Look up records of Bionty entities#

Entities and ontologies can be complex with many different identifiers.

Here we show Bionty’s lookup model for species, genes, proteins and cell markers. You’ll see how to

  • access the reference table via .df()

  • look up an entity term via .lookup()

  • look up an entity term via .fuzzy_match()

import bionty as bt
✅ Created /home/runner/.lamin/bionty/versions/sources_local.yaml!

.fields: fields of an ontology reference#

gene_bionty = bt.Gene()

gene_bionty


Gene
Species: human
Source: ensembl, release-108

📖 Gene.df(): ontology reference table
🔎 Gene.lookup(): autocompletion of ontology terms
🎯 Gene.fuzzy_match(): fuzzy match against ontology terms
🧐 Gene.inspect(): check if identifiers are mappable
👽 Gene.map_synonyms(): map synonyms to standardized names
🔗 Gene.ontology: Pronto.Ontology object
gene_bionty.fields
{'description',
 'ensembl_gene_id',
 'gene_type',
 'hgnc_id',
 'id',
 'ncbi_gene_id',
 'omim_id',
 'symbol',
 'synonyms'}

Fields can be accessed as attributes for autocompletion:

(You can pass them to the field parameter in any bionty function instead of strings.)

gene_bionty.ncbi_gene_id
ncbi_gene_id

.df(): reference table#

Data scientists love DataFrames, and every entity has a reference table containing all the fields.

df = gene_bionty.df()
df.head()
id ensembl_gene_id symbol gene_type description ncbi_gene_id hgnc_id omim_id synonyms version
0 Lzl9xt ENSG00000210049 MT-TF Mt_tRNA mitochondrially encoded tRNA-Phe (UUU/C) [Sour... None HGNC:7481 None MTTF|trnF Ens107
1 ILAWa7 ENSG00000211459 MT-RNR1 Mt_rRNA mitochondrially encoded 12S rRNA [Source:HGNC ... None HGNC:7470 None 12S|MOTS-c|MTRNR1 Ens107
2 XkyeQz ENSG00000210077 MT-TV Mt_tRNA mitochondrially encoded tRNA-Val (GUN) [Source... None HGNC:7500 None MTTV|trnV Ens107
3 jDD2jW ENSG00000210082 MT-RNR2 Mt_rRNA mitochondrially encoded 16S rRNA [Source:HGNC ... None HGNC:7471 None 16S|HN|MTRNR2 Ens107
4 J58H9b ENSG00000209082 MT-TL1 Mt_tRNA mitochondrially encoded tRNA-Leu (UUA/G) 1 [So... None HGNC:7490 None MTTL1|TRNL1 Ens107

To access the information of, for example the multiple gene symbols, we select the corresponding species through Pandas:

df.set_index("symbol").loc[["LMNA", "TCF7", "BRCA1"]]
id ensembl_gene_id gene_type description ncbi_gene_id hgnc_id omim_id synonyms version
symbol
LMNA 96RlDv ENSG00000160789 protein_coding lamin A/C [Source:HGNC Symbol;Acc:HGNC:6636] 4000 HGNC:6636 150330 CMD1A|HGPS|LGMD1B|LMN1|LMNL1|MADA|PRO1 Ens107
TCF7 sXCrmQ ENSG00000081059 protein_coding transcription factor 7 [Source:HGNC Symbol;Acc... 6932 HGNC:11639 189908 TCF-1 Ens107
BRCA1 9FY8yO ENSG00000012048 protein_coding BRCA1 DNA repair associated [Source:HGNC Symbo... 672 HGNC:1100 113705 BRCC1|FANCS|PPP1R53|RNF53 Ens107

.lookup(): Lookup terms and records with autocompletion#

Terms can be searched with auto-complete using a lookup object.

lookup = gene_bionty.lookup()

Pythonic terms can be directly fetched via dot . accessor:

lookup.TCF7
Gene(id='sXCrmQ', ensembl_gene_id='ENSG00000081059', symbol='TCF7', gene_type='protein_coding', description='transcription factor 7 [Source:HGNC Symbol;Acc:HGNC:11639]', ncbi_gene_id='6932', hgnc_id='HGNC:11639', omim_id='189908', synonyms='TCF-1', version='Ens107')

For non-pythonic string, use bracket [] for autocompletion:

lookup["ADGRL1-AS1"]
Gene(id='v68LyZ', ensembl_gene_id='ENSG00000267169', symbol='ADGRL1-AS1', gene_type='lncRNA', description='ADGRL1 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:55309]', ncbi_gene_id='100507373', hgnc_id='HGNC:55309', omim_id=None, synonyms=None, version='Ens107')

By default, the name field is used to generate lookup keys.

You can specify another field to look up:

lookup = gene_bionty.lookup(gene_bionty.hgnc_id)
lookup["HGNC:10478"]
Gene(id='AdmgUK', ensembl_gene_id='ENSG00000204231', symbol='RXRB', gene_type='protein_coding', description='retinoid X receptor beta [Source:HGNC Symbol;Acc:HGNC:10478]', ncbi_gene_id='6257', hgnc_id='HGNC:10478', omim_id='180246', synonyms='H-2RIIBP|NR2B2|RCoR-1|RXR-beta|RXRbeta', version='Ens107')

.fuzzy_match: Look up a term via fuzzy matching#

celltype_bionty = bt.CellType()


celltype_bionty.fuzzy_match("cytotoxic T cells")
ontology_id definition synonyms children __ratio__
name
cytotoxic T cell CL:0000910 A Mature T Cell That Differentiated And Acquir... cytotoxic T-cell|cytotoxic T lymphocyte|cytoto... [] 96.969697

By default, fuzzy_match also matches against synonyms:

celltype_bionty.fuzzy_match("P cell")
ontology_id definition synonyms children __ratio__
name
nodal myocyte CL:0002072 A Specialized Cardiac Myocyte In The Sinoatria... P cell|cardiac pacemaker cell|myocytus nodalis [CL:1000409, CL:1000410] 100.0

You can turn off synonym matching with synonyms_field=None:

celltype_bionty.fuzzy_match("P cell", synonyms_field=None)
ontology_id definition synonyms children __ratio__
name
PP cell CL:0000696 A Cell That Stores And Secretes Pancreatic Pol... type F enteroendocrine cell [CL:0002680] 92.307692

Match against another field (default is “name”):

celltype_bionty.fuzzy_match("CD8+ alpha beta T cells", field=celltype_bionty.definition)
ontology_id name synonyms children __ratio__
definition
A T Cell That Expresses An Alpha-Beta T Cell Receptor Complex. CL:0000789 alpha-beta T cell alpha-beta T lymphocyte|alpha-beta T-cell|alph... [CL:0000791, CL:0000790] 75.0

Return all results ranked by matching ratios:

celltype_bionty.fuzzy_match("P cell", return_ranked_results=True).head()
ontology_id definition synonyms children __ratio__
name
nodal myocyte CL:0002072 A Specialized Cardiac Myocyte In The Sinoatria... P cell|cardiac pacemaker cell|myocytus nodalis [CL:1000409, CL:1000410] 100.000000
double-positive, alpha-beta thymocyte CL:0000809 A Thymocyte Expressing The Alpha-Beta T Cell R... DP thymocyte|DP cell|double-positive, alpha-be... [CL:0002428, CL:0002430, CL:0002427, CL:000242... 92.307692
PP cell CL:0000696 A Cell That Stores And Secretes Pancreatic Pol... type F enteroendocrine cell [CL:0002680] 92.307692
pigmented ciliary epithelial cell CL:0002303 A Cell That Is Part Of Pigmented Ciliary Epith... PE cell [] 92.307692
GIP cell CL:0002278 An Enteroendocrine Cell Of Duodenum And Jejunu... type K enteroendocrine cell [] 85.714286

Tied results will all be returns:

celltype_bionty.fuzzy_match("A cell", synonyms_field=None)
ontology_id definition synonyms children __ratio__
name
T cell CL:0000084 A Type Of Lymphocyte Whose Defining Characteri... T-lymphocyte|T lymphocyte|T-cell [CL:0002419, CL:0000798, CL:0000789, CL:0002420] 83.333333
B cell CL:0000236 A Lymphocyte Of B Lineage That Is Capable Of B... B lymphocyte|B-cell|B-lymphocyte [CL:0009114, CL:0001201] 83.333333