About 'Gene Curation DS'

We have generated a dataset named ‘Gene Curation DS', which is dedicated to annotating crucial information related to the following gene functions in plant, animal, and protozoa species.

In Plantae, the following dataset was created for 189 species. Of the total of 231 genome datasets, 174 (representing 132 species) were sourced from Plant GARDEN, while 58 species were acquired from NCBI Genome.

  • Functionally important genes
    Gene functions described in literature were curated according to their correspondence between the gene names appearing in the literature and gene IDs defined in genome sequences. In addition, the Gene Ontology (GO), Trait Ontology (TO), Plant Ontology (PO), and Plant Experimental Conditions Ontology (PECO) included in the sentences were extracted (112 species). The primer sets (forward and reverse primers) of the genes were also curated (63 species).
  • Resistance Gene Analogs (RGAs; candidate of R-genes)
    RGAs are considered potential resistance genes (R-genes) against pathogens by producing R proteins, were searched for in genome sequences (134 species).
  • Genes applied to patents
    Genes included in patent applications registered in the DNA Data Bank of Japan (DDBJ) PAT database were searched for in genome sequences (152 species)
  • Gene functions predicted by similarity searches
    Gene functions were predicted by similarity searches against UniProtKB (the UniProt knowledgebase, which provides a comprehensive, high-quality, and freely accessible resource for protein sequences and functional information), Araport (a complete annotation of Arabidopsis thaliana), and EggNOG (a hierarchical, functionally and phylogenetically annotated orthology resource) (188 species). Non-coding RNAs including microRNAs were searched for in genome sequences against Rfam (128 species).
  • Orthologous genes
    Orthologous groups of genes among 136 plant species were explored by a clustering method using proteinortho. The genes were classified into orthologous groups and singletons.

In Animalia and Protozoa, the data 3 (genes applied to patents) and 4 (gene functions predicted by similarity searches) described above was respectively created for 185 and 58 species.

These curated data were summarized into Google Spreadsheets and TogoDB provided from Database Center for Life Science (DBCLS) in Japan.


When viewed on a smartphone, species information will be displayed at bottom of the page.


In the curation of useful genes, the gene names and their primer sequences were curated manually from literature. The sentences including the gene names were extracted by using PubTator, and the corresponding between the gene name in literature and that in the sentences were automatically assigned by in-house scripts. Therefore there might be errors in the correspondence. If you have find incorrect information, please contact to gcds[at].kazusa.or.jp.