The goals of the Drosophila Genome Center are to finish the sequence of the euchromatic genome of Drosophila melanogaster to high quality and to generate and maintain biological annotations of this sequence.
Allows users to search the NLM ChemIDplus database of over 370,000 chemicals. A user may enter compound identifiers such as Chemical Name, CAS Registry Number, Molecular Formula, Classification Code, Locator Code, and Structure or Substructure. New searchable features include search and display by Toxicity indicators such as Median Lethal Dose (LD50), by Physical/Chemical Properties such as LogP, and by Molecular Weight.
unites the expertise of five national laboratories—Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Pacific Northwest—along with the HudsonAlpha Institute for Biotechnology to advance genomics in support of the DOE missions related to clean energy generation and environmental characterization and cleanup. JGI is operated by the University of California for the U.S. Department of Energy
Citations, some with abstracts or tables of contents, to literature on the ethical, legal, and social implications of genetic and genomic research and its applications from many disciplines and publication types including journals, newspapers, books, bills, laws, court decisions, reports, and audiovisuals.
an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures.
provides a complete and up-to-date compilation of immunological and biological reagents and who sells them. Sources for antibodies, assays, cytokines, enzymes, recombinant proteins, tissues, and organs are included.
(from University of Pittsburgh Health Sciences Library)
provides quick access to the major bioinformatics databases, software tools, and related literature on the Web, in addition to HSLS-supported license-based resources. Each resource tab provides a federated search of a different pool of resources. Some resources will restrict access to authorized Pitt & UPMC users.
collects information on interacting sets of biomolecules involved in metabolic and signaling pathways, disease states, and other biological processes. BioSystems currently contains biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the EcoCyc (Escherichia coli K-12 MG1655) subset of the BioCyc databases and is designed to accommodate other data in the future. BioSystems records link to related literature, genes, protein sequences, structures, chemical data, to related BioSystems. When available each record links to detailed diagrams and annotations for individual pathways on the Web sites of the source databases.
Contains a collection of full-text books that can be searched online and that are linked to PubMed records through research paper citations within the text. The collection includes biomedical textbooks, other scientific titles, the NCBI News, and NCBI help manuals.
database of protein domains represented by sequence alignments and profiles for protein domains conserved in molecular evolution. It also includes alignments of the domains to known three-dimensional protein structures in the MMDB database. The source databases for Conserved Domains are Pfam, Smart, and COG
Database of Genotypes and Phenotypes) provides the results of studies that have investigated the interaction of genotype and phenotype including genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.
contains information about large-scale genomic variation, including large insertions, deletions, translocations and inversions. dbVar also provides associations of defined variants with phenotype information.
contains sequence records from the bulk EST (Expressed Sequence Tag) division of GenBank. These are typically short single-pass reads from cDNA libraries often generated as large survey project. Data from EST can be used to catalog expressed genes for a particular organ, tissue or cell type or general for a species, and compare expression levels of genes in various library sources
a searchable database of genes, focusing on genomes that have been completely sequenced and that have an active research community to contribute gene-specific data. Information in Gene records includes nomenclature, chromosomal localization, gene products and their attributes (e.g., protein interactions), associated markers, phenotypes, interactions, and links to citations, sequences, variation details, maps, expression reports, homologs, protein domain content, and external databases.
contains sequence and map data from the whole genomes of over 1000 species or strains. The genomes represent both completely sequenced genomes and those with sequencing in-progress. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles.
a database that provides the anatomical location of gene expression in the mouse brain using both in situ hybridization and transgenic mouse techniques on histological sections. The GENSAT records contain images that show the relative rates of transcription for each target gene in various regions of the brain.
database contains sequence records from the bulk GSS (Genome Survey Sequence) division of GenBank. These are the genomic equivalent of EST records; short single pass reads from gDNA libraries. Insert end and other reads from BAC and other large insert genomic libraries used to identify and assemble candidates for genome sequencing are common examples of GSS records.
contains automatically generated sets of homologous genes and their corresponding mRNA, genomic, and protein sequence data from selected eukaryotic organisms. Potential homologs from other organisms are included through sequence similarity to UniGene clusters.
The Nucleotide database is a collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.
contains related nucleotide sequences that originate from comparative studies: phylogenetic, population, environmental (ecosystem), and mutational. Each record in the database is a set of nucleotide sequences representing the same molecule from the same species (population, mutation), different identifiable species (phylogenetic), or anonymous species from the same biological community (ecosystem).
a database of nucleic acid reagents designed for use in a wide variety of biomedical research applications including genotyping, gene expression studies, SNP discovery, genome mapping, and gene silencing. Probe records contain information on reagent distributors, probe effectiveness, and computed sequence similarities.
The Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.
a collection of related protein sequences (clusters) consisting of Reference Sequence proteins that are encoded by complete prokaryotic genomes as well those encoded eukaryotic organelle plasmids and genomes. The database provides easy access to annotation information, publications, domains, structures, external links, and analysis tools.
a database that contains bioactivity screens of chemical substances described in PubChem Substance. It provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to that screening procedure.
contains unique, validated chemical structures (small molecules) that can be searched using names, synonyms or keywords. The compound records may link to more than one PubChem Substance record if different depositors supplied the same structure. Structures in PubChem Compounds are pre-clustered and cross-referenced by identity and similarity groups. Additionally, calculated properties and descriptors are available for searching and filtering of chemical structures. Compound records are linked to related PubChem Substance Records, PubMed citations, protein 3D structures, and biological screening results that are available in PubChem BioAssay.
contains information on chemical substances including mixtures electronically submitted to PubChem by depositors. This includes any chemical structure information submitted, as well as chemical names, comments, and links to the depositor's web site.
database is a central repository for single nucleotide polymorphisms, microsatellites, and small-scale insertions and deletions. Both submitted SNPs and NCBI-produced non-redundant reference records (RefSNPs) that cluster reports of the same polymorphism from different sources are available. SNP also contains population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral polymorphisms and clinical mutations.
contains sequencing data from the next generation sequencing platforms. SRA accepts and presents data from all current next-generation sequencing platforms including 454 (Roche), Illumina, SOLiD (Applied Biosystems), HeliScope, and Complete Genomics. Data can include sequence, quality scores, color values, and intensity graphs depending on the platform involved.
The Structure or Molecular Modeling Database (MMDB) contains experimental data from crystallographic and NMR structure determinations. The data for MMDB are obtained from the Protein Data Bank (PDB). Structure records link to bibliographic information, the sequence databases, and to the NCBI taxonomy. Cn3D, the NCBI 3D structure viewer, allows for easy interactive visualization of molecular structures from Entrez.
database contains the names and phylogenetic lineages of the more than 160,000 organisms that have molecular data in the NCBI databases. New taxa are added to the Taxonomy database as data are deposited for them. The taxonomy records include links to all molecular data for the organism or group as well as links to outside classification resources. The taxonomy provides the major controlled vocabulary for classifying molecular data across the Entrez system.
database that provides automatically generated nonredundant sets (clusters) of transcript sequences, each cluster representing a distinct transcription locus (gene or expressed pseudogene). UniGene clusters also provide information on protein similarities, gene expression, cDNA clone reagents, and genomic location.
a service that allows searching for structural neighbors starting with a set of 3D-coordinates specified by the user. This service is meant to be used with newly determined protein structures that are not yet part of MMDB. Structure neighbors for proteins already in MMDB have been pre-computed and can simply be looked up from MMDB's Structure summary pages.
is a database of genes, inherited disorders and traits in animal species (other than human and mouse). The database contains textual information and references, as well as links to relevant records from OMIM, PubMed, and Gene.
OMIM ® , Online Mendelian Inheritance in Man ® . OMIM is a comprehensive, authoritative, and timely compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known mendelian disorders and over 12,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain copious links to other genetics resources.
a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of a SWISS-PROT/TrEMBL composite.
PubGene helps you retrieve information on genes and proteins. The underlying structure of PubGene can be viewed as a "gene-centric" database. Gene and protein names are cross-referenced to each other and to terms that are relevant to understanding their biological function, importance in disease and relationship to chemical substances. The result is a "literature network" organizing information in a form that is easy to navigate.
allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. More than 500 domain families found in signalling, extracellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.
is a fully automated protein structure homology-modeling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). The purpose of this server is to make Protein Modelling accessible to all biochemists and molecular biologists worldwide.
This tool, also known as EMBL-Bank, constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications.
This tool will take a DNA sequence and find the large, non-overlapping open reading frames using the E.coli genetic code and the sites for all Type II and commercially available Type III restriction enzymes that cut the sequence just once.