Guides: Research Portal: BioResearch Sources

BioResearch Tools

BLAST
Basic Local Alignment Search Tool

Basic Local Alignment Search Tool - provides access to a suite of programs for sequence analysis services for nucleotides, proteins, genomes, etc.
European Nucleotide Archive (ENA)

The European Nucleotide Archive (ENA) is an open, supported platform for the management, sharing, integration, archiving and dissemination of sequence data. It provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.
Ensembl

The Ensembl project produces genome databases for vertebrates and other eukaryotic species.
NEBcutter 2.0
New England Biolabs

This tool will take a DNA sequence and find the large, non-overlapping open reading frames using the E.coli genetic code and the sites for all Type II and commercially available Type III restriction enzymes that cut the sequence just once.
PlasMapper
Plasmid Mapper

This tool automatically generates and annotates plasmid maps using only the plasmid DNA sequence as input.

BioResearch Resources - A-NCBI GSS

Berkeley Drosophila Genome Project
The goals of the Drosophila Genome Center are to finish the sequence of the euchromatic genome of Drosophila melanogaster to high quality and to generate and maintain biological annotations of this sequence.

DOE Joint Genome Institute
more... less...

unites the expertise of five national laboratories—Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Pacific Northwest—along with the HudsonAlpha Institute for Biotechnology to advance genomics in support of the DOE missions related to clean energy generation and environmental characterization and cleanup. JGI is operated by the University of California for the U.S. Department of Energy

GenBank
NIH Genetic Sequence Database

Annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2011 Jan;39(Database issue):D32-7).
GenEthx
Genetics and Ethics Database

Citations, some with abstracts or tables of contents, to literature on the ethical, legal, and social implications of genetic and genomic research and its applications from many disciplines and publication types including journals, newspapers, books, bills, laws, court decisions, reports, and audiovisuals.
GenScan

This server provides access to the program Genscan for predicting the locations and exon-intron structures of genes in genomic sequences from a variety of organisms.
Haz-Map: Occupational Exposure to Hazardous Agents

Haz-Map® (Copyright © 2000-2002) is an occupational toxicology database designed to link jobs to hazardous job tasks which are linked to occupational diseases and their symptoms. It is a relational database of chemicals, jobs and diseases.
InterPro

an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures.
KEG
Kyoto Encyclopedia of Genes and Genomes

an integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information.
Linscott's Directory of Immunological and Biological Reagents

provides a complete and up-to-date compilation of immunological and biological reagents and who sells them. Sources for antibodies, assays, cytokines, enzymes, recombinant proteins, tissues, and organs are included.
Molecular Biology Search

(from University of Pittsburgh Health Sciences Library) provides quick access to the major bioinformatics databases, software tools, and related literature on the Web, in addition to HSLS-supported license-based resources. Each resource tab provides a federated search of a different pool of resources. Some resources will restrict access to authorized Pitt & UPMC users.
Mouse Genome Informatics

the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
NCBI Homepage
National Center for Biotechnology Information

The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. Access all NCBI resources from this page.
NCBI Bookshelf

Contains a collection of full-text books that can be searched online and that are linked to PubMed records through research paper citations within the text. The collection includes biomedical textbooks, other scientific titles, the NCBI News, and NCBI help manuals.
NCBI Conserved Domains

database of protein domains represented by sequence alignments and profiles for protein domains conserved in molecular evolution. It also includes alignments of the domains to known three-dimensional protein structures in the MMDB database. The source databases for Conserved Domains are Pfam, Smart, and COG
NCBI dbGaP
Database of Genotypes and Phenotypes

Database of Genotypes and Phenotypes) provides the results of studies that have investigated the interaction of genotype and phenotype including genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.
NCBI dbVAR
Database of Genomic Structural Variation

contains information about large-scale genomic variation, including large insertions, deletions, translocations and inversions. dbVar also provides associations of defined variants with phenotype information.
NCBI EST
Expressed Sequence Tag

contains sequence records from the bulk EST (Expressed Sequence Tag) division of GenBank. These are typically short single-pass reads from cDNA libraries often generated as large survey project. Data from EST can be used to catalog expressed genes for a particular organ, tissue or cell type or general for a species, and compare expression levels of genes in various library sources
NCBI Gene

a searchable database of genes, focusing on genomes that have been completely sequenced and that have an active research community to contribute gene-specific data. Information in Gene records includes nomenclature, chromosomal localization, gene products and their attributes (e.g., protein interactions), associated markers, phenotypes, interactions, and links to citations, sequences, variation details, maps, expression reports, homologs, protein domain content, and external databases.
NCBI Genome

contains sequence and map data from the whole genomes of over 1000 species or strains. The genomes represent both completely sequenced genomes and those with sequencing in-progress. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles.
GENSAT
Gene Expression Nervous System Atlas

(GENSAT) Is a database that provides the anatomical location of gene expression in the mouse brain using both in situ hybridization and transgenic mouse techniques on histological sections. The GENSAT records contain images that show the relative rates of transcription for each target gene in various regions of the brain.
NCBI Geo Datasets

stores curated gene expression and molecular abundance data sets assembled by NCBI from the Gene Expression Omnibus (GEO) repository of microarray data.
NCBI Geo Profiles

is a database that stores individual gene expression and molecular abundance profiles assembled from the Gene Expression Omnibus (GEO) repository of microarray data.
NCBI GSS

database contains sequence records from the bulk GSS (Genome Survey Sequence) division of GenBank. These are the genomic equivalent of EST records; short single pass reads from gDNA libraries. Insert end and other reads from BAC and other large insert genomic libraries used to identify and assemble candidates for genome sequencing are common examples of GSS records.

BioResearch Resources - NCBI Homologene-Z

NCBI Homologene

contains automatically generated sets of homologous genes and their corresponding mRNA, genomic, and protein sequence data from selected eukaryotic organisms. Potential homologs from other organisms are included through sequence similarity to UniGene clusters.
NCBI Nucleotide

The Nucleotide database is a collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.
NCBI PopSet

contains related nucleotide sequences that originate from comparative studies: phylogenetic, population, environmental (ecosystem), and mutational. Each record in the database is a set of nucleotide sequences representing the same molecule from the same species (population, mutation), different identifiable species (phylogenetic), or anonymous species from the same biological community (ecosystem).
NCBI Protein

The Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.
NCBI Protein Clusters

a collection of related protein sequences (clusters) consisting of Reference Sequence proteins that are encoded by complete prokaryotic genomes as well those encoded eukaryotic organelle plasmids and genomes. The database provides easy access to annotation information, publications, domains, structures, external links, and analysis tools.
NCBI PubChem Bioassay

a database that contains bioactivity screens of chemical substances described in PubChem Substance. It provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to that screening procedure.
NCBI PubChem Compound

contains unique, validated chemical structures (small molecules) that can be searched using names, synonyms or keywords. The compound records may link to more than one PubChem Substance record if different depositors supplied the same structure. Structures in PubChem Compounds are pre-clustered and cross-referenced by identity and similarity groups. Additionally, calculated properties and descriptors are available for searching and filtering of chemical structures. Compound records are linked to related PubChem Substance Records, PubMed citations, protein 3D structures, and biological screening results that are available in PubChem BioAssay.
NCBI PubChem Substance

contains information on chemical substances including mixtures electronically submitted to PubChem by depositors. This includes any chemical structure information submitted, as well as chemical names, comments, and links to the depositor's web site.
NCBI SNP
Single Nucleotide Polymorphism

database is a central repository for single nucleotide polymorphisms, microsatellites, and small-scale insertions and deletions. Both submitted SNPs and NCBI-produced non-redundant reference records (RefSNPs) that cluster reports of the same polymorphism from different sources are available. SNP also contains population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral polymorphisms and clinical mutations.
NCBI SRA
Sequence Read Archive

contains sequencing data from the next generation sequencing platforms. SRA accepts and presents data from all current next-generation sequencing platforms including 454 (Roche), Illumina, SOLiD (Applied Biosystems), HeliScope, and Complete Genomics. Data can include sequence, quality scores, color values, and intensity graphs depending on the platform involved.
NCBI Structure

The Structure or Molecular Modeling Database (MMDB) contains experimental data from crystallographic and NMR structure determinations. The data for MMDB are obtained from the Protein Data Bank (PDB). Structure records link to bibliographic information, the sequence databases, and to the NCBI taxonomy. Cn3D, the NCBI 3D structure viewer, allows for easy interactive visualization of molecular structures from Entrez.
NCBI Taxonomy

database contains the names and phylogenetic lineages of the more than 160,000 organisms that have molecular data in the NCBI databases. New taxa are added to the Taxonomy database as data are deposited for them. The taxonomy records include links to all molecular data for the organism or group as well as links to outside classification resources. The taxonomy provides the major controlled vocabulary for classifying molecular data across the Entrez system.
NCBI VAST
Vector Alignment Search Tool

a service that allows searching for structural neighbors starting with a set of 3D-coordinates specified by the user. This service is meant to be used with newly determined protein structures that are not yet part of MMDB. Structure neighbors for proteins already in MMDB have been pre-computed and can simply be looked up from MMDB's Structure summary pages.
Nucleic Acids Research Database Collection

1330 carefully selected molecular biology databases
OMIA
Online Mendelian Inheritance in Animals

is a database of genes, inherited disorders and traits in animal species (other than human and mouse). The database contains textual information and references, as well as links to relevant records from OMIM, PubMed, and Gene.
OMIM
Online Mendelian Inheritance in Man

OMIM ® , Online Mendelian Inheritance in Man ® . OMIM is a comprehensive, authoritative, and timely compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known mendelian disorders and over 12,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain copious links to other genetics resources.
PDB
Protein Databank

The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies.
Pfam: database of protein families

The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
PIR
Protein Information Resource

an integrated public bioinformatics resource to support genomic, proteomic and systems biology research and scientific studies
PROSITE
Database of Protein Families

PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them
SMART
Simple Modular Architecture Research Tool

allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. More than 500 domain families found in signalling, extracellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.
SWISS-MODEL

is a fully automated protein structure homology-modeling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). The purpose of this server is to make Protein Modelling accessible to all biochemists and molecular biologists worldwide.