HGVbaseG2P Help

» Help

Click the link to go to the
help area of interest.

Database 'How to...' Guide
Reference Documents
Definitions/Glossary

This Help page provides 'How To' mini-tutorials on how to use the website, reference documents referred to in the Help text and a glossary of key terms used throughout the HGVbase-G2P web-site and descriptions of specific parts of the site.

DATABASE 'HOW TO...' GUIDE

How to Understand the Database Content

HGVbaseG2P is built upon a basal layer of Markers that comprises all known SNPs and other variants from public databases such as dbSNP and the DBGV.

Allele and genotype frequency data, plus genetic association significance findings, are added on top of the Marker data, and organised the same way that investigations are reported in typical journal manuscripts. Critically, no individual level genotypes or phenotypes are presented in HGVbaseG2P - only group level aggregated (summary level) data.
The largest unit in a data submission is a Study, which can be thought of as being equivalent to one journal article. This may contain one or more Experiments, one or more Sample Panels of test subjects, and one or more Phenotypes. Sample Panels may be characterised in terms of various Phenotypes, and they also may be combined and/or split into Assayed Panels. The Assayed Panels are used as the basis for reporting allele/genotype frequencies (in `Genotype Experiments`) and/or genetic association findings (in 'Analysis Experiments'). Environmental factors are handled as part of the Sample Panel and Assayed Panel data structures.

How to Explore HGVbaseG2P One Study at a Time

First, locate a single Study of interest, by using one of these strategies:
1. use the simple search function in the top right hand corner of every page, or
2. use the search boxes on the Home Page to look for matching keywords amongst the Studies, Phenotypes, or Markers sections of the database, or
3. go directly to the Studies, Phenotypes, or Markers sections of the database by clicking on the appropriate tab, and then use the functions provided there to identify a Study you are interested in.
Second, click on the name of the Study you wish to explore in more detail, and you will be taken to that Study page (and the web-path to that Study will appear in the 'breadcrumb' text below the main database tabs).
Third, on the Study page you will see 5 new tabs, and you can click on these to get to the Study's detailed content: Summary, Panels (Sample Panels), Phenotypes, Genotype Experiments, and Analysis Experiments.
Fourth, from the Genotype Experiments and the Analysis Experiments views into the Study you will find links to the detailed frequency and association (p-value) datasets, links to a whole-genome graphical view of the Study, and links to see the information on one or more genome browsers.

How to Extract HGVbaseG2P Content from Many Studies Together

BioMart is an open source solution that enables complex searches of databases and the generation of large and multifaceted result sets. This powerful system also lends itself well to the development of GRID-based searching across databases, which is a direction in which HGVbaseG2P will steadily move. More information on the Biomart platform can be found here, with general help texts provided here.

HGVmart is our adaptation of Biomart, designed to support 'power-users' of HGVbaseG2P. One key improvement we have made is a special 'compact' formatting option, via which each marker's multiple allele-specific result sets are compressed into single line summaries. Using HGVmart involves following just six simple steps:

First, use the 'Generate Data Exports' link on the Home page, or click on the HGVmart tab, to get to the HGVmart interface.
Second, click on the 'New' button, and by using the input boxes on the right choose the database 'HGVmart' (equivalent to the whole of HGVbaseG2P), and choose the required dataset ('Marker' if you want to search based upon particular Marker characteristics, or 'G2P Study' for searches based upon Study related content).
Third, click on the word 'Filters' on the left panel, and use the options in the right panel to set up your search parameters
Fourth, click on the word 'Attributes' on the left panel, and select from the right panel the type of output you wish to generate ('Study', 'Allele Frequency', 'Genotype Frequency', 'Allele Associations', 'Genotype Associations')
Fifth, use the options in the right panel to define precisely which data fields you wish to output.
Sixth, click on the 'Results' button, and then use the options in the right panel to both specific your required output format and to export the data. When producing results from the 'Allele Association' and 'Genotype Association' you can output it in a 'compact' way by selecting one of the options prefixed with 'COMPACT' from the output drop down box.

How to Submit Data

All data submitted to HGVbaseG2P will remain the property of the data generators and/or submitters, and all records will be presented to database with links and acknowledgements leading back to the original data source. Any users who might wish to obtain non-aggregated data will be instructed to make suitable requests to the relevant submitter and their data access authorities.

Submissions can be submitted with embargo dates or conditions attached. We will still immediately process such datasets to ensure the submission is complete and useable, but we will not release the submitted data to the public until instructed to do so.

When submitting genetic association data and/or allele/genotype frequency data to HGVbaseG2P, we require that the utilised Markers are all present in a major public marker/variation database (e.g., dbSNP). If this is not the case, we can assist you in depositing the Markers into a suitable database.

To submit genetic association and/or allele/genotype frequency data, please gather together the required information as specified in the Submission Guidance Notes and paste it into the Submission Data Template form for submission. Each submission will equate to one Study in HGVbaseG2P, but each Study (i.e., each submission) can include one or more Experiments.

Note: for the future, we are devising a standalone software tool that submitters will be able to download and install locally, which will actively guide them through the process of gathering and checking their data before submitting it. The tool will organize submission content into an XML formatted document that is stored on the submitter's hard disk, gather related information from sites across the internet (e.g., journal citation details, Marker Ids, and Allele specifications), and check for any inconsistencies in the total submission. This will make it simpler for users to assemble and check their submissions with care at their own pace, with the added benefit that they will be able to reuse components (e.g., assay details, clinical materials, and phenotype descriptions) from earlier submissions.

Questions on making submissions should be directed to:

How to View Your Data in the Ensembl Browser

Our DAS service has been suspended for the foreseeable future. Apologies for any inconvenience.

REFERENCE DOCUMENTS

HGVbaseG2P Nomenclature System

We have devised a completely new HGVbaseG2P Nomenclature System to ensure consistent and unambiguous presentation of alleles and genotypes. The system caters not only for simple sequence alleles and traditional presence/absence genotypes, but also copy-number variants and somatic variants, as well as quantitative and ratio classes of genotypes. It also offers a robust way to represent long alleles.

HGVbaseG2P Object Model v1.0

HGVbaseG2P data is organised into a series of relational database tables, a graphical overview of which is provided as a Data Model Diagram. The detailed structure of these tables is available in the form of a MySQL Relational Schema Definition.

Submitting data to HGVbaseG2P

Instructions on how to submit datasets to HGVbaseG2P are provided in the Submission Guidance Notes. To help you assemble your data correctly, we provide a Submission Data Template.

DEFINITIONS/GLOSSARY

What is a Study?

A Study in HGVbaseG2P is similar in scope to a journal article, comprising information relevant to a given research question or set of related questions. Data and analysis results from a study are grouped into one or more Experiments. The main fields in a Study entry are: Title, Abstract, Background, Objectives, KeyResults, Conclusions, StudyDesign, StudySizeReason, StudyPower, SourcesOfBias, Limitations, Acknowledgements, and SubmissionDate.

What is an Experiment?

Experiments in HGVbaseG2P are packages of information that address one discrete research question, and they are divided into 'Genotype Experiments' (providing summaries of allele/genotype frequencies in Assayed Panels) and 'Analysis Experiments' (providing summaries of genetic association findings in Assayed Panels). An Experiment may include data for any number of Markers and any number of Assayed Panels, but an Analysis Experiment will address no more than one Phenotype question. By way of example, a Genotype Experiment might summarise the use of a few sets of case and control Assayed Panels to explore the role of a gene or a genome region in predisposition to a specific disease, or it might comprise the results of a genome-wide association study. The main fields in an Experiment entry are: Objective, Outcome, and Comments.

What is a Sample Panel?

A Sample Panel in HGVbaseG2P is a set of test subjects that are collected together and grouped into a named compilation to address some phenotype of interest. Typically, all the individuals in a Sample Panel are annotated in terms of one or more related Phenotypes, or share some commonality of another key metric (e.g., age, gender, ethnicity). Sample Panels may or may not be equivalent to the eventual groupings that are used as the basis for examining and reporting Experiment data, i.e., the Assayed Panels.

What is an Assayed Panel?

An Assayed Panel in HGVbaseG2P is a set of test subjects that are grouped into a named compilation, and used as the basis for examining and reporting Experiment data. Each Assayed Panel is derived from one or more Sample Panels (by splitting them into subsets and/or merging across Sample Panels) on the basis of some explicit phenotype criterion (such as presence/absence of a Phenotype, or a Phenotype value beyond some inclusion threshold).

What is a Phenotype?

A Phenotype in HGVbaseG2P is a reported characteristic or trait of interest, such as blood pressure. Phenotype information is organized into three sub-components: the 'Phenotype Property' which represent the concept of the trait under study, the 'Phenotype Method' which describes how the Phenotype Property was measured, and the 'Phenotype Value' which is a particular observation/result produced by measuring the Phenotype Property. Schemalet examples of this are available at the PaGE-OM website.
This system is very straightforward to use for the representation of ordinal or nominal Phenotype Values. To solve the problem of presenting quantitative Phenotype Values in a group of individuals (i.e., a Sample Panel or an Assayed Panel), HGVbaseG2P stores various statistics that define the group's distribution (e.g., mean, max, min, standard deviation). HGVbaseG2P does not store Phenotype information for single individuals.

What is a Marker

In HGVbaseG2P we define a Marker as: "A DNA sequence for which identical or highly similar instances exist at one or more locations in a genome. Markers are typically used as the basis for designing an experimental assay for detection of those instances of that sequence". The range of Markers available in HGVbaseG2P is extensive, including the complete Marker content from other public depositories such as dbSNP, UniSTS, and DBGV.

What is a Genotype?

In HGVbaseG2P we define a Genotype as: "A qualitative or quantitative combination of alleles of one or more Markers or DNA regions, implied (by the result of running a genotyping assay) to be resident at one or more positions in the genome of a tested DNA sample". This definition thus focuses on the genotyping result and not absolute reality, i.e., detected genotypes may not always reflect the true status of the genome, since some assays are flawed in their design or application, and some DNA samples may be inaccurately genotyped. This definition also allows for haplotype genotypes, MarkerSet genotypes (composite Marker signals), and genotype classes that are something other than simple presence/absence detections. Specifically, we must also cater for copy-number variation and somatic variation, which implies quantitative and ratio genotypes will need to be supported. A new HGVbaseG2P Nomenclature System for genotypes has been devised, to help manage these various complexities.

What is an Allele?

In HGVbaseG2P we define an Allele as: "A specific version of a set of different sequence alternatives of a Marker or DNA region resident at one or more locations in a genome". To minimise confusion when referring to Alleles, HGVbaseG2P always presents Alleles in the context of their immediate flanking DNA sequences, and a new HGVbaseG2P Nomenclature System for Alleles has been devised.