Help:

Comparative Search:

The Comparative Search page allows you to perform several different statistical analyses on the microarray expression data across different subgroups: Never Smokers, Former Smokers and Current Smokers.

The statistical tests currently supported are: the T-Test, R-Test (Pearson Correlation), Wilcoxon Rank Test and Spearman Correlation.

In Section 1 of the Compsearch page, any one of this tests can be selected and a specific pair of patient subgroups can be chosen to compare across. In the case of the Pearson and Spearman tests, several different variables, including age, smoking pack-years and a host of lung function parameters can be selected for correlation analysis.

In Sections 2 and 3, significance thresholds can be set for the Correlation tests (R-Test and Spearman) and Mean Value Comparison tests (T-Test and Wilcoxon) respectively. For the Mean Value Comparison analyses, a P-Value threshold can be set to only display genes whose subgroup comparison results have a significance level better than the set threshold. Section 3 also allows you to specify which column you wish to apply the threshold to. You may select the raw P-Value or the adjusted Q-Value (Q values are used to correct for the multiple comparison problem. Q-values are the raw P-Value adjusted using the Q-value software by Storey et al.) For Correlation analyses, a Coefficient Value threshold can be set along with the P-Value threshold. The Coefficient value determines the strength of correlation between two variables, for example Coefficient values closer to 1 or -1 indicate stronger positive or negative linear relationships respectively. The default P-Value threshold has been set at 0.05 and the default for the Correlation Coefficient threshold has been set at 0.4 while the default P-Value threshold for T-tests or Wilcox Tests has been set at 0.001.

In Section 4, search results can be filtered to display results from genes who pass the above statistic\ al thresholds AND whose GO identifiers (set by the Gene Ontology Consortium) include the specified keywords eg. DNA repair, Cell Cycle or Apoptosis

Finally in Section 5, search results can be sorted by various parameters (for example by Minimum P-Valu\ e, Maximum Correlation Coefficient, Fold Change or Gene Chromosomal Location) to display results in a meaningful order.

Advanced Search:

The Advanced Search page allows you to use the schema provided to formulate specific, complex SQL queries on the database. For online documentation on how to construct SQL queries go to the following MySQL site. Please note that only "Select" queries are allowed and that the ";" character is not necessary in the SQL query statement.

Quick Info:

The Quick Info Search page allows you to quickly obtain a complete data readout of all the Patient Information or Sample tables. These include the Demographic, Smoking History, Lung Function, Diagnosis and Sample Information tables. Alternatively, you can use the second available option on this page to submit a list of Patient IDs and obtain the specified patient information for only the selected patients. In addition, the Quick Info page allows you to quickly determine which patient samples fall under which patient subgroup categories by selecting the "Patient/Sample Class Info" option from the first drop-down menu.

Gene Reference:

The Gene Reference Search page allows you to search the statistical test results (from either the T-Test, Wilcoxon Rank Test, Pearson or Spearman Correlation Tests) of genes specified by the user to be of interest.

An added functionality for this page is that it acts as a search engine that will return the Affymetrix ID for any gene based on a short user-specified keyword description of the gene. This functionality is the second option on this page and can be run independently from the first Statistical Result search option. Finally using the third option a user can retrieve the expression levels for all the samples for any given gene.

Filtered Data Download:

The Filtered Data Download page allows you to download a file containing the complete statistical results for each gene in all the different statistical tests we have performed.

Transcriptome Search:

The Transcriptome Search page allows you to search the genes that make up the putative transcriptomes for the various subgroups.

In Section 1 of the Transcriptome Search page, you can specify which subgroup transcriptome (Never Smokers, Former Smokers or Current Smokers) you wish to search. The "CORE" transcriptome includes all the genes that are present in 100% of the samples from the group chosen. The 50% transcriptome represents all the genes that are present in at least 50% of the samples from that group. Clicking the appropriate link will bring up a venn-diagram showing the number of genes in each subgroup and the genes that intersect between them

Alternatively in section 2, the search can also be performed to select genes using two variability parameters: Expression Value Standard Deviation or the default which is (Standard Deviation/Mean Expression) x 100%. Thresholds can be set by entering an absolute threshold number and then specifying whether you wish to return all genes exceeding or below the set threshold. The search results can be filtered to include genes whose GO identifiers (Gene Ontology Consortium) include user-specified keywords.

In Section 3, we have a GO annotation tool. By specifying a GO category, you can search and determine whether that category is over or under-represented in any of the putative transcriptomes.

In section 4, a GO annotation tool called GOMINER (Zeeberg et al.) is used to generate a DAG file which is a visual representation of the over- or under-representation of different GO categories in a specific putative transcriptome.

Graphing:

The Graphing page allows you to generate various different graphics depicting different microarray expression metrics across the entire dataset. This includes: 1) An Expression Value Histogram for all samples. Given one or several Affx IDs, the page will return a series of JPEG images showing the histogram of expression values across all samples for each AFFX ID specified. 2) A Sample vs Sample Scatterplot, plotting a JPEG image of the expression values for two different samples against each other.

In the Clustering Section of the Graphing page, you can generate clustering dendograms of specific sample subgroups (Never Smokers or Current Smokers) by selecting from a choice of clustering methods and distance metrics. The clustering algorithm can use ALL the filtered gene expression values or instead use a user-defined subset. The subset of genes can be selected using different variability parameters which include Max/Min Expression Ratio and Max-Min Expression Difference. The user can specify the number of top variable genes to be used in the clustering or alternatively can select a threshold such that only genes that pass that threshold will be included in the clustering.

This page is best viewed under the resolution of 1024x768.

Copyright 2004 All Rights Reserved. Trustees of Boston University.