Classification of genes using probabilistic models of microarray
expression profiles
Paul Pavlidis, Christopher Tang and William Stafford Noble
Proceedings of BIOKDD 2001: Workshop on Data Mining in
Bioinformatics.
Abstract
Microarray expression data provides a new method for classifying genes
and gene products according to their expression profiles. Numerous
unsupervised and supervised learning methods have been applied to the
task of discovering and learning to recognize classes of co-expressed
genes. Here we present a supervised learning method based upon
techniques borrowed from biological sequence analysis. The expression
profile of a class of co-expressed genes is summarized in a
probabilistic model similar to a position-specific scoring matrix
(PSSM). This model provides insight into the expression
characteristics of the gene class, as well as accurate recognition
performance. Because the PSSM models are generative, they are
particularly useful when a biologist can identify a priori a
class of co-expressed genes but is unable to identify a large
collection of non co-expressed genes to serve as a negative training
set. We validate the technique using expression data from
S. cerevisiae and C. elegans.
PDF version
Home