Hidden Markov Model Analysis of Motifs in Steroid
Dehydrogenases and their Homologs
William N. Grundy
Timothy L. Bailey
Charles P. Elkan
Michael E. Baker
Biochemical and Biophysical Research Communications,
231(3):760-766.
Abstract
The increasing size of protein sequence databases is straining methods
of sequence analysis, even as the increased information offers
opportunities for sophisticated analyses of protein structure,
function and evolution. Here we describe a method called Meta-MEME
that uses artificial intelligence-based algorithms to build models of
families of protein sequences. These models can be used to search
protein sequence databases for remote homologs. The MEME (Multiple
Expectation-maximization for Motif Elicitation) software package
identifies motif patterns in a protein family, and these motifs are
combined into a hidden Markov model (HMM) that can be used as a
database searching tool. Meta-MEME is sensitive and accurate, as well
as automated and unbiased, making it suitable for the analysis of
large datasets. We demonstrate Meta-MEME on a family of
dehydrogenases that includes mammalian 11b-hydroxysteroid and
17b-hydroxysteroid dehydrogenase and their homologs in the short chain
alcohol dehydrogenase family. We chose this dataset because it is
large and phylogenetically diverse, providing a good test of the
sensitivity and selectivity of Meta-MEME on a protein family of
biological interest. Indeed, Meta-MEME identifies at least 350
members of this family in Genpept96 and clearly separates these
sequences from non-homologous proteins. We also show how the MEME
motif output can be used for phylogenetic analysis.
PDF version
Home