Protein family classification using sparse Markov transducers
Eleazar Eskin
William Noble Grundy
Yoram Singer
Proceedings of the Eighth International Conference on
Intelligent Systems for Molecular Biology. August 20-23, 2000.
pp. 134-135.
Abstract
In this paper we present a method for classifying proteins into
families using sparse Markov transducers (SMTs). Sparse Markov
transducers, similar to probabilistic suffix trees, estimate a
probability distribution conditioned on a sequence. SMTs generalize
probabilistic suffix trees by allowing for wild cards in the
conditioning sequences. Because substitutions of amino acids are
common in protein families, incorporating wild card into the model
significantly improves classification performance. We present two
models for building protein family classifers using SMTs. We also
present efficient data structures to improve the memory usage of the
models. We evaluate SMTs by building family classifiers using the
Pfam database.
PDF version
Home