Supplementary data "Semi-Supervised Protein Classification using Cluster Kernels."
- ROC-50 scores for all families and all
detection methods from the paper in
plain text format.
- ROC scores for all families and all
detection methods from the paper in plain text format.
- Plain text table
specifying the positive and negative training and test sets for
each family. Each row is one sequence, and each column is one family. (0 =
not present; 1 = positive train; 2 = negative train; 3 = positive test; 4 =
negative test). [Same file, but with no headers]
- Summary of data splits
giving the number of positive and negative training and test set examples and amount
of unlabeled data for each family.
- Names
of the SCOP families.
- Sequence
file in FASTA format containing all sequences in SCOP version 1.59 with
less than 95% identity.
-
7329x7329 Kernel matrices for methods used in the experiments:
(here are the IDs by row or column)
- BLAST matrix, ascii text file, gzipped (49 MB).
- PSI-BLAST matrix using the complete 7329 examples as a database, ascii text file, gzipped (52 MB).
- Spectrum Mismatch Kernel , k=5, m=1, ascii text file, gzipped (79 MB).
- The Spider software used in the experiments, a Matlab-based library of machine learning tools.
- Matlab scripts to run the semi-supervised experiments (using the Spider software.)