HCPM is a program which finds groups of similar structures in a large
set of protein models. The main features of HCPM are:
- Similarity could be measured by crmsd (coordinate root-mean-square deviation) or by drmsd (distance root-mean-square deviation)
- HCPM removes very similar or identical structures
from the data set
- Clustering needs to be performed only once. During the clustering a hierarchy
describing similarity of the structures is stored in an external file.
Using this file it is possible to recalculate clusters for different
input settings without any noticeable additional computational cost.
- Program provides a heuristic for selecting an optimal value of the clustering
parameter (cutoff for the merging distance)
- Program saves centroid structures for all clusters
- It is also possible to generate a file with all structures
from a given (user-defined)cluster.
- HCPM can cluster relatively large data sets, up to several tens of thousands of decoys.
- Using an iterative procedure it is possible to cluster
data containing more than 100 000 protein models
The program can be downloaded from here. It compiles with GNU C++ compiler, version 3.2
g++ -o hcpm codes.cpp
Feel free to ask author