sphinx-quickstart on Mon May 11 18:17:16 2020.
Stability-based relative clustering validation to determine the best number of cluster¶
reval
allows to determine the best clustering solution without a priori knowledge.
It leverages a stability-based relative clustering validation method (Lange et al., 2004) that transforms
a clustering algorithm into a supervised classification problem and selects the number of clusters
that leads to the minimum expected misclassification error, i.e., stability.
This library allows to:
Select any classification algorithm from
sklearn
library;- Select a clustering algorithm with
n_clusters
parameter or HDBSCAN density-based algorithm, i.e., choose among
sklearn.cluster.KMeans
,
sklearn.cluster.AgglomerativeClustering
,sklearn.cluster.SpectralClustering
,hdbscan.HDBSCAN
;- Select a clustering algorithm with
Perform (repeated) k-fold cross-validation to determine the best number of clusters;
Test the final model on an held-out dataset.
Theoretical background can be found in (Lange et al., 2004), whereas code can be found on github.
The analysis steps performed by reval
package are displayed below.
Lange, T., Roth, V., Braun, M. L., & Buhmann, J. M. (2004). Stability-based validation of clustering solutions. Neural computation, 16(6), 1299-1323.
Cite as¶
Landi I, Mandelli V, Lombardo MV.
reval: A Python package to determine best clustering solutions with stability-based relative clustering validation.
Patterns (N Y). 2021 Apr 2;2(4):100228.
doi: 10.1016/j.patter.2021.100228. PMID: 33982023; PMCID: PMC8085609.
BibTeX alternative
@article{landi2021100228,
title = {reval: A Python package to determine best clustering solutions with stability-based relative clustering validation},
journal = {Patterns},
volume = {2},
number = {4},
pages = {100228},
year = {2021},
issn = {2666-3899},
doi = {https://doi.org/10.1016/j.patter.2021.100228},
url = {https://www.sciencedirect.com/science/article/pii/S2666389921000428},
author = {Isotta Landi and Veronica Mandelli and Michael V. Lombardo},
keywords = {stability-based relative validation, clustering, unsupervised learning, clustering replicability}
}