touchstone-rs · leaderboard

Streaming anomaly detectors, ranked by performance

No single metric tells the full story. ROC-AUC measures overall discrimination, PR-AUC and Average Precision reward detectors that rank true anomalies highly under class imbalance, and Range-based variants (RangePrec, RangeRec, RangeF1, RangePR-AUC/VUS, RangeROC-VUS) credit detectors that identify the extent of anomalous segments rather than just isolated points. Choose the metric that best reflects the cost structure of your application.

Rank	Detector ↕	ROC-AUC ↓	PR-AUC ↕	AvgPrec ↕	Precision ↕	Recall ↕	F1 ↕	RangePrec ↕	RangeRec ↕	RangeF1 ↕	RangePR-AUC ↕	RangePR-VUS ↕	RangeROC-VUS ↕	time_sec ↕
🥇	KMeans	0.6987	0.1796	0.1806	0.0680	0.3795	0.0954	0.0476	0.1981	0.0401	0.0799	0.0714	0.6046	0.4796
🥈	TiRex	0.6852	0.1037	0.1052	0.0566	0.2739	0.0751	0.0630	0.0872	0.0237	0.0282	0.0291	0.5962	288.0356
🥉	Toto-4m	0.6568	0.1471	0.1484	0.0753	0.3474	0.0954	0.0567	0.0902	0.0162	0.0252	0.0305	0.5698	285.0838
4	Stumpi	0.6414	0.1476	0.1487	0.0954	0.2980	0.1088	0.0670	0.1829	0.0396	0.1003	0.0788	0.5980	323.6492
5	TinyTimeMixer	0.5877	0.1093	0.1105	0.0645	0.2506	0.0769	0.0454	0.0497	0.0237	0.0190	0.0297	0.5616	91.7534
6	SAND	0.5712	0.0321	0.0325	0.0197	0.1135	0.0278	0.0206	0.0364	0.0107	0.0171	0.0383	0.5351	63.8342
7	NormalDistribution	0.4963	0.0318	0.0326	0.0227	0.1236	0.0316	0.0164	0.0296	0.0096	0.0112	0.0306	0.4876	0.0038
8	Baseline	0.4893	0.0190	0.0192	0.0183	0.0917	0.0235	0.0186	0.0024	0.0025	0.0133	0.0274	0.4956	0.0002

Time vs Quality (VUS Score)

Streaming detectors must process each data point in real time — a high-accuracy detector that cannot keep up with the data rate is unusable in practice. This plot reveals the trade-off: detectors in the upper-left corner offer the best of both worlds, while those in the lower-right sacrifice throughput without a quality gain. Quality is expressed as the mean of RangePR-VUS and RangeROC-VUS — metrics that integrate detector performance across all thresholds and temporal extents, making them the most comprehensive measure for streaming anomaly detection.