touchstone-rs

Evaluate · Compare · Advance
↓ Download CSV

Streaming anomaly detectors, ranked by performance

No single metric tells the full story. ROC-AUC measures overall discrimination, PR-AUC and Average Precision reward detectors that rank true anomalies highly under class imbalance, and Range-based variants (RangePrec, RangeRec, RangeF1, RangePR-AUC/VUS, RangeROC-VUS) credit detectors that identify the extent of anomalous segments rather than just isolated points. Choose the metric that best reflects the cost structure of your application.

Rank Detector ROC-AUC PR-AUC AvgPrec Precision Recall F1 RangePrec RangeRec RangeF1 RangePR-AUC RangePR-VUS RangeROC-VUS time_sec
🥇 KMeans 0.6987 0.1796 0.1806 0.0680 0.3795 0.0954 0.0476 0.1981 0.0401 0.0799 0.0714 0.6046 0.4796
🥈 TiRex 0.6852 0.1037 0.1052 0.0566 0.2739 0.0751 0.0630 0.0872 0.0237 0.0282 0.0291 0.5962 288.0356
🥉 Toto-4m 0.6568 0.1471 0.1484 0.0753 0.3474 0.0954 0.0567 0.0902 0.0162 0.0252 0.0305 0.5698 285.0838
4 Stumpi 0.6414 0.1476 0.1487 0.0954 0.2980 0.1088 0.0670 0.1829 0.0396 0.1003 0.0788 0.5980 323.6492
5 TinyTimeMixer 0.5877 0.1093 0.1105 0.0645 0.2506 0.0769 0.0454 0.0497 0.0237 0.0190 0.0297 0.5616 91.7534
6 SAND 0.5712 0.0321 0.0325 0.0197 0.1135 0.0278 0.0206 0.0364 0.0107 0.0171 0.0383 0.5351 63.8342
7 NormalDistribution 0.4963 0.0318 0.0326 0.0227 0.1236 0.0316 0.0164 0.0296 0.0096 0.0112 0.0306 0.4876 0.0038
8 Baseline 0.4893 0.0190 0.0192 0.0183 0.0917 0.0235 0.0186 0.0024 0.0025 0.0133 0.0274 0.4956 0.0002

Time vs Quality (VUS Score)

Streaming detectors must process each data point in real time — a high-accuracy detector that cannot keep up with the data rate is unusable in practice. This plot reveals the trade-off: detectors in the upper-left corner offer the best of both worlds, while those in the lower-right sacrifice throughput without a quality gain. Quality is expressed as the mean of RangePR-VUS and RangeROC-VUS — metrics that integrate detector performance across all thresholds and temporal extents, making them the most comprehensive measure for streaming anomaly detection.