No single metric tells the full story. ROC-AUC measures overall discrimination, PR-AUC and Average Precision reward detectors that rank true anomalies highly under class imbalance, and Range-based variants (RangePrec, RangeRec, RangeF1, RangePR-AUC/VUS, RangeROC-VUS) credit detectors that identify the extent of anomalous segments rather than just isolated points. Choose the metric that best reflects the cost structure of your application.
| Rank | Detector ↕ | ROC-AUC ↓ | PR-AUC ↕ | AvgPrec ↕ | Precision ↕ | Recall ↕ | F1 ↕ | RangePrec ↕ | RangeRec ↕ | RangeF1 ↕ | RangePR-AUC ↕ | RangePR-VUS ↕ | RangeROC-VUS ↕ | time_sec ↕ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 🥇 | KMeans | 0.6987 | 0.1796 | 0.1806 | 0.0680 | 0.3795 | 0.0954 | 0.0476 | 0.1981 | 0.0401 | 0.0799 | 0.0714 | 0.6046 | 0.4796 |
| 🥈 | TiRex | 0.6852 | 0.1037 | 0.1052 | 0.0566 | 0.2739 | 0.0751 | 0.0630 | 0.0872 | 0.0237 | 0.0282 | 0.0291 | 0.5962 | 288.0356 |
| 🥉 | Toto-4m | 0.6568 | 0.1471 | 0.1484 | 0.0753 | 0.3474 | 0.0954 | 0.0567 | 0.0902 | 0.0162 | 0.0252 | 0.0305 | 0.5698 | 285.0838 |
| 4 | Stumpi | 0.6414 | 0.1476 | 0.1487 | 0.0954 | 0.2980 | 0.1088 | 0.0670 | 0.1829 | 0.0396 | 0.1003 | 0.0788 | 0.5980 | 323.6492 |
| 5 | TinyTimeMixer | 0.5877 | 0.1093 | 0.1105 | 0.0645 | 0.2506 | 0.0769 | 0.0454 | 0.0497 | 0.0237 | 0.0190 | 0.0297 | 0.5616 | 91.7534 |
| 6 | SAND | 0.5712 | 0.0321 | 0.0325 | 0.0197 | 0.1135 | 0.0278 | 0.0206 | 0.0364 | 0.0107 | 0.0171 | 0.0383 | 0.5351 | 63.8342 |
| 7 | NormalDistribution | 0.4963 | 0.0318 | 0.0326 | 0.0227 | 0.1236 | 0.0316 | 0.0164 | 0.0296 | 0.0096 | 0.0112 | 0.0306 | 0.4876 | 0.0038 |
| 8 | Baseline | 0.4893 | 0.0190 | 0.0192 | 0.0183 | 0.0917 | 0.0235 | 0.0186 | 0.0024 | 0.0025 | 0.0133 | 0.0274 | 0.4956 | 0.0002 |
Streaming detectors must process each data point in real time — a high-accuracy detector that cannot keep up with the data rate is unusable in practice. This plot reveals the trade-off: detectors in the upper-left corner offer the best of both worlds, while those in the lower-right sacrifice throughput without a quality gain. Quality is expressed as the mean of RangePR-VUS and RangeROC-VUS — metrics that integrate detector performance across all thresholds and temporal extents, making them the most comprehensive measure for streaming anomaly detection.