Flynn  /  Validation
Validated across five domains · Same source · Same flags

The same binary, on five different signals.

No per-domain configuration. No retraining. No oracle thresholds. These are deployed numbers — what an integrator gets after install, power-on, walk-away.

Bearing precision
0.988
CWRU · deployed
Run-to-failure lead
3–4 days
before documented failure
Soak false positives
0
120 hours · 5 seeds
Soak samples
1.2 M
bearing normal-op traces · 0 FP
Domain 01 · Bearing vibration

The industry-standard bearing benchmark.

CWRU bearing dataset with 27 distinct fault pairs spanning inner race, ball, and outer race faults at three severity levels. The recommended operating point produces the deployed numbers below — without an oracle threshold and without per-fault tuning.

Protocol

Dataset
Case Western Reserve University bearing data
Fault pairs
27 (inner race · ball · outer race × 3 severities)
Threshold
Self-calibrated · locked at deployment
Configuration
Default operating point
Default mode
Precision
0.988
of alarms raised, share correct
F1 score
0.829
harmonic mean
Recall
0.996
share of real faults caught
Tandem mode (single compile flag)
Precision
0.972
Recall
0.996
Use case
Rotating machinery
Domain 02 · Run-to-failure

Caught two failing bearings. Stayed silent on the healthy ones.

An industry-standard run-to-failure dataset spans more than 30 days of continuous vibration from four bearings, two of which failed and two of which survived. Flynn ran on all eight accelerometer channels simultaneously — zero configuration changes between them.

Channels
8
simultaneous · zero config
Failing bearings detected
2 / 2
both before documented failure
Lead time
3–4 d
ahead of failure date
Healthy bearings
2 / 2
silent across full window
Duration
30+ days
continuous vibration
False alarms
0
across healthy channels

Protocol

Dataset
NASA IMS Bearing dataset
Setup
4 bearings · 8 accelerometer channels
Outcome
2 failed · 2 survived
Sampling
Continuous, multi-day
Configuration
Same source, same flags across channels
Lee, J., Qiu, H., Yu, G., Lin, J. and Rexnord Technical Services (2007). IMS, University of Cincinnati. "Bearing Data Set", NASA Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA.
Domain 03 · Ambient & diurnal

Operationally compatible with shift-cycle review.

336 hours of ambient time-series covering temperature, environmental, and process-control signals — slow-changing data with strong day-night cycles. Flynn maintains a false-positive rate that fits the cadence of normal operator review.

Protocol

Duration
336 hours
Signal types
Temperature · environmental · process-control
Review cadence
Shift-cycle compatible
FP / hour
0.074 – 0.080
across 336 hours
Alerts / day
~1.8 – 1.9
reviewable in normal rhythm
Tuning
None
same source as bearing
Domain 04 · Electrical grid stability

Electrical-bus stability with no domain assumptions.

10,000 instances across 12 features of electrical-grid stability data — a signal domain that shares nothing with rotating-machinery vibration. Same source, same flags, no per-domain configuration.

F1 score
0.532
no per-domain config
Instances
10,000
Features
12
Domain 05 · Soak testing

The number operators ask for first.

The dominant operational concern with anomaly detectors is the false-alarm rate. Across 120 simulated hours of synthetic vibration — 5 random seeds × 24 hours each — Flynn produced zero false positives, bounded above by 0.025 per hour at 95% confidence. On bearing-data normal-operation traces, zero false positives across 1.2 million samples.

Protocol

Synthetic soak
120 hours · 5 seeds × 24 hours
Bearing soak
1.2 million normal-operation samples
Confidence
95% CI bound on FP rate
FP · synthetic
0
120 hours
FP · bearing
0
1.2 M samples
95% CI bound
< 0.025/h
false-positive rate

Zero false positives on bearing data. Operationally low on diurnal. The numbers operators actually care about.

Note on completeness: an accelerated-life bearing-degradation dataset compresses years of wear into hours, producing a signal profile well outside Flynn's standard deployment envelope. F1 = 0.268 on this test indicates useful signal is still extracted even when the operational regime is structurally hostile to Flynn's architectural assumptions. Included for completeness; not a recommended deployment scenario.

Reproduce these numbers

Benchmarks, source, reproduction.

Flynn's empirical claims are reproducible from committed benchmarks and committed source code. Artifacts and reproduction instructions are available to evaluation licensees.

Request evaluation access Read the whitepaper
Email
tripp@entromorphic.com
Available
Source code · benchmarks · scripts · datasets manifest
Under
Evaluation license · NDA on request