Plain Theory Labs
Plain Theory Labs

One honest score for AI compute infrastructure.

MIT Supercloud 2022 73,367 jobs · Research HPC · Cambridge, MA · NVIDIA V100
0.000 of allocated compute utilized
Alibaba Helios 2020 361,498 jobs · Public ML trace · Beijing, China · Heterogeneous GPU fleet
0.000 of allocated compute utilized
Microsoft Philly 2017 74,020 jobs · DNN Training · Redmond, WA · NVIDIA K80 · P100 · V100
0.000 of allocated compute utilized
GPU Efficiency · 0.0 to 1.0 · 0 jobs evaluated across 3 public datasets
PROFILE

The cluster characterizes itself first.

PROFILE runs before any other engine. It identifies the scheduler, hardware, and workload types in use — then routes each engine to the highest-fidelity data source available. Everything that follows depends on what PROFILE finds.

ACE

GPU utilization, measured honestly.

Of the GPU capacity allocated to each workload, how much was actually used? The answer is almost always surprising. ACE finds the gap between what was requested and what ran.

COOL

Energy in versus useful work out.

Cooling is the largest non-compute energy cost in most facilities. COOL grades thermal efficiency against what is achievable for this facility type and climate — not a generic average.

FLUX kWh

Is the carbon accounting actually traceable?

Carbon accounting in AI infrastructure ranges from rigorous to weakly documented. FLUX grades the methodology and detects when claimed renewable coverage cannot be traced from grid to certificate to claim.

PACE

The decisions made before work starts.

Scheduler policy determines how compute gets allocated before a single workload runs. PACE grades those decisions and finds the patterns that leave hardware waiting: over-requesting, poor backfill, fairshare imbalance.

CORE

Is this the right hardware for this work?

A cluster optimized for training in 2021 may be the wrong tool for inference in 2026. CORE grades hardware-to-workload fit, fleet age, and whether the infrastructure was designed for what it is actually running.

GRADE 0.873 0.85

Five engines become one number.

GRADE weighs what every engine found and produces a PTL Score between 0.0 and 1.0. The weights are published. The formula is deterministic. The same inputs always produce the same number. The engine source is public.

ATLAS 1 2 3

Ranked by impact, not difficulty.

A score is only useful if it points somewhere. ATLAS ranks the changes most likely to improve the next assessment. Every recommendation is specific to this cluster and this workload profile. No generic advice.

CLAW

The cluster describes itself.

A NemoClaw-compatible agent that runs inside the infrastructure being assessed. No forms. No manual exports. CLAW collects from DCGM, Slurm, Kubernetes, and inference servers — then packages it for assessment.

research@plaintheory.org