Extending TraceML¶
This guide is for contributors adding a new metric, diagnosis, summary section, or compare field. It follows the current code layout and avoids older internal paths.
Mental Model¶
training process
-> samplers collect telemetry
-> runtime sender publishes batches
-> aggregator stores SQLite history
-> live display reads renderer/computer payloads
-> final report builds reporting sections
-> compare reads final summary JSON
Live UI and final summaries are separate paths. They can share diagnostics, but
they should pass explicit policies such as LIVE_STEP_TIME_POLICY or
SUMMARY_STEP_TIME_POLICY when thresholds differ.
Add a Diagnostic Rule¶
Diagnostics live under src/traceml/diagnostics/<domain>/.
Current domains include:
systemprocessstep_timestep_memory
Typical files:
context.py normalized input signals
policy.py thresholds and named policies
rules.py one rule class per issue
api.py public builder that runs rules and selects primary diagnosis
Add one rule class in rules.py, add it to the domain's default rule tuple,
then update priority sorting if the new issue should beat existing issues.
Tests should live in tests/diagnostics/ and cover:
- the rule triggers
- the rule does not trigger for normal input
- priority when multiple issues trigger together
Add a Summary Section¶
Final-report sections live under src/traceml/reporting/sections/.
Current sections:
systemprocessstep_timestep_memory
Each section follows this shape:
loader.py read SQLite / section inputs
builder.py build JSON payload and card text
formatter.py render section text
model.py section-local data helpers
Register sections through src/traceml/reporting/final.py. Keep the aggregator
as a caller only; report assembly belongs in reporting.
Tests should live in tests/reporting/summary/. Prefer small SQLite fixtures
over large golden snapshots. Assert stable schema keys and a few important text
lines.
Add a Sampler¶
Runtime sampler selection is in src/traceml/runtime/sampler_registry.py.
To add a sampler:
- Implement a
BaseSamplersubclass undersrc/traceml/samplers/. - Add a
SamplerSpectoDEFAULT_SAMPLER_REGISTRY. - Restrict it by
profilesandmodesso it only runs where needed. - Add SQLite projection, renderer, or summary code only if the data is user-facing.
Layer-level samplers are currently deep profile only. Keep advanced profiling
out of normal run and watch paths unless there is a strong reason.
Tests should live in tests/runtime/ for selection behavior and in a more
specific folder if the sampler has domain logic.
Add a Compare Metric¶
Compare code lives under src/traceml/reporting/compare/.
Important files:
sections/<section>.py extract comparable values from final summary JSON
model.py typed compare objects
verdict.py rule-based verdict selection
formatters.py terminal text output
core.py payload assembly
Add metric extraction to the relevant section comparer first. Only add a verdict rule if the metric should affect the top-level outcome. Only show a row in the text formatter if it helps users compare runs quickly.
Tests should live in tests/reporting/compare/ and cover missing data, changed
values, and verdict priority when multiple signals disagree.
Add Live Display¶
Live display code is renderer-driven. CLI and dashboard renderers may differ.
Relevant paths:
src/traceml/renderers/src/traceml/aggregator/display_drivers/
Keep renderer methods focused on presentation. Put data shaping in a compute object or formatter when the logic is reusable or non-trivial.
Fail Open¶
TraceML should not break user training because optional telemetry, rendering,
or reporting failed. Existing code logs advisory failures through
traceml.loggers.error_log.get_error_logger.
Use that pattern for non-critical paths:
logger = get_error_logger("MyComponent")
try:
...
except Exception as exc:
logger.exception("[TraceML] MyComponent failed: %s", exc)
Prefer returning an empty payload, NO DATA diagnosis, or fallback text over
raising from live display, compare rendering, or final-report generation.
Test Layout¶
Tests are grouped by area:
tests/core/
tests/diagnostics/
tests/reporting/summary/
tests/reporting/compare/
tests/runtime/
tests/sdk/
tests/telemetry/
tests/display/
tests/integrations/
Keep tests close to the behavior they protect. The most valuable tests are small and direct: rule behavior, priority, schema shape, and fail-open behavior.