FAQ¶

Short answers to common questions before or during adoption.

If you are new to TraceML, start with:

Do I need to replace W&B, MLflow, or TensorBoard?¶

No.

TraceML is designed to work alongside your existing stack.

Use your current tools for:

experiment tracking
artifacts
dashboards
reporting

Use TraceML for:

bottleneck diagnosis
stragglers
wait-heavy behavior
memory creep
run-to-run bottleneck comparison from saved TraceML summary JSON files

See:

Use TraceML with W&B / MLflow

How is TraceML different from `torch.profiler`?¶

torch.profiler is a deeper profiling tool.

TraceML is a lighter-weight bottleneck finder for real training runs.

A simple rule:

use TraceML to find where the problem is
use torch.profiler when you need deeper low-level analysis

How much code do I need to change?¶

Usually just this:

import traceml

traceml.init(mode="auto")

with traceml.trace_step(model):
    ...

For supported integrations:

Hugging Face: use TraceMLTrainer
Lightning: add TraceMLCallback()

from traceml.decorators import trace_step still works for backward compatibility, but the preferred public API is now the top-level traceml.*.

Should I use `traceml.trace_step()` or `trace_step()`?¶

Prefer:

import traceml

traceml.init(mode="auto")

with traceml.trace_step(model):
    ...

TraceML still supports:

from traceml.sdk.decorators_compat import trace_step

for backward compatibility, but new examples and docs use the top-level traceml.* API. Legacy decorator imports are planned for deprecation starting in v0.3.0.

What is the difference between `auto`, `manual`, and `selective`?¶

Use:

traceml.init(mode="auto") for the default TraceML workflow
traceml.init(mode="manual") when you want fully explicit wrappers
traceml.init(mode="selective", ...) when you want some automatic patching and some explicit wrapping

Start with auto unless you already know you need more control.

When should I use the wrapper APIs?¶

Use wrappers when you do not want the default automatic patching path or when part of your training loop is custom.

The main wrapper entrypoints are:

traceml.wrap_dataloader_fetch(...)
traceml.wrap_forward(...)
traceml.wrap_backward(...)
traceml.wrap_optimizer(...)

This is most relevant in manual or selective mode. Most users should start with mode="auto" and only move to wrappers if they need explicit control.

Does TraceML work with Hugging Face Trainer?¶

Yes.

See:

Hugging Face Trainer

Does TraceML work with PyTorch Lightning?¶

Yes.

See:

PyTorch Lightning

Does TraceML support DDP?¶

Yes, for single-node DDP.

TraceML can surface:

input stragglers
compute stragglers
rank imbalance
worst-rank vs median-rank skew

Does TraceML support multi-node?¶

Not yet.

Today the main distributed target is single-node DDP.

Does TraceML support FSDP?¶

Yes, for single-node FSDP.

If you hit an issue on your setup, please open an issue with a minimal repro and environment details.

Does TraceML support tensor parallel or pipeline parallel?¶

Not yet.

What is the difference between `watch`, `run`, and `deep`?¶

watch - zero-code system and process visibility

run - the default mode - step-aware bottleneck diagnosis - the best place to start for most users

deep - optional deeper layer-level inspection - best for short follow-up diagnostic runs

Start with run.

Is there a local UI?¶

Yes.

Run:

traceml run train.py --mode=dashboard

The local UI runs at:

http://localhost:8765

Is there a summary-only mode?¶

Yes.

Run:

traceml run train.py --mode=summary

This skips the live UI and focuses on the final end-of-run summary. It is a good fit when you want lower terminal noise or want to forward TraceML summary fields into W&B or MLflow.

Can TraceML compare two runs?¶

Yes.

Use:

traceml compare run_a.json run_b.json

traceml compare is designed to consume TraceML final_summary.json artifacts.

It writes:

a structured compare JSON
a compact text report

A good workflow is:

run each job with TraceML
retain final_summary.json for each run
compare the two runs with traceml compare

See:

Compare Runs

Can I log TraceML output into W&B or MLflow?¶

Yes.

TraceML is designed to work alongside your existing tracking stack. The recommended low-noise path is:

launch with traceml run train.py --mode=summary
call traceml.final_summary() near the end of your script
log selected fields from the returned dict into W&B or MLflow

See:

Use TraceML with W&B / MLflow

Can I run without TraceML telemetry for a baseline?¶

Yes.

Use:

traceml run train.py --disable-traceml

What does `MEMORY CREEP` usually mean?¶

It usually means memory is rising over time instead of staying stable.

A common cause is retaining tensors across steps, for example by storing graph-backed tensors in a persistent cache or list.

See:

How to Read TraceML Output

What does `INPUT STRAGGLER` mean?¶

It means one rank is slower in the input path than the typical rank.

Common causes:

uneven data loading
preprocessing imbalance
host-side jitter

See:

How to Read TraceML Output

What does `COMPUTE STRAGGLER` mean?¶

It means one rank is slower in compute than the typical rank.

Common causes:

uneven shapes or data
rank-local branching or extra work
compute imbalance in forward, backward, or optimizer

See:

How to Read TraceML Output

When should I use compare instead of live output?¶

Use live output when you want to understand the current run while it is still in progress.

Use compare when you already have final summary JSON files and want to answer:

did the run get slower or faster?
did the diagnosis change?
did memory or wait behavior regress?

Live output is for in-run diagnosis.

Compare is for run-to-run review after the runs have finished.

FAQ¶

Do I need to replace W&B, MLflow, or TensorBoard?¶

How is TraceML different from torch.profiler?¶

How much code do I need to change?¶

Should I use traceml.trace_step() or trace_step()?¶

What is the difference between auto, manual, and selective?¶

When should I use the wrapper APIs?¶

Does TraceML work with Hugging Face Trainer?¶

Does TraceML work with PyTorch Lightning?¶

Does TraceML support DDP?¶

Does TraceML support multi-node?¶

Does TraceML support FSDP?¶

Does TraceML support tensor parallel or pipeline parallel?¶

What is the difference between watch, run, and deep?¶

Is there a local UI?¶

Is there a summary-only mode?¶

Can TraceML compare two runs?¶

Can I log TraceML output into W&B or MLflow?¶

Can I run without TraceML telemetry for a baseline?¶

What does MEMORY CREEP usually mean?¶

What does INPUT STRAGGLER mean?¶

What does COMPUTE STRAGGLER mean?¶

When should I use compare instead of live output?¶

How is TraceML different from `torch.profiler`?¶

Should I use `traceml.trace_step()` or `trace_step()`?¶

What is the difference between `auto`, `manual`, and `selective`?¶

What is the difference between `watch`, `run`, and `deep`?¶

What does `MEMORY CREEP` usually mean?¶

What does `INPUT STRAGGLER` mean?¶

What does `COMPUTE STRAGGLER` mean?¶