Use TraceML with W&B, MLflow, or TensorBoard¶

You do not need to replace your existing tracking stack to use TraceML.

TraceML is designed to work alongside tools like:

Weights & Biases (W&B)
MLflow
TensorBoard
Lightning loggers
other experiment or metric tracking tools

A good mental model is:

your existing stack tracks runs, configs, metrics, and artifacts
TraceML explains why training is slow

The short version¶

Use your existing tools for:

experiment tracking
loss and accuracy curves
artifacts and checkpoints
long-term dashboards
team reporting

Use TraceML for:

input bottlenecks
compute-bound steps
DDP stragglers
wait-heavy behavior
memory creep
step-aware bottleneck diagnosis
compact run-to-run comparison from saved TraceML summary JSON files

TraceML is the training bottleneck finder in the stack.

A common setup¶

A common setup looks like this:

Hugging Face, Lightning, or a normal PyTorch loop for training
W&B or MLflow for experiment tracking
TraceML for bottleneck diagnosis during runs and end-of-run summaries

That means:

keep your current tracking workflow
add TraceML when you need to understand throughput problems

Why this setup makes sense¶

Tracking tools answer questions like:

what config did this run use?
what was the loss curve?
which checkpoint belongs to this run?
how did this run compare to the last one?

TraceML answers questions like:

is the job input-bound or compute-bound?
is one rank slower than the others?
is the job spending time waiting?
is memory drifting upward over time?
which phase of the step is the real bottleneck?

These are different jobs.

Example: W&B + TraceML¶

You can keep using W&B logging in your training script and launch the run through TraceML.

Example:

import traceml
import wandb

wandb.init(project="my-project")

for batch in dataloader:
    with traceml.trace_step(model):
        optimizer.zero_grad(set_to_none=True)
        outputs = model(batch["x"])
        loss = criterion(outputs, batch["y"])
        loss.backward()
        optimizer.step()

    wandb.log({"loss": loss.item()})

Run:

traceml run train.py

This gives you:

W&B for run tracking
TraceML for live bottleneck diagnosis

Log the final TraceML summary to W&B¶

If you want a quieter run and a structured TraceML summary you can log at the end, launch in summary mode:

traceml run train.py --mode=summary

Then request the finalized TraceML summary near the end of your script and log selected fields into W&B:

import traceml
import wandb

wandb.init(project="my-project")

# training loop ...

summary = traceml.final_summary(print_text=True)
if summary is not None:
    wandb.summary["traceml/step_time_status"] = summary["step_time"][
        "diagnosis"
    ]["status"]
    wandb.summary["traceml/step_memory_status"] = summary["step_memory"][
        "diagnosis"
    ]["status"]
    wandb.summary["traceml/duration_s"] = summary.get("duration_s")

This lets W&B stay your experiment system of record while TraceML contributes a clean bottleneck diagnosis at the end of the run.

If you also store final_summary.json as a run artifact, you can reuse it later with traceml compare.

Example: MLflow + TraceML¶

You can do the same with MLflow.

import traceml
import mlflow

mlflow.start_run()

for batch in dataloader:
    with traceml.trace_step(model):
        optimizer.zero_grad(set_to_none=True)
        outputs = model(batch["x"])
        loss = criterion(outputs, batch["y"])
        loss.backward()
        optimizer.step()

    mlflow.log_metric("loss", float(loss.item()), step=step)

Run:

traceml run train.py

This gives you:

MLflow for experiment tracking
TraceML for bottleneck diagnosis

Log the final TraceML summary to MLflow¶

TraceML can also return a structured final summary for MLflow logging:

import traceml
import mlflow

mlflow.start_run()

# training loop ...

summary = traceml.final_summary(print_text=True)
if summary is not None:
    mlflow.log_param(
        "traceml_step_time_status",
        summary["step_time"]["diagnosis"]["status"],
    )
    mlflow.log_param(
        "traceml_step_memory_status",
        summary["step_memory"]["diagnosis"]["status"],
    )
    mlflow.log_dict(summary, "traceml/final_summary.json")

This is a good fit when you want both a compact diagnosis in your run metadata and the full TraceML summary JSON attached to the run.

That attached JSON is also a good input for traceml compare later.

Compare runs later¶

Once you have TraceML final summary JSON files for two runs, compare them with:

traceml compare run_a.json run_b.json

This is useful when:

a training change may have made runs slower
a dataloader or preprocessing change may have shifted the bottleneck
a model or optimizer change may have moved time into a different phase
dashboards look similar but throughput still feels worse

A practical workflow is:

run training with TraceML summary mode
keep the TraceML final summary JSON as a W&B or MLflow artifact
compare two saved summaries locally with traceml compare

See Compare Runs.

Keeping terminal output clean¶

If you use TraceML in CLI mode together with other loggers, the terminal can get noisy.

Good ways to reduce that:

disable tqdm progress bars
reduce extra console logging
use --mode=summary if you only want the final TraceML summary
use the local UI if the terminal feels crowded

Launch the local UI with:

traceml run train.py --mode=dashboard

This is often the cleanest option when you want:

existing loggers to keep running
TraceML visible at the same time
less terminal clutter

Do I need to disable W&B or MLflow?¶

No.

For the cleanest local demo experience, you may temporarily disable external reporting, but it is not required.

For example, in some Hugging Face runs you might set:

report_to="none"
disable_tqdm=True

That is just to keep the terminal cleaner during local diagnosis runs.

It is not a TraceML requirement.

Best way to adopt TraceML¶

A clean adoption path is:

start with traceml run train.py
use --mode=summary when you want a quieter run and a structured final summary
log selected TraceML summary fields into W&B or MLflow if useful
keep the TraceML final summary JSON for important runs
compare two saved runs later with traceml compare

This usually gives the best balance between:

low overhead
clear diagnosis
compatibility with your current stack
simple run-to-run review