CLI Reference
rocm-trace-lite provides the rtl command-line tool. (rtl-legacy also works as an alias.)
rtl trace
Trace a GPU workload and generate profiling output.
rtl trace [-o OUTPUT] COMMAND [ARGS...]
Options:
Flag |
Default |
Description |
|---|---|---|
|
|
Output trace file path |
Output files generated:
File |
Description |
|---|---|
|
SQLite trace database (RPD format) |
|
Text summary of top kernels |
|
Compressed Perfetto JSON (open in ui.perfetto.dev) |
Examples:
# Basic tracing
rtl trace -o trace.db python3 my_model.py
# Multi-GPU with torchrun
rtl trace -o trace.db torchrun --nproc_per_node=4 train.py
# Trace a shell command
rtl trace -o trace.db -- ./my_hip_app --batch-size 32
rtl summary
Print top kernels and GPU utilization from a trace.
rtl summary [-n LIMIT] INPUT
Options:
Flag |
Default |
Description |
|---|---|---|
|
20 |
Number of top kernels to show |
Example:
rtl summary -n 10 trace.db
rtl convert
Convert an RPD trace to Perfetto/Chrome Trace JSON.
rtl convert [-o OUTPUT] INPUT
Options:
Flag |
Default |
Description |
|---|---|---|
|
|
Output JSON file path |
Example:
rtl convert trace.db -o trace.json
# Open trace.json in https://ui.perfetto.dev
rtl info
Show structural information about a trace file.
rtl info INPUT
Example:
rtl info trace.db
Trace: trace.db
Size: 1.2 MB
Tables: rocpd_api, rocpd_api_ops, rocpd_copyapi, rocpd_kernelapi, rocpd_metadata, rocpd_monitor, rocpd_op, rocpd_string
rocpd_op: 728 rows
rocpd_string: 45 rows
Duration: 13.247s
Unique kernels: 5
Environment variables
Variable |
Values |
Description |
|---|---|---|
|
file path |
Output trace file (supports |
|
|
Profiling mode. |
|
|
Log per-call summary: intercept call count, device ID, batch skip decisions |
|
|
Log per-packet details: AQL type, signal handle, kernel object address |