Installation
From pip (recommended)
pip install rocm-trace-lite
The pip package includes the pre-built librtl.so and CLI tools.
From source
Requirements
ROCm (for HSA headers:
hsa/hsa.h,hsa/hsa_api_trace.h)SQLite3 development headers
g++ with C++17 support
# Install build dependencies (Ubuntu/Debian)
sudo apt install libsqlite3-dev g++
# Clone and build
git clone https://github.com/sunway513/rocm-trace-lite.git
cd rocm-trace-lite
make -j
# Install the shared library system-wide
sudo make install
# Install CLI tools
pip install -e .
Verify installation
# Check the library has no forbidden dependencies
ldd librtl.so | grep -E "roctracer|rocprofiler"
# Should produce no output (clean dependency chain)
# Quick smoke test (requires GPU)
rtl trace -o test.db python3 -c "
import torch
x = torch.randn(512, 512, device='cuda')
y = x @ x
torch.cuda.synchronize()
"
rtl summary test.db
Troubleshooting
If rtl trace reports 0 GPU ops:
Check preflight output —
rtl traceprints diagnostic messages before tracing. Look for warnings about missinglibhsa-runtime64.soorlibrtl.so.Multi-process workloads — frameworks like ATOM/vLLM spawn GPU workers in subprocesses. Set env vars globally before launching:
export HSA_TOOLS_LIB=$(python3 -c "from rocm_trace_lite import get_lib_path; print(get_lib_path())") export RTL_OUTPUT=trace_%p.db python3 my_model.py
After
make install, runsudo ldconfigto update the linker cache.
Build targets
make # Build librtl.so
make install # Install to /usr/local/lib and /usr/local/bin
make test-cpu # Run non-GPU unit tests
make test-gpu # GPU smoke test (requires ROCm GPU)
make clean # Remove build artifacts
Runtime dependencies
Library |
Source |
Required |
|---|---|---|
|
ROCm runtime |
Yes |
|
System |
Yes |
|
Not used |
No |
|
Not used |
No |
|
Not used (built-in shim) |
No |