rocm-trace-lite
Getting Started
Installation
From pip (recommended)
From source
Requirements
Verify installation
Troubleshooting
Build targets
Runtime dependencies
Quick Start
Basic usage
View results
Terminal summary
Perfetto timeline
SQL queries
Multi-GPU / Distributed
Using roctx markers
CUDAGraph / HIP graph compatibility
Environment variables
Profiling modes
Environment variable mode
CLI Reference
rtl trace
rtl summary
rtl convert
rtl info
Environment variables
Tutorials
Tutorial: Profiling Prefill vs Decode with roctx Markers
Quick Start
1. Install RTL
2. Add roctx markers to your code
3. Run with RTL
4. Analyze by region
Example Output
Key Insights
Visualize in Perfetto
Notes
User Guide
Multi-GPU and Distributed Profiling
How it works
Usage
Diagnostic tool
Validated configurations
Diagnostic counters
CUDAGraph compatibility
Perfetto Visualization
Opening traces
Manual conversion
Trace structure
Single GPU behavior
Multi-GPU behavior
Output Format
Database schema
rocpd_op
rocpd_string
rocpd_metadata
rocpd_api
rocpd_api_ops
rocpd_kernelapi
rocpd_copyapi
rocpd_monitor
Built-in views
top
busy
Example queries
Architecture
How It Works
Architecture overview
Interception mechanism
1. Library loading
2. Queue interception
3. Signal injection profiling
4. Completion worker
5. Symbol resolution
6. roctx shim
Signal pool design
CUDAGraph / HIP graph handling
Batch skip (automatic)
Profiling modes (RTL_MODE)
Known limitation
Why signal injection?
Comparison with Other Profilers
Feature comparison
When to use rocm-trace-lite
When to use rocprofiler-sdk instead
Relationship to RPD
Development
Contributing
Development setup
Running tests
Code style
Before submitting a PR
Architecture guidelines
Changelog
v0.3.5
Documentation
v0.3.4
v0.3.3
v0.3.2
Bug fixes
v0.3.0
Profiling modes (RTL_MODE)
CUDAGraph / HIP graph compatibility (#67)
Signal forwarding
Testing
Bug fixes
v0.2.0
Signal injection profiling
Rename and consistency
Documentation
Testing
v0.1.1
Multi-process support (#28)
Packaging
Testing
v0.1.0
Initial release
rocm-trace-lite
Index
Index