rocm-trace-lite

Getting Started

  • Installation
    • From pip (recommended)
    • From source
      • Requirements
      • Verify installation
      • Troubleshooting
    • Build targets
    • Runtime dependencies
  • Quick Start
    • Basic usage
    • View results
      • Terminal summary
      • Perfetto timeline
      • SQL queries
    • Multi-GPU / Distributed
    • Using roctx markers
    • CUDAGraph / HIP graph compatibility
    • Environment variables
      • Profiling modes
    • Environment variable mode
  • CLI Reference
    • rtl trace
    • rtl summary
    • rtl convert
    • rtl info
    • Environment variables

Tutorials

  • Tutorial: Profiling Prefill vs Decode with roctx Markers
    • Quick Start
      • 1. Install RTL
      • 2. Add roctx markers to your code
      • 3. Run with RTL
      • 4. Analyze by region
    • Example Output
    • Key Insights
    • Visualize in Perfetto
    • Notes

User Guide

  • Multi-GPU and Distributed Profiling
    • How it works
    • Usage
    • Diagnostic tool
    • Validated configurations
    • Diagnostic counters
    • CUDAGraph compatibility
  • Perfetto Visualization
    • Opening traces
    • Manual conversion
    • Trace structure
    • Single GPU behavior
    • Multi-GPU behavior
  • Output Format
    • Database schema
      • rocpd_op
      • rocpd_string
      • rocpd_metadata
      • rocpd_api
      • rocpd_api_ops
      • rocpd_kernelapi
      • rocpd_copyapi
      • rocpd_monitor
    • Built-in views
      • top
      • busy
    • Example queries

Architecture

  • How It Works
    • Architecture overview
    • Interception mechanism
      • 1. Library loading
      • 2. Queue interception
      • 3. Signal injection profiling
      • 4. Completion worker
      • 5. Symbol resolution
      • 6. roctx shim
    • Signal pool design
    • CUDAGraph / HIP graph handling
      • Batch skip (automatic)
      • Profiling modes (RTL_MODE)
      • Known limitation
      • Why signal injection?
  • Comparison with Other Profilers
    • Feature comparison
    • When to use rocm-trace-lite
    • When to use rocprofiler-sdk instead
    • Relationship to RPD

Development

  • Contributing
    • Development setup
    • Running tests
    • Code style
    • Before submitting a PR
    • Architecture guidelines
  • Changelog
    • v0.3.5
      • Documentation
    • v0.3.4
    • v0.3.3
    • v0.3.2
      • Bug fixes
    • v0.3.0
      • Profiling modes (RTL_MODE)
      • CUDAGraph / HIP graph compatibility (#67)
      • Signal forwarding
      • Testing
      • Bug fixes
    • v0.2.0
      • Signal injection profiling
      • Rename and consistency
      • Documentation
      • Testing
    • v0.1.1
      • Multi-process support (#28)
      • Packaging
      • Testing
    • v0.1.0
      • Initial release
rocm-trace-lite
  • Search


© Copyright 2026, AMD.

Built with Sphinx using a theme provided by Read the Docs.