Comparison with Other Profilers
Feature comparison
Feature |
rocm-trace-lite |
rocprofiler-sdk |
roctracer + RPD |
|---|---|---|---|
Dependencies |
libhsa-runtime64 + libsqlite3 |
ROCm 6.0+ full stack |
roctracer + RPD |
GPU kernel timing |
HSA signal injection |
HW counters + callbacks |
roctracer callbacks |
HIP API tracing |
No |
Yes |
Yes |
HW performance counters |
No |
Yes |
No |
PC sampling |
No |
Yes |
No |
roctx markers |
Built-in shim |
libroctx64 |
libroctx64 |
Multi-GPU |
Automatic per-process merge |
Manual |
Manual |
Output format |
SQLite .db + Perfetto JSON |
Various |
RPD SQLite |
Overhead |
Low (signal pool, single worker) |
Medium-High |
Medium |
New HW bring-up |
Works immediately (HSA only) |
Requires rocprofiler support |
Requires roctracer support |
When to use rocm-trace-lite
Kernel profiling on new hardware where rocprofiler is not yet available
Lightweight CI regression testing of kernel performance
AI framework teams who need clean kernel profiling without heavy dependencies
Quick distributed profiling (TP>1) with automatic merge
When to use rocprofiler-sdk instead
You need HW performance counters (cache hit rates, occupancy, etc.)
You need PC sampling for hotspot analysis
You need HIP API tracing (API call timestamps, arguments)
You need the full ROCm profiling ecosystem
Relationship to RPD
rocm-trace-lite is a standalone alternative, not a fork of ROCm/rocmProfileData:
Written from scratch (no RPD code dependency)
Outputs RPD-compatible SQLite schema (interoperable with RPD tools)
Can coexist with original RPD on the same system