Output Format
rocm-trace-lite outputs standard SQLite .db databases. The schema is compatible with RPD ecosystem tools.
Database schema
rocpd_op
GPU operations (kernel dispatches, roctx markers).
Column |
Type |
Description |
|---|---|---|
|
INTEGER |
GPU device index (0-based). -1 for host-side markers |
|
INTEGER |
HSA queue handle |
|
INTEGER |
Sequence number |
|
TEXT |
Dispatch info string (hwq, workgroup, grid dimensions). NULL for markers. |
|
INTEGER |
Start timestamp (nanoseconds) |
|
INTEGER |
End timestamp (nanoseconds) |
|
INTEGER |
FK to |
|
INTEGER |
FK to |
rocpd_string
Deduplicated string table for kernel names and operation types.
Column |
Type |
Description |
|---|---|---|
|
INTEGER |
Primary key |
|
TEXT |
Unique string value |
rocpd_metadata
Trace-level metadata.
Column |
Type |
Description |
|---|---|---|
|
TEXT |
Metadata key |
|
TEXT |
Metadata value |
rocpd_api
HIP API calls (reserved for RPD compatibility; populated in future HIP profiling mode).
Column |
Type |
Description |
|---|---|---|
|
INTEGER |
Process ID |
|
INTEGER |
Thread ID |
|
INTEGER |
Start timestamp (nanoseconds) |
|
INTEGER |
End timestamp (nanoseconds) |
|
INTEGER |
FK to |
|
INTEGER |
FK to |
rocpd_api_ops
Association table linking API calls to GPU operations.
Column |
Type |
Description |
|---|---|---|
|
INTEGER |
FK to |
|
INTEGER |
FK to |
rocpd_kernelapi
Kernel launch details (grid/workgroup dimensions).
Column |
Type |
Description |
|---|---|---|
|
INTEGER |
FK to |
|
TEXT |
HIP stream handle |
|
INTEGER |
Grid dimensions |
|
INTEGER |
Workgroup dimensions |
|
INTEGER |
Group segment size (bytes) |
|
INTEGER |
Private segment size (bytes) |
|
INTEGER |
FK to |
rocpd_copyapi
Memory copy details.
Column |
Type |
Description |
|---|---|---|
|
INTEGER |
FK to |
|
TEXT |
HIP stream handle |
|
INTEGER |
Copy size (bytes) |
|
TEXT |
Destination address |
|
TEXT |
Source address |
|
INTEGER |
Copy kind (H2D, D2H, D2D, etc.) |
|
INTEGER |
Synchronous flag |
rocpd_monitor
Monitoring data (reserved for future use).
Column |
Type |
Description |
|---|---|---|
|
TEXT |
Device type |
|
INTEGER |
Device ID |
|
TEXT |
Monitor metric type |
|
INTEGER |
Start timestamp |
|
INTEGER |
End timestamp |
|
TEXT |
Metric value |
Built-in views
top
Top kernels by total GPU time.
SELECT * FROM top LIMIT 10;
Column |
Description |
|---|---|
|
Kernel name |
|
Number of invocations |
|
Total GPU time (ns) |
|
Average per-call time (ns) |
|
Minimum time |
|
Maximum time |
|
% of total GPU busy time |
busy
GPU utilization per device.
SELECT * FROM busy;
Column |
Description |
|---|---|
|
GPU device index |
|
% of wall time with active kernels |
|
Total kernel dispatches |
|
Total GPU busy time (ns) |
|
Total wall time (ns) |
Example queries
-- Find the slowest individual kernel execution
SELECT s.string, (o.end - o.start)/1000 as duration_us
FROM rocpd_op o
JOIN rocpd_string s ON o.description_id = s.id
ORDER BY (o.end - o.start) DESC
LIMIT 1;
-- Count kernels per GPU
SELECT gpuId, count(*) as ops
FROM rocpd_op
WHERE gpuId >= 0
GROUP BY gpuId;
-- Find NCCL/RCCL communication kernels
SELECT s.string, count(*) as calls
FROM rocpd_op o
JOIN rocpd_string s ON o.description_id = s.id
WHERE s.string LIKE '%nccl%'
GROUP BY s.string;