rocm-trace-lite v0.3.3

Hot Trace — Prefill vs Decode

MI355X  ·  Conc=64  ·  ISL/OSL=1k/1k
RTL lite mode  ·  Rank 0 trace

RTL Overview
Prefill
Decode
Collectives
Shared / both phases
Hover kernel name for full symbol  ·  Timeline = sampled op stream  ·  DSV32 + Qwen3 sweeps running