Today, achieving peak performance on modern AI accelerators often requires control over low-level hardware features. This trend is expected to further exacerbate as more asynchronicity and dynamism ...