Wafer Reveals GLM5.2 Inference Performance on AMD MI355X

인프라/플랫폼 | Sat Jul 04 2026 00:00:00 GMT+0000 (Coordinated Universal Time) | 1 sources

Wafer achieved 2626 tok/s/node running GLM5.2 on AMD MI355X at half the cost of Blackwell.

Analysis

[Wafer] published GLM5.2 inference benchmarks on AMD MI355X [1]

[Wafer] applied MXFP4 quantization via AMD Quark [1]

[sglang] selected as the inference framework for serving MXFP4 + GLM-5.2 [1]

[AMD Instinct MI355X] emerged as a low-cost inference alternative to Blackwell [1]

Sources