arXiv 2605.06169 · May 2026

Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers

Pengqi Lu

Live gradient-diagnosis viewer

The figure below replays the un-stabilized 400-layer baseline run analysed in the paper — the same trajectory through which MMS develops. Each cluster is one anchor layer; tokens collapse when the selected metric is high. Default metric is Token Cosine Similarity (TCS, group (f)) — in deep layers it climbs toward 1.0 and stays there, the signature of mean-dominated homogenization. Switch metric on the left, scrub the timeline at the bottom, drag to orbit the camera.

FLOW · run 0xylyord · 400 layers · crashed open fullscreen ↗

BibTeX

@article{lu2026mms,
  title   = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers},
  author  = {Lu, Pengqi},
  journal = {arXiv preprint arXiv:2605.06169},
  year    = {2026}
}