The figure below replays the un-stabilized 400-layer baseline run analysed in the paper — the same trajectory through which MMS develops. Each cluster is one anchor layer; tokens collapse when the selected metric is high. Default metric is Token Cosine Similarity (TCS, group (f)) — in deep layers it climbs toward 1.0 and stays there, the signature of mean-dominated homogenization. Switch metric on the left, scrub the timeline at the bottom, drag to orbit the camera.
@article{lu2026mms,
title = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers},
author = {Lu, Pengqi},
journal = {arXiv preprint arXiv:2605.06169},
year = {2026}
}