Natural Language Autoencoder

Residual-stream activations from Qwen2.5-7B layer 20 → verbalized by nla-qwen2.5-7b-L20-av. Based on Natural Language Autoencoders (Anthropic, 2026).

Input text

Tokens — click any to explain

Explanation