Step 1: The model predicts the next "Token" (patch) based on previous ones.
Step 2: It fills the grid sequentially (Autoregressive).
Step 3: A decoder smooths the final result.