Autoregressive Visual Generation

Upload an image to begin...

Step 1: The model predicts the next "Token" (patch) based on previous ones.

Step 2: It fills the grid sequentially (Autoregressive).

Step 3: A decoder smooths the final result.