RAID — Retrieval-Augmented Inverse Dynamics

The full RAID pipeline

The paper architecture combines a frozen GR-1 encoder and prediction head with a RAID decoder that retrieves nearby transitions, attends over their actions, and gates the result into a 7-DOF command.

RAID architecture diagram: GR-1 encoder, dreaming head, memory bank, and RAID head with direct trunk, cross-attention prior, and per-dimension gate

Dream the next state, then retrieve the action that likely caused it

RAID uses a frozen world-model encoder to imagine the next latent state, then decodes the implied motor command with a retrieval-augmented inverse-dynamics head.

Problem. Given GR-1 features f_t, f_t+1 and a memory bank M = {(f_i, f_i+1, a_i)}, RAID predicts the normalized action that caused a transition.

At deployment, GR-1 supplies a one-step dreamed feature f̂_t+1. RAID retrieves the nearest demonstrated transitions in the joint feature space, attends over their actions, and blends that action prior with a direct MLP estimate.

GR-1 + RAID. We freeze the GR-1 encoder and use its 384-dimensional class token as the state representation. The decoder conditions on k = 3 retrieved demonstrations alongside the dreamed transition.

Direct trunk. A two-hidden-layer MLP estimates the action from concat(f_t, f̂_t+1).

Cross-attention prior. The query transition attends over retrieved actions:

α_i = softmax(q^Tk_i / √d) â_prior = Σ α_i · a_i

Per-dimension gate. The final action is a dimension-wise blend of the direct estimate and the retrieval prior:

â = g ⊙ d_φ(f_t, f_t+1) + (1 − g) ⊙ â_prior
g = σ(W concat(f_t, f_t+1) + b)

Prior dropout and Gaussian jitter keep the model from simply copying the retrieved action, forcing the trunk and retrieval prior to share the work.

Fine-tune the policy with GRPO

After behavior cloning, RAID is refined online with Group Relative Policy Optimization: the BC policy rolls out in LIBERO-Spatial, rewards are grouped across parallel trajectories, and a relative advantage drives policy updates.

GRPO training pipeline: BC policy rollouts in LIBERO-Spatial, grouped rewards, group advantage, and policy update — GRPO loop over the RAID visual policy in LIBERO-Spatial with MuJoCo + EGL rollouts.

Transition grids

Two qualitative grids compare the current frame, GR-1's dreamed next frame, RAID actions, direct visual baseline actions, and ground truth across representative random seeds.

Seed 0

RAID, direct visual, and ground truth actions in the 200-demo LIBERO-Spatial setting.

Seed 1

Additional sampled transitions from the same 200-demo evaluation setup.

Retrieval gives the inverse-dynamics head a sharper action prior

On LIBERO-Spatial with 25 demonstrations, cross-attention RAID over GR-1 features reaches 0.132 validation MSE versus 0.842 for the same visual head without retrieval — a 6.4× improvement.

Validation MSE comparison between direct visual and RAID visual policies across 25, 50, 100, and 200 demonstrations — Validation MSE on LIBERO-Spatial: RAID visual consistently outperforms the direct visual baseline across all demonstration scales.

Video demonstrations

Side-by-side LIBERO-Spatial rollouts show the direct visual baseline and the RAID visual policy under the same comparison setup.

Direct Visual Baseline

Direct action prediction from visual features without retrieval.

RAID Visual Policy

Retrieval-augmented action prediction using remembered transitions.

Retrieval-Augmented Inverse Dynamics for robotic manipulation