Training
Pre-training τ₀-WM from scratch costs tens of thousands of dollars; fine-tuning it on your captured dataset costs about a thousand. This page explains the difference, the cloud providers Midcore drives, and the operating procedure for going from finalised dataset to deployed checkpoint.
Pre-training vs fine-tuning
| Pre-training (already done) | Fine-tuning (what you do) | |
|---|---|---|
| Compute | 64 × H100 × 42 h | 16 × H100 × ~26 h (typical) |
| Total H100-hours | ~2,700 | ~416 |
| Batch size | 12,288 | 384 |
| Learning rate | 5 × 10⁻⁵ | 5 × 10⁻⁵ |
| Optimizer | AdamW | AdamW |
| Data scale | 27,300 hours of mixed video | 100 - 500 of your episodes |
| Result | A general bimanual manipulation policy. | The same policy biased toward your specific task. |
The numbers come straight from the τ₀-WM paper’s Section 3.3. The default fine-tune shape in Midcore’s Training screen mirrors them exactly because deviating without reason tends to slow convergence.
What it costs
At June 2026 cloud-spot pricing, one H100-hour runs roughly:
| Provider | H100 spot rate (USD/hour) | Notes |
|---|---|---|
| Modal | $3.10 - $3.30 | Pay-per-second, no minimum, fastest cold-start of the three. |
| Lambda Labs | $2.80 - $3.00 | Cheapest sustained, longer cold-start, queue-prone during conference season. |
| CoreWeave | $3.00 - $3.40 | Enterprise; works best for monthly reservations vs single jobs. |
For the default 16 × 26 hour fine-tune, that’s roughly $1,300 to $1,400 per checkpoint. The Training screen surfaces the live estimate before you click submit, so there’s no surprise.
Why H100 and not A100
The operating procedure
Assuming you’ve already captured and finalised a dataset (see Datasets), the path to a deployed fine-tune is:
- Open the Training screen.
- Pick your dataset from the dropdown. Only finalised datasets with a non-zero episode count appear.
- Pick a cloud provider. The submit button stays disabled until provider credentials are configured by your administrator.
- Confirm the default shape (16 × H100, 26 hours) or tweak as needed. Lower GPU count or hours if you’re experimenting; raise them if you’re chasing the last few percent of performance.
- Confirm the cost estimate. Click Submit fine-tune.
- The job appears in the running list with status “queued ” → “running”. Click Poll to refresh status; longer-running deployments enable automatic polling.
- When the cloud provider reports
output_checkpoint_uri, the job row exposes a Register in Vault button. Click it to bind the checkpoint to your project. The newly registered checkpoint becomes available to the Brain’s policy panel.
Evaluating a fine-tune
The right way to evaluate a fine-tune is to run it on your task. Two structured options:
- Real-world rollouts. Stand at the robot, queue 20 attempts via the Manipulate card, count successes. The most predictive evaluator; the most expensive.
- ACVS imagined rollouts. When ACVS weights are available, the Simulation screen evaluates a candidate’s imagined future without touching the robot. Cheaper, faster, but only as good as the simulator’s coverage of your specific scene.
Common eval mistake
Checkpoint discipline
Every successful fine-tune produces a checkpoint blob. Treat these like releases:
- Register before activating. The checkpoint registry binds the dataset id and the fine-tune job id to the checkpoint hash. That’s the audit trail that lets you answer “what data made this model.”
- One active checkpoint per project. Brain’s policy panel reads the active checkpoint. Switching active = switching what runs on the next infer call. There is no automatic rollback; the previous checkpoint stays registered for re-activation.
- Soft-deprecate, never delete. Removing a registered checkpoint also removes the ability to re-create the audit chain. Set
active = falseinstead.
Continuous improvement loop
Once the basics are in place, the long-term cycle becomes:
- Deploy a checkpoint to production.
- Observe RCS-gated and blocked proposals in the audit log; they’re an automated “here’s where the model is weak” signal.
- Capture targeted teleop episodes for those weak spots.
- Fine-tune again, register, evaluate, swap the active checkpoint.
- Track the success rate trend across versions.
This is how a one-off integration becomes a continuously improving deployment. The audit log is the linchpin — without it, you don’t know which proposals to re-train on.
Next: the actual screens