Skip to main content

Training

Pre-training τ₀-WM from scratch costs tens of thousands of dollars; fine-tuning it on your captured dataset costs about a thousand. This page explains the difference, the cloud providers Midcore drives, and the operating procedure for going from finalised dataset to deployed checkpoint.

Pre-training vs fine-tuning

Pre-training (already done)Fine-tuning (what you do)
Compute64 × H100 × 42 h16 × H100 × ~26 h (typical)
Total H100-hours~2,700~416
Batch size12,288384
Learning rate5 × 10⁻⁵5 × 10⁻⁵
OptimizerAdamWAdamW
Data scale27,300 hours of mixed video100 - 500 of your episodes
ResultA general bimanual manipulation policy.The same policy biased toward your specific task.

The numbers come straight from the τ₀-WM paper’s Section 3.3. The default fine-tune shape in Midcore’s Training screen mirrors them exactly because deviating without reason tends to slow convergence.

What it costs

At June 2026 cloud-spot pricing, one H100-hour runs roughly:

ProviderH100 spot rate (USD/hour)Notes
Modal$3.10 - $3.30Pay-per-second, no minimum, fastest cold-start of the three.
Lambda Labs$2.80 - $3.00Cheapest sustained, longer cold-start, queue-prone during conference season.
CoreWeave$3.00 - $3.40Enterprise; works best for monthly reservations vs single jobs.

For the default 16 × 26 hour fine-tune, that’s roughly $1,300 to $1,400 per checkpoint. The Training screen surfaces the live estimate before you click submit, so there’s no surprise.

Why H100 and not A100

A100s are roughly half the per-hour price but ~40% slower on FlashAttention-3 + bf16 workloads. Net cost for a completed fine-tune ends up similar; H100 just finishes faster. If you have free A100 capacity and want to wait longer, both Modal and Lambda accept A100 selectors without further code changes.

The operating procedure

Assuming you’ve already captured and finalised a dataset (see Datasets), the path to a deployed fine-tune is:

  1. Open the Training screen.
  2. Pick your dataset from the dropdown. Only finalised datasets with a non-zero episode count appear.
  3. Pick a cloud provider. The submit button stays disabled until provider credentials are configured by your administrator.
  4. Confirm the default shape (16 × H100, 26 hours) or tweak as needed. Lower GPU count or hours if you’re experimenting; raise them if you’re chasing the last few percent of performance.
  5. Confirm the cost estimate. Click Submit fine-tune.
  6. The job appears in the running list with status “queued ” → “running”. Click Poll to refresh status; longer-running deployments enable automatic polling.
  7. When the cloud provider reports output_checkpoint_uri, the job row exposes a Register in Vault button. Click it to bind the checkpoint to your project. The newly registered checkpoint becomes available to the Brain’s policy panel.

Evaluating a fine-tune

The right way to evaluate a fine-tune is to run it on your task. Two structured options:

  • Real-world rollouts. Stand at the robot, queue 20 attempts via the Manipulate card, count successes. The most predictive evaluator; the most expensive.
  • ACVS imagined rollouts. When ACVS weights are available, the Simulation screen evaluates a candidate’s imagined future without touching the robot. Cheaper, faster, but only as good as the simulator’s coverage of your specific scene.

Common eval mistake

Evaluating only on the training distribution. If you captured 50 episodes of folding a navy-blue towel, fine-tuned, and now report 0.9 success on more navy-blue towels — you have not measured generalisation. Always include held-out colours, positions, lighting conditions in the eval.

Checkpoint discipline

Every successful fine-tune produces a checkpoint blob. Treat these like releases:

  • Register before activating. The checkpoint registry binds the dataset id and the fine-tune job id to the checkpoint hash. That’s the audit trail that lets you answer “what data made this model.”
  • One active checkpoint per project. Brain’s policy panel reads the active checkpoint. Switching active = switching what runs on the next infer call. There is no automatic rollback; the previous checkpoint stays registered for re-activation.
  • Soft-deprecate, never delete. Removing a registered checkpoint also removes the ability to re-create the audit chain. Set active = false instead.

Continuous improvement loop

Once the basics are in place, the long-term cycle becomes:

  1. Deploy a checkpoint to production.
  2. Observe RCS-gated and blocked proposals in the audit log; they’re an automated “here’s where the model is weak” signal.
  3. Capture targeted teleop episodes for those weak spots.
  4. Fine-tune again, register, evaluate, swap the active checkpoint.
  5. Track the success rate trend across versions.

This is how a one-off integration becomes a continuously improving deployment. The audit log is the linchpin — without it, you don’t know which proposals to re-train on.

Next: the actual screens

Everything above is theoretical until you sit in front of the app. Using the app is the surface-by-surface guide to where every concept on this page lives in the UI.