Skip to main content

Foundations

Everything in robotics rests on three observations: rigid bodies move, sensors observe imperfectly, and actuators are bandwidth- limited. This page is a tight refresher on the vocabulary every Midcore screen assumes you know.

Rigid bodies and coordinate frames

A rigid body is an idealised object whose internal distances never change. A robot is a tree (or graph) of rigid bodies called links connected by joints. The position and orientation of a link is its pose — three numbers for translation in metres, plus three more (Euler angles, rotation vector, or four for a unit quaternion) for orientation.

Every pose is expressed relative to some frame. Common frames you will see in Midcore:

FrameOriginUsed for
WorldA fixed point in the workspace (usually the floor under the robot base).Mission planning, multi-robot coordination.
Robot baseThe first non-moving link of the robot.Reachability, base-relative obstacles.
Arm baseThe first link of an arm (right at the shoulder mount).End-effector poses for τ₀-WM and most VLA models.
End-effector (EE)The gripper tip or tool centre point (TCP).Grasps, tool offsets, calibration.
CameraThe optical centre of an image sensor.Vision-based perception, eye-in-hand control.

Frame discipline saves debugging

The single most common source of wasted bench time is sending a pose to the wrong frame. The τ₀-WM contract Midcore implements is explicit: each end-effector pose is expressed in its own arm’s base frame, never in the world frame. The Designer’s τ₀-WM state panel renders the live values so you can confirm before you commit.

Three ways to write a rotation

  • Euler angles (XYZ, ZYX, ...) — three numbers. Human-readable, but suffers from gimbal lock when two axes align. Not safe for interpolation across configurations.
  • Quaternions (xyzw) — four numbers on the unit 3-sphere. No gimbal lock, smooth interpolation via SLERP, but represent each rotation twice (q and -q) which can confuse loss functions during training.
  • 6D continuous rotation (Zhou et al., CVPR 2019) — the first two columns of the 3 × 3 rotation matrix, flattened. Six numbers, continuous everywhere, the de-facto choice for neural network rotation outputs. τ₀-WM’s policy head emits 6D internally; the wire format converts to quaternion at the boundary because it’s easier to inspect.

Forward and inverse kinematics

Given joint angles q = [q₁, q₂, …], forward kinematics (FK) gives you the resulting end-effector pose. Always solvable, fast, unique.

Going the other way is harder. Inverse kinematics (IK) asks: given a target end-effector pose, what joint angles get me there? For 6-DOF arms there are usually up to 8 valid solutions; for 7-DOF arms there is a continuous self-motion manifold. IK solvers either pick a closed-form solution (fast, brittle) or run a numerical Jacobian iteration (slower, more flexible).

Important: a model like τ₀-WM does not solve IK explicitly. It learns the joint → pose → image relationship directly from data, then emits pose targets that you (or a downstream controller) realise via IK. This separation is why a single VLA generalises across embodiments: it commits to a pose, not a joint vector.

A note on dynamics

Kinematics describes where; dynamics describes how forces produce motion. The full robot dynamics equation:

  M(q) q̈ + C(q, q̇) q̇ + g(q) = τ + Jᵀ Fₑₓₜ

...says the joint torques τ plus any external force on the end-effector Fₑₓₜ accelerate the robot through its inertia M(q), fighting Coriolis C(q, q̇) q̇ and gravity g(q).

Midcore’s Designer ships a Rapier-based physics preview that computes link world boxes, centre of mass, and the inertia ellipsoid live as you edit the robot. You don’t have to solve dynamics by hand — but you do have to understand that a 30 kg arm can’t reverse direction in one frame, and a 100 N grasp won’t hold a 200 N pull.

Sensors

SensorWhat it returnsWhere it shows up in Midcore
RGB cameraH × W × 3 pixel grid. τ₀-WM’s default is 192 × 256.Fed straight into the policy’s vision encoder.
Depth / RGB-DPer-pixel distance, often aligned to RGB.Used for collision-aware planning + twin-state geometry.
IMULinear acceleration + angular velocity in the sensor frame.Inertial state fusion; appears in Designer’s sensor list.
Force / torque3 forces + 3 torques at the sensor (usually wrist-mounted).Compliance + contact-rich manipulation; surfaced via the safety tile.
Joint encodersPer-joint position (always) + velocity (often).Forms the state vector you send to the policy.
TactileHigh-density local force / shear across the gripper fingertip.Future; τ₀-WM’s authors flag this as a known limitation.

Actuators

  • DC motors with planetary or harmonic-drive gearing — the default for industrial arms. Encoded position, position-velocity-current control loops.
  • Series-elastic actuators (SEA) — a spring between motor and load lets you measure (and control) torque cheaply. Common in compliance-critical platforms.
  • Parallel grippers — two fingers driven by a single linear stage. State reduces to a one-dimensional opening (0 = open, 120 = closed in the τ₀-WM observation space).
  • Multi-fingered hands — 5+ DOF, much richer state. Out of scope for τ₀-WM today; targeted by humanoid-focused successor models.

Manipulator and mobile taxonomies

The robotics literature carves the field into platform classes because the engineering tradeoffs differ wildly. Midcore tracks the same split:

ClassTypical DOFExamples (Designer templates)
6-DOF manipulator6UR5e, UR10e, FANUC LR Mate, ABB IRB 1200
7-DOF cobot7KUKA LBR iiwa 14 R820, generic cobot_7dof
Dual-arm bimanual14 (7+7) or 12 (6+6)Dual-arm Franka FR3 (τ₀-WM ready)
Differential drive2Generic differential_drive_mobile
Quadruped12Boston Dynamics Spot, Unitree Go2
Hexapod18Generic hexapod walker
Multirotor UAV4DJI Mavic 3 Enterprise, generic quadrotor
Fixed-wing UAV4Generic fixed_wing_uav
Subsea / surface marine3–6Subsea micro-ROV, USV workboat
Humanoid20–30humanoid_biped, humanoid_upper_body

For bimanual manipulation — the class τ₀-WM targets — the canonical state is the pair of EE poses plus the pair of gripper openings. That’s the 14+2 = 16-channel input the Designer’s τ₀-WM state panel renders.

Where to go from here

With these foundations you can read the rest of the section comfortably. Next up: World models explains why a generative video model is now the most effective way to plan manipulation, and what the τ₀-WM architecture contributes specifically.