Analytical MuJoCo MPC

Exact first- and second-order rigid-body dynamics derivatives for MuJoCo — replacing finite differences in the model-predictive-control loop.

Exploratory research, in progress (2026). This page summarizes work on computing exact analytical derivatives of MuJoCo’s dynamics and using them inside a whole-body MPC loop, instead of MuJoCo’s default finite differences. Measured results are labeled as such; one end-to-end MPC number is a projection and is called out explicitly.

TL;DR

MuJoCo is the de-facto simulator for robot control, but its dynamics Jacobians are computed by finite differencing — which is both slow (it re-simulates the system many times per derivative) and accuracy-limited. This project derives and implements the exact analytical rigid-body-dynamics (RBD) derivatives of MuJoCo’s own smooth dynamics and drops them into the MPC solver.

Two contributions:

  • First order (FO): the missing analytical \(\partial/\partial q\) half of MuJoCo’s dynamics — the position derivatives MuJoCo computes only by finite difference today.
  • Second order (SO): the exact analytical Hessian, enabling true exact-Hessian Newton steps in MPC (rather than Gauss–Newton / iLQG approximations).

Headline measured results: analytical FO matches finite-difference to 1e-8…1e-10 (3–6 orders inside the validation gate) and is up to 11.6× faster than MuJoCo’s mjd_transitionFD at humanoid scale; analytical SO is bit-identical to compiled CasADi autodiff while needing zero code generation and zero build RAM (CasADi runs out of memory at humanoid scale).


The quadruped MPC solution

The analytical derivatives drive a whole-body MPC controller on a Unitree B2 quadruped + Z1 arm (a loco-manipulation platform). The MPC plans over a 14-node horizon with a Fatrop OCP solver.

B2 quadruped + Z1 arm tracking a reference via whole-body MPC
Reference tracking — the B2+Z1 follows a commanded whole-body trajectory under the MPC.
B2 quadruped + Z1 arm performing a pulling task via whole-body MPC
Pulling / manipulation — the same MPC executing a contact-rich pull with the arm.



The problem

For MPC, you need the dynamics Jacobians \(\partial f/\partial x\) at every node, every tick. In MuJoCo today:

  • mjd_smooth_vel gives only the velocity half of the smooth dynamics analytically — i.e. \(\partial\,qacc/\partial v\).
  • The position half \(\partial\,qacc/\partial q\) has no analytical path — it is recovered by central finite differences via mjd_transitionFD, which costs \(O(n^2)\) forward simulations and is limited to ~1e-6 accuracy.
  • Contact/constraint derivatives are omitted entirely.

So the position Jacobian — the load-bearing term for MPC stability — is exactly the piece MuJoCo computes least accurately and most slowly.


Old vs. new derivative expressions

MuJoCo’s smooth forward dynamics (contacts disabled), with \(c \equiv qfrc\_bias\) the bias (Coriolis/centrifugal/gravity) force:

\[qacc = M(q)^{-1}\big(\tau - c(q,v)\big)\]

Old (baseline) — what MuJoCo uses today

\[\underbrace{\frac{\partial\, qacc}{\partial v}}_{\text{analytic, velocity half}} \ \text{via}\ \texttt{mjd\_smooth\_vel} \qquad\qquad \underbrace{\frac{\partial\, qacc}{\partial q}}_{\text{finite difference}} \approx \frac{qacc(q+\delta e_i,\,v) - qacc(q-\delta e_i,\,v)}{2\delta}\ \ (\texttt{mjd\_transitionFD})\]

New (this work) — the analytical position half

\[\frac{\partial\, qacc}{\partial q} = -M^{-1}\!\left[\frac{\partial c}{\partial q} + \frac{\partial M}{\partial q}\,qacc\right], \qquad \frac{\partial\, qacc}{\partial v} = -M^{-1}\frac{\partial c}{\partial v}\]

The \(\tfrac{\partial M}{\partial q}\,qacc\) term is not negligible — dropping it is a common and silent source of error. A clean, numerically-robust equivalent reuses the inverse-dynamics map \(\mathrm{id}(q,v,a) = M(q)\,a + c(q,v)\):

\[\boxed{\ \frac{\partial\, qacc}{\partial q} = -M^{-1}\,\frac{\partial\, \mathrm{id}(q,v,a_0)}{\partial q}\bigg|_{a_0 = qacc}\ }\]

which is evaluated in a single \(O(n^2)\) RNEA-style spatial recursion rather than \(O(n^2)\) full forward simulations.

Second order (the moat) — analytical Lagrangian Hessian

For a multiplier \(\lambda\), the four Hessian blocks needed by an exact-Hessian MPC:

\[\lambda^{\top}\nabla^2\tau = \lambda^{\top}\!\left[\ \frac{\partial^2\tau}{\partial q\,\partial q},\ \ \frac{\partial^2\tau}{\partial v\,\partial v},\ \ \frac{\partial^2\tau}{\partial q\,\partial v},\ \ \frac{\partial^2\tau}{\partial a\,\partial q}\ \right]\]

computed as a directional quantity in one adjoint pass — \(O(n^2)\) instead of forming the full \(O(n^3)\) tensor.


Derivative accuracy

FO accuracy gate (KS0). Analytical \(\partial\,qacc/\partial\{q,v\}\) vs. MuJoCo’s own central-difference qacc. Pre-registered gate: max relative error ≤ 1e-5 on every robot. Result — passes by 3–6 orders of magnitude:

Robot nv Base ∂qacc/∂q (max-rel vs FD) ∂qacc/∂v
panda 9 fixed 6.7e-9 2.6e-8
solo12 18 floating 4.9e-9 2.4e-7
talos 38 floating 6.2e-8 9.0e-8

SO accuracy. Analytical second-order vs. CasADi autodiff and finite difference on B2+Z1 (nv=25) — essentially exact (machine precision against CasADi):

Block vs. CasADi vs. FD
∂τ/∂q 5.9e-16 4.9e-10
∂τ/∂v 1.1e-15 7.4e-10
M (= ∂τ/∂a) 1.1e-15 8.7e-10
∂²τ/∂v∂v 8.1e-14 verified
∂²τ/∂q∂v 9.5e-13 verified
∂²τ/∂a∂q 4.8e-12 verified

Speed

FO speed vs. finite difference (KS1). Wall-clock per derivative evaluation; the gap grows with DoF as expected (exact \(O(n^2)\) RBD vs. \(O(n)\times\) forward-sim FD):

Robot nv FD mjd_transitionFD (µs) Analytical (µs) Speedup
panda 9 88.8 33.7 2.6×
solo12 18 238.9 45.0 5.3×
talos 38 1163.1 100.0 11.6×

Binding-level speed vs. compiled CasADi (same quantities, both compiled to C, B2+Z1, nv=25):

Quantity Analytical (µs) CasADi compiled-C (µs) Speedup
FO (∂τ/∂q, ∂τ/∂v, M) 26.2 49.0 1.9×
SO (4 Hessian blocks) 129.0 150.7 1.2×

…and crucially, the one-time codegen wall: analytical needs 0 generated code, 0 compile time, 0 peak RAM, whereas CasADi SO generates 8.83 MB of C, takes 63 s to compile, and runs out of memory (>32 GB) at humanoid scale.


MPC results

Measured end-to-end (B2+Z1, 14-node Fatrop OCP, CasADi derivatives). Derivatives dominate — ~80% of every control tick is spent in the Jacobian and Hessian callbacks:

Callback Per tick Share
Lagrangian Hessian (nlp_hess_l) ~27 ms ~45%
Constraint Jacobian (nlp_jac_g) ~20 ms ~32%
Fatrop linear solve ~7 ms ~11%
Constraints / gradient / misc ~2 ms ~3%
Total ~62 ms/tick  

Projected with analytical derivatives (component-wise measurements applied to the same loop — projection, not yet a closed-loop measurement): replacing nlp_jac_g (3.7 → 0.36 ms, ~10×) and nlp_hess_l (7.8 → 1.5 ms, ~5×) brings the tick to ~20 ms (~3× faster).

Scaling / buildability. Analytical builds and runs at every scale; CasADi’s SO tensor cannot be built at humanoid scale:

Robot nv Analytical FO (µs) Analytical SO (µs) CasADi SO build
panda 7 4.9 23.5 OK
hyq 18 16.1 69.3 OK (2.7 GB)
b2_z1 25 26.2 129.0 OK (63 s, 1.4 GB)
talos 38 OOM (>32 GB)

How it compares

Method Speed Accuracy Exact? Uses MuJoCo’s own model?
FD baseline (mjd_transitionFD) slow — \(O(n^2)\) sims ~1e-6 (FD-limited) no yes
WASP (Liang & Rakita, 2025) ~2× over FD approximate no yes
Analytical (this work) up to 11.6× over FD 1e-8 – 1e-10 yes yes

Models tested

Robot models used in the derivative and MPC benchmarks
Robots spanning fixed-base (panda) to floating-base humanoid (talos) and the B2+Z1 loco-manipulation platform.


Status & next steps

  • FO well-posedness (KS0) — analytical matches FD to 1e-8…1e-10 across nv = 9 → 38.
  • FO speed (KS1) — up to 11.6× over FD at humanoid scale.
  • SO correctness — bit-identical to CasADi, validated against FD, with no codegen wall.
  • Closed-loop accuracy vs. WASP — analytical (exact) vs. WASP (approximate) inside the live MPC.
  • Exact-Hessian vs. Gauss–Newton — does the exact Hessian buy faster MPC convergence? (the flagship SO question)

Stack: MuJoCo, rigid-body-dynamics derivatives (RNEA/ABA-style recursions), CasADi (baseline), Fatrop OCP, whole-body MPC, C++/Python.


Exploratory research conducted independently. Numbers above are taken from the project’s pre-registered result logs; the end-to-end analytical MPC tick time is a projection from per-component measurements and is labeled accordingly.