Simultaneous System Identification and Model Predictive Control with No Dynamic Regret

University of Michigan,
IEEE Transaction on Robotics
hardware
We provide a control algorithm that simultaneously learns unknown dynamics/disturbances in a self-supervised manner, based on the data collected on-the-go, and uses the learned models for predictive control. We prove that the algorithm guarantees bounded suboptimality against the optimal controller in hindsight. The algorithm enables a quadrotor to track a reference trajectory under the challenges of voltage drop and a variety of unknown aerodynamic disturbances: ground effects, wind disturbances, and drag.

Abstract

We provide an algorithm for the simultaneous system identification and model predictive control of nonlinear systems. The algorithm has finite-time near-optimality guarantees and asymptotically converges to the optimal (non-causal) controller. Particularly, the algorithm enjoys sublinear dynamic regret, defined as the suboptimality against an optimal clairvoyant controller that knows how the unknown disturbances and system dynamics will adapt to its actions. The algorithm is self-supervised and applies to control-affine systems with unknown dynamics and disturbances that can be expressed in reproducing kernel Hilbert spaces. Such spaces can model external disturbances and modeling errors that can even be adaptive to the system's state and control input. The algorithm first generates random Fourier features that are used to approximate the unknown dynamics or disturbances. Then, it employs model predictive control based on the current learned model of the unknown dynamics (or disturbances). The model of the unknown dynamics is updated online using least squares based on the data collected while controlling the system. We validate our algorithm in both hardware experiments and physics-based simulations. The simulations include (i) a cart-pole aiming to maintain the pole upright despite inaccurate model parameters, and (ii) a quadrotor aiming to track reference trajectories despite unmodeled aerodynamic drag effects. The hardware experiments include a quadrotor aiming to track a circular trajectory despite unmodeled aerodynamic drag effects, ground effects, and wind disturbances.

framework
Overview of Simultaneous System Identification and Model Predictive Control Pipeline. The pipeline is composed of two interacting modules: (i) a model predictive control (MPC) module, and (ii) an online system identification module. The MPC module uses the estimated unknown disturbances/dynamics from the system identification module to calculate the next control input. Given the control input and the observed new state, the online system identification module then updates the estimate of the unknown disturbances/dynamics.

Hardware Experiment

BibTeX


@article{zhou2024simultaneous,
  title={Simultaneous System Identification and Model Predictive Control with No Dynamic Regret},
  author={Zhou, Hongyu and Tzoumas, Vasileios},
  journal={IEEE Transactions on Robotics},
  year={2025},
  publisher={IEEE}
}

@inproceedings{zhou2025no,
  title={No-Regret Model Predictive Control with Online Learning of Koopman Operators},
  author={Zhou, Hongyu and Tzoumas, Vasileios},
  booktitle={2025 American Control Conference (ACC)},
  year={2025},
  organization={IEEE}
}