We need a new learning theory for embodied AI

3 min readSep 25, 2024

This is an extended section from our Embodied AI ICML24 paper with Giuseppe Paolo and Jonas Gonzalez

The principles of embodied AI challenge us to reevaluate traditional learning theory (1, 2), bridging a gap between supervised and reinforcement learning. Supervised learning, while foundational in AI, assumes that the data is drawn from an unknown but fixed distribution, collected independently of the learning process. This theory gives rise to the classical notions of generalization, over- and underfitting, bias and variance, and asymptotic or finite-sample statistical consistency. This framing is obviously highly useful: even those who are not explicitly doing theory use it transparently as their lingua technica and cognitive scaffolding when working with algorithms and analyzing results.

When embodied agents interact dynamically with their environment, data collection becomes part of the data science pipeline (1, 2). Classical supervised learning theory is insufficient to analyze these cases and to guide algorithm building. Extensions, like transfer learning, multitask learning, distribution shift, domain adaptation or out-of-distribution generalization, have been proposed to patch basic supervised learning theory, but most of these cling to the original framing, pretending that the data is coming from outside the learning process, encapsulating the value (business or otherwise) of the predictive pipeline. Practically, this is obviously not the case: the data on which we learn a predictor is often collected by the data scientist, responsible for the quality of the pipeline (1, 2). Furthermore, most of the debates around responsible AI turn around the data, not the learning algorithm (1, 2). Collecting, selecting, and curating data is obviously part of the pipeline. The text we use to train LLMs is created by its writers, rather than drawn from a distribution. In some cases, when collection and model-retraining are automated, the situation may be even worse. For example, in click-through-rate prediction (1, 2) or recommendation systems, the deployed predictor affects the data for the next round of training, generating an often adversarial feedback. A similar phenomenon is happening in the LLM world: as these AIs become the go-to tools for creative and business writing, the data collected for the next round of training will, in large part, be coming from the previous generation of LLMs.

Reinforcement Learning and related paradigms (Bayesian optimization or contextual bandits) offer a closer fit for embodied AI, when the prediction is not the end-product, rather part of a predictive pipeline that also includes data collection. RL affords the data scientist to design a higher-level objective, letting the algorithm optimize both the predictor and the data it is trained on. Here, the mismatch between theory and practice is different from supervised learning. The analysis in RL or bandit theory often focuses on the convergence of the agent to a theoretical optimum, given a fixed but often unknown environment. RL theory usually does not offer tools to analyze the data collected during the learning process, especially when the collection is semi-automatic (includes a human curator in the loop). RL agents, in practice, usually do not converge even in a stationary environment, they rather individuate, making, for example, quite perversely, the random seed part of the algorithm. This is even more pronounced in non-stationary environments where the agent’s actions alter the environment; a situation which AGI will definitely find itself (1, 2).

A new learning theory for embodied AI must transcend these limitations. It should account for the dynamic, interactive nature of data in embodied AI, where the agent’s actions continuously reshape its learning environment. This theory should not just aim for optimal performance in a fixed setting but should embrace a spectrum of behaviors suitable for evolving environments. Moreover, it should provide diagnostics to assess the quality and relevance of data generated through these interactions.

We need a new learning theory for embodied AI

Written by Balázs Kégl

No responses yet