Mind everywhere, embodied AI of things, and the future of engineering
How will the next generation of systems look like?
In what follows, I will use a mobile telecommunication network as an example, with its human end-users, their intelligent end-devices, a network of base stations with its multi-layer distributed structure, and the systems engineers operating the network. That said, the concepts outlined here describe all future intelligent engineering systems, including smart buildings and cities, local area networks, and networks of self-driving vehicles or other IoTs.
Inspired by Michael Levin’s Technological Approach to Mind Everywhere (TAME), I will argue that these intelligent networks of systems will look more like biological bodies than classical control systems, so their design should be inspired by the latest research in distributed cognition and multi-level developmental system biology. Besides the narrow pragmatic goal of building future engineering systems, I also believe that, unlike the current narrow AI paradigm focusing on disembodied function learning, understanding and creating these embodied systems will lead closer to AGI and perhaps even exhibit the first signs of artificial consciousness.
Current AI research is narrow, both in its inspiration of the body-mind dichotomy and its design of essentially learning complicated but non-complex input-output functions on immense data and compute. Unlike these disembodied pieces of software, real intelligent agents are embodied within an environment in which they need to survive, building models of themselves and the world, learned on small data collected by the agent itself. They do not have a separate mind and body. They are designed to sense relevant factors from the environment, and prioritize and apply actions to always keep themselves within operating range of their vital performance indicators. Furthermore, they are profoundly multi-layer: their cells, tissues, and organs are themselves intelligent, collaborating but also competing for resources within the body. Decisions about actions are made at the right level, optimizing the goals of that level. Consciousness and the self arises because of the need to make decisions about and the top system-level actions.
How will the next generation of telecommunication systems look like?
They will consist of intelligent parts
For example, each end-user device will try to understand its (naturally intelligent) end-user in order to give them the best user experience. They will have sensors and will build predictive models about the world, attempting to decrease uncertainty (error) about their predictions.
They will have actions
that they can use to satisfy and optimize multiple key performance indicators (KPIs), which will often be contradictory (for example, video quality and battery life at the end-user level, or throughput and energy consumption at the base station).
Exploration
They will need to use their actions not only to optimize their KPIs but also to explore, to learn about the world, of course making sure that the exploration is safe, it does not significantly interfere with their primary goals. For example, base stations will roam some phones to discover optimal roaming policies, and phones will sometimes ask more bandwidth than what they need to poll the excess capacity of the base station.
They will be hierarchical and multi-agent
In classical multi-agent networks, nodes are connected both through communication channels and by interactions (action of a node will alter the state of the neighbor nodes). In hierarchical networks, nodes will also be connected up in the hierarchy (end-devices to base stations) and down to their body parts (base stations to end-devices).
Partial control
Unlike in classical control, nodes in the hierarchy will not have full access either to their higher nodes or to their “body parts”. They will be able to request certain things from higher nodes, and persuade lower nodes to do something by shaping the environment of the lower nodes (for example, certain actions of a base station could make the end-device buffer more or less).
Nodes will individuate
They will learn their locally optimal world model and action policy. For example, each phone will learn the habits of its user, and each base station will learn the “habits” of the cell it is serving. This means that new hardware will have to be warm-started by the models of the old node. Updates in the state and action spaces means that this process will look more like training in a teacher-student relationship than classical copying of the old system into the new hardware.
Nodes will communicate, bind to each other and to us
There will be classical low-level communication modes, such as an end-device connecting to a base station, but higher level messages (such as a phone informing the base station where it will go in the next hour) may use human language. First because end-devices will communicate with human users using language, second because node-to-node communication will be transparent to system engineers who operate the network, making the full organism more explainable than today.
The four pillars of embodied AIoT
World models / digital twins
We will need efficient and precise world models that can rapidly learn and adjust to new data. They will be self-calibrated: aware about their own uncertainties. They will also be consistently multi-timestep: we can query the same model about the state of the world at various time scales; these predictions will be precise and self-consistent. They will be robust and auto-tuning, since in a multi-node organism there will not be enough AI engineers babysitting training and tuning at every level.
Model-based control
We need a versatile, modular, and scalable toolkit that includes various model-based control techniques (contextual bandits, BO, RL). Policies should be learned continually, building on previous and lower level policies, and deployed continuously. They should be robust to real-life constraints: model size, training and testing (planning) time limits, noise. Besides optimizing their goals, they should also be curious: safely exploring potentially interesting parts of the state space.
Communication through LLMs
Basic LLMs will need to be upgraded so they remember their own experience. They should be learned continuously, updating and storing their experience in their episodic memories. As the models and agents, they should be able to individuate but stay in contact with their fellow nodes, possibly developing new concepts, and teaching them to human end-users and to operators. They should stay transparent and explainable continuously.
Hierarchical multi-agent systems
Classical “flat” multi-agent systems will have to be upgraded to create and manage hierarchies. Persuasion will replace tight control: higher levels will manage lower levels by shaping their environment and rewards. When the hierarchy is not dictated by hardware, node communities at the lower levels will collaboratively emerge higher levels, delegating to them some of their freedom for increased robustness, safety, and to avoid deadlocks and arms races. Single agents will have to be upgraded to absorb input from other nodes (at the same level, and lower and higher levels) and changing reward landscape.
Inspirations
Michael Levin’s TAME. I am mainly just repeating his groundbreaking ideas and applying them to intelligent engineering systems.
Marks Solms: The Hidden Spring. Demystifying consciousness by tying it to affect and the Free Energy Principle.
Lina Bariah and Merouane Debbah: AI Embodiment Through 6G: Shaping the Future of AGI. Similar ideas from Telecom and 6G.
John Vervaeke’s positive program of bringing up AI. I heard a book is on the way.