Learning to Interact in World Latent for Team Coordination

Dongsu Lee
UT Austin.
Daehee Lee
SKKU.
Yaru Niu
CMU.
Honguk Woo
SKKU.
Amy Zhang*
UT Austin.
Ding Zhao*
CMU.

* Equal Advising

Paper
Code

Question

Imagine a team of robots collaborating to achieve a common goal, such as transporting materials in a warehouse or driving autonomously in shared traffic.
What types of information should each robot require to effectively coordinate with its teammates under partial observability?
While many factors may matter, we posit that two elements are likely crucial:
- (i) Inter-agent relations – who influences whom, and how their roles complement one another.
- (ii) Task-specific world information – a compact surrogate of the global state that helps robot agents reason and understand their task.

Why Representation, Not Communication?

We acknowledge communication is powerful for multi-agent coordination. However, communication is not practical in many real-world scenarios due to:
- This is vulnerable! When imposing a bandwidth limit on communication (Type I) or considering adversarial agents with message-corruption attack (Type II), communication-based methods are easy to fail at Deployment phase.
- This is fundamentally centralized solution. It relies on a shared communication channel or synchronized messaging, which limits scalability and contradicts the decentralized nature of many real-world systems.

The IWoL Representation

In this work, we propose IWoL, a novel representation learning framework for MARL, which capture inter-agent relations and task-specific world information.
We would like to highlight key features of IWoL:
- IWoL is simple and practical for team coordination. Thanks to the simplicity of on-policy's end-to-end training scheme, IWoL representation can be captured via a single encoder and multiple decoders to learn inter-agent relations and task-specific world dynamics for team coordination. Learned representation can be used in deployment without additional modules, e.g., communication or estimation.
- IWoL is compatible. It works as plug-and-play module that can be combined with existing MARL algorithms, improving their performance without altering their fundamental structure.
- IWoL is scalable. It can be used in explicit and implicit communication settings by handling complex multi-agent coordination, achieving strong performance on a variety of challenging robotic tasks.

Architectural Overview

Observational encoder: Each agent encodes its local observation into a latent via self-attention.
Communication module: We employ communication module to learn inter-agent relationships via two steps: (i) additive self-attention and Gumbel softmax schedule whom to communication with, and (ii) Transformer block processes the messages what to communication. This can provide surrogate inter-agent relationship information into the IWoL encoder in training.
RL networks: We consider individual policy and value function for each agent with a feed-forward layer. Herein, a policy network is set in a stochastic form.

Interactive World Latent (IWoL)

Our goal is to build an IWoL representation $z_i^t $ that captures inter-agent relationships and privileged world information. For this, the IWoL includes the following three modules. $$ \underbrace{z_i^t = \mathrm{Encoder}_{\mathrm{IW}}(f_i^t)}_{\mathrm{Interactive~World~Encoder}} \quad \quad \underbrace{\hat{m}^t_i = \mathrm{Decoder}_{\mathrm{I}}(z_i^t)}_{\mathrm{Interactive~Decoder}} \quad \quad \underbrace{\hat{s}^t_i = \mathrm{Decoder}_{\mathrm{W}}(z_i^t)}_{\mathrm{World~Decoder}} $$

Interactive World Encoder: The encoder takes the encoded local observation $f_i^t$ as input and produces the IWoL representation $z_i^t$.
Interactive Decoder: This reconstructs the messages $m_i^t$ that agent $i$ would receive from others, encouraging $z_i^t$ to preserve inter-agent dependencies.
World Decoder: This reconstructs the agent’s privileged state $s_i^{t}$, encouraging $z_i^t$ global signals beyond local observation.

$$ \mathcal{L}_\pi^{\mathrm{Im}}({\phi_i}) = \mathcal{L}^{\mathrm{RL}}_\pi(\phi_i) + \lambda_{\mathrm{W}}\mathcal{L}_{\mathrm{W}} + \lambda_{\mathrm{I}}\mathcal{L}_{\mathrm{I}}, \quad \mathrm{where} ~~ \mathcal{L}_{\mathrm{W}} = \bigl\lVert\mathrm{Decoder}_{\mathrm{W}}(z^t_i)-s^{t}_{i}\bigr\rVert^{2}_{2} ~~ \mathrm{and} ~~ \mathcal{L}_{\mathrm{I}} = \bigl\lVert\mathrm{Decoder}_{\mathrm{I}}(z^t_i)-m^{t}_{i}\bigr\rVert^{2}_{2}$$

Here, $\mathcal{L}^{\mathrm{RL}}_\pi(\phi_i)$ is the RL loss for policy network with parameters $\phi_i$.
$\lambda_{\mathrm{W}}$ and $\lambda_{\mathrm{I}}$ are the coefficients for the world and interactive losses, respectively.

Experiments (Cooperative MARL)

Environments

For online cooperative MARL, we consider four challenging robotic environments, including MetaDrive, Robotarium, Multi-agent Quadruped Environment (MQE), and Bi-DexHands.

Experimental Results

IWoL variants achieve the best or second-best performance and success rate on most tasks.
In particular, our approach achieves up to $48.2\%$ and $20.0\%$ in MQE and Bi-DexHands, where previous baselines record near-zero success in three tasks (Go1Sheep, Go1Seesaw, and Two Catch Underarm).

Citation

        
@misc{lee2025learninginteractworldlatent,
title={Learning to Interact in World Latent for Team Coordination},
author={Dongsu Lee and Daehee Lee and Yaru Niu and Honguk Woo and Amy Zhang and Ding Zhao},
year={2025},
eprint={2509.25550},
archivePrefix={arXiv}}

The website template was borrowed from Seohong Park, Michaël Gharbi, and Jon Barron.