Learning to Interact in World Latent for Team Coordination

* Equal Advising

Question

  • Imagine a team of robots collaborating to achieve a common goal, such as transporting materials in a warehouse or driving autonomously in shared traffic.
  • What types of information should each robot require to effectively coordinate with its teammates under partial observability?
  • While many factors may matter, we posit that two elements are likely crucial:
    • (i) Inter-agent relations – who influences whom, and how their roles complement one another.
    • (ii) Task-specific world information – a compact surrogate of the global state that helps robot agents reason and understand their task.

Why Representation, Not Communication?


  • We acknowledge communication is powerful for multi-agent coordination. However, communication is not practical in many real-world scenarios due to:
    • This is vulnerable! When imposing a bandwidth limit on communication (Type I) or considering adversarial agents with message-corruption attack (Type II), communication-based methods are easy to fail at Deployment phase.
    • This is fundamentally centralized solution. It relies on a shared communication channel or synchronized messaging, which limits scalability and contradicts the decentralized nature of many real-world systems.

The IWoL Representation

  • In this work, we propose IWoL, a novel representation learning framework for MARL, which capture inter-agent relations and task-specific world information.
  • We would like to highlight key features of IWoL:
    • IWoL is simple and practical for team coordination. Thanks to the simplicity of on-policy's end-to-end training scheme, IWoL representation can be captured via a single encoder and multiple decoders to learn inter-agent relations and task-specific world dynamics for team coordination. Learned representation can be used in deployment without additional modules, e.g., communication or estimation.
    • IWoL is compatible. It works as plug-and-play module that can be combined with existing MARL algorithms, improving their performance without altering their fundamental structure.
    • IWoL is scalable. It can be used in explicit and implicit communication settings by handling complex multi-agent coordination, achieving strong performance on a variety of challenging robotic tasks.

Architectural Overview


  • Observational encoder: Each agent encodes its local observation into a latent via self-attention.
  • Communication module: We employ communication module to learn inter-agent relationships via two steps: (i) additive self-attention and Gumbel softmax schedule whom to communication with, and (ii) Transformer block processes the messages what to communication. This can provide surrogate inter-agent relationship information into the IWoL encoder in training.
  • RL networks: We consider individual policy and value function for each agent with a feed-forward layer. Herein, a policy network is set in a stochastic form.

Interactive World Latent (IWoL)


Our goal is to build an IWoL representation \(z_i^t \) that captures inter-agent relationships and privileged world information. For this, the IWoL includes the following three modules. $$ \underbrace{z_i^t = \mathrm{Encoder}_{\mathrm{IW}}(f_i^t)}_{\mathrm{Interactive~World~Encoder}} \quad \quad \underbrace{\hat{m}^t_i = \mathrm{Decoder}_{\mathrm{I}}(z_i^t)}_{\mathrm{Interactive~Decoder}} \quad \quad \underbrace{\hat{s}^t_i = \mathrm{Decoder}_{\mathrm{W}}(z_i^t)}_{\mathrm{World~Decoder}} $$
  • Interactive World Encoder: The encoder takes the encoded local observation \(f_i^t\) as input and produces the IWoL representation \(z_i^t\).
  • Interactive Decoder: This reconstructs the messages \(m_i^t\) that agent \(i\) would receive from others, encouraging \(z_i^t\) to preserve inter-agent dependencies.
  • World Decoder: This reconstructs the agent’s privileged state \(s_i^{t}\), encouraging \(z_i^t\) global signals beyond local observation.
$$ \mathcal{L}_\pi^{\mathrm{Im}}({\phi_i}) = \mathcal{L}^{\mathrm{RL}}_\pi(\phi_i) + \lambda_{\mathrm{W}}\mathcal{L}_{\mathrm{W}} + \lambda_{\mathrm{I}}\mathcal{L}_{\mathrm{I}}, \quad \mathrm{where} ~~ \mathcal{L}_{\mathrm{W}} = \bigl\lVert\mathrm{Decoder}_{\mathrm{W}}(z^t_i)-s^{t}_{i}\bigr\rVert^{2}_{2} ~~ \mathrm{and} ~~ \mathcal{L}_{\mathrm{I}} = \bigl\lVert\mathrm{Decoder}_{\mathrm{I}}(z^t_i)-m^{t}_{i}\bigr\rVert^{2}_{2}$$
  • Here, \(\mathcal{L}^{\mathrm{RL}}_\pi(\phi_i)\) is the RL loss for policy network with parameters \(\phi_i\).
  • \(\lambda_{\mathrm{W}}\) and \(\lambda_{\mathrm{I}}\) are the coefficients for the world and interactive losses, respectively.

Experiments (Cooperative MARL)

Environments


Experimental Results


  • IWoL variants achieve the best or second-best performance and success rate on most tasks.
  • In particular, our approach achieves up to \(48.2\%\) and \(20.0\%\) in MQE and Bi-DexHands, where previous baselines record near-zero success in three tasks (Go1Sheep, Go1Seesaw, and Two Catch Underarm).

Citation

        
@misc{lee2025learninginteractworldlatent,
title={Learning to Interact in World Latent for Team Coordination},
author={Dongsu Lee and Daehee Lee and Yaru Niu and Honguk Woo and Amy Zhang and Ding Zhao},
year={2025},
eprint={2509.25550},
archivePrefix={arXiv}}
         

The website template was borrowed from Seohong Park, Michaël Gharbi, and Jon Barron.