Episodic Future Thinking Mechanism for Multi-agent Reinforcement Learning
NeurIPS 2024
-
Dongsu Lee
Soongsil Univ. -
Minhae Kwon
Soongsil Univ.
Abstract
Understanding cognitive processes in multi-agent interactions is a primary goal in cognitive science. It can guide the direction of artificial intelligence (AI) research toward social decision-making in multi-agent systems, which includes uncertainty from character heterogeneity. In this paper, we introduce episodic future thinking (EFT) mechanism for a reinforcement learning (RL) agent, inspired by cognitive processes observed in animals. To enable future thinking functionality, we first develop a multi-character policy that captures diverse characters with an ensemble of heterogeneous policies. Here, the character of an agent is defined as a different weight combination on reward components, representing distinct behavioral preferences. The future thinking agent collects observation-action trajectories of the target agents and uses the pre-trained multi-character policy to infer their characters. Once the character is inferred, the agent predicts the upcoming actions of target agents and simulates the potential future scenario. This capability allows the agent to adaptively select the optimal action, considering the predicted future scenario in multi-agent interactions. To evaluate the proposed mechanism, we consider the multi-agent autonomous driving scenario with diverse driving traits and multiple particle environments. Simulation results demonstrate that the EFT mechanism with accurate character inference leads to a higher reward than existing multi-agent solutions. We also confirm that the effect of reward improvement remains valid across societies with different levels of character diversity.
Research Questions
- In real-world setting, trained robot agents should confront and coordinate with heterogeneous agents, making adaptation and generalization challenging.
- How can we build robust RL agents in multi-agent systems, where interacting with heterogeneous agents, to enable adaptive decision-making?
- We might think well what about multi-agent RL (MARL), agent modeling, or planning for robot agents coordinate their action. However, MARL and agent modeling struggle to adapt to unseen agents, while planning remains challenging due to the dynamic complexity of multi-agent systems.
EFTM: Episodic Future Thinking Mechanism
- The EFTM framework couples character inference and upcoming future prediction. Thanks to this framework, robot agents can coordinate their decision-making adaptively by considering others' characters and upcoming actions.
Meta-character Agent
- This work introduces the character, as the parameterization of reward function, in RL formulation, as follows:
- Characters evaluate differently the importance of each reward term and make the behavioral pattern diversified.
- An agent learns the meta-character policy that can emulate the behavioral pattern of a given character by capturing a set of possible character-conditioned policies.
- Specifically, the meta-character policy uses a local observation and character parameters, as input, and outputs an action. In the training phase, the agent randomly samples a character parameter at every episode.
Character Inference
- Once the meta-character policy is fully trained, the agent can infer others' character according to their trajectory.
- Character inference is a process to find a character that explains the observation-action pair of a target agent best. Methodology for character inference is a maximum likelihood estimation with stochastic gradient ascent.
- Complete knowledge about the reward components of the target agent may not always be possible. Our primary purpose is to interpret the behavioral pattern of the target agent by inferring character coefficients about the sketched reward function.
Future Thinking and Decision-making
- After inferring the character of a target agent, the agent is easily predicting the upcoming action of a target agent based on meta-character policy and inferred character.
- Next, the agent leverages learned dynamic model or principle-based dynamic model to foresight the upcoming future, and then make a decision at a predicted upcoming future observation.
Citation
@article{lee2024episodic,
title={Episodic Future Thinking Mechanism for Multi-agent Reinforcement Learning},
author={Lee, Dongsu and Kwon, Minhae},
journal={arXiv preprint arXiv:2410.17373},
year={2024}
}
The website template was borrowed from Seohong Park, Michaƫl Gharbi, and Jon Barron.