Emergent Collective Intelligence from Massive-Agent Cooperation and Competition
The key idea of this paper from Chen et al. is to study the emergence of artificial collective intelligence through massive-agent reinforcement learning. The authors provide evidence that collective intelligence can emerge from massive-agent cooperation and competition, leading to behaviors beyond their expectations.
WHAT THEY DID: The authors propose a new massive-agent reinforcement learning environment, Lux, where dynamic and massive agents in two teams scramble for limited resources and fight off the darkness. The authors use a pixel-to-pixel policy network coupled with Proximal Policy Optimization (PPO) algorithm and Generalized Advantage Estimation (GAE) to avoid the problem of credit assignment.
HOW IT WORKS: The pixel-to-pixel architecture takes images as input observations and uses the ResNet structure as the backbone. The authors design three phases with different rewards as a progressive curriculum to address the sparse reward problem. Through self-play and curriculum learning phases, the authors observe several stages of the massive-agent co-evolution, from atomic skills to group strategies.
TECHNICAL DETAILS: Additionally, the paper provides a detailed description of the Lux AI Challenge environment, which can be used as a benchmark for AI research and development. Furthermore, the paper provides additional implementation details, such as feature engineering, network design, and reinforcement learning algorithm implementation, which can be used to improve AI systems further. This includes information on the PPO implementation, GAE parameters, and discount factors, which can be used to optimize the policy net and estimate the advantage.
WHY IT MATTERS: The findings in this paper are essential for researchers and practitioners in the field because they provide evidence for the existence of a collective intelligence factor in the performance of human groups. This factor is important for understanding how groups of people can work together to achieve better results than individuals working alone. The paper also provides insight into designing AI systems that can learn from and interact with human groups, which can be used to create more effective AI systems.
GO DEEPER: Further readings that will elaborate on the concepts in this paper:
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. ArXiv. https://doi.org/10.48550/arXiv.1707.06347
Schulman, J., Levine, S., Moritz, P., Jordan, M. I., & Abbeel, P. (2015). Trust Region Policy Optimization. ArXiv. https://doi.org/10.48550/arXiv.1502.05477
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. ArXiv. https://doi.org/10.48550/arXiv.1312.5602
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous Methods for Deep Reinforcement Learning. ArXiv. https://doi.org/10.48550/arXiv.1602.01783