Emergent Collective Intelligence from Massive-Agent Cooperation and Competition

The key idea of this paper from Chen et al. is to study the emergence of artificial collective intelligence through massive-agent reinforcement learning. The authors provide evidence that collective intelligence can emerge from massive-agent cooperation and competition, leading to behaviors beyond their expectations.

WHAT THEY DID: The authors propose a new massive-agent reinforcement learning environment, Lux, where dynamic and massive agents in two teams scramble for limited resources and fight off the darkness. The authors use a pixel-to-pixel policy network coupled with Proximal Policy Optimization (PPO) algorithm and Generalized Advantage Estimation (GAE) to avoid the problem of credit assignment. 

HOW IT WORKS: The pixel-to-pixel architecture takes images as input observations and uses the ResNet structure as the backbone. The authors design three phases with different rewards as a progressive curriculum to address the sparse reward problem. Through self-play and curriculum learning phases, the authors observe several stages of the massive-agent co-evolution, from atomic skills to group strategies. 

TECHNICAL DETAILS: Additionally, the paper provides a detailed description of the Lux AI Challenge environment, which can be used as a benchmark for AI research and development. Furthermore, the paper provides additional implementation details, such as feature engineering, network design, and reinforcement learning algorithm implementation, which can be used to improve AI systems further. This includes information on the PPO implementation, GAE parameters, and discount factors, which can be used to optimize the policy net and estimate the advantage.

WHY IT MATTERS: The findings in this paper are essential for researchers and practitioners in the field because they provide evidence for the existence of a collective intelligence factor in the performance of human groups. This factor is important for understanding how groups of people can work together to achieve better results than individuals working alone. The paper also provides insight into designing AI systems that can learn from and interact with human groups, which can be used to create more effective AI systems. 

GO DEEPER: Further readings that will elaborate on the concepts in this paper: 

  1. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. ArXiv. https://doi.org/10.48550/arXiv.1707.06347

  2. Schulman, J., Levine, S., Moritz, P., Jordan, M. I., & Abbeel, P. (2015). Trust Region Policy Optimization. ArXiv. https://doi.org/10.48550/arXiv.1502.05477

  3. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. ArXiv. https://doi.org/10.48550/arXiv.1312.5602

  4. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous Methods for Deep Reinforcement Learning. ArXiv. https://doi.org/10.48550/arXiv.1602.01783

Abhishek Gupta

Founder and Principal Researcher, Montreal AI Ethics Institute

Director, Responsible AI, Boston Consulting Group (BCG)

Fellow, Augmented Collective Intelligence, BCG Henderson Institute

Chair, Standards Working Group, Green Software Foundation

Author, AI Ethics Brief and State of AI Ethics Report

https://www.linkedin.com/in/abhishekguptamcgill/
Previous
Previous

Wikipedia’s Balancing Act: A Tool for Collective Intelligence or Mass Surveillance?

Next
Next

Be careful with ChatGPT