Reinforcement Learning from Human Feedback: A Systematic Review

Journal of Artificial Intelligence Research • Vol. 12, No. 4

Review Article2023DOIOpen AccessPeer Reviewed

Abstract

This systematic review examines 147 studies on reinforcement learning from human feedback (RLHF) published between 2017 and 2024. We identify key methodological trends, open challenges in reward modeling, and propose a unified taxonomy for RLHF evaluation protocols. The review highlights convergence on Constitutional AI and debate-based approaches for scalable oversight.