Reinforcement Learning from Human Feedback: A Systematic Review

Reinforcement Learning from Human Feedback: A Systematic Review

Jiyeon Kim*, Wei Chen

Review Article2023DOIOpen AccessPeer Reviewed

Abstract

This systematic review examines 147 studies on reinforcement learning from human feedback (RLHF) published between 2017 and 2024. We identify key methodological trends, open challenges in reward modeling, and propose a unified taxonomy for RLHF evaluation protocols. The review highlights convergence on Constitutional AI and debate-based approaches for scalable oversight.

Publication Information

AcceptedOctober 5, 2023

Author Information

Jiyeon Kim Corresponding Author
Affiliation: KAIST School of Computing, South Korea
Affiliation: MIT Laboratory for Artificial Intelligence
KeywordsRLHF, reinforcement learning, human feedback, reward modeling, LLM alignment

Additional Information

Views8