Session 4: Intro to RHLF and Forecasting
Intro to RLHF We watched this video as introductory material from Rational Animations about failure trying to do RLHF in OpenAI’s training of GPT2, due to a bug in its reward function. We went over the materials in Week 3 of BlueDot Impact’s AI Safety Fundamentals Course. In it, we discussed what RLHF is and some problems with the technique, both fundamental and tractable. A Walkthrough in Forecasting Motivation: AI Safety is concerned with a breakdown of society due to A(G)I, but as of 17/3/2024, neither instances of these have ever happened in human history....