Session 4: Intro to RHLF and Forecasting

Intro to RLHF We watched this video as introductory material from Rational Animations about failure trying to do RLHF in OpenAI’s training of GPT2, due to a bug in its reward function. We went over the materials in Week 3 of BlueDot Impact’s AI Safety Fundamentals Course. In it, we discussed what RLHF is and some problems with the technique, both fundamental and tractable. A Walkthrough in Forecasting Motivation: AI Safety is concerned with a breakdown of society due to A(G)I, but as of 17/3/2024, neither instances of these have ever happened in human history....

4/4/2024

Session 3.5: Guest Speaker AMA

The Q&A details can be found here: session 3 qna We asked Jason from Apart Research a few questions we have on AI Safety research and career development, and he was very patient and understanding of us. Very cool!

4/4/2024

Session 3: Potential of AI & Introduction to AI Alignment

TensorTrust Brainstorming We played TensorTrust again, joined the Discord server for the game, and tried out basic to less basic strategies in an attempt to understand how a prompt injection might work (1 hour) We created a docs of strategies that ended up being successful, some more obscure than others. Potential of AI & Introduction to AI Alignment We went through week 1 + 2 in BlueDot Impact’s AI Safety Fundamentals Course (30 mins) Potential of AI: We deconstruct some arguments for why AGI might not be possible by making these ground assumptions: AGI is something the world is trying to build AGI doesn’t need to be sentient, just sufficiently intelligent Intelligence is probably not a transcendental metaphysical property only humans have Humans are probably not the most intelligent thing something in the universe can be We talked about how AI could (and in many ways already can) have massive sway over our workspace, politics, and public opinion Intro to Alignment: We talked about outer and inner misalignment is We also discussed some misconceptions that may make it difficult to talk about misalignment: Anthropomorphisation: AI is not necessarily sentient, and talking about it with respect to human feelings is counterproductive....

4/4/2024

Session 2: Risks from Learnt Optimization

Mesa-optimization and inner misalignment We read and discussed this sequence / paper live (3 hours) The really high-level gist of it was that: In a machine learning setup, the gradient descent process (base optimizer) finds a model that does a task (base objective) well Base optimizer finds in the sea of options and possibilities an algorithm (model) it “likes” and uses it to solve its problem. The model could be doing optimization itself....

4/4/2024

Session 1: Introduction to Machine Learning and AI safety

Goals and benefits of and what even is UAAR Understanding AI, the problem of AI safety, and its importance Make new friends Network with people in the field Funding/resources members are eligible to receive: including but not limited to books, compute resources for projects, free drinks and snacks at meetings, etc. Opportunities members are opened up to: research fellowships, part-time and full-time work opportunities, funding for projects and career development. Making real, visible progress in AI safety & getting work experience (by participating in research sprints, other hackathon, doing self research, writing papers, and more) And always remember to have fun :) ML background knowledge We recapped a bit of the two 3B1B videos on Neural Networks and Gradient Descent....

2/4/2024