We played TensorTrust again, joined the Discord server for the game, and tried out basic to less basic strategies in an attempt to understand how a prompt injection might work (1 hour)
We created a docs of strategies that ended up being successful, some more obscure than others.
We went through week 1 + 2 in BlueDot Impact’s AI Safety Fundamentals Course (30 mins)
Potential of AI:
We deconstruct some arguments for why AGI might not be possible by making these ground assumptions:
AGI is something the world is trying to build
AGI doesn’t need to be sentient, just sufficiently intelligent
Intelligence is probably not a transcendental metaphysical property only humans have
Humans are probably not the most intelligent thing something in the universe can be
We talked about how AI could (and in many ways already can) have massive sway over our workspace, politics, and public opinion
Intro to Alignment:
We talked about outer and inner misalignment is
We also discussed some misconceptions that may make it difficult to talk about misalignment:
Anthropomorphisation: AI is not necessarily sentient, and talking about it with respect to human feelings is counterproductive.
The Terminator Effect: A rogue AI does not need a physical “body” to cause harm to our world, physical harm included.
Appeal to fiction: Probably the hardest part of beginner alignment discussion. The only thing we can base AGI expectations on is sci-fi, so people tend to have some unrealistic expectations.
Resources for learning/funding/career development#
We shared some resources and opportunities for further career development.
Open Philanthropy funding for career development
Global Challenges Project X-risk workshops
AI safety training / research programmes like ARENA, SPAR, MATS