Session 3: Potential of AI & Introduction to AI Alignment

mogu_apartment

TensorTrust Brainstorming

We played TensorTrust again, joined the Discord server for the game, and tried out basic to less basic strategies in an attempt to understand how a prompt injection might work (1 hour)
We created a docs of strategies that ended up being successful, some more obscure than others.

We went through week 1 + 2 in BlueDot Impact’s AI Safety Fundamentals Course (30 mins)
Potential of AI:
- We deconstruct some arguments for why AGI might not be possible by making these ground assumptions:
  - AGI is something the world is trying to build
  - AGI doesn’t need to be sentient, just sufficiently intelligent
  - Intelligence is probably not a transcendental metaphysical property only humans have
  - Humans are probably not the most intelligent thing something in the universe can be
- We talked about how AI could (and in many ways already can) have massive sway over our workspace, politics, and public opinion
Intro to Alignment:
- We talked about outer and inner misalignment is
- We also discussed some misconceptions that may make it difficult to talk about misalignment:
  - Anthropomorphisation: AI is not necessarily sentient, and talking about it with respect to human feelings is counterproductive.
  - The Terminator Effect: A rogue AI does not need a physical “body” to cause harm to our world, physical harm included.
  - Appeal to fiction: Probably the hardest part of beginner alignment discussion. The only thing we can base AGI expectations on is sci-fi, so people tend to have some unrealistic expectations.

We shared some resources and opportunities for further career development.
- Open Philanthropy funding for career development
- Global Challenges Project X-risk workshops
- AI safety training / research programmes like ARENA, SPAR, MATS
- Rationality camps