Projects

Quick Takes on LLM General Reasoning

Are LLMs general reasoners? Not just LLMs: You can substitute LLM for foundation models or general purpose transformers If you define “general reasoning” as “thinking carefully and generalising that thinking to any novel domains” then yes, I think current LLMs are definitely capable of it. If you mean any novel domains that humans can generalise to, I lean to yes for more advanced LLMs. If you mean generalisation to literally everything ever, then no, humans are not general reasoners either....

RLHF/RLAIF: An Explainer

(This explainer post is made for a capstone project in HAISN’s 2nd AI Safety Intro cohort. Congrats to BotJP and Pauloda for finishing the project!) Introduction Reinforcement Learning with Human Feedback (RLHF) and Reinforcement Learning with AI Feedback (RLAIF) are important techniques in machine learning. Same with the pupils who are learned from their teachers and ones who learned from other pupils, RLHF involves using human feedback to align AI behavior with human preferences and values, while RLAIF (created by Anthropic through their “Constitutional AI”) relies on AI-generated feedback to achieve similar goals....

Why AI Metrics Are Misleading

What is a benchmark? In the world of AI, we often find it helpful to talk about how well a model performs a task. You might, for example, hear a friend gush over how GPT4 is so much better compared to GPT3 because its answers are so much more accurate and coherent. In an attempt to rigorously assess model performance, researchers had devised a more systematic approach called “benchmarking”. Benchmarks help us evaluate how good our AI models are, for example:...

Can Language Models Determine Where You Live with Just a Single Photo?

TL;DR Members from HAISN participated in a research sprint where they tried to assess AI models’ ability to infer location from single images. And they found out that depending on what clues there are in an image, GPT-4o can land only a few hundreds meters from where you are, or it can be a few continents off. Introduction We all know that AI models are surprisingly good at conversing, summarizing essays, writing code [7] [9] [13] , passing the Turing test [10] and explaining complex topics in the style of Jerry Seinfeld [15] ....

rAInboltBench: Benchmarking user location inference through single images

Abstract This paper introduces rAInboltBench, a comprehensive benchmark designed to evaluate the capability of multimodal AI models in inferring user locations from single images. The increasing proficiency of large language models with vision capabilities has raised concerns regarding privacy and user security. Our benchmark addresses these concerns by analysing the performance of state-of-the-art models, such as GPT-4o, in deducing geographical coordinates from visual inputs. By Le “Qronox” Lam, Aleksandr Popov, Jord Nguyen, Trung Dung “mogu” Hoang, Marcel M, Felix Michalak...

Benchmarking Dark Patterns in LLMs

Abstract This paper builds upon the research in Seemingly Human: Dark Patterns in ChatGPT (Park et al, 2024), by introducing a new benchmark of 392 questions designed to elicit dark pattern behaviours in language models. We ran this benchmark on GPT-4 Turbo and Claude 3 Sonnet, and had them self-evaluate and cross-evaluate the responses By Jord Nguyen, Akash Kundu, Sami Jawhar Read the paper here