Abstract

This paper introduces rAInboltBench, a comprehensive benchmark designed to evaluate the capability of multimodal AI models in inferring user locations from single images. The increasing proficiency of large language models with vision capabilities has raised concerns regarding privacy and user security. Our benchmark addresses these concerns by analysing the performance of state-of-the-art models, such as GPT-4o, in deducing geographical coordinates from visual inputs.

By Le “Qronox” Lam, Aleksandr Popov, Jord Nguyen, Trung Dung “mogu” Hoang, Marcel M, Felix Michalak

Read the paper here

This project is currently in development at the Apart Lab Fellowship.