Quick Summary
- Research Findings: AI systems struggle with basic tasks like reading an analogue clock and determining calendar dates. Success rates were low: 38.7% for clocks and 26.3% for calendars.
- Study Details: The findings were presented at the 2025 International Conference on Learning Representations (ICLR) and published on arXiv, though not yet peer-reviewed.
- Testing Parameters: Custom datasets of clock and calendar images were tested across various multimodal large language models (mllms), including Meta’s Llama 3.2-Vision, Anthropic’s Claude-3.5 Sonnet, Google’s Gemini 2.0, and OpenAI’s GPT-4o.
- Challenges for AI: Tasks requiring spatial reasoning-like detecting overlapping hands on clocks or handling diverse designs-proved difficult for AI models compared to humans’ innate abilities.
- Calendar Issues: Despite access to examples explaining leap years in training data, AI fails abstract reasoning tasks like counting specific days into the year (“What day will the 153rd day of the year be?”).
- reasoning Gap: Unlike traditional computers that run math algorithms consistently, large language models predict outputs based on patterns rather than explicit rules or logical processes.
!Image credit: Alamy
Indian Opinion Analysis
The research underscores a crucial limitation in current AI systems: their inability to reliably perform tasks involving spatial logic or calculated reasoning despite advances in natural language processing and other domains like image generation or coding facilitation. For India-a country increasingly focusing on automation technologies-the study suggests prioritizing robust testing frameworks before integrating such MLLMs into sensitive sectors such as scheduling transportation networks or creating assistive tools.
India could leverage these insights by fostering interdisciplinary research collaborations between developers of machine learning algorithms and human factors specialists who understand how spatial cognition works in humans versus machines. Moreover, targeted training datasets adapted to culturally specific experiences (e.g., unique regional calendar designs) could improve outcome accuracy while grassroots education about responsible use ensures public trust remains intact.
Read more: AI Models Can’t Tell Time Or Read A Calendar – Study Reveals