Today, artificial intelligence can describe images, recognize objects, and explain complex relationships. And the progress has been enormous: so-called vision-language models (VLMs) combine text and image understanding in impressive ways. Yet they stumble when faced with what seems like a simple task: counting. Researchers at the Institute for Information Systems at Hof University of Applied Sciences (iisys) now aim to address this issue.

“Many of the current models are very good at recognizing what is visible in an image—but they cannot reliably determine how many objects there are,” explains Prof. Dr. René Peinl from the Institute for Information Systems (iisys) at Hof University of Applied Sciences. Errors become increasingly frequent when there are more than four or five objects of a single type.
Why Counting Is So Difficult for AI
The problem runs deeper than it appears at first glance. While humans can intuitively grasp small quantities, larger numbers must be actively counted. It is precisely this step that many AI models lack. Furthermore, existing training data is often unsuitable.
“Some datasets are too simple and only promote pattern recognition—others are too complex or flawed, for example due to hidden objects or unclear questions.”
Prof. Dr. René Peinl, Institute Director
The result: models “guess” or fall back on learned expectations—sometimes with surprisingly incorrect results.
The solution from Hof: An artificial dataset
To specifically address this problem, iisys developed the SITUATE dataset. Instead of relying on real photos, the researchers generate artificial 3D scenes with clearly defined properties. “We wanted to create an environment where we can precisely control what happens in the image—and what doesn’t,” says Prof. Dr. René Peinl. The scenes created in this way contain geometric objects such as cubes, spheres, or cylinders, with positions clearly defined (e.g., “to the left of the table”), allowing specific questions to be asked, such as about the color, number, or location of the objects. This creates a training environment that is not based on chance but specifically trains certain skills.
Learning through structure rather than chance
A unique aspect of the project is the way the AI learns to count. In addition to simple answers, detailed explanations are also used, in which the AI describes step by step what it sees and how it counts. An example: “There are two objects on the table, three next to it—five in total.” This so-called “chain-of-thought” approach is effective—at least with larger numbers.
“We see that this structured approach significantly improves the models’ performance when it comes to more complex counting tasks.”
Prof. Dr. René Peinl
However, this method also has its limits: With small numbers, the AI tends to “invent” additional objects using this logic in order to stay true to its own line of reasoning.
Better results—and new insights
The experiments clearly show: AI models trained with SITUATE generalize better. “A combination of different datasets yields the best results in the test series. However, we see that the type of training strongly influences how the AI thinks. What’s particularly exciting is that the models exhibit behavioral patterns reminiscent of humans. Small quantities are quickly grasped, while larger ones require structured strategies,” says Prof. Peinl. At the same time, it becomes clear that AI often does not develop a “true” concept of numbers, but rather learns visual patterns.



Implications for the Future of AI
The research from Hof also shows that advances in artificial intelligence depend not only on ever-larger models—but above all on better data and well-thought-out training methods. “Our dataset shows that one can specifically address the weaknesses of the models and that synthetic, i.e., computer-generated data, is not automatically bad,” emphasizes Peinl.

A building block for more reliable AI systems
Whether in industry, medicine, or logistics—many applications rely on AI not only to recognize, but also to count precisely and interpret correctly. With SITUATE, the iisys at Hof University is making an important contribution to improving precisely these capabilities. Following the success of the first test, a second, significantly more diverse dataset is currently being developed, which will enable the learning of even more sophisticated counting strategies.