Prof. Dr. René Peinl from the Institute for Information Systems at Hof University of Applied Sciences (iisys) shows in a paper how language models can help to avoid distortions in AI-generated images.

Artificial intelligence now generates amazingly realistic images – but it has a problem: many of these images reflect stereotypical ideas. Women tend to appear as nurses, men as firefighters. Certain ethnic groups are more frequently depicted in problematic contexts. Such biases are not only annoying, but can also reinforce social prejudices.
A recent paper by Prof. Peinl from Hof University of Applied Sciences now shows a promising way to reduce these biases – without changing the image AI itself.
The solution: language models revise the user input
The approach is as simple as it is effective: instead of sending a prompt (e.g. “a doctor”) directly to the image AI, it is first “translated” by a large language model (LLM) such as ChatGPT or Claude. These models analyze what information might be missing – such as gender, age or skin color – and use it to formulate a more diverse description. The result is not just a single image, but a whole range of representations.
The surprising thing: The images generated in this way are not only fairer, but often also more creative and visually appealing than those from the original prompts.”
Prof. Dr. René Peinl

Results with depth
The study compared over 2400 images with and without prompt revision. What was particularly striking was that while neutral prompts such as “a happy family” previously showed almost exclusively white, heterosexual couples, the LLMs ensured significantly more diversity – both in terms of ethnicity and gender, age or body shape.
In the case of professions such as doctors or soldiers, the language models usually succeeded well in breaking down stereotypical representations. In individual cases, however, there were overcorrections – for example, when “a soldier” suddenly became four female soldiers or an image AI depicted a figure with a bird’s head because the prompt was too extravagant.
Not perfect, but trend-setting
Of course, the use of language models is not a panacea. They require computing time, cause additional waiting times – and with very specific prompts, they may not correctly capture the user’s intention. Nevertheless, the study shows that the combination of AI and ethical sensitivity can work.
Prof. Peinl sees great potential for using this technology in practice – for example through smaller, specially trained language models that could be integrated directly into image generators. “In the future, even country-specific or personalized prompt adaptations could become possible,” says the researcher.
Less bias, more diversity
The study makes an important contribution to the question of how we can use AI to generate images responsibly. It shows that even simple interventions – such as reformulating a prompt using a language model – can make a big difference. And that “diverse” can not only be politically correct, but also visually exciting.