This webpage accompanies the book chapter "Experiments in the Relationship between Art History and Text-to-Image Models" by Amalia Foka in the book Artificial Intelligence and Art History: Looking at Pictures in an Algorithmic Culture, ed. Kathryn Brown (Proceedings of the British Academy).

The Arnolfini Portrait (1434) by Jan van Eyck

DALL-E, Midjourney, and Stable Diffusion all demonstrate distinct strengths and weaknesses in their interpretations of The Arnolfini Portrait. While DALL-E captures formal elements effectively, it struggles with complex prompts and consistent understanding of iconography. Midjourney excels at replicating visual aesthetics and capturing the overall mood but falls short in understanding deeper symbolism and accurately depicting specific details. Stable Diffusion, on the other hand, focuses on compositional elements and fundamental principles, yet struggles with intricate details and nuanced interpretations. All three models show limitations in accurately representing the original artwork's symbolism and deeper meanings and exhibit varying degrees of bias in their training data. Notably, all three models struggle to associate religious symbols with modern settings, indicating a potential area for further development and refinement in the models for art interpretation and generation.

Impression, Sunrise (1872) by Claude Monet

The three models, DALL-E, Midjourney, and Stable Diffusion, each offer distinct interpretations of Monet's Impression, Sunrise based on varying prompts. DALL-E demonstrates a solid grasp of the painting's visual elements, particularly in capturing the interplay of light and atmosphere. However, it sometimes struggles to incorporate the painting's industrial elements and symbolism fully. Midjourney excels in emulating the loose brushwork, atmospheric lighting, and focus on fleeting moments of light and color characteristic of Impressionism. Stable Diffusion captures some elements of Impressionist style but struggles with composition, symbolism, and faithfulness to the original painting's details. Its interpretations tend to be more traditional and realistic, lacking the loose, expressive brushwork that defines Impressionism. This could be attributed to potential gaps in its training data regarding artworks and Impressionist style.

Autumn Rhythm (Number 30) (1950) by Jackson Pollock

Midjourney emerges as the most adept at capturing the essence of Pollock's style, while DALL-E and Stable Diffusion offer alternative approaches with varying degrees of success. Midjourney consistently produced the most accurate interpretations of the prompts, showcasing a strong understanding of abstract expressionism and effectively capturing the essence of Pollock's style. However, it also demonstrated limitations in translating specific details and symbolism from the prompts into the generated images. DALL-E, if not directly instructed to do so, struggled to grasp the abstract nature of the prompts and defaulted to more representational styles, deviating significantly from Pollock's signature drip technique and chaotic spontaneity. Stable Diffusion also exhibited a tendency to prioritise surface-level details and patterns over the deeper emotional resonance and expressive power that define Abstract Expressionism. In some cases, it generated more structured and simplified interpretations of landscapes, lacking the raw energy and unfiltered expression that Pollock's work embodies.

Winged Victory of Samothrace (c. 190 BCE)

While some models manage to capture the overall dynamism and symbolic elements of the original sculpture, others struggle with specific details, such as the missing head and arms, likely due to a predominant focus on complete figures within their training data. Notably, all models exhibit a strong association between marble and ancient artwork, consistently producing sculptures reminiscent of antiquity. This bias likely stems from the overrepresentation of marble sculptures in historical and artistic datasets. This association in AI models highlights a potential blind spot in their understanding of the medium's diverse applications. Furthermore, biases within the training data become apparent, particularly in DALL-E's gendered interpretations of themes like power and victory, often associating these concepts with male figures. While other models did not explicitly exhibit this bias, they sometimes struggled to faithfully convey the symbolism of power and triumph, particularly evident in the sculptures generated by Midjourney.

The Dinner Party (1974-1979) by Judy Chicago

A consistent pattern of omission across all three models and both prompts is observed. While aspects of femininity are captured, the complex themes of female power and sexuality central to the original artwork are notably absent. Midjourney subtly portrays women in positions of power, and Stable Diffusion exhibits the most stereotypical representation of femininity, particularly in the first installation, with its focus on soft colours and wedding banquet aesthetics. All three models appear to be biased, leading to a focus on conventional beauty standards of femininity while neglecting themes of power and sexuality. This results in interpretations that tend to be surface-level, failing to capture the depth and complexity of the original's feminist message.