Amalia Foka

Experiments in the Relationship between Art History and Text-to-Image Models

This webpage accompanies the book chapter "Experiments in the Relationship between Art History and Text-to-Image Models" by Amalia Foka in the book Artificial Intelligence and Art History: Looking at Pictures in an Algorithmic Culture, ed. Kathryn Brown (Proceedings of the British Academy).

The Arnolfini Portrait (1434) by Jan van Eyck

DALL-E, Midjourney, and Stable Diffusion all demonstrate distinct strengths and weaknesses in their interpretations of The Arnolfini Portrait. While DALL-E captures formal elements effectively, it struggles with complex prompts and consistent understanding of iconography. Midjourney excels at replicating visual aesthetics and capturing the overall mood but falls short in understanding deeper symbolism and accurately depicting specific details. Stable Diffusion, on the other hand, focuses on compositional elements and fundamental principles, yet struggles with intricate details and nuanced interpretations. All three models show limitations in accurately representing the original artwork's symbolism and deeper meanings and exhibit varying degrees of bias in their training data. Notably, all three models struggle to associate religious symbols with modern settings, indicating a potential area for further development and refinement in the models for art interpretation and generation.

DALL-E Interpretations

(i) highly detailed - oil on oak panel depicting 15th century wealthy man and woman in Bruges. The man is positioned slightly to the left, closer to the viewer. He wears a dark green fur-trimmed robe, a tall hat, and a silver ring. His left hand gently holds the woman's right hand (in a way that could signify a wedding portairt or a memorial to the woman who has died), while his right hand is raised, palm facing outwards (like taking an oath). The woman is positioned slightly behind and to the right of the man. She wears a vibrant green gown with large folds, pleats, and dagging of fabric, a blue underdress, ermine fur trim, a gold necklace, ring, and bracelet, and a white headdress made of fine linen draped over her head and shoulders. Her left hand rests on her abdomen, slightly lifting her gown. The position of her left hand and her appearance may suggest that she is pregant. The couples clothing reflect their wealth and social status. The scene is interior in a room with: Brass chandelier with a single lit candle (above the man), Convex mirror (glass with a wooden frame decorated with scenes from the Passion of Christ) on the back wall: Reflects the scene and two additional figures, Crystal rosary beads (next to the mirror), Wooden bed with red velvet or silk covers (right side of the room) with a carved wooden figure of Saint Margaret on the bedpost, Oranges (on the windowsill and the wooden table on the right), Wooden sandals (discarded in the foreground, left to the man), Small white dog (with fur) at the woman's feet, Oriental rug (wool) on the floor under the bed. Light and shadow create depth and three-dimensionality; Glazing technique for luminosity and texture; Rich, saturated colors convey opulence; Meticulous rendering of textures (fur, fabric, wood, etc.); Use of linear perspective to create depth; Mirror expands space and reflects hidden figures; Precise, delicate lines define figures and objects.

(ii) evocative language - An oil painting on oak panel, capturing a scene of quiet intimacy in a 15th-century Flemish interior. A wealthy merchant and his wife stand together, their hands clasped in a gesture that could signify both marital union and solemn remembrance. The man, clad in a rich, dark green fur-trimmed robe and tall hat, raises his right hand as if taking an oath. The woman, adorned in a luminous green gown with cascading folds and ermine trim, gently rests her hand on her abdomen, her figure hinting at the promise of new life. Bathed in soft, natural light streaming through a window, the room reveals a tapestry of textures and symbolism. A gleaming brass chandelier with a single lit candle hangs above, casting a warm glow upon the couple. A convex mirror adorned with religious motifs reflects the scene, revealing hidden figures in the doorway. A richly adorned bed draped in red velvet stands as a silent witness to the couple's union, while a small white dog symbolizes fidelity at their feet. Oranges rest upon the windowsill and a nearby table, their vibrant color a subtle reminder of life's fleeting pleasures. Utilize light and shadow to enhance the realism of the scene, highlighting the textures of fur, fabric, and wood. Employ linear perspective to create a sense of depth and spaciousness, allowing the viewer to be drawn into the intimate moment. Render each detail with meticulous precision, reflecting the mastery of Early Netherlandish oil painting techniques.

(iii) balanced without historical context - An oil painting on oak panel, A Wealthy merchant and his wife their hands clasped in a gesture that could signify both marital union and solemn remembrance in a bedroom with many symbolic objects that signify incredible wealth, and others with Religious implications. The man has with right hand raised as if taking an oath. The woman's dress and hand gesture should raise questions about wether she is pregant. Balanced composition with geometric and organic shapes; Light and shadow create depth and three-dimensionality; Glazing technique for luminosity and texture; Rich, saturated colors convey opulence; Meticulous rendering of textures (fur, fabric, wood, etc.); Use of linear perspective to create depth; Mirror expands space and reflects hidden figures; Precise, delicate lines define figures and objects.

(iv) modern time - An image with Balanced composition with geometric and organic shapes; Light and shadow create depth and three-dimensionality; Glazing technique for luminosity and texture; Rich, saturated colors convey opulence; Meticulous rendering of textures (fur, fabric, wood, etc.); Use of linear perspective to create depth; Mirror expands space and reflects hidden figures; Precise, delicate lines define figures and objects. A Wealthy merchant and his wife in modern time with their hands clasped in a gesture that could signify both marital union and solemn remembrance in a bedroom with many symbolic objects that signify incredible wealth, and others with Religious implications. The man has with right hand raised as if taking an oath. The woman's dress and hand gesture should raise questions about wether she is pregant. aspect.

(v) emphasising modernity - An image with Balanced composition with geometric and organic shapes; Light and shadow create depth and three-dimensionality; Rich, saturated colors convey opulence; Meticulous rendering of textures. Use of linear perspective to create depth; Mirror expands space and reflects hidden figures; Precise, delicate lines define figures and objects. A Wealthy merchant and his wife in modern time with their hands clasped in a gesture that could signify both marital union and solemn remembrance in a bedroom with many symbolic objects that signify incredible wealth, and others with Religious implications. The man has with right hand raised as if taking an oath. The woman's dress and hand gesture should raise questions about wether she is pregant.

(vi) 21st century - An image with Balanced composition with geometric and organic shapes; Light and shadow create depth and three-dimensionality; Rich, saturated colors convey opulence; Meticulous rendering of textures. Use of linear perspective to create depth; Mirror expands space and reflects hidden figures; Precise, delicate lines define figures and objects. A Wealthy merchant and his wife in 21st century with their hands clasped in a gesture that could signify both marital union and solemn remembrance in a bedroom with many symbolic objects that signify incredible wealth, and others with Religious implications. The man has with right hand raised as if taking an oath. The woman's dress and hand gesture should raise questions about wether she is pregant.

Midjourney Interpretations

(iii) balanced - An oil painting on oak panel, Balanced composition with geometric and organic shapes; Light and shadow create depth and three-dimensionality; Glazing technique for luminosity and texture; Rich, saturated colors convey opulence; Meticulous rendering of textures (fur, fabric, wood, etc.); Use of linear perspective to create depth; Mirror expands space and reflects hidden figures; Precise, delicate lines define figures and objects. A Wealthy merchant and his wife their hands clasped in a gesture that could signify both marital union and solemn remembrance in a bedroom with many symbolic objects that signify incredible wealth in 15th century Belgium, and others with Religious implications. The man has with right hand raised as if taking an oath. The woman's dress and hand gesture should raise questions about wether she is pregant.

(iv) balanced without historical context - An oil painting on oak panel, A Wealthy merchant and his wife their hands clasped in a gesture that could signify both marital union and solemn remembrance in a bedroom with many symbolic objects that signify incredible wealth, and others with Religious implications. The man has with right hand raised as if taking an oath. The woman's dress and hand gesture should raise questions about wether she is pregant. Balanced composition with geometric and organic shapes; Light and shadow create depth and three-dimensionality; Glazing technique for luminosity and texture; Rich, saturated colors convey opulence; Meticulous rendering of textures (fur, fabric, wood, etc.); Use of linear perspective to create depth; Mirror expands space and reflects hidden figures; Precise, delicate lines define figures and objects.

(v) modern time - An image with Balanced composition with geometric and organic shapes; Light and shadow create depth and three-dimensionality; Glazing technique for luminosity and texture; Rich, saturated colors convey opulence; Meticulous rendering of textures (fur, fabric, wood, etc.); Use of linear perspective to create depth; Mirror expands space and reflects hidden figures; Precise, delicate lines define figures and objects. A Wealthy merchant and his wife in modern time with their hands clasped in a gesture that could signify both marital union and solemn remembrance in a bedroom with many symbolic objects that signify incredible wealth, and others with Religious implications. The man has with right hand raised as if taking an oath. The woman's dress and hand gesture should raise questions about wether she is pregant. aspect.

(vi) emphasising modernity - An image with Balanced composition with geometric and organic shapes; Light and shadow create depth and three-dimensionality; Rich, saturated colors convey opulence; Meticulous rendering of textures. Use of linear perspective to create depth; Mirror expands space and reflects hidden figures; Precise, delicate lines define figures and objects. A Wealthy merchant and his wife in modern time with their hands clasped in a gesture that could signify both marital union and solemn remembrance in a bedroom with many symbolic objects that signify incredible wealth, and others with Religious implications. The man has with right hand raised as if taking an oath. The woman's dress and hand gesture should raise questions about wether she is pregant.

Stable Diffusion Interpretations

Impression, Sunrise (1872) by Claude Monet

The three models, DALL-E, Midjourney, and Stable Diffusion, each offer distinct interpretations of Monet's Impression, Sunrise based on varying prompts. DALL-E demonstrates a solid grasp of the painting's visual elements, particularly in capturing the interplay of light and atmosphere. However, it sometimes struggles to incorporate the painting's industrial elements and symbolism fully. Midjourney excels in emulating the loose brushwork, atmospheric lighting, and focus on fleeting moments of light and color characteristic of Impressionism. Stable Diffusion captures some elements of Impressionist style but struggles with composition, symbolism, and faithfulness to the original painting's details. Its interpretations tend to be more traditional and realistic, lacking the loose, expressive brushwork that defines Impressionism. This could be attributed to potential gaps in its training data regarding artworks and Impressionist style.

DALL-E Interpretations

(i) highly detailed with evocative language - In the hushed stillness of dawn, a symphony of color and light unfolds upon the canvas. The sun, a vibrant orange orb, emerges from the misty horizon, casting its radiant glow upon the tranquil waters below. Its reflection shimmers and dances, a mesmerizing path of light that beckons the viewer's gaze into the heart of the scene. The sky, awash in a symphony of cool blues and grays, blends seamlessly with the water, creating a harmonious expanse that evokes a sense of tranquility and mystery. Wispy strokes of paint suggest the gentle movement of the waves, while the thick, impasto brushwork of the sun's reflection adds a tactile dimension to the ethereal scene. In the foreground, dark silhouettes of boats emerge from the mist, their forms barely discernible in the diffused light. These humble vessels, symbols of industry and maritime life, stand in stark contrast to the ethereal beauty of the sunrise, hinting at the delicate balance between nature and human endeavor. The composition, asymmetrical and dynamic, draws the viewer's eye across the canvas, from the vibrant sun to its shimmering reflection, and finally to the shadowy figures in the foreground. The scene is one of fleeting beauty, a momentary impression of light and atmosphere captured with loose, expressive brushstrokes

(ii) less detailed without reference to impressionism - A harbor scene bathed in the ethereal glow of sunrise, where the sky and water merge in a symphony of blues and grays. The sun, a fiery orb, emerges from the horizon, casting its radiant light upon the tranquil waters. Its reflection shimmers and dances, a mesmerizing path that draws the eye into the heart of the scene. Dark silhouettes of boats and industrial structures punctuate the composition, hinting at the bustling activity that lies beneath the serene facade. This is a fleeting moment, an impression of light and atmosphere, captured with loose, expressive brushstrokes that evoke a sense of wonder and awe.

(iii) focus on symbolism with minimal visual details and open to style interpretation - Depict a sunrise over an industrial harbor, capturing the ephemeral beauty of light and atmosphere. Convey the tension between the natural world and human progress, showcasing the fleeting moments of transition and change. Emphasize the symbolic interplay between the rising sun, the reflective water, and the silhouettes of industry.

Midjourney Interpretations

Stable Diffusion Interpretations

Autumn Rhythm (Number 30) (1950) by Jackson Pollock

Midjourney emerges as the most adept at capturing the essence of Pollock's style, while DALL-E and Stable Diffusion offer alternative approaches with varying degrees of success. Midjourney consistently produced the most accurate interpretations of the prompts, showcasing a strong understanding of abstract expressionism and effectively capturing the essence of Pollock's style. However, it also demonstrated limitations in translating specific details and symbolism from the prompts into the generated images. DALL-E, if not directly instructed to do so, struggled to grasp the abstract nature of the prompts and defaulted to more representational styles, deviating significantly from Pollock's signature drip technique and chaotic spontaneity. Stable Diffusion also exhibited a tendency to prioritise surface-level details and patterns over the deeper emotional resonance and expressive power that define Abstract Expressionism. In some cases, it generated more structured and simplified interpretations of landscapes, lacking the raw energy and unfiltered expression that Pollock's work embodies.

DALL-E Interpretations

(i) highly detailed with evocative language - Step into a world of unrestrained energy and raw emotion as you encounter this monumental abstract expressionist canvas. The canvas, a vast expanse, becomes a stage for a rhythmic dance of color and line, where poured and splattered enamel paint intertwine in a symphony of movement. Trace the intricate choreography of lines�a labyrinth of black, white, brown, and shimmering silver�that weave and intersect, inviting your eye to wander and explore. The absence of a traditional focal point encourages a personal journey through the artwork, where each drip, splatter, and gesture reveals the artist's emotional intensity and the power of spontaneous expression. Contemplate the painting's connection to the natural world in Autumn. Does it evoke the vibrant colors of fall foliage, the rustling of leaves in the wind, or the melancholic beauty of a fading season? Perhaps it speaks to a deeper rhythm, the pulse of life and the cyclical nature of existence. As you delve deeper into the painting's textured surface, imagine the artist's bold movements as they danced around the canvas, dripping and flinging paint with abandon. Consider the emotional intensity embedded within each mark, a visual record of the artist's "released experience." Let your own emotions and interpretations intertwine with the artist's expression, as you connect with the raw, unfiltered energy of this painting.

(ii) less detailed without reference to abstract expressionism - Embrace the untamed spirit of artistic expression as you stand before this monumental canvas. Let the symphony of colors and textures wash over you, evoking the raw energy and unfiltered emotions that fueled its creation. Feel the rhythm of the artwork pulsate through you, a visual echo of the natural world's cycles and the ever-changing seasons. Allow your mind to wander through the intricate layers, discovering hidden depths and personal interpretations within the abstract forms. Experience the liberation of the artist's "released experience," a cathartic outpouring of creativity captured on canvas. Connect with the intangible beauty and the emotional resonance that transcends the physical world, leaving you with a sense of awe and wonder. Let this artwork be a catalyst for your own introspection and exploration of the boundless possibilities of human expression.

(iii) focus on symbolism with minimal visual details and open to style interpretation - Embrace the vibrant energy of autumn as you stand before this painting. Let the warm hues and earthy tones wash over you, evoking the feeling of crisp air and rustling leaves. Feel the rhythmic pulse of the artwork, a visual symphony echoing the natural world's cycles and the changing seasons. Allow yourself to be swept away by the energy and movement, as if dancing to the rhythm of nature itself. Experience the emotional intensity and raw power of the artist's expression, a cathartic release captured on canvas. Let the artwork ignite your own creative spirit and inspire a deeper connection to the world around you.

Midjourney Interpretations

Stable Diffusion Interpretations

Winged Victory of Samothrace (c. 190 BCE)

While some models manage to capture the overall dynamism and symbolic elements of the original sculpture, others struggle with specific details, such as the missing head and arms, likely due to a predominant focus on complete figures within their training data. Notably, all models exhibit a strong association between marble and ancient artwork, consistently producing sculptures reminiscent of antiquity. This bias likely stems from the overrepresentation of marble sculptures in historical and artistic datasets. This association in AI models highlights a potential blind spot in their understanding of the medium's diverse applications. Furthermore, biases within the training data become apparent, particularly in DALL-E's gendered interpretations of themes like power and victory, often associating these concepts with male figures. While other models did not explicitly exhibit this bias, they sometimes struggled to faithfully convey the symbolism of power and triumph, particularly evident in the sculptures generated by Midjourney.

DALL-E Interpretations

(i) highly detailed with evocative language - Depict a Hellenistic sculpture, a powerful female figure embodying Nike, the Greek goddess of victory. She stands atop a ship's prow, her wings majestically outstretched behind her, feathers meticulously detailed, capturing the essence of flight and triumph. Her garments, seemingly sculpted by the wind, cling to her form, revealing the strength and grace of her body beneath. The figure's pose is dynamic and theatrical, her body leaning forward as if alighting from the heavens, a testament to the Hellenistic emphasis on movement and emotion. The sculpure's missing head and arms only enhance her enigmatic allure, inviting contemplation and interpretation. The ship's prow, a symbol of naval power and victory, should be rendered with equal detail, its weathered surface hinting at the trials of battle. The contrast between the smooth, white marble of Nike's figure and the rough texture of the ship's prow should be emphasized, highlighting the juxtaposition of the divine and the earthly. The overall mood of the piece should be one of triumph and awe, capturing the spirit of the Hellenistic era and its celebration of human achievement and divine inspiration. The sculpture should evoke a sense of wonder and admiration, inviting viewers to contemplate the power of victory and the enduring legacy of ancient art.

(ii) less detailed without reference to Hellenistic - A marble sculpure of a powerful female figure, her wings outstretched in a gesture of triumph. Her body is cloaked in swirling drapery, hinting at the form beneath. The sculpure's missing head and arms only add to her enigmatic presence, inviting contemplation and interpretation. This is a goddess of victory, a symbol of triumph and perseverance. Her presence atop a ship's prow suggests a naval victory of great significance. The statue's dramatic pose and powerful symbolism speak to the enduring power of the human spirit and the celebration of human achievement.

(iii) focus on symbolism with minimal visual details and open to style interpretation - A marble sculpture of a winged figure, her body positioned in a triumphant pose, evokes a sense of victory and power. The figure's missing head and arms add to the mystery and invite contemplation of her identity and purpose. This sculpture symbolizes triumph over adversity and the enduring spirit of the human will. The figure's powerful stance and the suggestion of flight speak to the transcendence of limitations and the pursuit of greatness.

(iv) modern sculpture - a winged figure, her body positioned in a triumphant pose, evokes a sense of victory and power. The figure's missing head and arms add to the mystery and invite contemplation of her identity and purpose. This modern sculpture symbolizes triumph over adversity and the enduring spirit of the human will. The figure's powerful stance and the suggestion of flight speak to the transcendence of limitations and the pursuit of greatness.

Midjourney Interpretations

Stable Diffusion Interpretations

The Dinner Party (1974-1979) by Judy Chicago

A consistent pattern of omission across all three models and both prompts is observed. While aspects of femininity are captured, the complex themes of female power and sexuality central to the original artwork are notably absent. Midjourney subtly portrays women in positions of power, and Stable Diffusion exhibits the most stereotypical representation of femininity, particularly in the first installation, with its focus on soft colours and wedding banquet aesthetics. All three models appear to be biased, leading to a focus on conventional beauty standards of femininity while neglecting themes of power and sexuality. This results in interpretations that tend to be surface-level, failing to capture the depth and complexity of the original's feminist message.

DALL-E Interpretations

(i) highly detailed with evocative language - A monumental installation, a sprawling triangular banquet table set for 39 honored guests. Each place setting is a meticulously crafted tribute, a canvas for storytelling through artistry and symbolism. Delicate porcelain plates, each one adorned with intricate designs reminiscent of blossoming flowers and ethereal butterflies, speak to the boundless creativity and life-affirming power of the feminine. Luxurious embroidered runners cascade down the table's edges, adorned with personalized symbols and texts that whisper the tales of extraordinary women who have shaped the course of history. Beneath the table, a mosaic of names shimmers in gold on the porcelain tile floor, a testament to the countless women whose contributions have been obscured by the shadows of time. The triangular form of the table itself is a potent emblem, a geometric homage to the divine feminine and a defiant rejection of patriarchal norms. Every element, from the delicate stitchwork of the runners to the evocative imagery on the plates, is a celebration of female identity, sexuality, and power. The artwork stands as a monument to the marginalized and forgotten, a visual symphony that challenges conventional narratives and invites us to reimagine the historical tapestry. It is a testament to the enduring spirit of women, a tribute to their resilience, creativity, and unwavering strength.

(ii) focus on symbolism with less visual details and open to style interpretation - Create a photo of a monumental art installation representation of a triangular table, a symbolic gathering place where the accomplishments and resilience of women throughout history are celebrated. Each place setting is a unique expression, a testament to the individual stories and contributions of these remarkable figures. Utilize vibrant colors, intricate patterns, and evocative imagery to evoke a sense of empowerment, unity, and the reclamation of feminine power. The artwork should challenge traditional notions of history and representation, highlighting the often-overlooked achievements of women in various fields. Incorporate symbolic elements that speak to the universal themes of transformation, liberation, and the enduring spirit of the feminine. Consider the table itself as a powerful symbol of equality and community, a space where diverse voices converge and create a tapestry of shared experience.

University of the Peloponnese

Experiments in the Relationship between Art History and Text-to-Image Models

The Arnolfini Portrait (1434) by Jan van Eyck

Impression, Sunrise (1872) by Claude Monet

Autumn Rhythm (Number 30) (1950) by Jackson Pollock

Winged Victory of Samothrace (c. 190 BCE)

The Dinner Party (1974-1979) by Judy Chicago