Supplementary Material for "Evolving Three Dimension (3D) Abstract Art: Fitting Concepts by Language"

Colab Notebook for code: [Open in Colab] [Download]

Author: Yingtao Tian

Abstract: Computational creativity has contributed heavily to abstract art in modern era, allowing artists to create high quality, abstract two dimension (2D) arts with a high level of controllability and expressibility. However, even with computational approaches that have promising result in making concrete 3D art, computationally addressing abstract 3D art with high-quality and controllability remains an open question. To fill this gap, we propose to explore computational creativity in making abstract 3D art by bridging evolution strategies (ES) and 3D rendering through customizable parameterization of scenes. We demonstrate that our approach is capable of placing semi-transparent triangles in 3D scenes that, when viewed from specified angles, render into films that look like artists' specification expressed in natural language. This provides a new way for the artist to easily express creativity ideas for abstract 3D art.

Table of Content

Evaluation 1: Our method places semi-transparent triangles in three dimension (3D) spaces using Evolution Strategies.
Evaluation 2: Our method is capable of making a 3D art following the spatial abstract art style, that looks like what humans can compose in natural language text.
(There is no Evaluation 3, to keep the numbering of evalutions in line with Figures in paper.)
Evaluation 4: Our method leverages the budgets of triangles in the increasing order of granularity, by first using triangles for general shape and then moving towards fine-grained details.
Evaluation 5: In our method, different runs lead to equally plausible yet largely different 3D art.
Evaluation 6: In our method, the fixed transparency setting allows more global control of the scene, and the learnable one provides great flexibility in how triangles are related to the space.
Evaluation 7: Our method produces one 3D art, and successfully allows it to look differently from different angles.

Evaluation 1: Our proposed method places semi-transparent triangles in three dimension (3D) spaces using Evolution Strategies.

This corresponds to the Figure 1 in the paper: Our proposed method places semi-transparent triangles in three dimension (3D) spaces using Evolution Strategies. Leveraging ray-tracing based rendering Mitsuba 3, the rendered film at possibly multiple cameras is compared with its corresponding, user-specified text prompt using distance between their representation embedded by CLIP. Such distances, aggregate by average, are used as the fitness in sense of Evolution Strategies, which optimize the parameters of triangles to archive better finesses.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
The prompt for all camera/films is Ancient Roman painting, Fourth Style, Third Style, second Style, Pompeii.

Evaluation 2: Our method is capable of making a 3D art following the spatial abstract art style, that looks like what humans can compose in natural language text.

This corresponds to the Figure 2 in the paper: Several examples of the evolved 3D art produced by our method, where the evolution process places triangles inside the unit cube space visualized by black frame and sets triangles' colors and transparencies, forming a spatial configuration. In each example shown here, four cameras look at the space from four sides, although this is an arbitrary decision and cameras can have different numbers and directions. The film from each camera, capturing the rendered images, is compared with the prompt. It could be observed that our method is capable of making a 3D art, which follows the spatial abstract art style, that looks like what humans can compose in natural language text.

Result Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
The prompt for all camera/films is Walt Disney World

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
The prompt for all camera/films is A painting of Human

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
The prompt for all camera/films is A bright, vibrant, dynamic, spirited, vivid painting of a dog.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
The prompt for all camera/films is A vivid, colorful bird'

Evaluation 4: Our method leverages the budgets of triangles in the increasing order of granularity, by first using triangles for general shape and then moving towards fine-grained details.

This corresponds to the Figure 4 in the paper: Our method generating with text prompts "Walt Disney World" with four cameras, with different numbers of triangles, namely 10, 25, 50 and 100 respectively. It could be shown that our method leverages the budgets of triangles in the increasing order of granularity, by first using triangles for general shape and then moving towards fine-grained details.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Walt Disney World -- 10 Triangles.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Walt Disney World -- 25 Triangles.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Walt Disney World -- 50 Triangles.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Walt Disney World -- 100 Triangles.

Evaluation 5: In our method, different runs lead to equally plausible yet largely different 3D art.

This corresponds to and extends the Figure 5 in the paper: Our method generating two configurations, each with two independent runs. The first configuration is with text prompts "A bright, vibrant, dynamic, spirited, vivid painting of a dog" from four cameras, while the second configuration "Walt Disney World" . Different runs lead to equally plausible yet largely different 3D art. An artist user could thus be "in-the-loop" by choosing from different variants from these runs.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
A bright, vibrant, dynamic, spirited, vivid painting of a dog/span> -- First Run.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
A bright, vibrant, dynamic, spirited, vivid painting of a dog -- Second Run.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Walt Disney World -- First Run.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Walt Disney World -- Second Run.

Evaluation 6: In our method, the fixed transparency setting allows more global control of the scene, and the learnable one provides great flexibility in how triangles are related to the space.

This corresponds to the Figure 6 in the paper: Our method generating with text prompts "Walt Disney World", with four settings of transparency. We show here three fixed transparency of 0% 50% and 80% compared with the default setting of learnable transparency. While the fixed transparency setting allows more global control of the scene, the learnable one provides great flexibility in how triangles are related to the space.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Fixed transparency of 0%.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Fixed transparency of 50%.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Fixed transparency of 80%.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Learnable transparency.

Evaluation 7: Our method produces one 3D art, and successfully allows it to look differently from different angles.

This corresponds to the Figure 7 in the paper: Our method generating with different text prompts at cameras. The text prompt for camera 1 and 3 is "Walt Disney World" and for camera 2 and 4 is "an annoyed cat". Our method produces one 3D art, and successfully allows it to look differently from different angles.

Result 3D Model (Hover to Enlarge)
Rendered Film
Prompts	Prompt/Film 1	Prompt/Film 2	Prompt/Film 3	Prompt/Film 4
Prompt 1 and Prompt 3: Walt Disney World Prompt 2 and Prompt 4: an annoyed cat