AI is capable of creativity

I’m surprised how often I hear well-informed people argue that modern AI models are still “just repeating things that are in the training data.” This is simply not true. Large language models routinely solve math problems they have never seen, write poems that have never been written, and program software algorithms that are not in the training set.

Of course, these outputs are similar to things seen in the training data — but in the same way that humans mostly only solve math problems that are similar to ones we have seen, write poems that only use familiar elements of language, and write computer programs based on strategies we learned from other programs.

This is not to say that AI models have reached or exceeded the limit of what humans can do. For instance, I am not aware of any AI models that have invented entirely new fields of research. Indeed, AI models are not yet competitive with most (if not all) experienced professionals. But in terms of everyday creativity — which involves copying, combining, and transforming known ideas — current AI models are quite capable.

Part of the confusion may come from the need to “un-bundle creativity” as I wrote about previously. We may not be used to viewing creativity on a spectrum from “somewhat creative” to “as creative as an experienced professional.”

Another reason for misunderstanding may simply be our natural resistance to the idea that some of the things that used to be uniquely human are no longer so. (Though some animal species also exhibit creative problem solving.)

We might also see generic or cliché outputs and mistakenly attribute them to a lack of creativity. In reality, these generic responses come from models that are specifically trained to be neutral and multi-purpose. By default, popular systems do not veer far off the beaten path — for the same reason that most corporations do not hire erratic geniuses as spokespeople. They are still capable of creativity if prompted.

Finally, we might unnecessarily conflate creativity with agency. Part of being an artist is being moved — knowing what you want to create. Chatbots are designed to be assistants, only responding when prompted, so they do not have this type of intrinsic agency. A human needs to specify the goal and the constraints. But this still leaves plenty of room for the AI model to create novel solutions.

If the definition of creativity requires capabilities to be uniquely human, then the word is useless in discussions about AI. If the definition requires equivalence to what humans can do, then the word is useless until (and if) we reach that point. To meaningfully discuss the impact of the technology now, we need to acknowledge the spectrum of creativity and the AI models’ very real capabilities for creative problem solving and artistic expression.

Un-bundling intelligence

Something I hear a lot in debates about AI are variations of: “sure, this chatbot can [do online research, tutor you in chemistry, search for drug candidates, …], but it’s not really intelligent.”

A very similar sentiment was common in the 1960s and ’70s when electronic computers were becoming widespread. “Sure, it can solve thousands of equations in one second, but it’s not really intelligent.”

We would have previously said such a performance would make a person extraordinarily intelligent, but we needed to un-bundle this capability of super speedy calculation from “intelligence” so that the word could keep its everyday meaning as “what only humans can do”. The field of artificial intelligence has thus been getting the “intelligence” rug pulled out from under it for decades, as we discovered how to make computers ever smarter.

If “intelligence” is defined as mental abilities that only humans have, then saying that a chatbot is “not really intelligent” is a tautology — one equals one. We figured out how to make a computer do it and thus it no longer fits in this definition of “intelligent”. It’s an utterly boring statement that doesn’t tell us anything about the more impactful questions of how this technology will affect the world.

In order to have more meaningful conversations about the new capabilities of AI systems, we need to get more comfortable with the un-bundling of intelligence and stop getting distracted by words whose meanings have become ambiguous in the computer age.

Landscape photography store

One of my photographs was recently featured in the Seattle Times. Someone asked if they could purchase prints, so I went ahead and created a storefront where anyone can buy art prints and other products displaying my landscape photographs.

Robin Stewart Photography Studio

It’s just a hobby for me, but I keep an eye out for beautiful scenes, and every now and then I get lucky. If you make a purchase, most of the money goes to the print shop and I will get a small commission.

Definition of spirituality

“Spirituality is recognizing and celebrating that we are all inextricably connected to each other by a power greater than all of us, and that our connection to that power and to one another is grounded in love and compassion. Practicing spirituality brings a sense of perspective, meaning, and purpose to our lives.”

-Brené Brown (source)

Autonomous vehicle timeline

Every time I see a news update on self-driving cars, I wonder, “when are autonomous vehicles actually going to become mainstream?”

To try to answer that question — and to provide another example of putting numbers in context — I performed the following analysis.

There are many definitions of “autonomous”, but for this analysis I’m going to focus on “Level 4” or above, meaning the driver is not required to pay attention to the road. This does not include current Tesla vehicles, nor most other commercially available driver-assist systems.

Waymo is by far the leading company in the US right now as measured by level 4 (or above) autonomous miles driven. They are frequently in the tech news and have a conspicuous presence on the streets of San Francisco. Waymo’s press releases have become more indirect over time, but my analysis estimates that their autonomous vehicle fleet drove about 27 million miles in 2024.

That’s just another “big number”, so let’s put it in context. According to the Federal Highway Administration, in 2024 the total distance driven by all US drivers was 3.2 trillion miles. If we conservatively assume that Waymo makes up 1/4 of all autonomous miles today, that implies that only about 1 out of every 30,000 miles driven was autonomous.

If we assume that growth proceeds exponentially (for example, doubling every year), how long will it take for autonomous vehicles to become mainstream?

I plotted the numbers. Note that the vertical axis is logarithmic! If it weren’t, all of the data points in the lower-left would be smooshed along the bottom of the graph. Straight lines on a logarithmic scale signify exponential growth.

Waymo’s autonomous mileage approximately tripled each year between 2014 and 2019 (from 100,000 to about 8 million). If we use that optimistic growth rate, autonomous mileage could approach ubiquity around 2034. If we instead use Waymo’s average 75% yearly growth rate from 2014 to 2024, we approach ubiquity around 2044.

In other words, even if everything goes extremely well for the autonomous car industry, it will probably take at least another decade before driverless vehicles become mainstream.

This is largely determined simply by the massive scale of the auto industry. Waymo’s website states, “We have over 40 million miles of real-world driving experience — that’s enough to drive to the Moon and back 80 times.” Then again, Americans as a whole drove to the moon and back 64 million times in 2024 alone.

Obnoxious stage

[When] we become aware of the high costs of assuming responsibility for others’ feelings and trying to accommodate them at our own expense… we may get angry. I refer jokingly to this stage as the obnoxious stage because we tend toward obnoxious comments like, “That’s your problem! I’m not responsible for your feelings!” when presented with another person’s pain. We are clear what we are not responsible for, but have yet to learn how to be responsible to others in a way that is not emotionally enslaving. […]

At the third stage, emotional liberation, we respond to the needs of others out of compassion, never out of fear, guilt, or shame. Our actions are therefore fulfilling to us, as well as to those who receive our efforts. […] At this stage, we are aware that we can never meet our own needs at the expense of others. Emotional liberation involves stating clearly what we need in a way that communicates we are equally concerned that the needs of others be fulfilled.

Marshall Rosenberg, Nonviolent Communication (3rd ed.) p. 59-60

Seeing what’s missing

“[A need] is a perceived lack, something that is missing. Needfinding is thus a paradoxical activity—what is sought is a circumstance where something is missing. In order to find and articulate a need, this missing thing must be seen and recognized by someone.”

-Rolf Faste (via The Essence of Software p. 249)

Specification and AI

I’ve been looking for a grounded way to reason about the limits and potential of the new era of AI technology. Is it mostly a fun toy, or will future advances put most people out of a job (or somewhere in between)?

I take inspiration from a fun computer science activity where I pretend to be a robot trying to cross a crowded classroom, and a group of kids takes turns instructing me to take a step forward, back, left, or right. Inevitably, one of their instructions won’t quite line up and a step will send me crashing into a desk (which is also part of the fun).

The takeaway is that computers do exactly what you tell them to do, not necessarily what you want them to do. In other words, the core problem is specification: how to translate the needs and goals in your head into instructions that a computer can follow.

AI tools clearly raise the level at which you can communicate: it is now plausible to use higher-level concepts like “walk across the classroom while avoiding desks.” But no matter how smart, an AI still can’t read your mind. It might know what you’ve done in the past, and what other people have done. But it doesn’t know what you want to do today unless you can describe it.

In other words, the extent to which AI tools can automate a task depends on how complicated it is to specify what you want.

Since I’m a software developer, let’s imagine a future intelligent assistant that might take my job by being able to fulfill a request like “build a great weather app”. Will such a tool ever come to exist?

What makes a weather app great? There’s no definitive answer — rather it’s a question of what you happen to want, today. How much do you care about the rain vs. wind vs. clouds? How much do you care about today’s conditions vs. tomorrow and next week? How much detail do you want to see? How much time are you willing to wait for data to load? How much will it cost? You’ll have to tell the imagined AI assistant about all the things you care about and don’t care about in order for it to make an app that’s great for you. That might still require a lot of work from you, the human.

Consider all the time people spend in meetings trying to get everyone on the same page about how, exactly, to best move forward. I don’t see how AI technology would remove the need for this. If you want to take everyone’s goals into account, you’ll still need to spend a lot of time talking it all through with the AI. If you skip that step and ask the AI to make decisions, you’ll only be getting a cultural average and/or a roll of the dice. That might be good enough in some cases, but it’s certainly not equivalent.

On the other hand, when requests are relatively simple and your goals are relatively universal, AI is likely to be transformative.

Either way, the limit of automation is the complexity of specifying what you want.

AI Fashion

As a way to experiment with recent generative AI tools, I challenged myself to design a piece of clothing for each color of the rainbow. The results are a sort of fashion line with a theme of bold, angular patterns.

I experimented with a variety of tools and approaches, but all of the above were generated using free tools based on Stable Diffusion XL: either the macOS app Draw Things or the open source project Fooocus. I also used Pixelmator Pro in a few cases to fix small issues with faces, hands, and clothing via more classic photo editing techniques.

Each image was selected from around 5 to 50 alternatives, each of which took between 1 to 6 minutes for the system to generate (depending on hardware and settings). So the gallery above represents at least 10 hours of total compute time.

In some cases, I needed to iterate repeatedly on the prompt text, adding or emphasizing terms to guide the system towards the balance of elements that I wanted. In other cases, I just needed to let the model produce more images (with the same prompt) before I found one that was close enough to my vision. In a few cases, I used a promising output image as the input for a second round of generation in order to more precisely specify the scene, outfit, and pose.

It’s impressive to see how realistic these tools are getting, though they certainly have limits. If you specify too many details, some will be ignored, especially if they do not commonly co-occur. I also started to get a feel for the limits and biases of the training data, as evidenced by how much weight I needed to give different words before the generated images would actually start to reflect their meaning.

It’s also clear that the model does not have a deep understanding of physics or anatomy. AI-generated images famously struggle with hands, sometimes using too many fingers or fusing them together. It also often failed to depict mechanical objects with realistic structure — I more or less gave up trying to generate bicycles, barbells, and drum sticks.

Overall, the experience of generating the fashion gallery felt less like automation and more like a new form of photography. Rather than having to buy gear, hire a model, sew an outfit, and travel to a location, you can describe all those things in words and virtually take the photo. But you still need the artistic vision to come up with a concept, as well as the editorial discretion to discard the vast majority of images — which is also the case in traditional photography.

Last, it was interesting to notice that the process of adjusting prompts and inspecting results was not so different than trying to communicate with another person. You’re never sure exactly how your words will be interpreted, and you sometimes need to iterate for a while to come to a shared understanding of “less like that” and “more like this”.