Specification and AI

I’ve been looking for a grounded way to reason about the limits and potential of the new era of AI technology. Is it mostly a fun toy, or will future advances put most people out of a job (or somewhere in between)?

I take inspiration from a fun computer science activity where I pretend to be a robot trying to cross a crowded classroom, and a group of kids takes turns instructing me to take a step forward, back, left, or right. Inevitably, one of their instructions won’t quite line up and a step will send me crashing into a desk (which is also part of the fun).

The takeaway is that computers do exactly what you tell them to do, not necessarily what you want them to do. In other words, the core problem is specification: how to translate the needs and goals in your head into instructions that a computer can follow.

AI tools clearly raise the level at which you can communicate: it is now plausible to use higher-level concepts like “walk across the classroom while avoiding desks.” But no matter how smart, an AI still can’t read your mind. It might know what you’ve done in the past, and what other people have done. But it doesn’t know what you want to do today unless you can describe it.

In other words, the extent to which AI tools can automate a task depends on how complicated it is to specify what you want.

Since I’m a software developer, let’s imagine a future intelligent assistant that might take my job by being able to fulfill a request like “build a great weather app”. Will such a tool ever come to exist?

What makes a weather app great? There’s no definitive answer — rather it’s a question of what you happen to want, today. How much do you care about the rain vs. wind vs. clouds? How much do you care about today’s conditions vs. tomorrow and next week? How much detail do you want to see? How much time are you willing to wait for data to load? How much will it cost? You’ll have to tell the imagined AI assistant about all the things you care about and don’t care about in order for it to make an app that’s great for you. That might still require a lot of work from you, the human.

Consider all the time people spend in meetings trying to get everyone on the same page about how, exactly, to best move forward. I don’t see how AI technology would remove the need for this. If you want to take everyone’s goals into account, you’ll still need to spend a lot of time talking it all through with the AI. If you skip that step and ask the AI to make decisions, you’ll only be getting a cultural average and/or a roll of the dice. That might be good enough in some cases, but it’s certainly not equivalent.

On the other hand, when requests are relatively simple and your goals are relatively universal, AI is likely to be transformative.

Either way, the limit of automation is the complexity of specifying what you want.

AI Fashion

As a way to experiment with recent generative AI tools, I challenged myself to design a piece of clothing for each color of the rainbow. The results are a sort of fashion line with a theme of bold, angular patterns.

I experimented with a variety of tools and approaches, but all of the above were generated using free tools based on Stable Diffusion XL: either the macOS app Draw Things or the open source project Fooocus. I also used Pixelmator Pro in a few cases to fix small issues with faces, hands, and clothing via more classic photo editing techniques.

Each image was selected from around 5 to 50 alternatives, each of which took between 1 to 6 minutes for the system to generate (depending on hardware and settings). So the gallery above represents at least 10 hours of total compute time.

In some cases, I needed to iterate repeatedly on the prompt text, adding or emphasizing terms to guide the system towards the balance of elements that I wanted. In other cases, I just needed to let the model produce more images (with the same prompt) before I found one that was close enough to my vision. In a few cases, I used a promising output image as the input for a second round of generation in order to more precisely specify the scene, outfit, and pose.

It’s impressive to see how realistic these tools are getting, though they certainly have limits. If you specify too many details, some will be ignored, especially if they do not commonly co-occur. I also started to get a feel for the limits and biases of the training data, as evidenced by how much weight I needed to give different words before the generated images would actually start to reflect their meaning.

It’s also clear that the model does not have a deep understanding of physics or anatomy. AI-generated images famously struggle with hands, sometimes using too many fingers or fusing them together. It also often failed to depict mechanical objects with realistic structure — I more or less gave up trying to generate bicycles, barbells, and drum sticks.

Overall, the experience of generating the fashion gallery felt less like automation and more like a new form of photography. Rather than having to buy gear, hire a model, sew an outfit, and travel to a location, you can describe all those things in words and virtually take the photo. But you still need the artistic vision to come up with a concept, as well as the editorial discretion to discard the vast majority of images — which is also the case in traditional photography.

Last, it was interesting to notice that the process of adjusting prompts and inspecting results was not so different than trying to communicate with another person. You’re never sure exactly how your words will be interpreted, and you sometimes need to iterate for a while to come to a shared understanding of “less like that” and “more like this”.

Slow Software

The “slow food” movement encourages people to take the time to cook and savor meals made with love. It emphasizes care rather than the efficiency and utilitarianism of “fast food”.

What does the software version of that look like?

The typical software application is a product of a business trying to maximize profit and efficiency. There is a constant push to release new features and upgrades as quickly as possible, at the lowest cost possible. “Quality” is defined as the minimum possible bar that is still acceptable to paying customers. It is designed with the same priorities as “fast food”.

Of course, “slow” in the software world usually refers to annoyingly unresponsive user experiences. But here I mean “slow” in the sense that the software itself was designed and built slowly and with care, the same way that a meal can be prepared and eaten slowly and with care. Such software is likely to actually be more responsive because its architecture has been more carefully honed.

Part of my design process involves doing a lot of research and prototyping and then… doing nothing for a while. I call this the “percolation” phase of design. The ideas percolate in my head, mostly subconsciously, as I go about my life. And surprisingly often, a much better solution than any I’ve previously thought of will randomly pop into my head days, weeks, or even years later!

If you’re on a treadmill of designing and building as fast as possible, there is simply not time to arrive at these better solutions.

Somatic

“It’s like a worldwide itch, like all of my limbs are phantom limbs, like I swallowed a live animal much larger than me that promptly died. And now I have to walk around with it inside of me, soul and all.”

“It’s like I was physically punched in the solar plexus with something electric that sent a shock up through my face and head.”

-Sean Cole (This American Life #806)

Magic

“My definition of magic: competence so much more advanced than yours with such alien mental models that you cannot predict the outcomes of the model at all.”

“Magicians are wearing not just better, but fundamentally differently shaped lenses to the rest of us. And regardless of your skills and experience, it is likely that you are a magician to someone else.”

-Jessica Watson Miller, “Becoming a magician

Responsible technology

“As we move into the age when technology will obliterate and ‘eat the world’ so much faster than our responsibilities can catch up, it’s no longer ok [for technologists] to say it’s someone else’s responsibility to define what responsibility means.”

-Aza Raskin (“The Three Rules of Humane Tech” podcast episode)

Software Design

Software design is not the same as user interface design. … If a user interface is designed after the fact, that is like designing an automobile’s dashboard after the engine, chassis, and all other components and functions are specified. …

Many people who think of themselves as working on the design of software simply lack the technical grounding to be an effective participant in the overall process. Naturally, programmers quickly lose respect for people who fail to understand fundamental technical issues. [But] the technical demands of writing the code are often so strenuous that the programmer can lose perspective on the larger issues affecting the design of the product. …

Software designers should be trained more like architects than like computer scientists. Software designers should be technically very well grounded without being measured by their ability to write production-quality code. … Software design needs to be recognized as a profession in its own right, a disciplinary peer to computer science and software engineering.

Mitchell Kapor (1991), via Daniel Jackson

Scarcity as a fiction

“People who see the glass as ‘half-empty’ [are] wedded to a fiction, for ’emptiness’ and ‘lack’ [are] abstractions of the mind, whereas ‘half-full’ is a measure of the physical reality under discussion. The so-called optimist, then, is the only one attending to real things, the only one describing a substance that is actually in the glass.”

-Zander & Zander, The Art of Possibility (p.109)

Design by thought experiment

It is infeasible to find a design by exploring a space of options and testing each option empirically. It simply costs too much to implement more than one or two options. …

The long established approach to evaluating designs involves expert critique. The designer… evaluates the proposed design on the basis of prior experience and deep knowledge of the domain. She anticipates the reaction of users based on the accumulated evidence of how similar users have responded to similar designs in the past. She predicts potential problems by mentally simulating the design, imagining its use in various scenarios, and subjecting the design to known corner cases. …

By critiquing ideas as they arise, a skilled designer can navigate a path from a vague intuition to a concrete design proposal, [and only then] does it make sense to expose it to user testing.

From the footnotes of The Essence of Software (Jackson 2021), p.218-219

The same is true even in science, where the decisions of which questions to ask and which hypotheses to test play an outsized, often underrated role. The use of “thought experiments”, past experience, domain knowledge, and logic is necessary in order to narrow down the vast space of possibilities.

Boredom as integrity

“I don’t know why adults were so freaked out about kids being bored. Boredom is where I get ideas about what I might want to do or create next! When I’m too busy to be bored, I just start forgetting who I am.”

-Allyson Dinneen (Notes From Your Therapist)