Mar 22, 2024

Consistent image generation with Microsoft Copilot

I have found the DALL-E 3 image generation in Microsoft Copilot to be pretty fantastic. I’m amazed at how simply I can go from idea to an image that meets the need (which typically falls somewhere between what I’d hack out in paint.net and a free stock image).

As easy as it is, I’ve felt like my results were sometimes all over the place. In fact, sometimes I wouldn’t get back anything close to what I had imagined in my head. I also found that I had a hard time getting similar images when trying (what I thought were) similar prompts.

I decided to commit a little time every day with the intent of getting better at the skill of doing image prompting. I learned a lot.

Prompting for Consistent Results

There’s a very simple, yet powerful, AI Art Prompting Guide from Microsoft which has a few things I liked. One is that it gives clues as to what the copilot is looking for in a prompt, including some suggestions for prompt structure and refinement of prompts. The guide also has some swell visual examples of art styles.

In addition, I tested out some of the suggestions found in guides for Midjourney and Stable Diffusion. This yielded mixed results, but it did help me focus in on what was important and what made a good prompt.

An interesting realization is that when I would do an image prompt through the Microsoft Copilot https://copilot.microsoft.com (even when using the Designer GPT), I could see that sometimes key parts of my prompt were being dropped. I think the copilot was trying to “help”…but unfortunately it was dropping key parts of my prompt. So now I’m using the Image Creator from Microsoft Designer, which is mostly the same, but isn’t messing with my prompts (at least not as overtly as the Microsoft Copilot.

Building a Prompt Generator

After working with this prompt template for a while, I decided to make a simple prompt generator.

My basic idea was to use a fill-in-whichever-blanks approach to generate my desired prompt. Pick an art style (by selecting an image that has the right vibe), answer a couple of questions, then copy the generated prompt.

This is what the prompt builder looks like

To assemble this, I used a few ingredients:

Images I’d created - These would make for a nice “visual” selector of an image style. Also, since I had already used the resources to generate images as part of my experiments, it was nice to put them to use.
The prompt template - Granted, the core structure was suggested by the MS documentation, but a prompt generator site would make for a great structured means of assembling prompts.
GitHub Copilot - Building the site is a great excuse to use GH Copilot

For a couple of hours work, I think this came together great. I also used the opportunity to stretch some Flask skills, which would allow me to add more art styles in the future by updating a database instead of manually updating code for the website.

Final thoughts

If you’re curious, and you haven’t perused the prompt builder website, the algorithm I’m using for the prompt is to include a combination of [1 primary image style] [2 description of the subject of the image] [3 scene or setting] [4 additional elements] [5 additional image style details]. I find that getting comfortable with those elements has been a great way to improve my image prompting.

Let me know if you have additional suggestions or resources to continue getting better!