RECAP: Learning AI Chapter 1
I wanted to explore more about Open AI, ChatGPT, and DALL-E. I have been WORD HERE using ChatGPT since March of 2023. However, for this course, I bought a membership for GPT-4 as well as for DALL-E 2. I later learned GPT-4 also comes with access to DALL-E 3, but more on the differences of that later.
To aid me in my quest to learn more about these tools and their capabilities, I'm reading Learning AI with ChatGPT and Google Colab: The Easiest, Quickest Way to Start Learning AI by Dr. Kelsey.
I wanted to immediately learn more about DALL-E. As a marketer, I often have to create vector graphics and other visual content for work. The issue: I am not a graphic designer. I always end up using free stock images or buying vectors to make do. If DALL-E can fill in the gaps where my artistic ability falls short, I would be saving both time and money.
Here is what I learned.
Getting Different Outputs
One of the prompts the book suggests you try out to see what DALL-E can do is "Seasonal Image: "A cozy winter scene featuring a snow-covered cabin surrounded by pine trees, with a snowman in the front yard." I tried the same prompt on GPT-4 using DALL-E 3 and directly on DALL-E 2. The differences were big!
I then took to GPT-4 to ask why the outcomes were so different. The answer is, that this is a machine-learning algorithm! It learns after each use, user, prompt, and feedback. According to GPT-4,
"When you use the same prompt in different versions of DALL-E, such as DALL-E 2 versus DALL-E 3, you may get different outputs due to several factors:
- Model Architecture: Each version of DALL-E is built on a different model architecture, which influences how it interprets prompts and generates images.
- Training Data: The versions are likely trained on different datasets, and possibly also with different objectives or techniques, which affects the kind of imagery they produce.
- Learning and Improvement: As newer versions are developed, they incorporate feedback and learning from previous versions, often resulting in more sophisticated image generation.
- Randomness: There is an element of randomness in the image generation process. Even with the same version of DALL-E, the same prompt can yield different results on different occasions.
- Algorithm Updates: Newer versions may have updated algorithms that better understand context, style, and content, leading to different interpretations of the same prompt.
- User Feedback and Iteration: Developers may adjust the models based on user feedback to improve image relevance and quality, which can change the outputs over time.
- System Limitations and Enhancements: Each version may have different limitations or enhanced capabilities (like resolution, aspect ratio options, etc.), impacting the final image.
- Prompt Engineering: Over time, the understanding of how to craft prompts (prompt engineering) for AI image generation has evolved, and newer models might be better tuned to respond to more nuanced prompts.
Each of these aspects can cause two versions of DALL-E to produce different images even when given the same input."
Uploading Your Own Images
Something else I quickly learned is that DALL-E 2 and 3 cannot alter an image you upload to the algorithm. I tried making changes to a recent picture of my fiancé and me on our most recent vacation to Kauai, the film location of Jurassic Park. Hence the dinosaur. I asked DALL-E 2 to change my picture into a cartoon-style image. The results were more than unsatisfactory.
Of course, I then took to GPT-4 to ask why I could not alter my own photographs in DALL-E 2. Here is what GPT-4 had to say:
"DALL-E does not have the capacity to directly take your uploaded pictures and transform them into different styles. It's a generative AI model that creates images from textual descriptions. So while you cannot upload an image and have DALL-E modify it, you can provide DALL-E with a text description of what you want, and it can generate new images from scratch based on that description.
For instance, you could describe a character or scene in detail, and DALL-E could generate an image in a Disney style, or any other style you specify. However, the result wouldn't be a modified version of an existing photograph but a new image that reflects the elements and style described in the text."
For instances like this, I think I'm better off using Photoshop filters. I might try using some of the other AI tools out there for image editing like Midjourney.
Using Descriptive Text
Dr. Kelsey shared an interview with Wendy Aw, a copywriter turned AL/ML Engineer. In this interview, Aw mentions the advantage of using concise language to communicate with AI and yield better results. I decided to put this to the test. Since I could not upload images, I wanted to capture the likeness of my dogs using as much descriptive language as possible. It worked perfectly for one dog, but not so much for the other.
I decided to start with the baby. Enzo is an aussiedoodle puppy with beautiful blue eyes. I went into GPT-4 and using DALL-E 3 I input the prompt "create an oil painting portrait of a black and white aussiedoodle puppy with with blue eyes. white snout, black ears and eyes fur, white tripe on top of the head."
I think it got pretty close!
Next up was Matilda. She is a double-merle cockapoo with beautiful, starry blue eyes and the cutest underbite. I knew trying to explain her idiosyncrasies would be too difficult for AI, so I tried the basics. "Create an oil painting portrait of a white cockapoo dog with blue eyes. The left eye looks like it has black eyeliner. The right eye only has eyeliner on the top outer corner. She has two small pink spots on both nostrils. She has long ears." And multiple variations of the same. I kept making small changes, but I could not get a portrait that resembled Matilda as much as Enzo's AI portrait resembled him.
I thought I would use AI to my advantage. I uploaded the same image of Matilda into GPT-4 and asked to describe her in detail. "This is an image of a medium-sized dog with a curly, dense coat of white fur that appears soft to the touch. The dog has a distinctive, teddy-bear-like appearance, with a round face and expressive eyes. Its eyes are light-colored with a hint of blue, and one eye has a visible reddish-pink area... The dog's nose is black, with a slight sheen... and its nostrils are well-defined... The ears hang just below the level of its jaw, which is muscular and slightly protruding, typical of a dog that might be strong and active."
I tried inputting the above description into DALL-E 3, and I think it got worse.
I gave up after this. I did not attempt to create a portrait of my Shih Tzu.
There are still growing pains with AI and DALL-E specifically, but the future of what it can accomplish is an exciting one!
Comments
Post a Comment