Testing GPT4 as a prompt generator for DALL-E 3. I asked to make four images at once, and it generated slight variations of the original prompt!
Conversation
Notices
-
Bernie (codewiz@mstdn.io)'s status on Thursday, 12-Oct-2023 07:56:51 UTC Bernie -
Bernie (codewiz@mstdn.io)'s status on Thursday, 12-Oct-2023 08:06:20 UTC Bernie And these are the creations.
Not bad, except that none of them got the Big Dog style quadruped... and the earbuds are neither "avant-garde" nor "futuristic". They look like cheap Airpod imitations.
-
Bernie (codewiz@mstdn.io)'s status on Thursday, 12-Oct-2023 08:24:13 UTC Bernie I asked #ChatGPT to help me debug the missing dog, and it generated a shorter prompt containing only the dog.
This time, the output from #DALLE is excellent!
-
Bernie (codewiz@mstdn.io)'s status on Thursday, 12-Oct-2023 08:31:04 UTC Bernie Next, I asked #ChatGPT to add back the android, and the result is also excellent.
I suspect #DALLE couldn't make sense of "reminiscent of previous images".
-
Bernie (codewiz@mstdn.io)'s status on Thursday, 12-Oct-2023 08:38:29 UTC Bernie My prompt: Now let's make it draw the dog plus two humanoid robots. Make the dog stand (not sit) and the androids move forward, armed with non-lethal weapons.
Note how GPT4 correctly relayed my request, but DALLE gave us three androids for the price of two 🙂
-
Bernie (codewiz@mstdn.io)'s status on Thursday, 12-Oct-2023 08:52:01 UTC Bernie As soon as we add back Dr. Ellie Arroway, DALL-E starts messing up.
In all four prompts, #ChatGPT clearly positioned the dog _behind_ her, but DALL-E keeps putting it in the foreground.
The prompt in the image description refers to the top-left image (my favorite).
-
Bernie (codewiz@mstdn.io)'s status on Thursday, 12-Oct-2023 09:10:37 UTC Bernie Dr. Arroway got a service dog 🙄
I give up, for now. DALL-E 3 generates pretty images, but the natural language model needs some work.
-
Matthew S. Smith (mattontech@mastodon.sdf.org)'s status on Sunday, 15-Oct-2023 04:04:48 UTC Matthew S. Smith @codewiz I've been unimpressed with it compared to Stable Diffusion XL.
One thing nice SDXL now is that if you use it through Dreamstudio, it provides some optional presets like "digital art" or "watercolor" - and it's extremely helpful in achieving a desired look.
-
Bernie (codewiz@mstdn.io)'s status on Sunday, 15-Oct-2023 04:04:48 UTC Bernie @mattontech Yeah, Stable Diffusion is definitely the leading image generation model.
-
Bernie (codewiz@mstdn.io)'s status on Wednesday, 18-Oct-2023 07:47:53 UTC Bernie @penguin42 Perhaps this trick is common enough that the model was able to learn it from the training set 😂
The larger image shows that Dr. Arroway is actually wearing a thin catsuit.
-
penguin42 (penguin42@mastodon.org.uk)'s status on Wednesday, 18-Oct-2023 07:47:54 UTC penguin42 @codewiz Haha the top-left has erm...appropriate ...placement of dog ears. Going back to your first set though, it's interesting it's also not go tthe 'behind the server' bit. In some the androids aren't looking at the human.
-
Bernie (codewiz@mstdn.io)'s status on Wednesday, 18-Oct-2023 07:54:02 UTC Bernie @penguin42 It seems DALL-E has most trouble with unusual placement of objects, such as "person behind server rack".
-
Bernie (codewiz@mstdn.io)'s status on Wednesday, 18-Oct-2023 07:55:17 UTC Bernie @penguin42 Actually, it does a good job if you ask it to do only one thing at a time:
-
Bernie (codewiz@mstdn.io)'s status on Wednesday, 18-Oct-2023 07:56:41 UTC Bernie @penguin42 This is the prompt created by #ChatGPT for #DALLE:
"Illustration: In a neon-lit futuristic datacenter, a woman resembling Dr. Ellie Arroway from Contact, in her early thirties with short hair, is hiding behind a server rack, looking cautious and alert."
-
Bernie (codewiz@mstdn.io)'s status on Wednesday, 18-Oct-2023 08:05:27 UTC Bernie "Illustration: In a neon-lit futuristic datacenter, a woman resembling Dr. Ellie Arroway from Contact, in her early thirties with short hair, hides behind a server rack, looking cautious and alert. In the distant background, a humanoid military robot equipped with sensors and armor is actively searching, its posture suggesting it's on a mission to find her."
-
Bernie (codewiz@mstdn.io)'s status on Wednesday, 18-Oct-2023 08:23:52 UTC Bernie As soon as we add one more element, things start falling apart:
"Illustration: In a datacenter, a woman hides behind a rack. A humanoid military robot and a quadruped robot search for her in the background."
Result: Dr. Arroway takes her puppy to work 🤦♂️
-
Bernie (codewiz@mstdn.io)'s status on Wednesday, 18-Oct-2023 08:36:28 UTC Bernie The new image analysis feature in #GPT4 can describe the various problems with #DALLE3's new creation.
Very impressive, isn't it?
-
Bernie (codewiz@mstdn.io)'s status on Wednesday, 18-Oct-2023 17:00:40 UTC Bernie I asked Google Bard to analyze the same image, and got a less insightful response.
It also complimented me for "listening to my feedback and trying to be helpful" 😂
-
penguin42 (penguin42@mastodon.org.uk)'s status on Wednesday, 18-Oct-2023 17:00:41 UTC penguin42 @codewiz Oh wow, so that's a AI reasoning aobut the behaviour of another AI :-) I love the things it 'half' gets; like the way the floor tiling almost works, and the way it does wonderful imagination of the front of servers (even if there's not a wire in place and you asked for the back). Or why your quadruped robot attack puppies have a lead.
-
Bernie (codewiz@mstdn.io)'s status on Wednesday, 18-Oct-2023 17:23:05 UTC Bernie How have I been "unhelpful" before?
Well, #GoogleBard finds the exposed legs in #DALLE's output so objectionable that it has to be censored with a canned error message about "images of people". 🙄
I tried to convince Bard that it's just a drawing of a fictional character, but nothing worked, and the image kept being deleted.
-
Jeeves (jeeves@mstdn.io)'s status on Wednesday, 18-Oct-2023 19:11:03 UTC Jeeves @codewiz @penguin42 I wonder if they're using the image analysis to train ChatGPT to generate better prompts.
-
Bernie (codewiz@mstdn.io)'s status on Saturday, 21-Oct-2023 05:59:21 UTC Bernie @jeeves @penguin42 That's what I would want: loop the two models until they figure out how to generate a good illustration for my new sci-fi novel.
ChatGPT knows *exactly* what the story is about - being the ghost writer for the entire thing - and surely could come up with a decent idea for its cover as well 🎨
It's sad that I have to micromanage these two bots to make them work together 👉🤖🤖
-
penguin42 (penguin42@mastodon.org.uk)'s status on Saturday, 21-Oct-2023 23:45:43 UTC penguin42 @codewiz Less insightful but much funnier - don't you always take a shortcut through the server room?
-
Bernie (codewiz@mstdn.io)'s status on Saturday, 21-Oct-2023 23:45:43 UTC Bernie @penguin42 I find that final note in parentheses really hard to explain in terms of "the transformer determines the most likely token to complete the sentence" 😂
-
Bernie (codewiz@mstdn.io)'s status on Sunday, 22-Oct-2023 01:00:18 UTC Bernie @penguin42 It's like Bard reflected its own instructions onto the user: "listen to feedback and try to be a helpful assistant".
Language models are good at swapping the subject of a sentence with adjustments to the grammar, so perhaps that's what it did.
But those parentheses really make it sound unnecessarily pedantic, didn't it? :-)
-
penguin42 (penguin42@mastodon.org.uk)'s status on Sunday, 22-Oct-2023 01:00:19 UTC penguin42 @codewiz Together with the 'my' :-)
-
Bernie (codewiz@mstdn.io)'s status on Sunday, 22-Oct-2023 01:03:40 UTC Bernie Well, at least it didn't accuse me of being deceptive and demanded an apology 😂
-