> The pipeline (bottom) shows how diverse OpenImages inputs are edited
using Nano-Banana and quality-filtered by Gemini-2.5-Pro, with failed attempts automatically retried.
Pretty interesting. I run a fairly comprehensive image-comparison site for SOTA generative AI in text-to-image and editing. Managing it manually got pretty tiring, so a while back I put together a small program that takes a given starting prompt, a list of GenAI models, and a max number of retries which does something similar.
It generates and evaluates images using a separate multimodal AI, and then rewrites failed prompts automatically repeating up to a set limit.
It's not perfect (nine pointed star example in particular) - but often times the "recognition aspect of a multimodal model" is superior to its generative capabilities so you can run it in a sort of REPL until you get the desired outcome.
Recently I've found myself getting the evaluation simultaneously from to OpenAI gpt-5, Gemini 2.5 Pro, and Qwen3 VL to give it a kind of "voting system". Purely anecdotal but I do find that Gemini is the most consistent of the three.
Thanks! It's probably the same site. It used to only be a showdown of text-to-image models (Flux, Imagen, Midjourney, etc), but once there was a decent number of image-to-image models (Kontext, Seedream, Nano-Banana) I added a nav bar at the top so I could do similar comparisons for image editing.
The license is CC BY-NC-ND - I’m not sure who is going to be able to use it given the NC-ND part… especially given the potential uncertainty over what uses count as commercial and what counts as derivative works. OTOH, given the bulk of this dataset is AI outputs, its copyrightability is an open question.
Looks like the dataset is distilled from Gemini nano-banana
Definitely very useful, but I’m so curious how the original datasets from these image editing models were created. I’m guessing a lot of it is synthetic data to construct scenes programmatically with layers
I confess that I don't quite get the point here - is it just that they've paid the inference costs for a dataset than can be used for distillation/other research?
Essentially yes, it’s a data set that can help train or fine tune another model or similar research. From the site:
> Pico-Banana-400K serves as a versatile resource for advancing controllable and instruction-aware image editing.
Beyond single-step editing, the dataset enables multi-turn, conversational editing and reward-based training paradigms.
> What if I took a swim in a typical spent nuclear fuel pool? Would I need to dive to actually experience a fatal amount of radiation? How long could I stay safely at the surface?
> Assuming you’re a reasonably good swimmer, you could probably survive treading water anywhere from 10 to 40 hours. At that point, you would black out from fatigue and drown. This is also true for a pool without nuclear fuel in the bottom.
From the paper
> The pipeline (bottom) shows how diverse OpenImages inputs are edited using Nano-Banana and quality-filtered by Gemini-2.5-Pro, with failed attempts automatically retried.
Pretty interesting. I run a fairly comprehensive image-comparison site for SOTA generative AI in text-to-image and editing. Managing it manually got pretty tiring, so a while back I put together a small program that takes a given starting prompt, a list of GenAI models, and a max number of retries which does something similar.
It generates and evaluates images using a separate multimodal AI, and then rewrites failed prompts automatically repeating up to a set limit.
It's not perfect (nine pointed star example in particular) - but often times the "recognition aspect of a multimodal model" is superior to its generative capabilities so you can run it in a sort of REPL until you get the desired outcome.
https://genai-showdown.specr.net/image-editing
What do you use for evaluation? gemini-2.5-pro is at the top of MMLU and has been best for me but always looking for better.
Recently I've found myself getting the evaluation simultaneously from to OpenAI gpt-5, Gemini 2.5 Pro, and Qwen3 VL to give it a kind of "voting system". Purely anecdotal but I do find that Gemini is the most consistent of the three.
Interesting, I'll give voting a shot, thanks.
I love your site I stumble across it once a month it seems.
Or there's another very similar site. But I'm pretty sure it's yours
Thanks! It's probably the same site. It used to only be a showdown of text-to-image models (Flux, Imagen, Midjourney, etc), but once there was a decent number of image-to-image models (Kontext, Seedream, Nano-Banana) I added a nav bar at the top so I could do similar comparisons for image editing.
The license is CC BY-NC-ND - I’m not sure who is going to be able to use it given the NC-ND part… especially given the potential uncertainty over what uses count as commercial and what counts as derivative works. OTOH, given the bulk of this dataset is AI outputs, its copyrightability is an open question.
Looks like the dataset is distilled from Gemini nano-banana
Definitely very useful, but I’m so curious how the original datasets from these image editing models were created. I’m guessing a lot of it is synthetic data to construct scenes programmatically with layers
Can it be? Has Apple FINALLY joined the party? Very ironic they are using an open dataset from Google... and Gemini for prompts by Google.
I'm happy to see something from Apple but this seems so low-tech that it could be one of my own local ComfyUI workflows.
I confess that I don't quite get the point here - is it just that they've paid the inference costs for a dataset than can be used for distillation/other research?
Essentially yes, it’s a data set that can help train or fine tune another model or similar research. From the site:
> Pico-Banana-400K serves as a versatile resource for advancing controllable and instruction-aware image editing. Beyond single-step editing, the dataset enables multi-turn, conversational editing and reward-based training paradigms.
Relevant from Randall Munroe https://what-if.xkcd.com/29/
> What if I took a swim in a typical spent nuclear fuel pool? Would I need to dive to actually experience a fatal amount of radiation? How long could I stay safely at the surface?
> Assuming you’re a reasonably good swimmer, you could probably survive treading water anywhere from 10 to 40 hours. At that point, you would black out from fatigue and drown. This is also true for a pool without nuclear fuel in the bottom.
You meant to comment on this post: https://news.ycombinator.com/item?id=45708292
Shit thank you
Really cool - looking to Apple to lead the on-device AI space in short order...
[flagged]
Eh