Tech Trends 📡

Google Fixes Gemini’s People-Generating Feature

Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show an anachronistic group of racially diverse soldiers while rendering “Zulu warriors” as stereotypically Black.

Google CEO Sundar Pichai apologized, and Demis Hassabis, the co-founder of Google’s AI research division DeepMind, said that a fix should arrive “in very short order” — within the next couple of weeks. It ended up taking much, much longer than that (despite some Googlers pulling 120-hour workweeks!). But in the coming days, Gemini will once again be able to create pics showing people.

Well… sort of.

Only certain users — specifically those signed up for one of Google’s paid Gemini plans, Gemini Advanced, Business or Enterprise — will regain Gemini’s people-generating feature as part of an early access, English-language-only test.

Google wouldn’t say when the test will expand to the free Gemini tier and other languages.

“Gemini Advanced gives our users priority access to our latest features,” a Google spokesperson said. “This helps us gather valuable feedback while delivering a highly anticipated feature first to our premium subscribers.”

So what fixes did Google implement for people generation? According to the company, Imagen 3, the latest image-generating model built into Gemini, contains mitigations to make the people images Gemini produces more “fair.” For example, Imagen 3 was trained on AI-generated captions designed to “improve the variety and diversity of concepts associated with images in [its] training data,” according to a technical paper they shared. And the model’s training data was filtered for “safety,” plus “review[ed] … with consideration to fairness issues,” claims Google.

We asked for more details about Imagen 3’s training data, but the spokesperson would only say that the model was trained on “a large dataset comprising images, text and associated annotations.”

“We’ve significantly reduced the potential for undesirable responses through extensive internal and external red-teaming testing, collaborating with independent experts to ensure ongoing improvement,” the spokesperson continued. “Our focus has been on rigorously testing people generation before turning it back on.”

Imagen 3 and Gems

In a spot of better news, all Gemini users will get Imagen 3 within the week — minus people generation for those not subscribed to the premium Gemini tiers.

Google says that Imagen 3 can more accurately understand the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations. In addition, the model produces fewer artifacts and errors, Google claims, and is the best Imagen model yet for rendering text.

A sample from Google’s Imagen 3.
Image Credits: Google

To allay concerns about the potential for deepfakes, Imagen 3 will use SynthID, an approach developed by DeepMind to apply invisible, cryptographic watermarks to various forms of AI-originated media. Google previously announced Imagen 3 would use SynthID, so this doesn’t come as much surprise. But I’ll note that the contrast between how Google’s treating image generation in Gemini versus other products, like its Pixel Studio, is a bit curious.

Another sample from Imagen 3.
Image Credits: Google

Alongside Imagen 3, Google’s rolling out Gems for Gemini — albeit only for Gemini Advanced, Business and Enterprise users. Like OpenAI’s GPTs, Gems are custom-tailored versions of Gemini that can act as “experts” on particular topics (e.g. vegetarian cooking).

Here’s how Google describes them in a blog post: “With Gems, you can create a team of experts to help you think through a challenging project, brainstorm ideas for an upcoming event, or write the perfect caption for a social media post. Your Gem can also remember a detailed set of instructions to help you save time on tedious, repetitive, or difficult tasks.”

To create a Gem, users write instructions, give it a name and they’re off to the races.

Gems are available on desktop and mobile in 150 countries and “most languages,” Google says (but not supported in Gemini Live just yet). There are several examples at launch, including a “learning coach,” a “career guide,” a “brainstormer” and a “coding partner.”

Image Credits: Google

We asked Google if it had any plans for ways to let users publish and use other users’ Gems, similar to GPTs on OpenAI’s GPT Store. The answer was “no,” basically.

“Right now, we’re focused on learning how people will use Gems for creativity and productivity,” the spokesperson said. “Nothing further to share at this time.”

Dakidarts

Recent Posts

Inspiring Branded Content Examples: A Showcase

Looking for inspiration? Explore these captivating examples of branded content that effectively engage audiences and…

13 hours ago

OpenAI’s o1: A Self-Fact-Checking AI Model

OpenAI's latest AI model, o1, is a significant advancement in AI technology. Equipped with self-fact-checking…

14 hours ago

AI Chatbots: What is New?

AI chatbots have revolutionized communication and customer service. This comprehensive guide explores the technology behind…

15 hours ago

Google’s Monopoly: Will Anything Change?

Google's dominance in the search engine market has raised antitrust concerns. This article explores the…

15 hours ago

Shopsense AI: Shop the VMAs Looks with AI-Powered Dupes

Discover Shopsense AI, a platform that allows music fans to find and purchase fashion dupes…

23 hours ago

Rent Success: Expanding Your Reach Beyond Your Website

Explore the potential of publishing content beyond your website to reach a wider audience and…

23 hours ago