Say what you will about generative AI. But it’s commoditizing — or, at least, it appears to be.
In early August, both Google and OpenAI slashed prices on their budget-friendliest text-generating models. Google reduced the input price for Gemini 1.5 Flash (the cost to have the model process text) by 78% and the output price (the cost to have the model generate text) by 71%. OpenAI, meanwhile, decreased the input price for GPT-4o by half and the output price by a third.
According to one estimate, the average cost of inference — the cost to run a model, essentially — is falling at a rate of 86% annually. So what’s driving this?
For one, there’s not much to set the various flagship models apart in terms of capabilities.
Andy Thurai, principal analyst at Constellation Research, told me: “We expect the pricing pressure to continue with all AI models if there is no unique differentiator. If the consumption is not there, or if the competition is gaining momentum, all of these providers need to be aggressive with their pricing to keep the customers.”
John Lovelock, VP analyst at Gartner, agrees that commoditization and competition are responsible for the recent downward pressure on model prices. He notes that models have been priced on a cost-plus basis since inception — in other words, priced to recoup the millions of dollars spent to train them (OpenAI’s GPT-4 reportedly cost $78.4 million) and the server costs to run them (ChatGPT was at one point costing OpenAI ~$700,000 per day). But now data centers have reached a size — and scale — to support discounts.
Vendors, including Google, Anthropic, and OpenAI, have embraced techniques like prompt caching and batching to yield additional savings. Prompt caching lets developers store specific “prompt contexts” that can be reused across API calls to a model, while batching processes asynchronous groups of low-priority (and subsequently cheaper) model inference requests.
Major open model releases like Meta’s Llama 3 are likely having an impact on vendor pricing, too. While the largest and most capable of these aren’t exactly cheap to run, they can be competitive with vendors’ offerings, cost-wise, when run on an enterprise’s in-house infrastructure.
The question is whether the price declines are sustainable.
Generative AI vendors are burning through cash — fast. OpenAI is said to be on track to lose $5 billion this year, while rival Anthropic projects that it will be over $2.7 billion in the hole by 2025.
Lovelock thinks that the high capex and operational costs could force vendors to adopt entirely new pricing structures.
“With cost estimates in the hundreds of millions of dollars to create the next generation of models, what will cost-plus pricing result in for the consumer?” he asked.
We’ll find out soon enough.
News
Musk supports SB 1047: X, Tesla and SpaceX CEO Elon Musk has come out in support of California’s SB 1047, a bill that requires makers of very large AI models to create and document safeguards against those models causing serious harm.
AI Overviews speak poor Hindi: Ivan writes that Google’s AI Overviews, which give AI-generated answers in response to certain search queries, makes lots of mistakes in Hindi — like suggesting “sticky things” as something to eat during summer.
OpenAI backs AI watermarking: OpenAI, Adobe and Microsoft have thrown their support behind a California bill requiring tech companies to label AI-generated content. The bill is headed for a final vote in August, Max reports.
Inflection adds caps to Pi: AI startup Inflection, whose founders and most of its staff was hired away by Microsoft five months ago, plans to cap free access to its chatbot Pi as the company’s focus shifts toward enterprise products.
Stephen Wolfram on AI: Ron Miller interviewed Stephen Wolfram, the founder of Wolfram Alpha, who said he sees philosophy entering a new “golden age” due to the growing influence of AI and all of the questions that it’s raising.
Waymo drives kids: Waymo, the Alphabet subsidiary, is reportedly considering a subscription program that would let teens hail one of its cars solo and send pickup and drop-off alerts to those kids’ parents.
DeepMind workers protest: Some workers at DeepMind, Google’s AI R&D division, are displeased with Google’s reported defense contracts — and they’re said to have circulated a letter internally to indicate as much.
AI startups fuel SVP buying: VCs are increasingly buying shares of late-stage startups on the secondary market, often in the form of financial instruments called special purpose vehicles (SVPs), as they try to get pieces of the hottest AI companies, Rebecca writes.
Research paper of the week
As we’ve written about before, many AI benchmarks don’t tell us much. They’re too simple — or esoteric. Or there’s glaring errors in them.
Aiming to develop better evaluations for vision-language models (VLMs) specifically (i.e., models that can understand both photos and text), researchers at the Allen Institute for AI (AI2) and elsewhere recently released a test bench called WildVision.
WildVision consists of an evaluation platform that hosts around 20 models, including Google’s Gemini Pro Vision and OpenAI’s GPT-4o, and a leaderboard that reflects people’s preferences in chats with the models.
In developing WildVision, the AI2 researchers say that they found that even the best VLMs hallucinated and struggled with contextual cues and spatial reasoning. “Our comprehensive analysis … indicates future directions for advancing VLMs,” they wrote in a paper accompanying the release of the testing suite.
Model of the week
It’s not a model per se, but this week, Anthropic launched its Artifacts feature for all users, which turns conversations with the company’s Claude models into apps, graphics, dashboards, websites and more.
Launched in preview in June, Artifacts — which is now available for free on the web and Anthropic’s Claude apps for iOS and Android — provides a dedicated window that shows the creations you’ve made with Claude. Users can publish and remix artifacts with the broader community, while subscribers to Anthropic’s Team plan can share artifacts in more locked-down environments.
Here’s how Michael Gerstenhaber, product lead at Anthropic, described Artifacts in an interview: “Artifacts are the model output that puts generated content to the side and allows you, as a user, to iterate on that content. Let’s say you want to generate code — the artifact will be put in the UI, and then you can talk with Claude and iterate on the document to improve it so you can run the code.”
Worth noting is that Poe, Quora’s subscription-based, cross-platform aggregator for AI models, including Claude, has a feature similar to Artifacts called Previews. But unlike Artifacts, Previews isn’t free — it requires paying $20 per month for Poe’s premium plan.
Grab bag
OpenAI might have a Strawberry up its sleeve.
That’s according to The Information, which reports that the company is trying to release a new AI product that can reason through problems better than its existing models. Strawberry — previously called Q*, which yours truly wrote about last year — is said to be able to solve complex math and programming problems it hasn’t seen before, as well as word puzzles like The New York Times’ Connections.
The downside is that it takes more time to “think.” Unclear is how much longer compared to OpenAI’s best model today, GPT-4o.
OpenAI hopes to launch some form of Strawberry-infused model this fall, potentially on its AI-powered chatbot platform ChatGPT. The company’s also reportedly using Strawberry to generate synthetic data to train models, including its next major model code-named Orion.
Expectations for Strawberry are sky-high in AI enthusiast circles. Can OpenAI meet them? It’s difficult to say — but I’m hoping for an improvement in ChatGPT’s spelling abilities, at the very least.