Labeling and annotation platforms might not get the attention flashy new generative AI models do. But they’re essential. The data on which many models train must be labeled, or the models wouldn’t be able to interpret that data during the training process.
Annotation is a vast undertaking, requiring thousands to millions of annotations for the larger and more sophisticated datasets in use. To help ease the burden, Eric Landau and Ulrik Hansen founded Encord, which they describe as a “data development” platform for companies managing and preparing their data for AI models.
Now, the company has an additional $30 million in its coffers thanks to a Series B round led by Next47. Bringing Encord’s war chest to $50 million, the new capital will be put toward doubling the size of Encord’s product, engineering and AI research teams over the next six months and expanding the company’s San Francisco offices, they said.
“By the end of the year, we expect to grow our team to 100 employees, up from 70 currently,” he added. “We now have dual headquarters in London and San Francisco, with team members across the globe.”
Landau first started working with big data systems, conducting research into particle physics while an undergraduate student at Stanford. Hansen worked in global markets at J.P. Morgan, where he dealt in emerging market derivatives.
Hansen says that the seed of the idea for Encord came while he was working on data-intensive AI projects during a computer science master’s program at Imperial College London. Frustrated by the time-consuming nature of data curation and labeling, Hansen met with Landau, whom he knew from the entrepreneurial scene in London, about ways they might solve the data problem together.
“Combining Hansen’s software development expertise with my insights from quantitative research to automate data development, we launched the first iteration of Encord’s product during Y Combinator in the spring of 2021,” Landau told Dakidarts. “The Encord platform equips enterprises with tools to prepare their data for AI and assess how effectively that data supports their models.”
With the size of the data annotation and labeling market estimated to grow to $3.6 billion by 2027, Encord is one of many vendors competing for contracts. Besides the elephant in the room — Scale AI — there are startups like Datasaur, which lets customers create models automatically from sets of labels; Heartex, which is building an open source data “development” platform; and data annotation tooling provider Dataloop.
Encord stands apart, Landau says, with the versatility of its platform.
Using Encord, teams can explore and visualize datasets — including image, video and voice datasets — pulled in from private and public cloud storage and compare the performance of different models trained on the same sets. The platform attempts to detect model accuracy issues and suggest additional training data that could help to rectify those issues.
“Unlike piecemeal solutions that only address specific parts of your data stack, Encord lets you consolidate all your data workflows in one platform,” Landau said. “Through this consolidation, companies gain traceability that sheds light on the often opaque ‘black box’ of AI, helping to understand why a model makes specific decisions.”
Encord’s strategy seems to be working well so far. The company has 120 customers, including Philips, buzzy AI startup Synthesia and healthcare providers Cedars-Sinai and Northwell Health, as well as contracts with unnamed military and government agencies. Landau claims that Encord increased revenue 4x over the last year and that it could be cash-flow positive by 2025 if it weren’t continuing to grow headcount.
“We’re feeling the opposite of a slowdown,” Landau said. “That being said, we are aware of the broader market conditions and have taken a conservative approach to deploying capital.”
Other participants in the new funding round included Y Combinator, CRV and Crane Venture Partners.