Today, every organization understands an AI model is only as good as the data it is trained upon. Companies give particular focus on sourcing and annotating data correctly, but when it comes to computer vision models, the task becomes twice as difficult. This is largely due to the scarcity of high-quality 2D and 3D visual training data.
A study conducted by Datagen itself found 99% of computer vision (CV) teams have had a machine learning (ML) project canceled due to insufficient training data while 100% saw delays due to the same problem. At the core, either they don’t have domain-specific, fair and correctly annotated training data or what they have is just not enough for driving the expected results.
Datagen’ synthetic data platform
Founded in 2018, Datagen solves this problem by providing computer vision teams with a self-service platform to design synthetic datasets. It allows users to create on-demand datasets of people and customize them according to parameters such as ethnicity, gender, environmental interaction and expression. This way, companies not only get training data for their application on a large scale but also with high variance. They can also define how much of the total dataset would be attributed to one particular subject.
“Our platform includes a range of tools and generators, including application-specific tools, such as our “in-cabin automotive” solution which is an interface optimized for producing data to train driver monitoring systems (DMS). Datagen Faces Generator, meanwhile, allows the user to control attributes like age, gender, facial expression, gaze direction, as well as scene-specific parameters, such as camera location, and lighting,” Datagen CEO Ofir Zuk (Chakon) told VentureBeat.
“With our application-specific solutions, like the aforementioned in-cabin automotive generator, users can control the identity of the subject, and generate that subject playing out certain common DMS scenarios, such as “Falling asleep at the wheel,” or “Using their cellular phone.” For each of these scenarios, the user can generate their range of subjects engaging in these activities in 10-second animated clips – again, with variation around the scene, lighting, camera angle, etc. Once the parameters are set to the user’s liking, the engine then generates a robust, targeted dataset of still images and/or animated clips that can be applied in training,” he explained.
Ultimately, the solution enables enterprises to do away from manually sourcing and annotating and switch to a way that provides the required 2D, 3D visual data at scale and ease. Computer vision teams can use it to get to market faster whether they are developing applications for robotics, smart security/monitoring or some other area.
“Labeling real-world visual data is not only incredibly time-consuming and resource-intensive, but it’s also a major source of errors and inconsistencies. With Datagen, you’re able to not only skip the time and expense of human annotation but also ensure much higher data quality. Datagen modalities provide accurate annotations for each image — for example, the exact head yaw/pitch/roll, the exact direction of the eye gaze — at levels of detail and accuracy that cannot be achieved with real-life data and manual annotation,” the CEO added.
With the fresh round, which was led by Scale Venture Partners, Datagen plans to accelerate growth and strengthen its position as the leading synthetic data provider for computer vision projects. The company’s revenue has grown eightfold YoY since launch and its customer base includes Fortune 100 companies and three of the top five tech giants.
While Ofir did not share the company names, he did note that Datagen is not beholden to a single industry or use-case in the computer vision segment.
“We’ve already seen considerable success with our application-specific offerings, such as our in-cabin automotive solution. Moving forward, we’ll be expanding our human-centric offering to additional domains that cater to our customers needs. The Metaverse will also be an even larger area of focus for us moving forward. As interest and demand continue to outpace development, we see a significant opportunity for synthetic data to serve as a significant enabler of the Metaverse. Lastly, we’re actively developing additional tools and solutions on top of data Generation, with the goal of establishing a comprehensive, streamlined infrastructure for Computer Vision,” he emphasized.
Demand for synthetic data continues to surge
Globally, the demand for synthetic data is expected to continue for all AI applications, including computer vision model training. According to Gartner, by 2024, 60% of the data used for the development of analytics and AI projects will be synthetically generated, and by 2030, synthetic data will surpass real data as the preferred tool for training AI models.
Other companies operating in the same space include Mostly AI, Rendered AI, YData and Synthetaic. However, Datagen claims to be unique in the sense that it allows CV teams to simulate dynamic humans and objects in their context. They can generate, train, evaluate and repeat to improve the accuracy of their models.
“The Datagen Platform uses proprietary, virtual camera technology so users can ‘photograph’ real-world 3D data in photo-realistic simulations, thus creating hyperrealistic environments and training data. Finally, Datagen’s Zero PII design provides teams with photo-realistic, human training data without any concerns around personally identifiable information (PII),” the CEO said. “By design, Datagen’s product infrastructure supports modularity and expandability of use cases and domains with close to zero overhead. This way Datagen can offer a rapidly increasing number of use cases that cover the growing needs of enterprise customers.”