Why Frontier Data?
Through frontier data, Gata seeks to drive, rather than follow, frontier AI advancements.
Through frontier data, Gata seeks to drive, rather than follow, frontier AI advancements.
Data innovation is at the heart of today's frontier AIāand tomorrowās. A striking example is ChatGPT, whose notable capabilities were made possible by Reinforcement Learning from Human Feedback (RLHF), which leverages novel human preference data that reflects human preferences between various AI outputs. Although popularized by OpenAI in 2022, the human preference data approach have quickly become standard in aligning AI with human goals. This rapid adoption highlights the profound impact of data innovation on AI advancement.
Data innovation will be equally crucial in advancing AI from human-level to superhuman performance. On the quantitative side, we are nearing the limits of publicly available and accessible data. On the qualitative side, as AI exceeds human intelligence across various fields, even the most knowledgeable humans will no longer be able to provide effective guidance. Meeting these challenges demands new ways of creating and curating dataāinnovations that will be pivotal to unlocking AIās next wave of breakthroughs.
However, most data companies miss out on the significant upside of data because they neither innovate nor own the datasets they produce. They provide data labelling as a service and capture marginal value. In contrast, AI companies capture the majority of value by driving data innovation in-house and outsourcing data production. For example, even though Scale AI provided human preference data for ChatGPT, it did not share in ChatGPTās remarkable upsideāa pattern repeated throughout the industry.
Gata aims to become a decentralized version of LAION, focused on researching and producing frontier data necessary to unlock frontier-level AI advancements. By using token-based incentives, Gata can rapidly translate AI research into large-scale data production. Through the ownership of frontier data, we ensure that individuals who help create this data also own a share of the AI breakthroughs it enables.
Non-frontier dataāsuch as image segmentation for autonomous driving or sentiment analysis for social mediaāremains a viable business segment. However, Gata chooses to focus on frontier data because it powers the most advanced AI innovation. Non-frontier data typically serves narrow use cases and is generated on demand for a single client. As a result, owning non-frontier data offers little strategic leverage for broader AI advancement. In contrast, developing frontier data aligns with Gataās mission to drive frontier AI advancements and ensures that those who contribute to its creation can share in the value it generates.