> For the complete documentation index, see [llms.txt](https://gata.gitbook.io/gata-public-gitbook/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://gata.gitbook.io/gata-public-gitbook/thesis/why-frontier-data.md).

# Why Frontier Data?

<figure><img src="/files/o5ei1OKKkPVk1dotwOYG" alt=""><figcaption></figcaption></figure>

**Data innovation is at the heart of today's frontier AI—and tomorrow’s.** A striking example is ChatGPT, whose notable capabilities were made possible by Reinforcement Learning from Human Feedback (RLHF), which leverages novel human preference data that reflects human preferences between various AI outputs. Although popularized by OpenAI in 2022, the human preference data approach have quickly become standard in aligning AI with human goals. This rapid adoption highlights the profound impact of data innovation on AI advancement.

**Data innovation will be equally crucial in advancing AI from human-level to superhuman performance.** On the quantitative side, we are nearing the limits of publicly available and accessible data. On the qualitative side, as AI exceeds human intelligence across various fields, even the most knowledgeable humans will no longer be able to provide effective guidance. Meeting these challenges demands new ways of creating and curating data—innovations that will be pivotal to unlocking AI’s next wave of breakthroughs.

**However, most data companies miss out on the significant upside of data because they neither innovate nor own the datasets they produce.** They provide data labelling as a service and capture marginal value. In contrast, AI companies capture the majority of value by driving data innovation in-house and outsourcing data production. For example, even though Scale AI provided human preference data for ChatGPT, it did not share in ChatGPT’s remarkable upside—a pattern repeated throughout the industry.

**Gata aims to become a decentralized version of** [**LAION**](https://laion.ai)**, focused on researching and producing frontier data necessary to unlock frontier-level AI advancements. By using token-based incentives, Gata can rapidly translate AI research into large-scale data production. Through the ownership of frontier data, we ensure that individuals who help create this data also own a share of the AI breakthroughs it enables.**

## A Note on Non-Frontier Data

Non-frontier data—such as image segmentation for autonomous driving or sentiment analysis for social media—remains a viable business segment. However, Gata chooses to focus on frontier data because it powers the most advanced AI innovation. Non-frontier data typically serves narrow use cases and is generated on demand for a single client. As a result, owning non-frontier data offers little strategic leverage for broader AI advancement. In contrast, developing frontier data aligns with Gata’s mission to drive frontier AI advancements and ensures that those who contribute to its creation can share in the value it generates.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://gata.gitbook.io/gata-public-gitbook/thesis/why-frontier-data.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
