Reddit COO Says Platform is 'Incredibly Important' for Training AI

When it comes to advancements in technological space, AI has taken center stage in terms of rapid evolution and influence. It is no surprise that companies are investing heavily in this tech, often at astronomical costs.

Peculiar as it might seem, according to Reddit COO (Chief Operating Officer), Jen Wong, the popular online platform has the potential to contribute greatly towards the advancement of AI.

Come to think of it, it makes a lot of sense. The platform holds 19 years’ worth of user data, which Wong has reportedly said will be highly beneficial for the evolution and development of language models.

The platform has struck a deal with Google to provide its content for AI training purposes. Like any other decade-running social media platform, it holds a large library of “original human ideas” recorded over the years that has now become a wealth for language learning AI models.

In a recent interview, Reddit COO Jen Wong highlighted the platform’s significance in the realm of AI training.

The platform generated over 70 million daily active users and 850 million monthly users. The millions of posts, comments, and messages serve as a treasure trove of data ranging in different topics and interests.

AI models that aim to evolve with the understanding of human sentiment and context can gain vast benefits through these platforms.

The Abundance of Diverse, Human-Generated Data

For years, people have generated content on social media platforms like Twitter, Facebook, Instagram, or in this case, Reddit, almost all of which is authentic and original.

Anyone who aims to study the human experience can very well grasp its essence through these platforms. Just imagine the scope of what an AI can comprehend through decades of this data.

One of the primary factors contributing to Reddit AI training is the sheer abundance and diversity of this human-generated data on the platform.

Unlike curated datasets or synthetic data, Reddit’s content reflects the nuances, complexities, and idiosyncrasies of real-world human interactions.

For instance, ChatGPT, the most popular AI language model, was built of various text data such as books, articles, research papers, and web pages. Many users have reported flaws in the system, the most infamous of which is the AI’s lack of grasp on human context.

ChatGPT often gives out nonsensical and inaccurate responses. It doesn’t sound human, or, more accurately, it is not indistinguishable from humans.

Researchers are constantly trying to develop methods to evolve AI systems, especially its ability to mimic human speech behavior.

Access to a library of data that portrays human linguistic behaviors makes it invaluable for training AI models to understand and interpret natural language.

Reddit COO on Language programming and Linguistics

Have you ever wondered how ChatGPT understands what to say? How does an AI generate human-like responses?

Natural language processing is a component of artificial intelligence that enables it to understand human speech and text—what is referred to as natural language. The goal is to manipulate and replicate human language.

The human language is filled with intricacies that make it difficult to replicate by any language model. The AI doesn’t understand the intended meaning behind a particular sentence; how to respond has to be programmed.

The complex parts of, for instance, the English language, like sarcasm, idioms, metaphors, and usage exceptions, are difficult to categorize into language models.

As AI continues to reshape industries, companies must prioritize access to diverse data to train their AI systems.

Business leaders and AI developers should recognize the value Reddit and, simultaneously, other social media platforms hold for training AI models.