OpenAI, the developer behind ChatGPT, is advocating the use of artificial intelligence (AI) in content moderation, touting its potential to enhance operational efficiencies for social media platforms by expediting the processing of challenging tasks.
The company said that its latest GPT-4 AI model could significantly shorten content moderation timelines from months to hours, ensuring improved consistency in labeling.
Moderating content is challenging for social media companies like Facebook parent Meta, necessitating the coordination of numerous moderators globally to prevent users from accessing harmful material like child pornography and highly violent images.
“The process (of content moderation) is inherently slow and can lead to mental stress on human moderators. With this system, the process of developing and customizing content policies is trimmed down from months to hours.“
According to the statement, OpenAI is actively investigating utilizing large language models (LLMs) to tackle these issues. Its language models, such as GPT-4, make them suitable for content moderation, as they can make moderation decisions guided by policy guidelines.
ChatGPT-4’s predictions can refine smaller models for handling extensive data. This concept improves content moderation in several ways, including consistency in labels, a swift feedback loop and easing the mental burden.
The statement highlighted that OpenAI is currently working to enhance GPT-4’s prediction accuracy. One avenue being explored is the integration of chain-of-thought reasoning or self-critique. Additionally, it is experimenting with methods to identify unfamiliar risks, drawing inspiration from constitutional AI.
OpenAI’s goal is to utilize models to detect potentially harmful content based on broad descriptions of harm. Insights gained from these endeavors will contribute to refining current content policies or crafting new ones in uncharted risk domains.
On Aug. 15, OpenAI CEO Sam Altman clarified that the company refrains from training its AI models using user-generated data.