Naver trained a Korean language model of type “ GPT-3 ”
Elevate your enterprise data technology and strategy at Transform 2021.
Naver, a Seongnam, South Korea-based company that operates an eponymous search engine, announced this week that it has trained one of the largest AI language models of its kind, called HyperCLOVA. Naver says the system has learned 6500 times more Korean data than OpenAI’s GPT-3 and contains 204 billion parameters, the parts of the machine learning model learned from historical training data. (GPT-3 has 175 billion parameters.)
For nearly a year, OpenAI’s GPT-3 remained among the greatest AI language models ever created. Via an API, people have used it to automatically write emails and articles, summarize text, compose poetry and recipes, create website layouts, and generate code for deep learning in Python. But GPT-3 has some key limitations, the main one being that it is only available in English.
According to Naver, HyperCLOVA was trained on 560 billion Korean data tokens – 97% of the Korean language – against the 499 billion tokens on which GPT-3 was formed. Tokens, a way to separate pieces of text into smaller natural language units, can be words, characters, or parts of words.
In a translated press release, Naver said it will use HyperCLOVA to deliver “differentiated” experiences across its services, including the Naver search engine autocorrect feature. “Naver plans to support HyperCLOVA [for] small and medium businesses, creators and startups, ”the company said. “Since AI can be harnessed with a few-step learning method that provides straightforward explanations and examples, anyone who is not an AI expert can easily create AI services.”
OpenAI’s policy director, Jack Clark, called HyperCLOVA a “notable” achievement because of the scale of the model and because it fits in with the trend of generative diffusion of models, with several players developing “GPT-3 style” models. In April, a research team from Chinese company Huawei quietly detailed PanGu-Alpha (stylized PanGu-α), a 750-gigabyte model with up to 200 billion parameters that was trained on 1.1 terabytes of ebooks. in Chinese, encyclopedias, news, social media and web pages.
“Generative models ultimately reflect and amplify the data they are trained on – so different nations care a lot about how their own culture is represented in these models. Therefore, Naver’s announcement is part of a general trend for different nations to assert their own AI capability. [and] capability via training frontier models like GPT-3, ”Clark wrote in his weekly Import AI newsletter. “[We’ll] wait for more technical details to see if [it’s] really comparable to GPT-3. “
Some pundits believe that while HyperCLOVA, GPT-3, PanGu-α, and similar models are impressive when it comes to performance, they don’t advance the ball on the research side of the equation. Rather, they are prestigious projects that demonstrate the scalability of existing techniques or serve as a showcase for a company’s products.
Naver does not claim that HyperCLOVA overcomes other natural language blockers, like correctly answering math problems or answering questions without paraphrasing training data. More problematically, it’s also possible that HyperCLOVA contains the types of bias and toxicity found in models like GPT-3. Among other things, prominent AI researcher Timnit Gebru questioned the wisdom of building large linguistic models – examining who benefits and who suffers. The effects of AI and machine learning model training on the environment have also been raised as serious concerns.
The co-authors of the OpenAI and Stanford paper suggest ways to address the negative consequences of large language models, such as enacting laws that require companies to recognize when text is AI-generated – may -be within the meaning of the California Robot Act.
Other recommendations include:
- Formation of a separate model that acts as a filter for content generated by a language model
- Deploy a bias test suite to run models before allowing users to use the model
- Avoid certain specific use cases
The consequences of not taking any of these steps could be catastrophic in the long run. In recent research, the Center on Terrorism, Extremism and Counterterrorism at the Middlebury Institute of International Studies asserts that GPT-3 could reliably generate “informative” and “influential” text that could radicalize people towards extreme right-wing ideologies and violent behavior. And the toxic language models deployed in production might struggle to understand certain aspects of minority languages and dialects. This might force people using the models to switch to ‘white aligned English’, for example, to make sure the models work best for them, something might discourage minority speakers from engaging with the models. .
VentureBeat’s mission is to be a digital city place for technical decision-makers to gain knowledge about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in running your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member