DeepMind Experimenting with Its Nascent Gopher 280 Billion Parameter Language Model
When it comes to language models, the name of the game is the development of ever more powerful models that push the boundaries of generating accurate and usable text generation from large stores of data.
The latest research comes from Alphabet’s DeepMind division, which unveiled its new 280 billion parameter language model named Gopher and several smaller models on Dec. 8 as projects which aim to deliver further insights in this fast-growing area of AI and machine learning discoveries.
The experiments, which analyzed the performance of six Transformer-based language models – in sizes from 44 million parameters up to Gopher’s 280 billion parameter model – were evaluated while performing 152 diverse tasks to watch how they performed and stood up to the results of other language models that are in use.
According to a 118-page paper on the results of the experiments, DeepMind researchers achieved state state-of-the-art performance across most of the tests.
Performance gains occurred as the language models were scaled up, particularly in areas such as reading comprehension, fact-checking and the identification of toxic language, the company said in a blog post. At the same time, logical and mathematical reasoning saw less benefit from the larger model. The research provided a holistic analysis of the training dataset and the model’s behavior, including the intersection of model scale, bias and toxicity. Part of the research was also applying the language models to topics such as AI safety and the mitigation of downstream harms caused by the technology.
The Gopher results were described in one of three research papers released by DeepMind on Dec. 8. The other papers were a study of ethical and social risks associated with large language models and a paper investigating a new architecture with better training efficiency.
In the blog post about the Gopher research, written by DeepMind researchers Jack Rae, Geoffrey Irving and Laura Weidinger, the authors wrote that their experiments found that Gopher’s performance exceeded existing language models on several tasks, including on the Massive Multitask Language Understanding (MMLU) benchmark. On the MMLU benchmark, “Gopher demonstrates a significant advancement towards human expert performance over prior work,” the researchers wrote. Also performed were quantitative evaluations of Gopher, including an exploration of the model through direct interaction. “Among our key findings was that, when Gopher is prompted towards a dialogue interaction (like in a chat), the model can sometimes provide surprising coherence.”
For example, in one experiment, Gopher was able to discuss cell biology and provide a correct citation despite no specific fine-tuning of the involved dialogue, the researchers noted.
“However, our research also detailed several failure modes that persist across model sizes, amongst them a tendency for repetition, the reflection of stereotypical biases, and the confident propagation of incorrect information.”
Those failures, however, can be helpful, the researchers wrote. “This type of analysis is important, because understanding and documenting failure modes gives us an insight into how large language models could lead to downstream harms and shows us where mitigation efforts in research should focus to address those issues.”
This language modeling work is crucial, the researchers wrote, “because the development and study of more powerful language models – systems that predict and generate text – have tremendous potential for building [ever more] advanced AI systems” to benefit humanity by summarizing information, providing expert advice and following instructions via natural language. “Developing beneficial language models requires research into their potential impacts, including the risks they pose. This includes collaboration between experts from varied backgrounds to thoughtfully anticipate and address the challenges that training algorithms on existing datasets can create.”
The DeepMind work with the Gopher model comes as similar language modeling projects continue from OpenAI, Nvidia, Yuan and others.
OpenAI’s GPT-3 project is a massive natural language model that runs exclusively on Microsoft Azure. GPT-3, which stands for Generative Pre-trained Transformer 3, is an autoregressive language model with 175 billion parameters, which OpenAI claims is ten times more than any previous non-sparse language model. The first version, GPT-1, arrived in 2018, while the second version, GPT-2, debuted in 2019. With the release of GPT-3 in 2020, natural language processing (NLP) gained more power and use cases in the enterprise than ever before.
In November, Nvidia unveiled its new NeMo Megatron large language framework and its latest customizable 530 billion parameter Megatron-Turing model at its GTC21 conference. The Megatron framework trains language models with trillions of parameters, while the Megatron-Turing NLG (natural language generator) 530 billion customizable large language model will be trainable for new domains and languages, according to Nvidia.
In October, China-based Inspur AI Research revealed the availability of its Yuan 1.0 language model, which has 245.7 billion parameters and has undergone training using 5TB of datasets. Yuan 1.0 was built from the ground up as a model for the Chinese language, which is complex and required a unique development approach compared to English, according to Inspur AI Research.
DeepMind is a U.K.-based AI research company that was acquired by Alphabet, the parent company of Google, in 2014, four years after DeepMind’s creation.