This month, Google forced a prominent AI ethics researcher after expressing frustration over the company for withdrawing a research article. The article highlights the risks of language processing artificial intelligence, the type used in Google Search and other text analysis products.
One of the risks is the large carbon footprint of the development of this type of AI technology. According to some estimates, training an AI model generates as many carbon emissions as it takes to build and drive five cars during their lifetime.
I am a researcher studying and developing AI models, and I am all too familiar with the rising energy and financial costs of AI research. Why have AI models become so power hungry, and how does it differ from the traditional calculation of the data center?
Today’s training is ineffective
Traditional data processing work done in data centers includes video streaming, email and social media. AI is more computer-intensive because it has to read a lot of data until it learns to understand it – that is, is trained.
This training is very inefficient compared to the way people learn. Modern AI uses artificial neural networks, which are mathematical calculations that mimic neurons in the human brain. The strength of the connection of each neuron to its neighbor is a parameter of the network called weight. To learn how to understand language, start the network with random weights and adjust it until the output matches the correct answer.
A common way to train a language network is by entering many texts from sites like Wikipedia and news outlets with some of the masked words and asking them to guess the masked words. An example is ‘my dog is cute’, with the word ‘cute’. Initially, the model misunderstands everyone, but after many rounds of adjustment, the connection weights start to change and pick up patterns in the data. The network eventually becomes accurate.
A recent model called Bidirectional Encoder Representations from Transformers (BERT) used 3.3 billion words from English books and Wikipedia articles. Furthermore, BERT reads this data set not once, but 40 times during training. By comparison, an average child learning to speak can hear 45 million words at age five, 3,000 times less than BERT.
Looking for the right structure
What makes language models even more expensive to build is that this training process often takes place during development. This is because researchers want to find the best structure for the network – how many neurons, how many connections between neurons, how fast the parameters need to change during learning, and so on. The more combinations they try, the greater the chance that the network will achieve high accuracy. Human brains, on the other hand, do not have to find an optimal structure – they have a pre-built structure that has been honed by evolution.
As companies and academics compete in the AI space, the pressure is on to improve the latest state of affairs. Even achieving a 1 percent improvement in accuracy on difficult tasks such as machine translation is considered important and leads to good publicity and better products. But to achieve the 1 percent improvement, one researcher can train the model thousands of times, each time with a different structure, until the best one is found.
Researchers at the University of Massachusetts Amherst have estimated the energy costs of developing AI language models by measuring the power consumption of common hardware used during training. They found that BERT’s training once had the carbon footprint of a passenger flying a round trip between New York and San Francisco. However, by searching using different structures – that is, by training the algorithm multiple times on the data with slightly different numbers of neurons, connections and other parameters – the cost became the equivalent of 315 passengers, or a whole 747- jet.
Bigger and warmer
AI models are also much larger than they should be, and are getting bigger every year. A more recent language model similar to BERT, called GPT-2, has 1.5 billion weights in its network. GPT-3, which made headlines this year due to its high accuracy, has 175 billion weights.
Researchers have discovered that larger networks lead to better accuracy, even if only a small fraction of the network is useful. Something similar happens in the brains of children when neuronal connections are first added and then reduced, but the biological brain is much more energy efficient than computers.
AI models are trained in specialized hardware, such as graphics processing units, that draw more power than traditional CPUs. If you own a laptop, it probably has one of these graphics processing units to create advanced graphics for playing Minecraft RTX for example. You can also see that they generate a lot more heat than ordinary laptops.
All this means that the development of advanced AI models yields a large carbon footprint. Unless we switch to 100 percent renewable energy sources, AI progress may run counter to the goals of reducing greenhouse gas emissions and slowing down climate change. The financial cost of development is also becoming so high that only a select few laboratories can afford it, and it is these that will determine the agenda for what types of AI models are developed.
Do more with less
What does this mean for the future of AI research? Things may not be as dark as they seem. The cost of training can decrease as the more effective training methods are invented. Similarly, although over the past few years it has been predicted that the use of the data centers’ energy would explode, this has not happened due to improvements in the efficiency of the data center, more efficient hardware and cooling.
There is also a trade-off between the cost of training the models and the cost of using them, so to spend more energy while training to devise a smaller model can use it to make it cheaper . Because a model will be used many times in its lifetime, it can lead to huge energy savings.
In my lab research, we looked at ways to make AI models smaller by sharing weights or using the same weights in different parts of the network. We call these shapeshifter networks because a small set of weights can be reconfigured into a larger network of any shape or structure. Other researchers have shown that weight sharing performs better in the same amount of exercise time.
In the future, the AI community needs to invest more in developing energy-efficient training schemes. Otherwise, there is a danger that AI will be dominated by a select few who can afford to set the agenda, including what types of models are developed, what types of data are used to train them, and what the models are used for.
This story originally appeared in The Conversation.