Microsoft, OpenAI may have solved a fundamental AI bottleneck

µ-Parametrization could be the key to tuning hyperparameters for massive AI models

When you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.

MicrosoftandOpen AIhave developed a new method for optimizing massive AI models that are too expensive to train multiple times, such asGPT-3.

Ablog postpublished by Microsoft Research describes a technique called µ-Parametrization (or µP), which plays on the discovery of similarities between the behaviour of small- and large-scale AI models to minimize the quantity of compute resources required to make optimizations.

Although you’d need a doctorate to make sense of the specifics, the essential message is this: with µ-Parametrization, it will be cheaper and simpler to develop larger-scale AI models capable of yielding far superior performance to those available today.

Optimizing AI models

As explained in the blog post, one reason large AI models are difficult to train effectively is because we have little insight into the way their behavior changes as they scale. As such, the larger the AI model, the less well-tuned researchers would currently expect it to be.

However, µ-Parametrization offers a route to tuning large-scale models at much lower costs and much greater efficiency, by capitalizing on the insight that neural networks of varying sizes share the same optimal hyperparameters (HPs) in some conditions.

Essentially, this means a small-scale tuning process can be extrapolated outwards and mapped onto a much larger model, instead of tuning an entire multi-billion-parameter model directly.

“µP’s principled way of parameterizing the model and selecting the learning rate make it easier for anybody to scale the training of deep neural networks. Such an elegant combination of beautiful theory and practical impact,” said Johannes Gehrke, Lab Director at Microsoft Research.

Are you a pro? Subscribe to our newsletter

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

To put the theory into practice, Microsoft worked with OpenAI to unleash µ-Parametrization on GPT-3, a natural language model whose largest iteration is made up of 175 billion parameters.

Microsoft lifts lid on plans for ‘planet-scale’ AI infrastructure>I went to a play written by AI; it was like looking in a circus mirror>Microsoft suspends new sales in Russia

“After parameterizing a version of GPT-3 with relative attention in µP, we tuned a small proxy model with 40 million parameters before copying the best hyperparameter combination to the 6.7-billion parameter variant of GPT-3,” Microsoft explained.

The results were quite startling; the collaborators managed to create an even more performant version of GPT-3, using just 7% of the compute power consumed in the pretraining of the 6.7-billion parameter model.

To help other practitioners benefit from these findings, Microsoft has published aPyTorch packagedesigned to help integrate µ-Parametrization into their existing models, which can supposedly be finicky in practice.

The company also says there remains plenty that is yet to be understood about the scaling of AI models, however, and pledged to continue its work to “derive more principled approaches to large-scale machine learning”.

Joel Khalili is the News and Features Editor at TechRadar Pro, covering cybersecurity, data privacy, cloud, AI, blockchain, internet infrastructure, 5G, data storage and computing. He’s responsible for curating our news content, as well as commissioning and producing features on the technologies that are transforming the way the world does business.

New fanless cooling technology enhances energy efficiency for AI workloads by achieving a 90% reduction in cooling power consumption

Samsung plans record-breaking 400-layer NAND chip that could be key to breaking 200TB barrier for ultra large capacity AI hyperscaler SSDs

NYT Strands today — hints, answers and spangram for Sunday, November 10 (game #252)

Microsoft, OpenAI may have solved a fundamental AI bottleneck#

Optimizing AI models#

Are you a pro? Subscribe to our newsletter#

Microsoft, OpenAI may have solved a fundamental AI bottleneck

Optimizing AI models

Are you a pro? Subscribe to our newsletter