Apple research paper hints at LLMs on iPhones and Macs

While Apple may have entered the large language models (LLM) arena later than some competitors, recent moves showcase the company’s innovative strides. Apple’s researchers unveiled a groundbreaking method aimed at significantly reducing the memory requirements for operating LLMs. Their approach utilizes inventive storage and memory management techniques that dynamically transfer the model’s weights between flash memory and DRAM, maintaining impressively low latency.

This breakthrough holds the potential to allow future iPhones and Macs to leverage the power of LLMs without overwhelming the device’s memory, a significant advantage for Apple in the absence of a hyperscaler business, emphasizing the importance of on-device AI.

The memory demands of LLMs have long posed challenges, with a 7-billion-parameter model requiring over 14 gigabytes of RAM, surpassing the capacity of many edge devices. Apple’s approach diverges from typical quantization methods, aiming to address the issue of deploying models on hardware with limited memory.

The strategy involves storing LLMs on flash memory and incrementally loading them into RAM for inference tasks. While flash memory is more abundant than DRAM, it is slower, and naive inference approaches can be sluggish and energy-intensive. Apple’s researchers introduce a suite of optimization techniques in their paper to streamline the process, ensuring swift loading from flash memory to DRAM while keeping memory consumption low.

Testing their techniques on different models and hardware setups, Apple’s research team achieved remarkable results. For instance, on an Apple M1 Max, the latency for a naive loading approach was reduced from 2.1 seconds per token to approximately 200 milliseconds with their techniques. They demonstrated the ability to run LLMs twice the size of available DRAM, achieving accelerated inference speeds of 4-5x on CPU and 20-25x on GPU compared to traditional loading methods.

The researchers assert that their findings hold significant implications for future research, emphasizing the need to consider hardware characteristics when developing inference-optimized algorithms. This paper underscores Apple’s fusion of insights from machine learning models with deep knowledge of hardware and memory design, providing a glimpse into the practical and applicable research the company may contribute in the future. These innovations are anticipated to integrate into upcoming Apple products, ushering in advanced AI capabilities for consumer devices.

What's Hot

Advancing Global Connectivity Through Open Fibre Data Standard

AMD’s Strategic Ascent in Embedded and Edge AI Markets

New York City Embraces AI to Transform Urban Services and Security

Apple research paper hints at LLMs on iPhones and Macs

Nvidia and GE HealthCare Forge Transformative Partnership in Medical AI and Robotics

OpenAI beefs up Washington presence with a pitch for energy and security

Why Writing Without Autocorrect Matters?

Leave A Reply Cancel Reply

Mark McCarthy & Christina Pritchard, Founders at Meta-LUCID

Amrita Choudhary, Accounting Manager at Wasabi Technologies

Jose Remon, Vice President at Responsive Technology Partners

James Raath, Chief Executive Officer at Growth Predictor Ltd

Dr. Yousuf Khan, Chief Medical Officer at Be Team International

Eric Cantwell, CEO at Consumer’s Coop Oil Company

Anurag Dikshit, Senior Director at WNS

Andrew Neal, Vice President at Scanco Software LLC

Allissa Erdmann, Senior Director of Finance at Arctic Wolf

Features:

Company:

What's Hot

Subscribe to Updates

Apple research paper hints at LLMs on iPhones and Macs

Related Posts

Leave A Reply Cancel Reply

Features:

Company: