Covering Scientific & Technical AI | Friday, November 15, 2024

AI Demands Are Fueling Hardware Acceleration 

As Moore’s law chip scaling reaches the point of diminishing returns, processor architectures are evolving to accelerate everything from AI model training and inference, data centers that support those demanding workloads, and sprawling edge computing deployments as well.

Those chip design trends were a focus of this week’s Linley Fall Processor Conference, during which principal analyst Linley Gwennap made the case for hardware accelerators as Moore’s law runs out of steam. Along with AI and edge computing, the emerging chip architectures are being used to boost application performance as well as data center infrastructure that supports those enterprise workloads.

The handwriting is on the wall, Gwennap argued: Chip scaling down to the 3nm node is yielding few performance benefits, with factors like increase electrical resistance negating gains in transistors, speed and power. Moreover, expanding transistor counts are outweighed by cost and power limits.

As an example, Gwennap noted that Nvidia’s 7nm Ampere GPU clock runs slower than its predecessor: the 12nm Volta processor.

Those and other leading-edge chips are manufactured by Taiwan Semiconductor Manufacturing Co. In an early sign that the costs associated with chip scaling outweigh the advantages, GlobalFoundries threw in the towel at the 7nm node. Instead, TSMC’s main competitor is focusing on 12nm and higher nodes, targeting low-power embedded chip applications.

Another emerging processor strategy is replacing huge transistor counts with novel hardware accelerator designs, Gwennap said. For example, application-specific accelerators in the form of GPUs and network processors are being used to boost computing and memory resources. Case in point are the growing number of AI accelerator chips along with data processing units, or DPUs.

Linley Gwennap

“AI accelerators differ from processors,” Gwennap said, with many using parallel computing designs like systolic arrays “to break the register-file bottleneck.” They also employ large SRAMs rather than caching memory as a way to boost performance.

Meanwhile, DPUs are being aimed at networking bottlenecks as those functions move from CPUs to smart network interface cards. Software-defined networking functions such as virtual routers, flexible storage and cloud radio-access networks can be accelerated using custom architectures, Gwennap argued in his keynote.

Another hardware acceleration strategy is in-memory processing, an architecture that places multiple processors as close as possible to memory arrays. Current designs require more power to fetch data than the power used to compute the results. “In-memory compute breaks [the] bottleneck,” the chip industry analyst said. “In-memory designs can greatly reduce power requirements.”

The requirement to accelerate AI workloads is driven by the reality that complex image- and language-processing models are much larger than those used for simpler tasks. For example, Linley Group estimates OpenAI’s third-generation language generator, GPT-3, handles 175 billion parameters. By comparison, the ResNet-50 convolutional neural network uses only 26 million.

The general-purpose language model uses deep learning to produce text. In September, OpenAI (exclusively) licensed the technology to Microsoft.

While smaller models are adequate for simpler tasks like image recognition, workloads like natural language processing are more complex. Hence, “models are large and growing much more quickly,” Gwennap noted, requiring greater hardware acceleration.

The market tracker estimates model size is growing by a factor of 20 each year, but those larger models are producing more accurate results, including human-like text.

The clearest example of the rise of hardware acceleration is Nvidia’s push into enterprise data centers, Gwennap said. The GPU leader’s Volta V100 is the market leader for AI training, and its Turing T4 is making inroads handling inference workloads. Ampere A100 accelerators (shipping now) are expected to extend Nvidia’s dominance in the data center AI market, where Linley Group estimated its revenues have jumped 80 percent in the first half of 2020. That figure excludes Nvidia's Mellanox unit.

Meanwhile, hardware accelerators are shifting to the network edge as cloud vendors deploy micro-data centers closer to data while network service providers place servers near cell towers or broadband gear. “Some distributed services require AI or network acceleration," Gwennap said. Among those services are ADAS, or advanced driver-assisted systems.

While steady progress is being made on the chip side, accelerator software stacks remain “weak,” he added. Most AI frameworks rely on Nvidia’s Cuda developer toolkit. Accelerator vendors must then port applications to their chips. Compatibility issues have so far hampered efforts by cloud vendors and startups seeking to tap the burgeoning market for hardware acceleration.

“ResNet-50 is easy, real workloads are hard,” the industry analyst concluded.

About the author: George Leopold

George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).

AIwire