Advanced Computing in the Age of AI | Friday, April 19, 2024

Busting Four Common AI, ML, and Data Storage Myths 

Today, consumers and business users interact with artificial intelligence and machine learning unbeknownst to them. From the consumer perspective, it involves everything from watching your favorite show on streaming media to calling a car service at a moment’s notice. From a business perspective, organizations are interacting with AI and ML to gain better insights to support their business objectives.

AI/ML is about pattern recognition. Specifically, the ability to recognize and understand patterns in real time to improve business processes, enterprise outcomes, and people’s lives. According to a 2022 Market Research Insights Report, the AI market is projected to grow from $387.45 billion in 2022 to $1,394.30 billion by 2029.

With more organizations adopting these deep learning technologies, IT teams are navigating how to best understand the practicalities of how to cost-effectively build and manage their infrastructure to support the opportunities AI and ML deliver to the organization and support their ability to scale for future business growth. One element that must not be underestimated, rather should be a priority focal point, is the data storage infrastructure that’s required to support these emerging applications.

Here are four common AI/ML storage myths that need to be busted.

1. AI/ML applications must be supported by high-IOPs all-flash storage

To “feed the beast,” the accelerator requires data to be available whenever and wherever it is needed. As such, it reinforces the fact that AI/ML storage is not only about pure speed. All-flash storage systems have impressively high IOPs but can also drain your IT budget.

As with most AL/ML applications, accelerators, too, have varying levels of performance. As an example, the computation per image in object recognition applications takes long enough that a hybrid (hard-disk drive and solid-state disk) system can be a comparable solution as all-NVMe solution – at a much lower price. IT teams must take note and balance their compute accelerators, AI/ML workloads, with their storage options to find the optimal solution. Independent benchmark reports such as MLPerf can help here.

 2. AI/ML is all about the GPU

Before the emergence of modern graphic processing units (GPUs) with extreme computational power, the AI/ML applications and neural networks in use today were nothing more than an interesting concept. There is not doubt that accelerator silicon is critical for AI/ML applications, but without the adequate storage and networking, it’s worthless.

Storage and networking are considered the hands that “feed the beast” as they ensure that the next set of data is always available to the accelerator before it has finished with the current set. Organization must then consider the choice of storage and networking infrastructure as carefully as they consider the GPU. Each element must be balanced to achieve the optimal result: too much storage performance or capacity will be wasted, while too little will leave expensive computational silicon idle.

3. AI/ML can make effective use of a dedicated single-use storage system

Organizations gain the most value from AI/ML when applied to its core data source. Already, banks have benefitted from adoption these technologies for fraud detection and drug manufacturers can better analyze data from experimentation or manufacturing to speed up drug development. Multiple industry leading retailers are also implementing AI technologies at the core of its technology and business infrastructures to best address the needs of their customers. Many businesses no longer view AI/ML as experimental side projects, but rather, as an integral part of the business and a catalyst for future growth. As such, these applications are best services with a dedicated storage system within the company’s core IT infrastructure.

4. Tiering reduces AI/ML storage costs

Tiered storage is a common strategy to maximize storage resources and minimize costs. “Hot” mission-critical and frequently accessed data is placed on expensive and fast storage media (i.e., SSDs), while “cold” archival data that is very rarely accessed or updated is kept on the cheapest storage devices (i.e., tape).

Since there is no such thing as “cold” AI/ML data, this model cannot be applied to these types of applications.

As all AI/ML training data is used on every training run, tiering select data off to differing storage layers will result in significant performance issues. AI/ML storage solutions must treat all data as “hot” and ensure all data is always available.

It’s also worth noting that the accuracy of AI/ML workloads increases with the volume of training data available. This means that the storage infrastructure must be able to scale without disruption as training data volumes expand. Scale-out linear growth, in contrast to storage tiering, is a key storage requirement for these environments.

Innovations in AI/ML are poised to fuel massive digital transformations across the enterprise that result in better business outcomes. If adoption is not implemented and managed properly, it will impact nearly every aspect of an organisation, and not in a positive way. Many technologies are expected to reach mainstream adoption in the next two to five years, such as edge AI, decision intelligence, and deep learning, according to the Gartner hype cycle. As organisations embark on their own individual digital journeys, don’t let your underlying storage infrastructure be an afterthought as it plays a critical role in the success of your organizations’ ability to maximise the potential of AI/ML applications.

About the Author

MLCommons storage working group co-chair and Panasas software architect Curtis Anderson is a senior storage professional with more than 34 years of experience in a wide variety of storage and I/O software, with primary focus on filesystem implementations. He holds 10 patents in the areas of continuous data protection, replication of de-deduplicated file data over a network, forensics collection after storage system failures, and network replication of filesystem transaction logs.

About the author: Tiffany Trader

With over a decade’s experience covering the HPC space, Tiffany Trader is one of the preeminent voices reporting on advanced scale computing today.

EnterpriseAI