Tackling AI Data Infrastructure Energy Challenges with Solidigm

This research note covers Enterprise, on-premises AI deployments. The availability of cloud resources, ARM alternatives, TPUs and other cloud-based solutions is acknowledged but out of scope.

One of the key takeaways of AI Data Infrastructure Field Day 1 (AIDIFD1) was undoubtedly the topic of energy efficiency in AI. Solidigm provided valuable insights on the topic, making it a good opportunity for TECHunplugged to reiterate on this topic, then to cover Solidigm’s approach.

Why is that important? In 2023, a research paper mentioned that computing power required by AI was doubling every 100 days. Another 2023 white paper from Schneider Electric mentions that AI represented at the time an estimate global power demand of 4.5 GW (enough to power three flux capacitors), with projections of 14 to 18.7 GW by 2028.

Another interesting takeaway from this white paper is the AI workload use case repartition and distribution: in 2023, workloads are split between 20% (model) training and 80% inference, while in 2028 the company estimates 15% training vs. 85% inference. Furthermore, distribution in 2023 was estimated at 95% core data center vs. 5% edge, while the 2028 projection forecasts 50% core DC vs. 50% edge.

Finally, another study provides further detail on AI power consumption with footprint estimates for major AI players, with the interesting twist that it refers to TWh (Terawatt hour), a more common way of measuring consumption (vs Schneider Electric study using GW to measure demand), but we disgress.

AI Processing Stages

Understanding AI processing stages gives not only an end-to-end overview of the various steps in an AI workflow; it also helps understand whether the organization needs to implement the entire flow, or if it only needs to use a subset of AI processing capabilities.

Figure 1 – Data is Everywhere in the AI Pipeline – Source: Solidigm – Note the bottom elements of the diagram showing capacity ranges during each stage of data processing (first line), workload characteristics (second line) and data locality (third line).

The diagram above (Fig. 1) provides a full view of an AI processing flow, including in the middle the sub-steps relevant to model development, and breaks down as follows:

Data Ingest: data is collected from various sources, transferred, and aggregated into a central repository for further usage
Data Preparation: data ingested in step 1 is cleaned (errors, duplicated) and normalized / structured in a standardized format. This step is essential to provide quality data to the model.
Model Development: This step is relevant for industries with specific requirements, where custom models are needed and / or where sensitive data must be handled (healthcare, finance, etc.). Key drivers for model development may include highly specialized tasks, proprietary data usage, control concerns over model behavior, and performance requirements. Some organizations can decide to use generic models instead and skip this step.
Inference: During this step, the trained model is applied to new, unseen data, to generate and output. Does not improve the original model data.
RAG (Retrieval-Augmented Generation): Helps improve model responses and factual accuracy by providing access to external data sources.
Fine-Tuning: This step adapts a pre-trained model to perform specific tasks or work with specific data sets, to improve output relevance and performance.
Archive: If necessary, datasets, models, and checkpoints may be archived for future use, reproducibility, or to comply with regulatory requirements.

Energy Consumption Considerations

The table below helps understand whether AI processing stages are CPU-bound or GPU-bound. Customization options can significantly impact the target AI infrastructure, its design, and energy consumption.

Step	CPU-bound	GPU-bound	Comment
Data Ingest	Yes	No	Primarily storage I/O & CPU-intensive.
Data Preparation	Yes	No	Data cleanup and normalization relies on CPU operations.
Model Development	No	Yes	Primarily GPU-bound and GPU-intensive.
Inference	No	Yes	GPU compute required for real-time & low-latency apps. CPU usable for small models or cost-sensitive use cases.
RAG	Yes	Yes	Combines CPU for data retrieval and GPU for response generation.
Fine-tuning	No	Yes	GPU required for reasonable timeframes.
Archive	Yes	No	Primarily storage I/O, no performance requirements.

Table 1 – Workload compute characteristics – Source: TECHunplugged

Looking at the table above, we can understand how our own AI workflow implementation will impact our infrastructure design and power consumption from a CPU, GPU, and storage perspective. For the sake of brevity, we will skip networking and other data center optimizations.

CPU Perspective

From a CPU perspective, x86 architectures currently dominate most AI / ML deployments (especially in enterprises) so organizations are typically offered the classical choice between Intel and AMD processors. Intel Xeon CPUs generally offer better extensions (DL Boost, AVX-512, etc.), however AMD EPYC CPUs usually provide better energy efficiency, which can be significant at scale.

Regardless, most of the compute-intensive activities are offloaded to GPUs, making the CPU choice a secondary consideration when it comes to energy efficiency, with the notable exception of smaller models optimized for CPU usage (in which case overall energy efficiency is considerably less relevant).

GPU Perspective

If it wasn’t clear by now, GPUs are the workhorse for the overwhelming majority of AI workloads and consume insane amounts of power. This regularly sparks articles about the massive power usage coming from AI, limits of the power grid, necessity to switch to new energy generation methods, and so on.

The broad majority of GPUs are currently produced by NVIDIA and are available in various generations and performance configurations. Other players are trying to shake the market: for example, AMD with their Instinct MI300 Series, and Intel with their Data Center GPU Max and Flex series.

When selecting GPUs, organizations will have to balance business requirements & objectives with the harsh realities of the field:

Technical requirements: Do use cases focus on Inference, RAG, Fine-Tuning or a combination? Will the organization build and train its own model, or will it rely on existing ones?
Environmental constraints: datacenter total power capacity as well as max power draw per rack will impact server configuration, including GPU models and GPU counts.
Costs: even the most ambitious, unicorn turn-GPU-into-gold AI project can be notably downsized once it goes through the ruthless and unforgetting Tunnel of Revised Expectations™(a.k.a. financial project review).

Storage Perspective

The storage perspective focuses on the AI infrastructure stack, which typically consists of a huge data lake backed by object storage (see Step 1 – Data Ingest in the previous sections).

Although all-flash deployments are becoming more commonplace, most architectures are based on hybrid deployments. These include massive capacity HDD-based JBODs fronted by an all-flash tier (usually TLC 3D NAND flash) that acts as a cache or tiering mechanism (more on flash later).

The view below (Fig. 2) presents workload characteristics from a storage perspective (sequential / random reads vs. sequential / random writes), while taking also into account involved storage tiers.

*Figure 2 – Data Movement in an AI Cluster – Source: Solidigm*

Storage has a significant impact from power consumption standpoint:

According to Meta, HDDs use up to 35% of available power in the Meta AI Recommendation Engine
Another study by Microsoft and Carnegie Mellon University states that storage accounts for 33% of operational energy consumption in Azure general-purpose cloud

Furthermore, storage density shouldn’t be overlooked either. In addition to being power-hungry, HDD density remains stagnant, with a max raw capacity currently around ~ 32 TB per unit (and typical deployments ~ 24 TB/HDD).

In contrast, QLC-based SSDs are now available in the ~60 TB range, providing superior capacity, better storage density and power efficiency at a similar $/GB price point.

Solidigm’s Approach: A Case Study

Solidigm is a leading innovator in flash storage, known for its high-performance, energy-efficient data center SSDs, available in a variety of form factors.

During AIDIFD1, the company showcased its latest D5-P5336 NVMe QLC SSD, and demonstrated how QLC-based AI storage infrastructure significantly reduces power and rack space consumption, freeing up both physical space and power for additional GPUs and storage.

In the example below, a typical NAS solution (based on 7.68TB TLC SSDs and 24TB HDDs) requires roughly 4 racks and consumes ~ 32,000 W. In contrast, an all-QLC NAS based on the Solidigm 61.44TB D5-P5336 SSD) requires half a rack and consumes 6,900 W of power. These figures clearly demonstrate the impact of QLC flash.

*Figure 3 – QLC power efficiency compared to hybrid NAS (TLC flash + HDD JBOD) – Source: Solidigm*

Not only does Solidigm QLC flash beats hybrid storage from a power efficiency and density perspective, but it also manages to provide noticeable improvement over TLC flash, as can be seen in the diagram below.

Even in that case, the power reduction and density improvement allow to free up over 10% of power capacity that can be dedicated to compute cluster growth.

*Figure 4 – QLC power efficiency over TLC flash – Source: Solidigm*

Finally, we mentioned earlier that different workloads have different storage I/O characteristics. In that case, organizations can tailor their storage needs by selecting the most appropriate SSD media, depending on which stages they are using in their AI workflow.

The figure below outlines these different steps and storage I/O patterns, matched with relevant Solidigm portfolio solutions.

*Figure 5 – Solidigm SSD Portfolio – Source: Solidigm*

Conclusion

Although each organization’s AI business use cases and implementations will differ, energy requirements will primarily derive from GPU, Storage, and CPU consumption.

Since GPU is the driving force behind all AI workloads, energy optimization on the GPU side can only come from two sides: either heavily optimizing the models, or GPU vendors significantly improving the energy efficiency of their GPUs. Neither of these changes is likely to happen in the immediate future, leaving therefore optimizations to be sought elsewhere.

As we noted above, storage is a significant area of improvement. TECHunplugged has long been a firm believer in QLC 3D NAND capabilities, both from a capacity and energy efficiency perspective. The capacity doubling that we observed further improves storage density, which in turn improves overall energy efficiency. As QLC flash capacity grows, density and efficiency will improve as well, freeing up even more power to feed power-hungry GPUs.

Innovation is fast-paced in the AI world: we can expect more energy-efficient CPUs, new dedicated processing units, more efficient GPU cards (or at least some competition), but when it comes to power efficiency (and assuming the organization has already optimized its AI workflow and usage), the low hanging fruit seems to remain optimizing the storage part. In that context, Solidigm demonstrated practical expertise, measurable outcomes, and significant improvements to be achieved with their 61.44TB D5-P5336 NVMe QLC SSD.

Additional Resources

Overall, Solidigm presented in four different segments, available on this page. We are also sharing the link to the individual videos below: