This post is part of a sponsored Pure Storage blog post series. To learn more about Pure Storage, please visit purestorage.com.
Organizations and citizens across the world are increasingly concerned about sustainability matters. Two drivers for this are the visible impact of climate change and its devastating consequences, and the steep increase of energy prices in some regions of the world, caused by an extremely volatile geopolitical situation.
The sustainability of organizations is usually evaluated in the environmental area of Environmental, Social, and Corporate Governance (ESG) reports. And increasingly, sustainability is taking a larger role in how organizations evaluate their IT infrastructure. Due to the vastness of this topic, this article will focus on the sustainability aspect of storage systems, with a particular emphasis on measuring storage efficiency.
In the context of making IT more sustainable, organizations seek to achieve multiple objectives: reduce overall power consumption (combining power draw of infrastructure systems and adjacent power draw, for example with air conditioning systems), reduce greenhouse gases (GHG) emissions, increase the use of renewable energies, and reduce e-waste.
IT infrastructure systems such as servers, storage appliances, network switches and so on are usually rated across two metrics: energy consumption (in kilowatts / hour, abbreviated kWh) and cooling requirements (expressed in British Thermal Units, or BTU). Those two units express a finite rating that, while helpful to understand consumption and cooling requirements, does not provide any insights about efficiency.
To measure storage efficiency accurately, the power consumption must be related to capacity and performance. When the storage platform is capacity-oriented (for instance, for backup or “cold” data storage), the W/TB ratio could be used, indicating how many watts are needed to power one terabyte of capacity. Taking an arbitrary example, a solution rated 1 W/TB would be more efficient than one rated 5 W/TB, as it would only take 1 watt to power a terabyte of capacity instead of 5 watts.
For performance-oriented storage systems (such as for critical applications) energy consumption must be balanced with the overall performance of the system and the business outcomes achieved. At a similar watt input, a solution delivering a higher count of IOPS would be more efficient, but it is a starting point since performance can be more difficult to quantify than capacity.
These are starting points that do not consider specifics such as storage efficiency mechanisms (raw vs. effective capacity) or workload I/O profiles (random vs. sequential IOPS, read vs. write operations), but benchmarks can be developed by industry boards to standardize measurements.
Architecture Impact on Energy Consumption
Storage efficiency would be simple to determine if all systems used the same components, or at least a broad set of similar components. If that could have been true a decade or two ago, the situation is vastly different today. Two main categories impact a solution’s energy efficiency: its hardware design (the sum of all components), and its software (the operating system and all the data efficiency algorithms).
An overwhelming majority of storage solutions rely on the use of x86 processors at the heart of their architecture. These versatile processors are excellent “jack-of-all-trades” but come with trade-offs: they are not inherently optimized for storage or data reduction activities. They are also power-hungry with heat dissipation and cooling requirements as byproducts.
Power-efficient x86 CPUs exist but come with performance drawbacks that make them ill-suited for operation at scale and mission-critical workloads. Parsimonious usage of CPU power requires the development of powerful data reduction algorithms, which will be covered below.
Architectures must balance capacity, performance, durability, power draw and cost of each media type to design the most appropriate solution. Always superior to hard disk drives in terms of performance, flash media has recently managed to reach pricing parity with HDDs from a with QLC flash.
Even if flash is better from an energy efficiency and thermal footprint, there can be a significant difference between storage architectures. Some are essentially hard disk-based designs that were retrofitted to use commodity off-the-shelf flash drives. Other storage architectures have been built and optimized for flash.
Seemingly minor inefficiencies in the former approach start adding up to create multiple issues: for example, the flash translation layer (FTL) which handles data placement on SSDs is managed on each disk, creating overhead, making data placement less efficient, and reducing capacity usage (and endurance for capacity-oriented media such as QLC). In contrast, storage architectures that were built with proprietary flash modules eliminate this FTL bottleneck and deliver greater density and increase flash durability. While this may not be challenging for small-scale deployments, it becomes a major hurdle when operating at scale.
Finally, it’s worth mentioning the impact of data efficiency algorithms. These can be combined with data placement on the flash media and include other techniques such as compression and deduplication. While data reduction algorithms consume CPU power, they also significantly reduce capacity usage, thus allowing more data to be stored in the same physical footprint. This has positive implications on energy draw and environmental factors such as heat generation and cooling requirements, not to mention the physical space required.
Workload Capabilities and Impact on Energy Efficiency
How data is being used also plays a role in a system’s efficiency: different workloads dictate different performance, capacity, and cost requirements. The ability to propose different storage profiles (performance vs. capacity) and the ability to intelligently move data across them (even to object or cold storage, on-prem or in clouds) contributes to more sustainable outcomes.
Previously described architectural considerations have a trickling down impact on what is referenced to as “environmental” aspects, taken in the sense of the environment where the infrastructure is hosted.
Capacity-dense systems using optimized flash media in a compact form factor will objectively achieve better results compared to rack-scale, HDD-based systems. Less heating and less rack space reduce cooling needs as well as power consumption, whereas racks and racks of HDD-based systems will draw significantly more energy and generate more heat.
But as noted above, there can be large differences between all-flash architectures as well.While these effects can be moderate or irrelevant for small deployments, they become game-changing at data center scale, turning into a key decision factor.
A last aspect to be considered is greenhouse gases (GHG) emissions, and how to mitigate them. Quantifying GHG emissions can be a tricky endeavor, the most used methodology follows the US-based Environment Protection Agency (EPA) GHG scope, which defines three scopes:
- Scope 1 – direct GHG emissions that occur from sources that are controlled or owned by an organization
- Scope 2 – indirect GHG emissions associated with the purchase of electricity, steam, heat, or cooling
- Scope 3 – GHG emissions are the result of activities from assets not owned or controlled by the reporting organization, but that the organization indirectly impacts in its value chain (including e-waste)
Organizations can noticeably reduce their GHG emissions by focusing on Scopes 1 and 2. Scope 2 is where the IT and storage infrastructure efficiency as described above can lead to significant GHG emissions reduction. This includes power savings at scale, both direct savings from using a more efficient and dense storage solution, and indirect savings thanks to the reduction of associated environmental costs such as cooling.
Organizations can further reduce Scope 2 GHG emissions by selecting greener energy sources, including renewables. Also worth exploring are data center locations: for example, data centers located in countries with cooler climates and broad availability of renewable energy sources (such as hydroelectricity) will require less energy for cooling and the compound GHG emissions will be lower.
Storage efficiency is dictated by multiple variables: an absolute imperative is ensuring that performance and capacity requirements are met. For performance-driven workloads, power consumption should be relative to overall performance; for capacity-oriented workloads, Watt per TB should be used.
These metrics provide a fair baseline for evaluation, but other criteria need to be assessed as well: the density of the storage solution can play an important role on the solution efficiency, due to reduced environmental requirements (rack space, power draw, cooling). Optimized architectures built on modern flash media, with efficient data reduction mechanisms will deliver more value.
Storage efficiency is a nascent requirement in organizations but is expected to become a priority with CIOs and decision makers as the impact of climate change is heavily felt in our daily lives across the globe. In addition, exploding utility costs are forcing organizations to prioritize efficiency to address the steep increase in fixed costs. With an understanding of the basics involved, it is possible for organizations to make impactful choices about their storage infrastructure.