Will Flash Penetrate Secondary Storage Environments?

When the first All-Flash Arrays (AFAs) were introduced back in 2011, many enterprises, analysts and established enterprise storage vendors felt that these types of systems would be too expensive for widespread use in the enterprise. But by 2019, AFAs were generating almost 80% of primary external storage revenues, and the revenue streams for Hybrid Flash Arrays (HFAs) and HDD-only arrays was on the decline.

The key to this success was the narrowing of the cost per gigabyte between HDDs and SSDs, driven both by the use of in-line data reduction technologies and the continuing drop in flash costs that is still strongly under way.

Latency-sensitive primary workloads provided another boost to AFA penetration as well since the much lower latency of SSDs drove other cost savings not available with HDD-based arrays – most importantly significantly increased compute utilization, which drove both the need for fewer servers and lower software licensing costs.

Can the domination of primary external storage spend by flash-based systems be repeated for less latency-sensitive tier 2 and secondary storage workloads? Some industry observers are skeptical that this could occur because the cost of HDDs used in secondary workloads is so much lower than those HDDs that prior to the advent of AFAs were used with high-performance primary workloads. The low latency requirements which significantly helped flash dominate primary storage spend just don’t exist with many secondary workloads either, making the bar that much higher.

Low latency aside, flash does offer a number of benefits though that are of interest for secondary workloads. Unlike primary workloads, where low latency and very high availability tend to be important requirements, secondary workloads tend to focus more on capacity scalability and $/GB cost. But let’s look at some of the other benefits flash brings to the table:

Higher throughput and bandwidth. The ability to move large data sets quickly can be very important in backup and disaster recovery environments, both for quickly ingesting new backups and for speeding recovery. Big data analytics workloads also often need to move large data sets, so these capabilities could be of interest there as well, particularly with analytics workloads that have some sort of time-sensitivity.
Increased infrastructure density. Many secondary workloads scale into the petabyte (PB) range, and there are many data centers that are dealing with tens or hundreds of PBs of data that must be retained over time. Today, the largest HDDs top out at around 14TB, although component suppliers are already talking about 20TB HDDs on their roadmaps.30TB SSDs have, on the other hand, already been shipping for about a year, and component suppliers have larger SSDs on their roadmaps as well. In building a 10PB infrastructure, the system that uses the largest available HDDs would be at least twice as large as the one that used the largest SSDs available. Not only does this require more floor space, but it also requires more energy and cooling capacity, factors which contribute to a higher overall TCO for HDD-based systems.While the cost savings due to energy and floor space consumption could be minimal if a system is only several hundred TBs in size, with PB scale systems those cost savings become much more significant.Increased device density drops the $/GB cost of that storage. The cost of fifteen 1TB SSDs is higher than the cost of one 15TB SSD. Unlike with HDDs where the use of multiple, smaller capacity devices was needed to hit higher performance requirements, read performance is not at all impacted in this way with flash (and writes are generally acknowledged from NVRAMs on a controller in enterprise systems rather than an SSD so there is not much need to deploy multiple smaller devices to scale write performance for most enterprise workloads).
In addition to flash pricing drops due to volume and competitive pressures, we are also seeing lower flash costs due to new media packaging approaches like triple-level cell and quad-level cell technology. As flash media gets denser, there are endurance and reliability issues that must be addressed, but so far vendors have been able to introduce software that make these denser flash media technologies “enterprise-class”.

Better reliability. As mechanical devices, HDDs have moving parts. Solid-state devices, on the other hand, have no moving parts, and as such are more reliable. RAID and erasure coding approaches ensure that device failures do not impact application service availability, but there is still the hassle factor of having to identify and replace failed devices. This issue is only compounded in larger configurations like those often used for secondary workloads.

In 2015, Western Digital introduced the first AFA targeted for tier 2 and secondary workloads, dubbed InfiniFlash, but the system did not do well due to both price and functionality limitations. In 2017, Nimble Storage (now owned by HPE), introduced what it called its Secondary Flash Array, an HFA that had been specifically optimized for secondary storage workloads, but it withdrew that product from the market in 2018.

Also in 2018, Pure Storage noticed that its FlashBlade AFA, which had originally been introduced for scale-out file system and big data analytics workloads, was being purchased often by customers as a backup appliance because of its ability to move large data sets very quickly. FlashBlade offered a much more aggressive price point than Western Digital had with its InfiniFlash system, due not only to flash price decreases but also the fact that Pure Storage used its own 52TB solid-state devices rather than off-the-shelf SSDs (allowing them to deliver a lower $/GB cost at scale).

Pure Storage sought to more directly serve the secondary storage markets by introducing a system in 2019, the FlashArray//C, that had been specifically built for and was targeted at tier 2 and secondary storage workloads. In 2019 a storage startup, VAST Data, had also introduced an all-flash platform that was specifically targeted for these types of workloads. IDC expects to see more introductions like this from vendors in the future, including some established vendors.

Will these types of products find a growth market? An AFA, regardless of whether it is targeted at primary or secondary workloads, does not necessarily have to offer the same $/GB cost as alternative HDD-based platforms. It only has to be close enough that the other flash benefits (throughput, ease of use, bandwidth, density, reliability, and lower TCO) win the day.

Just as in the primary storage markets, AFAs were initially sold for those most performance-sensitive workloads, and as costs dropped over time AFAs become sufficiently cost-effective for a broader set of workloads. IDC expects this to happen in the secondary markets. Those workloads that have some performance sensitivity (in terms of throughput or bandwidth) will migrate to these systems first, and as secondary AFA costs drop over time they will be used for a broader set of workloads. There will likely always be some cold storage environments, like deep archive, that will never move to flash, but that doesn’t mean that secondary AFAs won’t be a better value proposition for a lot of tier 2 and secondary workloads than HDD-based systems.

Learn more about the evolving All-Flash Array market shares in our latest study:

READ NOW