Advertisement

Deep Learning: When Data Gets Smart

By on
Read more about author Frank Mueller.

Deep learning has advanced data storage to the point where, as more data hits it, the storage platform gets smarter and faster over time. This represents a smarter approach to help enterprises and managed service providers implement a sustainable process for their business amid the exponential growth of data and the evolving challenges for data infrastructure. 

Throwing “faster and better” hardware at these challenges is not always the answer. This has been, at times in the past, a hard lesson for the storage industry to embrace, but more intelligent software, anchored in deep learning, can make a significant difference.

Deep learning, which is the most sophisticated subset of machine learning under the broader umbrella of artificial intelligence, enables the storage system to take an actual, software-based course of action. It learns over time and continuously adapts to specific data patterns and user behavior.

To enable this behavior, a trie structure is recommended because it has proven to be superior and extremely performant in multi-petabyte environments. It is different from classical structures, such as a hash table. All inserts, modifications, and deletions from the trie operate at the same latency, providing consistent performance from the first bytes of data to multiple petabytes.

For the sake of comparison, a person who enters a search term into the Google search engine and sees a variety of search-related suggestions is experiencing a trie structure in action. Applying this structure to enterprise-class storage has been enlightening.

The outcome in storage is that the user can expect the fastest possible response times. It is done by analyzing the incoming data and their query patterns and ensuring that the relevant data and potentially additional data are always available in DRAM, which is the fastest area of memory.

Deep learning is requisite to fuel and guide the continuous learning process, which makes a “set-it-and-forget-it” approach possible in enterprise storage. You don’t have to think about the storage infrastructure. Let the software-defined storage solution do the thinking for you.

Deep learning capabilities in a storage system are designed to consider and evaluate multiple possibilities in real time. Like a human being, it is learning from the “guesses” – in other words, the anticipated actions in predictive mode.

It’s similar to when a person guesses that a customer wants something before they even ask because there is a history of doing it in the past based on the customer’s preferences. Deep learning emulates human intelligence in the same situations if there is access to the data. 

This way of thinking about storage capabilities is different, and it’s not something just promised for the future. It actually exists today, and it has gone farther than just AI-enabled capabilities (recognizing requests that it has seen before) or even machine learning. Storage continues to evolve and gets smarter over time.

New tools have been developed to intelligently pre-stage data from low-cost, high-capacity persistent back-end storage into ultra-fast front-end cache. It enables the delivery of data to users and applications at lower latency for the distinct advantage of higher speed.

The shift in organizations from time-critical to real-time business requirements has created gaps that deep learning can fill. IT teams have to deal with the higher expectation for 100% availability – no longer just 99.99999%.

Because of the use of DRAM and deep learning, a next-generation storage platform with a triple-redundant architecture can deliver 100% availability. The intelligence that is built into the data infrastructure enables it, and the software overcomes the unpredictability of potential hardware failures, which is an age-old risk that comes with any hardware.

On top of it are the more stringent requirements for higher performance and lower latency. Deep learning is interwoven into the architecture that can deliver on these requirements. Whether the CIO wants SSD or HDD, it does not really matter because the deep learning-powered software is above the medium on which it runs, as if being hardware-agnostic. This opens up significant new possibilities for enterprise organizations to recalculate how to address data storage needs.

IT leaders are wise to look at a storage platform that has sophisticated algorithms, metadata management structures, and next-generation storage features. All components are implemented in software, from the RAID to the clustered services, to allow constant optimization.

Unlike traditional storage arrays, which aim to place the most active data (aka “hot data”) in flash cache to achieve performance parity with AFAs, a better approach with deep learning capabilities is to place all of the hot data in DRAM. Reads are done at DRAM speed, which is faster than flash. It’s like having an “All-DRAM-array” experience with built-in intelligence that AFA simply does not have without the right architecture to support SSD and HDD. 

But SSD still has a role to play. In the real world, a thick SSD flash layer can serve as a cushion for DRAM misses. As the “deep learning brain” learns the I/O patterns and optimizes data placement, the flash layer changes its function from handling DRAM misses to handling changes in I/O patterns, which the algorithm may not be able to predict. There may be periodic audits that require that the data not be in DRAM. The deep learning algorithm can detect it, learn from it, and make smarter decisions about it.

A good time to start incorporating deep learning capabilities into your data infrastructure is when you start consolidating multiple storage arrays into a single platform, which should be able to scale to a multi-petabyte level. Then you’ll really see what deep learning can do.

Leave a Reply