NODE - Unstructured data and cloud object storage: a perfect match

Data Intelligence

Whether it’s a patient anxiously awaiting their X-ray, a millennial gaming on Twitch, or law enforcement scanning surveillance videos to catch a thief – the future of data is unstructured.

80% of all data in the world is unstructured. And it’s growing. IDC estimates an additional 28.7% compound annual growth rate (through 2025) with the expanding data flow from the Internet of Things (IoT).

The challenges also compound. To meet the voracious appetite of data consumption, IT professionals are expected to serve this data to their customers at hyper-scale speed while keeping data secure and costs low. Traditional block storage can’t keep pace and public cloud storage can cause security concerns. On-premises flash storage can be fast but it’s expensive and how do you cost-optimise storage of that older data?

Paul Speciale, Chief Product Officer, Scality

And then think of the scale. At the beginning of the millennium, a sizable database could comprise 100 gigabytes of storage, and the idea of managing a terabyte of data was almost unheard of. Now, a single full body CT scan is 40GB of file data. Multiply that by thousands of patients and then again by several years, and we are moving towards zetabytes of data. Legacy block storage arrays, and even file systems were simply not designed for this level of scale.

Enter cloud object storage

Over a decade ago, the “hyperscaler” cloud vendors (AWS, Azure, Google) developed cloud storage services based on the object model to address the scalability requirements, geographic reach and economics of vast unstructured data storage. This was soon followed by storage software vendors who created on-prem solutions based on similar technology principles. The key features of these object storage solutions included:

A flat namespace: this ensures the ability to scale beyond the hierarchical directory structure of file systems, by maintaining a much simpler and more scalable namespace of keys (object identifiers) that map to values (objects, representing the actual data payloads).
RESTful APIs: instead of stateful (session-based) file system protocols such as SMB and NFS, the cloud model demands protocols that are stateless “request/response” based, and work with the language and transport mechanism of the internet: HTTP. This means cloud object storage is effective at internet scale, with higher latencies, and for services that are much more distributed than was the case for legacy (block or file) based applications.
Rich metadata: the ability to “tag” data (objects) with additional attributes that describe the object data. This extends the value and semantics of the data beyond the simple attributes captured in a file system (for example, file size, owner, permissions).

While object storage started as an ideal storage solution for active archive applications, and less frequently accessed data, it has now evolved well beyond that. Today’s object storage solutions provide scale-out performance that is very well suited for large media content delivery, online cloud services with thousands of simultaneous access requests, and also for big data analytics applications. The use of flash media will become widely embraced by vendors to further expand the performance envelope of object storage, and increase the types of applications that can leverage it.

For many years, a major barrier to the widespread acceptance of cloud object storage was the lack of a standard or default RESTful API, much like NFS and SMB became the de facto protocols for file-based network storage systems. Several competing object protocol standards emerged, but today most Independent Software Vendors (ISVs) have embraced the AWS S3 (Simple Storage Service) API as their default API for accessing cloud-based object storage, as well as on-prem object storage solutions. This has to a large extent simplified enterprise application adoption of cloud object storage.

Implementing cloud object storage and planning for the future

When organisations take a fresh look at their data management strategy, the flexibility, scalability and ease of management that comes with cloud object storage make it an attractive solution for most of the unstructured data an organisation holds. The first step is to carefully assess your long-term data retention and access needs, including how your data storage and data use requirements will change as your business evolves to meet new and emerging demands. For example, considerable capacity will be required for IoT data.

Other considerations may include how you build out your presence across clouds and manage multi-cloud environments, and your edge strategy. The increase in core enterprise (private) and public clouds and the emergence of edge computing from billions of devices are creating important new data management challenges. Exabytes of data will be generated and consumed on the edge, with dedicated local cloud infrastructures deployed to serve large communities of edge users and devices such as remote and branch offices, sports arenas, hospitals and more. This resulting data deluge will require proven solutions to store, govern and orchestrate significant volumes of data in the core and at the edge.

Ultimately, it comes down to data access: putting data where it can be best leveraged, allowing access to it by those who should have it, and barring those who shouldn’t. The priorities in today’s data economy revolve around data security and multi-tenancy, agility, cloud-native and legacy apps support. These all share equal importance as organisations are deploying a mix of traditional core and edge data centres; local private cloud stacks such as AWS Outpost, Azure Stack, and Google GKE; and public cloud. When exploring cloud object storage, look for a provider that shares your priorities and can partner with you as your organisation moves forward.