Healthcare and the evolution to DataOps

The healthcare industry is transforming quickly along with the evolution of technology across society. Technology breakthroughs such as IoT, advanced imaging, genomics mapping, artificial intelligence and machine learning are some of the key items re-shaping healthcare and promise a revolution in the delivery of patient outcomes and patient care. Underlying these shifts, as in so much else being changed by technology, is the data revolution and how organisations are adopting modern data technologies to support bold new use cases.

Just as in the wider economy, healthcare organisations, particularly the larger ones, have begun to understand that they must be nimble and adapt to keep delivering cutting edge healthcare services as both technology achievement and patient (or consumer) expectations rise. A significant big data stack, including Kafka, Impala and Spark Streaming deployments, among others would be a standard, with a focus on the applications, developer needs and, ultimately, business value to the healthcare provider in better meeting physician requirements and patient needs.

At this stage in market maturity many large organisations will have built innovative data applications on top of growing data pipelines, providing useful new services and insights for their customers and employees. During this process most also come to see that it’s extremely difficult to manually troubleshoot and manage such data applications. Forward-thinking organisations with a developer-focused culture can find that they are sinking huge chunks of their time fixing and diagnosing failing applications, taking their focus away from creating new apps to drive core business value. In this industry, that’s unacceptable – the programmers should be building applications that usher in the next generation of healthcare.

Impala and Spark Streaming are two big data technologies that developers increasingly employ to support next-generation use cases. These two technologies are commonly used to build applications that leverage streaming data, which is prevalent in the healthcare sector. Unfortunately, both Impala and Spark Streaming are difficult to manage. Applications built with these two often experience frequent slowdowns and intermittent crashes. Spark Streaming, in particular, can be very hard to even monitor.

When key data applications are not performing as expected and programmers are wasting time trying to troubleshoot them, an application performance management (APM) solution can provide new insights into aspects of data applications where previously there was no visibility combined with known low application performance.

Tame Impala

Impala is a distributed analytic SQL engine running on the Hadoop platform. Understanding Impala’s memory consumption, discovering detail for queries using drill-down functionality, or coming up with recommendations on how to make queries run faster or use data across nodes more efficiently, are all challenges to the DataOps (or DevOps) team.

Modern data applications are fuelling healthcare’s technical transformation

The most efficient situation would be to use an APM to analyse the query pattern (insert, select, data, data locality across the Hadoop cluster) against Impala and have an AI system offer a few key insights based on the architecture analysis. You might discover, say, that most of the time was spent scanning Hadoop’s file system across nodes and combining the results. After computing stats on the underlying table – a simple operation – you may be able to improve performance by 12-18x.

Light a spark

Spark Streaming is a lightning quick processing and analytics engine that’s perfect for handling enormous quantities of streaming data. As with Impala, the DataOps team would generally find their lives improved with insights and recommendations to alleviate the headaches common with using the technology. The platform might, for example, say that it doesn’t have the memory for many Spark Streaming jobs, which may be causing slowdowns and crashes. With specific recommendations on how to re-configure Spark Streaming, a process that’s typically complicated and replete with costly missteps, the team would be far more efficient. In addition, you might be able to save significant CPU resources by sending parallel tasks to cores.

The aim of the game

Ultimately the DataOps team would like to gain recommendations and configuration changes that don’t require any alterations to coding alongside improvements to application performance that yield an immediate boost to critical business applications. And those would be insights and recommendations for the entire big data deployment, eliminating the need to manage any data pipeline with a siloed tool.

Modern data applications are fuelling healthcare’s technical transformation. By improving data application performance, organisations are better able to deliver a pioneering healthcare experience, achieving better patient outcomes, new services and greater business value. Without an application performance management solution to offer DataOps teams a helping hand, developers and IT teams would be bogged down troubleshooting rather than creating new applications and optimising the business. The IT solution is one part of it, but it should also just be part of a deep cultural shift to do more with existing data and evolve to a DataOps mindset.


Shivnath Babu

Co-founder & CTO at Unravel Data Systems


Scroll to Top