Part I of the series “Laying the foundation of a data-driven enterprise with Hadoop” can be accessed here.
Hadoop can be used with any type of data, whether it is batch interactive or real-time. It can be applied to any data whether it is a traditional system or coming from the internet of things (IoT) and be deployable anywhere whether it is on-premise, cloud, appliances, Linux, and Windows. A pretty important thing is that it enables a consistent experience for bringing that data together in a way that is interoperable with tools you already have.
At the center of the platform is the technology we call YARN. We view it as a data operating system just like Windows with power of multitasking applications that run on top of it like Microsoft Office or Adobe Photoshop, so YARN is that sort of an operating platform for Hadoop that enables a wide range of data processing engines, open source as well as from partners such as Microsoft’s HDInsights, Talend and others that run natively on the platform to get benefits of the scalability.
Hadoop: A modern platform
For a modern data platform you need operations, security and governance so these capabilities are built in the platform. This way it is easy to manage, monitor and provision, on-premise or in the cloud, and manage high availability. It should also help manage the lifecycle of the platform as well as the workloads that are running on the platform and get active alerts when you need to do parent feeding for workloads.
Data governance is clearly important to be able to manage data to add its life cycle or understand the linear algebra data. Hadoop is not different than any data system you have in your enterprise. Most of them participate in the data governance.
Top use case: The Single View of X
In the top middle, probably the top 2 use cases we see are the single view use cases. It is the single view of customer, single view of product, single view of the supply chain, and a single view of patients.
Being able to collect disparate data arguably from silo data sets and bings them together where you can join them in a way that you haven’t been able to before is a very big use case to drive additional revenue or better care.
The world of fast data, data in motion, as well as the rich historical data, deep historical machine learning and data modeling really underpins the predictive analytics. In many cases, you will see businesses transforming themselves with predictive analytics applications, so that’s the landscape of journey where folks will pick one and move to others in their journey.
Build on top of simple use cases OR use them as reference points
Single view use case
To give you an example, lets have a look at Mercy Corps. Since, most of us are patients at a given point, in our birth or in our life-cycle, and they are really about delivering transformational outcomes at scale when it comes to patients. They have been onto their journey of 1 Patient/1 record, clearly a single view use case. They have a million patients that they deal with across hospitals and clinics, basically, they bring in data from electronic EHRs called EPIC. They bring it into Hadoop where they can begin to join it, aggregate it with other data sources, around the patient, or the clinical care lab, they can also onboard the real-time patient sensor data so they can perform a better analysis of the patient to deliver better care. They have internal systems data, and third-party data sets so they are able to bring data into a central location, to be able to provide that single view of the patient.
Beyond single view of a patient, another use case is bringing the free-text lab notes online that historically Mercy Corps was never able to search across. So, they went from the ability to never discover insights to a matter of seconds. So, if we look towards their Mercy Corps towards becoming data driven, it started from lower hexagon that is cost savings use cases, and they went above the line into the transformational business outcome use case.
They are continuing their journey with things like vital signs monitoring, preventive care, medical decision support or device data ingest and they are working on lab notes archive, operational efficiencies, and a single view from a doctor’s perspective.
The series is based on Hortonworks webinar titled “Laying the foundation for a Data-Driven Enterprise with Hadoop”. It can be accessed here.