The Changing Approach to Building Large Scale Data Platforms

As the big data trend continues to accelerate, our approach to building ‘data platforms’, ‘strategic data lakes’ or ‘big data enablers’ (take your pick of terms) is undergoing a major change. 

Traditionally, companies would decide they needed to enhance their data capability and would create a programme to build a large-scale, technically complex platform before touching any data or creating any value. 

This meant massive upfront investment, with no proven ROI until the platform was finished, and contributed to the oft-quoted “85% of big data projects fail” statistic.

The preferred approach now focuses first and foremost on the end users, rather than the technology. Figure out how they want to use data to create value and work backwards from there. Start by building the minimum viable product needed to deliver value then iterate over time based on user feedback. 

The result is that your end users can start creating value sooner, upfront investment is lower and ROI is seen much earlier. 

There are two trends that have opened up this approach. Cloud-native services and bringing Agile to big data. 

Awakening to Cloud-Native Data

The awakening began when a plethora of native data services started to become available across the three main public clouds (AWS, GCP, Azure). Foundational services such as Storage and Compute are a given but you can also deploy advanced machine learning services almost at the click of a button. This has made building data capabilities that include advanced data processing and analytics much more accessible to your average enterprise. 

By implementing these native services, delivering a cutting-edge data platform no longer requires many months, or years, of upfront investment. Experimentation and innovation flourishes. Not only does this shorten the development process, it also avoids the long procurement and implementation cycles of on-premise Hadoop or similar.

But this isn’t only a cloud story. 

Bringing Agile to Big Data

Modern data teams are adopting more agile ways of working, shortening the cycles for introducing new features and new data sets to end users. They understand DataOps principles and the importance of building a platform with quality, compliance, scalability and operability at its heart.

“Big data is the ‘New Oil’: the black gold of the 21st century.”

The insights you can mine from your data are extremely valuable.

With good data analysis, big data can help you understand your business and your customers in a way that was previously impossible.

Read More

Why You Should Build Your Large-Scale Data Platform Backwards

Data teams are shifting their approach by prioritising the needs of the end users, engaging directly with them early on in the build process. Following this approach means that these teams will have the confidence to work directly with the users, to assist and to challenge them, ensuring that the requirements they work from will have a known business value. 

The key to this approach is working backwards from what your new platform will ultimately achieve. Data teams should begin by asking: What are you going to learn from your data? What will you do with your data? 

When a data team starts by understanding the organisations’ requirements, they can then follow this back to figure out what knowledge is required. This will, in turn, lead them on to the analytics or machine learning needed to extract that knowledge from the data and finally on to how this data will be gathered and stored. 

Crucially, it is only then that platform build costs are incurred, with quick iterations that lead to business value. This shift in approach is fundamental to the success of these enlightened data teams.

So, rather than pushing data through the platform, data teams should be thinking instead of pulling knowledge through the platform.

Remember, ‘Big Data’ is worthless but ‘Big Knowledge’ is priceless.



DevOps Insights Directly to Your Inbox!

Join thousands of your peers and subscribe to our best content, news, services and events.

Mark Pybus

Big Data Practice Director, EMEA

Mark Pybus leads the EMEA data practice. He understands the value and difficulties inherent in large scale data and uses his expertise across the big data landscape to deliver outstanding value.

More Articles by Mark