NODE - Data centralisation is critical for innovation and AI

Corporate AI has a dirty secret. Despite all the investment and the best of intentions, nearly half of enterprises report falling short of their AI ambitions.

Fivetran research shows that companies lose an average of six percent of their global annual revenue (about $406 million) by relying on ineffective, time-consuming, and expensive data processes. Despite the tremendous spend on enterprise AI, organisations are still relying on weak data foundations. The result: underperforming models built on inaccurate or low-quality data.

Poor data processes do more than hurt an organisation’s bottom line. They consume expensive engineering and analytical resources, holding them back from innovation and wasting time, talent, and opportunity. To support both thriving teams and high-performing AI models, organisations need to get data centralisation right.

The importance of data centralisation

To build practical and effective business AI, data teams need to train or augment it with their company’s own unique, proprietary data. This requires businesses to move data where it can be used to train or augment AI systems. Nearly three-quarters of organisations manage more than 500 data sources, but many struggle to centralise such a volume and variety of data. As a result, data becomes siloed, scattered, and even duplicated across multiple locations, complicating efforts to train or augment AI. Without a clear modelling layer to unify and make sense of those sources, the data often lacks consistency and context, making it difficult to use effectively in AI models or analytics.

In fact, C-suite decision-makers identify data integration as the main challenge to achieving AI readiness. There has never been a more urgent need for high-performance, best-in-class data integration that can scale with the demands of modern AI.

While advanced algorithms are important for AI, even the most sophisticated models will struggle without access to large volumes of clean, well-organised data. That data needs context. A semantic or modelling layer helps make sense of raw information by defining relationships and terminology, standardising formats, and giving AI systems a shared understanding across different sources. Without this layer, models often fail to reason effectively or distinguish between what matters and does not. AI techniques like retrieval-augmented generation (RAG) depend heavily on reliable, contextualised data. When the foundations are weak, the chances of hallucinations or inaccurate results increase.

Some organisations try to avoid centralising their data by using federated or zero-copy methods, accessing data in real time across multiple systems through multi-cloud platforms. While this can work for certain use cases, it is rarely effective for AI. These setups introduce serious issues with data integrity, performance, and governance, as they entirely lack a shared, standardised schema with consistent structures, data types, and relationships. AI needs more than just access to data. It needs data that is properly contextualised, consistent, and ready to be put to work. That level of understanding requires data to be centralised and modelled in a way that accurately communicates how the business operates. When this is done well, teams can spend less time stitching things together and more time delivering insights and building new ideas.

Centralised data requires automated data integration

The more time a data engineer can spend building scalable models and defining clear data relationships—a semantic or modelling layer—the more effective the AI will be. However, data engineers are spending 44 percent of their timebuilding, maintaining, and rebuilding data pipelines. As a result, nearly 70 percent say they don’t have enough time to do that high-value data work. They are instead stuck working on arduous, resource-intensive DIY pipelines that are difficult to scale.

The chief consequence of manual data engineering work is their inability to produce timely, actionable insights. As a result, leaders are unable to make effective data-driven decisions. In fact, 85 percent of data leaders admit their companies have lost money due to decisions based on outdated or error-prone data.

Put simply, no matter how heavily organisations may invest in AI, if their data insights are unreliable and data teams are overburdened, they will have little to show for it.

Freeing data teams to focus on AI, not infrastructure

Data engineers can’t deliver meaningful AI outcomes if they’re stuck maintaining fragile, manual data pipelines just to centralise data. Automating this work is essential, as it frees up technical teams to focus on building models, conducting analysis, and delivering insights that drive real business impact. With less time spent on plumbing and more time spent on innovation, organisations can move faster and get more value from both their data and their people.

Through automated data movement, data engineers can shift their focus from pipeline maintenance to identifying and leveraging the most useful data for AI initiatives. This enables teams to generate more accurate insights and deliver innovative data products that can improve decisions, strategies, and processes of all kinds.

For example, when Intercom, a leading AI-first customer service platform, made the switch to automated data movement from a bespoke, engineering-intensive process, it not only solved its data quality issues but also led to higher productivity. Where Intercom previously needed two or three engineers working on data integration full-time, one person can now achieve the same outcome in less than half a week. This shift has freed up the team to focus on business-critical initiatives.

The fastest path to AI impact starts with your data team

Successful organisations recognise that data engineers are already solving complex problems, but often in the wrong places. Too much of their creativity is spent maintaining brittle, engineering-intensive pipelines instead of contributing to AI initiatives. By automating data engineering tasks, companies free these teams to apply their skills toward higher-value, AI-focused problem-solving. This accelerates innovation and makes better use of their technical talent.

Most companies aren’t falling short on AI because they lack ambition. They are falling short because their data infrastructure is not built to support it. Data engineers are spending their time on the wrong problems, patching together fragile pipelines instead of enabling accurate, timely data for AI. Until that changes, AI investments will continue to miss the mark. Automating data engineering is not just a productivity improvement. It is the foundation for building AI systems that actually deliver results.

Taylor Brown

Taylor Brown is COO and Co-founder at Fivetran, a global leader in data movement, helping customers use their data to power everything from AI applications and ML models, to predictive analytics and operational workloads.