Your Data Isn't Ready. Neither Is Anyone Else's.

AI Strategy & Insight | Brooklyn Solutions | Part 3 of 5

The AI data readiness myth that is stopping procurement teams from starting, and the minimum viable standard that lets you begin anyway.

Nick Francis · Co-Founder & CTO, Brooklyn Solutions · 2025/26

Part 3 of 5. Previous: Why Procurement Transformations Keep Failing →

74% of procurement leaders say their data isn’t AI-ready. The instinct is to treat that as a reason to wait. It isn’t. It’s a reason to understand what ‘ready’ actually means, and why the platform you’re about to buy is the most effective tool for getting there.

The Statement That Stops Programmes Before They Start

There’s a conversation that happens, with remarkable consistency, early in almost every procurement transformation discussion. It goes something like this: “We’d love to adopt AI-powered supplier management, but our data is such a mess that we wouldn’t get the benefit. We’d need to clean it up first.”

It’s understandable. It’s also, in most cases, the wrong conclusion drawn from the right observation.

Yes, your data is probably in a poor state. According to Deloitte’s research, 74% of procurement leaders say the same thing [1]. MIT’s 2025 Project NANDA study found that 95% of organisations deploying generative AI saw zero measurable return. In the vast majority of cases, the failure was not the AI model. It was data readiness, workflow integration, and the absence of a defined outcome [2]. Gartner predicts that 60% of AI projects lacking AI-ready data will be abandoned before they deliver value [3].

So the concern is legitimate. But “my data is a mess” has become, for many organisations, a reason to not start. And that is a mistake, because the platform you are about to implement is almost certainly the most effective tool available for cleaning it.

The Chicken and the Egg: Reframed

Here’s the trap: if you try to clean your data before moving to a platform, you will end up building significant parts of that platform yourself. The structured database, the data validation rules, the deduplication logic, the taxonomy standards: these are exactly what a well-designed supplier management or procurement platform provides as part of its architecture. Attempting to replicate them in spreadsheets or bespoke scripts before onboarding is expensive, time-consuming, and largely redundant.

Any good platform, designed properly, should accommodate any starting level of data maturity and begin improving it from day one. The structured onboarding process, the data model, the validation logic at point of entry: these are the mechanisms that progressively clean your data as you use the system, apply your policies, and embed your governance. The system does not just store your data. It disciplines it.

This is a fundamentally different framing from “clean before you start.” It’s “start, and let the system help you clean.” The condition is not perfect data. It’s the right policies, the right process design, and the right governance framework, applied consistently from the point of onboarding.

Start With Your Cleanest Data, Not Your Most Complete

“The question is not ‘is my data good enough to start?’ It’s ‘where is my data good enough to start first?’ Those are very different questions — and they lead to very different outcomes.”

The practical starting point is not to attempt a wholesale data remediation programme. It’s to identify where your cleanest, most reliable data already exists, and begin there.

In supplier management, this almost always means your most critical and strategically important suppliers. These are the vendors your team manages actively, the ones you have contracts with, the ones you have onboarded through proper due diligence. Your data on these suppliers is already more structured, more complete, and more trustworthy than your data on the long tail.

This is where you wrap your governance standards first. This is where you run your first AI-assisted processes. This is where you prove the value. And then, using supplier segmentation as a framework, you progressively extend that standard to the next tier of suppliers, based on their criticality to your operation.

The same MVP principle applies here at a data level. Brooklyn VMS supports an unlimited number of custom fields, and we have seen customers configure hundreds, even thousands, of data points per supplier. But that level of completeness is not a precondition for starting. The practical approach is to work out, at a minimum, what you need to capture per supplier and what you need to capture per contract, and configure the platform accordingly. Brooklyn’s compliance rules allow you to enforce those minimum field requirements at the point of data entry, flagging missing or incomplete data before it embeds itself in your records, so that the standard you set is the standard the system maintains.

What AI-Ready Data Actually Means, and Why You Do Not Need Perfection

“AI does not need perfect data. It needs representative data — enough examples of what good looks like that it can begin to recognise the pattern.”

There’s a widespread misconception that AI requires pristine, complete, fully-normalised data to function. It does not. What it requires is data that is representative of the outcomes you are trying to produce, structured well enough that it reflects what good looks like, even if only in a limited number of examples.

As discussed in Part 1 of this series, if you have a well-designed process, with defined inputs, defined outputs, and clear success criteria at each stage, and you’ve run that process end-to-end even three or four times, you already have something valuable: a baseline. You have documentation of how the process should run. You have examples of what good output looks like. You have, in effect, a training signal.

That is your minimum viable data standard. Not perfect coverage. Not complete historical records. A small set of well-structured, well-understood examples that tell the AI what it’s aiming for. From that foundation, every subsequent run improves the signal. Every process cycle adds to the evidence base.

Identifying your minimum viable data standard means defining the minimum amount of data you need per artefact, per supplier record, per contract record, per process instance, and then enforcing it. In practice, this means agreeing on consistent categorisation, a shared taxonomy, normalised naming conventions, and common terminology before data enters the system. Once that standard is set, two things become possible. First, the platform can enforce it through compliance rules, so data quality improves continuously rather than degrading over time. Second, AI can actively help you close the gaps, identifying where the minimum viable standard is not being met, suggesting values based on comparable records, and flagging anomalies that a manual review would miss. The minimum viable standard is not just a quality threshold. It is the brief you give the AI.

How to Actually Clean Your Data: Three Practical Levers

Design your processes to clean as they run

Process design is the first lever, and it’s underused. A well-designed onboarding or supplier data process, built with structured fields, validated inputs, and mandatory data standards, does not just collect data. It enforces quality at the point of entry. Every supplier that passes through a properly designed onboarding workflow arrives in your system with a baseline level of data completeness and accuracy. Over time, this compounds.

This is why the process design work described in Part 1 and Part 2 of this series is not separate from data readiness. It is data readiness. The way you design your process determines the quality of the data that process produces.

Use ETL pipelines and platform-assisted cleansing

Before data reaches your platform, particularly during initial onboarding of legacy data, Extract, Transform, Load (ETL) pipelines are your most powerful tool. A well-configured ETL pipeline can deduplicate records, standardise naming conventions, enforce taxonomy alignment, and flag incomplete or inconsistent entries before they pollute your new system.

Most platform providers either offer ETL tooling directly or have established integration partnerships that can help. The investment in a proper ETL process at onboarding is almost always recovered quickly in reduced remediation effort downstream. The specific capability to look for: deduplication logic that handles variant spellings and naming conventions, a persistent problem in supplier data, where “Vendor A” and “Vendor A Inc.” are the same entity to a human and two entirely different records to an AI [4].

Establish your golden source, and govern it

The third lever is architectural: identifying and designating your system of record for each data domain, and then treating that designation seriously. Gartner’s System of Record / System of Information framework is a useful structure here. For any given data entity, a supplier, a contract, a spend category, there should be one authoritative source. Other systems may reference or display that data, but only one system owns it.

In supplier management, your VMS or supplier management platform should be the system of record for supplier master data. Your ERP for transactional spend. Your CLM for contract data. These designations need to be explicit, documented, and enforced, because AI draws on all of these, and conflicting data across systems is one of the most common causes of AI outputs that feel “wrong” without anyone being able to pinpoint why.

In Brooklyn’s architecture, Brooklyn VMS is the system of record for supplier master data and the Brooklyn CLM module is the system of record for contract data. Where customers have existing ERP or financial systems, those remain the system of record for transactional spend. Ask Brooklyn, our conversational AI layer, draws on the correct system of record for each query type, whether that is supplier profile and risk data, contract obligations and key dates, or spend analytics. Where Brooklyn is the system of record, data is governed directly within the platform. Where another system holds the record, Brooklyn is configured as the system of information, referencing that data without duplicating or overwriting it. The result is that AI outputs are grounded in data that is both authoritative and traceable: exactly what the EU AI Act’s transparency requirements are increasingly asking organisations to demonstrate.

Data Quality Is Not a Project. It’s a Discipline.

One of the most important mindset shifts procurement organisations need to make: data quality is not something you achieve and then move on from. It degrades. Suppliers change names, merge, or cease trading. Contracts expire. Spend categories evolve. People leave and take institutional knowledge with them. Without active stewardship, even clean data becomes stale data within 12 to 18 months.

This connects directly to the governance and ownership arguments in Part 2. The RACI model applied to process steps applies equally to data: for every critical data entity, someone needs to be responsible for its accuracy, accountable for its completeness, and empowered to enforce the standard. Data stewardship is not a technology function. It’s a business function that happens to be supported by technology.

Info-Tech Research Group’s Data Priorities 2026 report is blunt on this point: inconsistent data quality, unclear data governance, and low data literacy are continuing to undermine AI readiness across enterprises, not because organisations lack the tools, but because they have not embedded the discipline [5]. The tool is necessary but not sufficient. The governance is what makes it stick.

The Regulatory Dimension: Data Readiness Is Now a Compliance Requirement

There is a further reason to take data readiness seriously that goes beyond operational performance: regulation.

The EU AI Act, now in force for high-risk AI applications, requires organisations to demonstrate that the data used to train, fine-tune, or prompt AI systems was fit for purpose: accurate, representative, and free from material bias. For procurement functions operating AI-assisted supplier risk assessment, contract compliance monitoring, or spend analytics, this means the quality and governance of your underlying data is not just an operational matter. It is a compliance matter.

Organisations that have built proper data governance frameworks, with documented data lineage, defined quality standards, and clear ownership, are not just better positioned to benefit from AI. They are better protected when regulators, auditors, or major customers ask how their AI decisions are made. This is a core part of why Brooklyn’s architecture centralises AI observability, and why our BISO-28 governance policy addresses data governance explicitly. The ability to show what data informed an AI output, and to demonstrate that data was governed, is rapidly becoming as important as the output itself.

The Minimum Viable Data Standard: A Starting Checklist

Before deploying AI on any procurement process, the following represents a practical minimum threshold, not perfection, but enough to start with confidence:

✓Supplier master data: your critical and strategic suppliers are identifiable, deduplicated, and correctly named in a single system of record.
✓Process examples: you have at least 3–5 complete end-to-end examples of the target process, with documented inputs, outputs, and outcomes at each stage.
✓Taxonomy alignment: your spend categories, supplier classifications, and contract types use a consistent, documented naming convention. Even if imperfect, it must be consistent.
✓Data ownership: every critical data entity has a named owner who is accountable for its accuracy and empowered to enforce the standard.
✓ETL or onboarding cleansing: legacy data has been passed through at least basic deduplication and completeness validation before entering your system of record.
✓Defined “good”: you can articulate what a correct output looks like for the AI process you are targeting, even if only in qualitative terms to start.

The Right Question

The organisations successfully deploying AI in procurement right now are not the ones that waited until their data was perfect. They are the ones that identified where their data was good enough to start, defined what “good enough” meant in practice, built the governance to maintain that standard going forward, and used the platform, and the discipline of using it, to progressively raise the bar.

So the question is not “is our data ready?” The question is: “where is our data ready enough to start, and what’s our plan for the rest?”

That is a question every procurement organisation can answer. And it leads somewhere far more productive than waiting.

About the Author

Nick Francis

Nick Francis, Chief Technology and Marketing Officer at Brooklyn Solutions

Nick Francis is a well-established and experienced CxO delivering Digital & Security-focused Transformation through the design, build, and deployment of cost-effective, highly automated industry-leading solutions. Nick has experience working across the private and public sectors in industries such as Financial Services, Insurance, Legal, Utilities, Retail, Public Sector and Government. Specialised in transformation activity to optimise processes, operational expenditure, and increase productivity. Significant experience in compliance, risk & control activities in highly regulated industries, standardisation of technologies, streamlining of internal processes and continuous improvement driving consistency and efficiency across an organisation whilst holding Customer, Colleague and Partner experience at a premium.

Nick Francis

Co-Founder, CTO & CISO · Brooklyn Solutions

25+ years in technology leadership across financial services, government, and B2B SaaS. Lean Six Sigma practitioner.

References & sources

Deloitte / Suplari — 74% of procurement leaders say their data isn’t AI-ready. Referenced across multiple 2025–2026 industry reports including Suplari’s Procurement Trends 2026 and SpecLens AI in Procurement Complete Guide 2026.
MIT Project NANDA (July 2025): 95% of organisations deploying generative AI saw zero measurable return. Primary failure causes identified as data readiness, workflow integration, and absence of defined outcome, not model performance.
Gartner (February 2025): 60% of AI projects lacking AI-ready data will be abandoned before delivering value. Gartner definition of AI-ready data: aligned to specific use cases, actively governed at asset level, supported by automated pipelines with quality gates, managed through live metadata, and continuously quality-assured.
SpecLens AI in Procurement Complete Guide 2026: “If your historical PO data says Vendor A in one system and Vendor A Inc. in another, the AI sees two vendors.”
Info-Tech Research Group — Data Priorities 2026 (January 2026): inconsistent data quality, unclear data governance, and low data literacy are continuing to undermine AI readiness and enterprise decision-making heading into 2026.

Your Data Isn’t Ready. Neither Is Anyone Else’s.