Every B2B e-commerce project has a moment, usually three weeks in, when it becomes obvious that the customer-master data is going to be a bigger problem than the storefront. It happens to the best teams. The pitch deck for the re-platform talked about UI, mobile, search, AI. The actual blocker turns out to be that the same customer exists three times in the ERP under slightly different names, and each duplicate has a different set of pricing agreements.
This article is the workflow we now run on every project before we write any storefront code. It’s the work nobody scopes and everybody pays for one way or another.
Why the storefront makes this worse
In an ERP-only world, the AR clerk who runs into a customer duplicate just picks one and moves on. Humans absorb data quality issues silently for years. The moment you put a self-service B2B storefront in front of those records, every flaw becomes a customer-facing bug:
- "Why is the price different from what my rep quoted?" → Two records, two pricing agreements.
- "My order history is empty." → They’re looking at duplicate #2 but ordered against duplicate #1.
- "Why is my account suddenly on credit hold?" → AR put #1 on hold. Storefront sees #2.
- "Where’s my ship-to from last month?" → Stored on a third duplicate that nobody knew about.
The five issues we always find
- Duplicates. The same legal entity exists 2–N times. Often created over years as different sales reps onboarded the same customer through different territories or product lines.
- Inconsistent ship-to / bill-to relationships. Some customers have all their ship-tos under one parent; others have separate top-level customer records per location. The ERP may not enforce a consistent pattern.
- Stale assignments. Sales reps who left, payment terms that haven’t been reviewed in five years, customer classes that map to discontinued product lines.
- Implicit business logic in fields nobody documented. "Customer Class = X means we ship via FedEx." That logic is in someone’s head, not the ERP’s configuration.
- Tax exemption certificates that aren’t enforced. The certificate is in the file cabinet. The ERP customer record doesn’t know about it. The storefront will charge tax in a state where it shouldn’t.
The workflow we run
Phase 1: extract and profile (week 1)
Pull every customer record. We don’t care about pricing details yet — we care about identity. Extract: customer ID, legal name, DBA, primary address, primary contact, year created, last order date, sales rep assignment, parent ID (if hierarchy exists). Run it through fuzzy matching (we use a combination of token-set ratio on names + Levenshtein on addresses, then human review).
Phase 2: present the duplicate report (week 2)
Hand the duplicate candidates back to the AR / sales-ops team. Their reaction tells you the project’s actual scope. If the response is "yeah, those are the ones we knew about, we’ll merge them," you’re in good shape. If the response is "wait, I didn’t realize we had three records for Acme Manufacturing — let me check with Bob," you’ve found a long-tail of project work nobody scoped.
Phase 3: hierarchy and ship-to normalization
Pick a pattern: parent customer + child ship-to records, OR flat customers with shared billing address. Both work. The wrong answer is "we’ll let it stay both ways." Document it, write a small migration script, run it.
Phase 4: enrich and validate
Add the data the storefront will need that the ERP didn’t enforce: web user emails, role assignments per ship-to (buyer, approver, AP), tax-exemption flags. This is also the moment to ask "should sales reps still see customer X who hasn’t ordered in three years?" — soft archive resolves a lot of clutter.
Phase 5: lock and re-test
Once the cleaned records are deployed, write integration tests that prove the basics: every active customer has exactly one primary record, every ship-to belongs to a customer, every customer has at least one valid contact. Run those before every release.
How to size this in a proposal
Bracket it as 10–15% of the total project budget for B2B re-platforms. That sounds high until you’ve been on the wrong side of it. We’d rather scope it explicitly than let it bleed across every other line item and turn into “why is this project late?”
Who owns it
The cleanup is mostly business work, not IT work. The right team is your AR lead + your sales-ops lead, with us providing the scripts, the duplicate-detection tooling, and the test harness. If the customer team is engaged from week one, this finishes on time. If they’re looped in at week ten, the project slips.
If you’re scoping a B2B re-platform and the customer master hasn’t been audited in five years, talk to us before you scope the storefront. The order matters.