Activating SAP and Salesforce Data in Databricks for Manufacturing Enterprises, Governance-First, BI-Ready, CI/CD-Managed
Prepared by
Audience
Industry
Version

Section 1
Manufacturing enterprises are operating with a data paradox: they generate more operational data than ever before, from SAP production orders to Salesforce pipeline activity, yet the teams who need that data most still rely on manually reconciled spreadsheets, shadow BI tools, and periodic extracts that are outdated before they're shared.
The root cause is not a shortage of data. It is a shortage of trustworthy, governed, BI-ready data, data that is current, reconciled across systems, quality-validated, and accessible through a governed platform that the business can self-serve.
This blueprint addresses that gap by laying out a proven approach to activating SAP and Salesforce data in Databricks using a medallion architecture, Unity Catalog governance, and CI/CD-managed data products.
Manufacturing enterprises are operating with a data paradox: they generate more operational data than ever before, from SAP production orders to Salesforce pipeline activity, yet the teams who need that data most still rely on manually reconciled spreadsheets, shadow BI tools, and periodic extracts that are outdated before they're shared.
The root cause is not a shortage of data. It is a shortage of trustworthy, governed, BI-ready data, data that is current, reconciled across systems, quality-validated, and accessible through a governed platform that the business can self-serve.
This blueprint addresses that gap by laying out a proven approach to activating SAP and Salesforce data in Databricks using a medallion architecture, Unity Catalog governance, and CI/CD-managed data products.
Section 2
Manufacturing enterprises are operating with a data paradox: they generate more operational data than ever before, from SAP production orders to Salesforce pipeline activity, yet the teams who need that data most still rely on manually reconciled spreadsheets, shadow BI tools, and periodic extracts that are outdated before they're shared.
The root cause is not a shortage of data. It is a shortage of trustworthy, governed, BI-ready data, data that is current, reconciled across systems, quality-validated, and accessible through a governed platform that the business can self-serve.
This blueprint addresses that gap by laying out a proven approach to activating SAP and Salesforce data in Databricks using a medallion architecture, Unity Catalog governance, and CI/CD-managed data products.
System fragmentation at scale. SAP is typically the system of record for order-to-cash, procure-to-pay, finance, and production. Salesforce manages the commercial pipeline, account relationships, and pricing agreements. These two systems are rarely integrated in a way that supports analytical workloads.
Complex, time-sensitive metrics. OTIF, production backlog, GR/IR accruals, and pricing leakage require joining multiple SAP modules (SD, MM, PP, FI) with Salesforce objects, applying business rules that often exist only in ABAP custom code, and refreshing frequently enough to be actionable.
ABAP technical debt. Over years of SAP customisation, manufacturers accumulate ABAP reporting programs, user exits, and enhancement spots that encode critical business logic. This logic is rarely documented, is difficult to test, and becomes a significant risk as ECC decommission deadlines approach.
Shadow analytics proliferation. When the official data platform cannot meet business demand, teams build local solutions: Excel extracts, Power BI reports connected directly to SAP via RFC, Salesforce reports that don't reconcile with finance. The result is contradictory numbers and eroded trust.
Section 3
Customertimes implements a three-zone medallion architecture on Databricks, aligned to Unity Catalog governance. Each zone has a distinct purpose, quality standard, and access policy.
Source
SAP ECC / S/4HANA
Source
Salesforce CRM
Databricks
Medallion Lakehouse
Consumption
Power BI / Tableau
Section 4
Customertimes implements a catalogue hierarchy aligned to manufacturing data domains:
Each Gold data product has a named data product owner (typically a business stakeholder), a data steward (governance team), and a data engineer (technical maintainer). This is documented in the Unity Catalog tag taxonomy and enforced through change management.
All columns are tagged on ingestion in the Bronze zone using Unity Catalog system tags:
pii.classification: personal_identifiable — customer name, contact, bank detailpii.classification: sensitive_business — pricing, margin, rebate conditionspii.classification: public — material descriptions, plant codes, cost centresColumn masking policies are applied automatically based on tag + role combination, with no manual intervention required on new tables that inherit the schema.
Section 5
Every Silver and Gold table includes automated DQ checks executed as part of the pipeline run:
For every incremental load, Customertimes generates a reconciliation report comparing:
These reports are stored in a dedicated audit schema and surfaced in the observability dashboard, with drill-down to the specific records that failed or were quarantined.
The standard observability dashboard covers:
Section 6
OFFER 1 OF 3
Quick Starter: SAP + Salesforce Data Activation
BI-ready in Databricks, in weeks, not quarters
The Quick Starter is designed for manufacturing organisations that need to demonstrate value from their Databricks investment quickly, or those building a governed data foundation for the first time. It delivers BI-ready data from SAP and Salesforce in 4–6 weeks.
In Scope:
SAP and Salesforce data available in Databricks Gold zone, passing all DQ checks
Pipeline SLA met: daily refresh by agreed time for 5 consecutive days without manual intervention
Business stakeholder signs off on output accuracy against source system reconciliation report
CI/CD pipeline operational: dev → staging → prod with automated test gate
Data dictionary and runbook delivered, reviewed, and accepted by client data engineering team
Section 7
OFFER 2 OF 3
ABAP Logic Migration & Decommission
De-risk your ECC exit, with a clean path to SAP Datasphere
Custom ABAP code is one of the most underestimated risks in SAP ECC decommission programmes. Business-critical calculations, pricing adjustments, allocation rules, exception classifications, are embedded in ABAP programs, user exits, and BAdIs that are not documented, not tested, and often known to only a small number of people.
This offer systematically inventories, documents, and migrates that logic to Databricks-native code, reducing ECC exit risk and creating a forward path to SAP Datasphere if required by the enterprise roadmap.
Phase A: Inventory and Documentation (Weeks 1–2)
Phase B: Migration to Databricks (Weeks 3–6)
Phase C: Validation and Sign-Off (Weeks 7–10)
ABAP inventory completed, reviewed, and signed off by SAP program lead
All high-priority programs migrated and reconciled at agreed pass rate.
Regression test suite operational in Delta Live Tables.
Decommission sign-off checklist completed and approved by programme sponsor.
No critical reconciliation discrepancies unresolved at handover.
Section 8
OFFER 3 OF 3
Finance 360 (Manufacturing)
Sales + Procurement data products with governed CI/CD marts
Finance leaders in manufacturing need a single, governed view of the business that reconciles procurement spend, accounts receivable, and commercial performance, without waiting on IT for every report or reconciling four spreadsheet versions at month-end.
Finance 360 delivers a purpose-built suite of data products for manufacturing finance, combining SAP FI, MM, and SD module data with Salesforce commercial data into governed, BI-ready Databricks Gold marts.
Procurement Data Product
Source: SAP MM — EKKO, EKPO, EKES, EKET, EKBE, MARA, LFM1
AR / DSO Data Product
Source: SAP FI-AR — BSID, BSAD, KNA1, VBAK
Commercial Data Product
Source: SAP SD (VBAK, VBAP, LIPS, VBRK, VBRP) + Salesforce (Opportunity, Account, Order, PricebookEntry)
Governed CI/CD Marts
All Finance 360 data products are delivered as versioned, tested, CI/CD-managed Gold tables:
All three Gold data products operational and passing DQ checks
Finance leader sign-off on GR/IR accrual output vs. SAP FBL3N / MB5S reconciliation
DSO output within agreed tolerance vs. SAP standard AR aging report
OTIF metric agreed and validated by Operations lead
CI/CD pipeline operational with automated test gate across all three marts
First live month-end close supported without escalation to engineering team
Section 9
Quantifying the ROI of a data platform programme requires honesty: some benefits are directly measurable (reduced manual effort, avoided licence costs), others are indirect (better decisions, faster close, reduced audit risk). This section provides the logical framework and cost drivers without inventing specific figures.
How to use this section
We recommend using this model as a structured conversation with your Finance team during the assessment phase, substituting your organisation's actual operational numbers. A well-constructed, defensible business case built on actual data is more valuable, and more trusted internally, than an inflated projection.
Driver
Logic
Shadow BI consolidation
How many separate BI or extract tools can be retired when Gold marts are self-service?
SAP BW / BEx rationalisation
How many BW InfoProviders can be retired if ABAP logic migrates to Databricks?
ECC dual-run cost
Does ABAP migration remove blockers to ECC retirement, reducing parallel running cost?
Driver
Logic
Month-end close
How many FTE-days per period are spent on manual GR/IR, AR aging, and intercompany reconciliation?
Pipeline rework
What is the engineering cost of fixing broken pipelines caused by SAP schema changes or DQ failures?
Audit preparation
How much time is spent assembling lineage, access logs, and DQ evidence for internal or external auditors?
Driver
Logic
Pricing leakage recovery
If leakage detection identifies unrecognised discounts or rebate errors, even a small % of revenue can be material at manufacturing scale
OTIF penalty avoidance
If earlier OTIF visibility enables intervention before customer penalty threshold is breached, what is the avoided cost?
Driver
Logic
DSO reduction
If earlier dispute identification accelerates cash collection by even a few days at scale, what is the working capital impact?
Decision cycle time
If Finance and Operations have daily-refreshed trusted data vs. weekly manual reports, what decisions can be made faster?
Section 10
Each row represents a workstream. Phases are designed to be composable — Path A and Path B in Phase 2 can run in sequence or overlap depending on team capacity and business priority.
Phase 1
Weeks 1-6
Quick Starter: SAP + Salesforce Data Activation
♦ Go-Live W6
Phase 2A
Weeks 7-18
Finance 360 (Manufacturing) — Procurement, AR/DSO, Commercial
♦ Live Close W17
Phase 2B
Weeks 7-16
ABAP Logic Migration & Decommission — Inventory → Migrate → Validate
♦ Sign-Off W16
Phase 3
Month +5
Scale: New Data Products • Self-Service Enablement • Advanced Analytics
Section 11
Use this checklist to evaluate your organisation's readiness and to structure your conversation before the assessment call with Customertimes.
We have an active Databricks workspace, or have budget and approval to stand one up
We can identify the SAP modules most critical to our data needs (SD, MM, FI, PP, etc.)
We have a named SAP technical contact who can provide schema access and extraction credentials
We have a named Salesforce administrator who can configure API access and object permissions
We know which 2–3 business use cases would deliver the most immediate value if data were trusted and timely
We have (or are willing to establish) a data governance policy that defines data ownership
We understand our GDPR, SOX, or industry-specific data handling obligations for the data we want to activate
We have a view on which data assets contain PII and which are purely operational or financial
We know our ECC decommission timeline, or can find out within 2 weeks
We have a process for change management when data definitions or pipeline logic changes
There is an executive sponsor (CIO, VP Data, or CFO) who can unblock access and resource requests
The Finance team is willing to validate output accuracy against source system reports
Our data engineering team (internal or outsourced) can own the platform post-handover
We are open to a time-boxed engagement with clear scope and written success criteria
We have capacity for 2–3 hours per week from a business stakeholder during the engagement
We are committed to Databricks as our primary analytical platform, or evaluating it seriously
We are willing to implement Unity Catalog governance (or already have it in place)
Our cloud environment (Azure, AWS, or GCP) is compatible with Databricks deployment requirements
We are open to implementing Git-based version control for data pipeline code
We understand the difference between ELT (transform in Databricks) and ETL (transform in middleware) and have a preference
Leadership understands that this is a data platform programme, not a reporting tool purchase
We have a realistic timeline expectation: 4–6 weeks for Quick Starter, not 4–6 days
We are prepared to make decisions on data definitions, access policies, and use case priorities in a timely manner
We see this as the beginning of a data product discipline, not a one-time project
We have communicated to business stakeholders that initial outputs will be validated before replacing existing reports
Book a free 60-minute data landscape assessment with Customertimes. We'll map your SAP and Salesforce environment, identify your highest-value use cases, and recommend the right offer, with no commitment required.