As reliance on Hadoop and Spark grows for data management, processing and analytics, data integration strategies should evolve to exploit big data platforms in support of digital business, Internet of Things (IoT) and analytics use cases. While Hadoop is used for batch data processing , Spark supports low-latency processing. Integration leaders should understand the various patterns of integration described below, and align use cases with vendor offering.
1) Native ETL and ELT in Apache Hadoop & Spark platforms – data integration occurring natively in the Hadoop & Spark platforms without leveraging data integration tools, but rather using native tools (such as Pig, Hive, MapReduce). Skillset scarcity is a potential concern with this pattern.
2) Data integration offering specific to Hadoop & Spark platforms (distinct from traditional data integration offering) – incumbent Vendors providing a dedicated big data integration offering (distinct from their traditional data integration offering) that runs in Hadoop & Spark platforms. A separate offering potentially avoids disruption to traditional data integration workflows.
3) The same offering for both traditional and big data integration needs – vendors leveraging the same offering for traditional data and big data integration.
4) Data pipelines in Hadoop & Spark platforms – vendors providing end-to-end data management solution (ingestion, organization, transformation, enrichment, and quality) including integration.
5) Self-service data preparation using Hadoop & Spark platforms – self-service data preparation offerings using Hadoop & Spark platforms to support the processing requirements for data preparation.Category: Big Data Integration Big Data Trends Data Integration Data preparation