Five Patterns of Big Data Integration

Five Patterns of Big Data Integration
admin | March 21st, 2016

Download PDF

As reliance on Hadoop and Spark grows for data management, processing and analytics, data integration strategies should evolve to exploit big data platforms in support of digital business, Internet of Things (IoT) and analytics use cases. While Hadoop is used for batch data processing , Spark supports low-latency processing. Integration leaders should understand the various patterns of integration described below, and align use cases with vendor offering.

1) Native ETL and ELT in Apache Hadoop & Spark platforms – data integration occurring natively in the Hadoop & Spark platforms without leveraging data integration tools, but rather using native tools (such as Pig, Hive, MapReduce). Skillset scarcity is a potential concern with this pattern.

2) Data integration offering specific to Hadoop & Spark platforms (distinct from traditional data integration offering) – incumbent Vendors providing a dedicated big data integration offering (distinct from their traditional data integration offering) that runs in Hadoop & Spark platforms. A separate offering potentially avoids disruption to traditional data integration workflows.

3) The same offering for both traditional and big data integration needs – vendors leveraging the same offering for traditional data and big data integration.

  • Incumbent vendors evolving their traditional data integration product to support big data integration.
  •  Emerging vendors providing data integration products that support both traditional and big data integration.

4) Data pipelines in Hadoop & Spark platforms – vendors providing end-to-end data management solution (ingestion, organization, transformation, enrichment, and quality) including integration.

  •  Vendors providing end-to-end analytics solution (ingestion, organization, transformation, enrichment, and analytics) including integration.
  •  Vendors providing a framework for building and deploying data applications including integration

5) Self-service data preparation using Hadoop & Spark platforms – self-service data preparation offerings using Hadoop & Spark platforms to support the processing requirements for data preparation.

Category: Big Data Integration Big Data Trends Data Integration Data preparation