Orchestrating Dynamic Big Data End to End ETL Pipeline

Authors(3) :-Syed Azimuddin Inamdar, Sayyid Abrar, Gayatri Bajantri

Now a days data is said to be the new currency and key to triumph. Gathering a rich quality information from numerous dispersed sources across the world necessitates abundant struggle and time. There stand quite a lot of other challenges that consists while transferring information from its start point to its end point. Data ETL pipelines are employed to extend the complete effectiveness of flow of data from its source to the final destination. In the meantime it is automated and decreases the involvement of humans. In spite of prevailing study on ETL pipelines, the study on this topic is limited. ETL pipelines are intellectual representations of end to end data pipelines. To make use of the full possible of the data pipeline, we need to recognize the events that are going in it and the way they're associated in an end to end pipeline. This thesis gives an summary of designing a conceptual model of data pipeline which may be further used as means of communication among various data teams.

Authors and Affiliations

Syed Azimuddin Inamdar
Department of Computer Science Engineering, VTU, SECAB Institute of Engineering and Technology, Vijayapura, Karnataka, India
Sayyid Abrar
Department of Computer Science Engineering, VTU, SECAB Institute of Engineering and Technology, Vijayapura, Karnataka, India
Gayatri Bajantri
Department of Computer Science Engineering, VTU, SECAB Institute of Engineering and Technology, Vijayapura, Karnataka, India

Bigdata, ETL pipeline.

  1. K. Goodhope, J. Koshy, J. Kreps, N. Narkhede, R. Park, J. Rao, and V. Y. Ye, “Building linkedin’s real-time activity data pipeline.” IEEE Data Eng. Bull., vol. 35, no. 2, pp. 33–45, 2012.
  2. E. Deelman and A. Chervenak, “Data management challenges of dataintensive scientific workflows,” in 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, 2008, pp. 687–692.
  3. P. Vassiliadis, “A survey of extract–transform–load technology,” International Journal of Data Warehousing and Mining (IJDWM), vol. 5, no. 3, pp. 1–27, 2009.
  4. J. Trujillo and S. Lujan-Mora, “A uml based approach for modeling ´ etl processes in data warehouses,” in International Conference on Conceptual Modeling. Springer, 2003, pp. 307– 320.
  5. Alkis Simitsis, Kevin Wilkinson, Umeshwar Dayal, Malu Castellanos HP Labs Palo Alto, CA, USA, Optimizing ETL Workflows for Fault-Tolerance, Conference: Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA.

Publication Details

Published in : Volume 8 | Issue 5 | September-October 2021
Date of Publication : 2021-08-26
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 47-53
Manuscript Number : CSEIT21857
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Syed Azimuddin Inamdar, Sayyid Abrar, Gayatri Bajantri, "Orchestrating Dynamic Big Data End to End ETL Pipeline", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 8, Issue 5, pp.47-53, September-October-2021. |          | BibTeX | RIS | CSV

Article Preview