Data Engineer
Position Purpose:
Data Engineers build and support data pipelines and data marts built off those pipelines. Both must be scalable repeatable and secure. They help facilitating getting data from a variety of different sources, in the correct format, assuring that it conforms to data quality standards and assuring that downstream users can get to that data timeously. This role functions as a core member of an agile team.
These professionals are responsible for the infrastructure that provide insights from raw data, handling and integrating diverse sources of data seamlessly.
They enable solution handling large volumes of data in batch and real time leveraging emerging technologies from both the big data and cloud spaces.
Additional responsibilities include developing proof of concepts and implement complex big data solution with a focus on collecting, parsing, managing, analysing and visualising large datasets. They know how to apply technologies to solve the problems of working with large volumes of data in diverse formats to deliver innovative solutions.
Data Engineering is a technical job that requires substantial expertise in a broad range of software development and programming fields. These professionals have a knowledge of data analysis, end user requirements and business requirements analysis to develop a clear understanding of the business need and to incorporate these needs into a technical solution. They have a solid understanding of physical database design and the systems development lifecycle. This role must work well in a team environment
Job objectives:
- Design and develop data feeds from an on-premise environment into a data lake environment in an AWS cloud environment
- Design and develop programmatic transformations of the to correctly partition it, format it and validate or correct it data quality
- Design and develop programmatic transformation, combinations and calculations to populate complex data marts based on feed from the data lake
- Provide operational support to data mart data feeds and data marts
- Design infrastructure required to develop and operate data lake data feeds
- Design infrastructure required to develop and operate data marts, their user interfaces and the feeds required to populated them
Task information:
- Design and develop data feeds from an on-premise environment into a data lake environment in an AWS cloud environment
- Establishments functional and non-functional requirements around the feed
- Work with the integration team to design the process for managing and monitoring the feeds to Shoprite standards
- Work with the integration team to build and test the feed components
Design and develop programmatic transformations of the to correctly partition it, format it and validate or correct it data quality
- Establish the functional and non-functional requirements for formatting and validating the data feed
- Design process appropriate to high volume data feeds for managing and monitoring the feeds to Shoprite standards
- Build and test the formatting and validation transformation components
Design and develop programmatic transformation, combinations and calculations to populate complex data marts based on feed from the data lake
- Establish requirement that a data mart should support
- Design the target data model, the transformations and the feeds, appropriate to high volume data flows, required to populate the data marts
- Build and test the target data model, the transformations and the feed required to populate the data marts
Provide operational support to data mart data feeds and data marts
- Identity and perform maintenance on the feed as appropriate
- Work with the front-line support team and operations to support the feed in production
Design infrastructure required to develop and operate data lake data feeds
- Specify infrastructure requirements for feed and work with operations team to implement those requirements and deploy the solution and future updates
Design infrastructure required to develop and operate data marts, their user interfaces and the feeds required to populated them
- Specify infrastructure required to develop and operate data marts
- Specify infrastructure in term of front-end tools required to exploit the data marts for end-user and work with front end team to deploy a complete solution for the user
- Specify and build any feeds required to populate front-end tools and work with the front-end team to optimise the performance of the overall solution
Report Structure:
- Reports to manager for Data Management and Decision Support
Impact of Decision
- Time Span – Operational
- Problem solving – Complex to Highly Complex
- Risk of decisions – High Internal
- Financial impact – Medium
- Influence of work – Operational
- Work proficiency – Professional
- Demands of change – High
Job Related Experience
Time | Essential | Desirable | |
Retail operations | 4+ years | X | |
Business Intelligence | 4+ years | X | |
Big Data | 2+ years | X | |
Extract Transform and Load
(ETL) processes
|
4+ years | X | |
Cloud AWS | 2+years | X | |
Agile exposure, Kanban or
Scrum |
2+years | X |
Formal Qualification
Time | Essential | Desirable | |
IT-related | 3 years | X | |
AWS Certification at least to
associate level |
X |
Job Related Knowledge
Time | Essential | Desirable | |
Creating data feeds from
on-premise to AWS Cloud
|
24 months | X | |
Support data feeds in production
on break fix basis |
24 months | X | |
Creating data marts using
Talend or similar ETL development tool |
48 months | X | |
Manipulating data using python
and pyspark |
24 months | X | |
Processing data using the
Hadoop paradigm particularly using EMR, AWS’s distribution of Hadoop |
24 months | X | |
Devop for Big Data and
Business Intelligence including automated testing and deployment |
24 Months | X |
Job Related Skills
Time | Essential | Desirable | |
Talend | 12 months | X | |
AWS: EMR,EC2, S3 | 12 months | X | |
Python | 12 months | X | |
PySpark or Spark | 12 months | X | |
Business Intelligence data
modelling
|
36 months | X | |
SQL | 36 months | X |
COMPETENCIES
Essential
- Planning & Organising (Structuring tasks)
- Evaluating problems
- Executing assignments
- Achieving success
- Analytical thinking
- Communication
Desirable
- Creative thinking