Data Engineer

Position Purpose:

Data Engineers build and support data pipelines and data marts built off those pipelines. Both must be scalable repeatable and secure. They help facilitating getting data from a variety of different sources, in the correct format, assuring that it conforms to data quality standards and assuring that downstream users can get to that data timeously. This role functions as a core member of an agile team.

These professionals are responsible for the infrastructure that provide insights from raw data, handling and integrating diverse sources of data seamlessly.

They enable solution handling large volumes of data in batch and real time leveraging emerging technologies from both the big data and cloud spaces.

Additional responsibilities include developing proof of concepts and implement complex big data solution with a focus on collecting, parsing, managing, analysing and visualising large datasets. They know how to apply technologies to solve the problems of working with large volumes of data in diverse formats to deliver innovative solutions.

Data Engineering is a technical job that requires substantial expertise in a broad range of software development and programming fields. These professionals have a knowledge of data analysis, end user requirements and business requirements analysis to develop a clear understanding of the business need and to incorporate these needs into a technical solution. They have a solid understanding of physical database design and the systems development lifecycle. This role must work well in a team environment

 

Job objectives:

  • Design and develop data feeds from an on-premise environment into a data lake environment in an AWS cloud environment
  • Design and develop programmatic transformations of the to correctly partition it, format it and validate or correct it data quality
  • Design and develop programmatic transformation, combinations and calculations to populate complex data marts based on feed from the data lake
  • Provide operational support to data mart data feeds and data marts
  • Design infrastructure required to develop and operate data lake data feeds
  • Design infrastructure required to develop and operate data marts, their user interfaces and the feeds required to populated them

 

Task information:

  • Design and develop data feeds from an on-premise environment into a data lake environment in an AWS cloud environment
  • Establishments functional and non-functional requirements around the feed
  • Work with the integration team to design the process for managing and monitoring the feeds to Shoprite standards
  • Work with the integration team to build and test the feed components

Design and develop programmatic transformations of the to correctly partition it, format it and validate or correct it data quality

  • Establish the functional and non-functional requirements for formatting and validating the data feed
  • Design process appropriate to high volume data feeds for managing and monitoring the feeds to Shoprite standards
  • Build and test the formatting and validation transformation components

 

Design and develop programmatic transformation, combinations and calculations to populate complex data marts based on feed from the data lake

  • Establish requirement that a data mart should support
  • Design the target data model, the transformations and the feeds, appropriate to high volume data flows, required to populate the data marts
  • Build and test the target data model, the transformations and the feed required to populate the data marts

Provide operational support to data mart data feeds and data marts

  • Identity and perform maintenance on the feed as appropriate
  • Work with the front-line support team and operations to support the feed in production

Design infrastructure required to develop and operate data lake data feeds

  • Specify infrastructure requirements for feed and work with operations team to implement those requirements and deploy the solution and future updates

Design infrastructure required to develop and operate data marts, their user interfaces and the feeds required to populated them

  • Specify infrastructure required to develop and operate data marts
  • Specify infrastructure in term of front-end tools required to exploit the data marts for end-user and work with front end team to deploy a complete solution for the user
  • Specify and build any feeds required to populate front-end tools and work with the front-end team to optimise the performance of the overall solution

 

Report Structure:

  • Reports to manager for Data Management and Decision Support

 

Impact of Decision

  • Time Span – Operational
  • Problem solving – Complex to Highly Complex
  • Risk of decisions – High Internal
  • Financial impact – Medium
  • Influence of work – Operational
  • Work proficiency – Professional
  • Demands of change – High

 

Job Related Experience

  Time Essential Desirable
Retail operations 4+ years   X
Business Intelligence 4+ years X  
Big Data 2+ years   X
Extract Transform and Load

(ETL) processes

 

4+ years X  
Cloud AWS 2+years X  
Agile exposure, Kanban or

Scrum

2+years X  

 

Formal Qualification

  Time Essential Desirable
IT-related 3 years X  
AWS Certification at least to

associate level

  X  

 

Job Related Knowledge

  Time Essential Desirable
Creating data feeds from

on-premise to AWS Cloud

 

24 months X  
Support data feeds in production

on break fix basis

24 months X  
Creating data marts using

Talend or similar ETL

development tool

48 months X  
Manipulating data using python

and pyspark

24 months X  
Processing data using the

Hadoop paradigm particularly

using EMR, AWS’s distribution

of Hadoop

24 months X  
Devop for Big Data and

Business Intelligence including

automated testing and

deployment

24 Months X  

 

Job Related Skills

  Time Essential Desirable
Talend 12 months X  
AWS: EMR,EC2, S3 12 months X  
Python 12 months X  
PySpark or Spark 12 months   X
Business Intelligence data

modelling

 

36 months X  
SQL 36 months X  

 

COMPETENCIES

Essential

  • Planning & Organising (Structuring tasks)
  • Evaluating problems
  • Executing assignments
  • Achieving success
  • Analytical thinking
  • Communication

 

Desirable

  • Creative thinking
Upload your CV/resume or any other relevant file. Max. file size: 20 MB.