POSITION SUMMARY
Air Products’ goal is to be the safest, most diverse, and most profitable industrial gas company in the world, providing excellent service to our customers. Our 4S principles are Safety, Simplicity, Speed, and Self-confidence. Effective use of data and analytics is critical to help the company achieve these goals. Our IT Data and Analytics team is seeking an Analytics Data Engineer to help us build and maintain our Amazon Web Services (AWS) S3 Data Lake.
...
Nature and Scope
The Analytics Data Engineer is responsible for operationalizing data pipelines that support analytics initiatives for the company. The primary responsibilities include building, managing, and optimizing data flows from various sources such as SAP ECC, into our S3 data lake and Redshift cluster. The primary skills needed include proficiency with Qlik (Attunity) Replicate, Qlik (Attunity) Compose, AWS Glue, Athena, Redshift, EMR, Hive, Spark, and S3. Experience operating and tuning EMR and/or Redshift clusters is desirable.
The data lake enables consumers such as data scientists and business and IT data analysts to complete advanced analytics projects as well as business reporting. The data engineer is expected to collaborate with data scientists, data analysts and other data consumers to productionize data models and algorithms developed by those users in order to improve the overall efficiency of advanced analysis projects. Additionally, the data engineer is responsible for ensuring data quality, governance and data security procedures are met while curating data for use in the Data Lake. Advanced proficiency programming in PySpark and Python ETL modules is required.
PRINCIPAL ACCOUNTABILITIES
- Build and maintain data pipelines in support of the enterprise AWS S3 Data Lake.
- Contribute to the core design of data architecture, data pipelines, data models and schemas, and implementation plans for the data lake
- Enable an innovative approach to data platforms in order to greatly increase the flexibility, scalability, and reliability of IT services at an optimal cost
- Work in cross-disciplinary teams to understand enterprise needs and ingest rich data sources.
- Work with analytics and data science team members to optimize data platforms to better meet their needs
- Maintain the proper infrastructure to support ETL from a variety of sources using SQL, SAP Data Services, and big data technologies
- Design ETL processes based on enterprise architecture and custom project needs
- Perform design reviews, plan, develop, and resolve technical issues
- Work closely with management to prioritize business and information request backlogs
- Ensures data governance and data security procedures are followed
- Perform data replication with Qlik Replicate and maintain data marts with Qlik Compose
- Leverage EMR and Hive to process data mart ETL and inserts, updates, deletes
- Implement data warehouses on platforms such as AWS Redshift
- Research, experiment, and utilize leading data and analytics technologies in AWS
- Educate and train yourself and others as you evangelize the merits of data and analytics
- Be proactive in keeping your skills fresh
- Generate new ideas, never say or think "that's not my job."
JOB REQUIREMENTS
- 4-year College Degree required; Bachelor’s Degree in Information Technology field or related technical discipline preferred
- 4+ years as a Python, PySpark, Scala, Java software developer building scalable real-time streaming ETL applications and data warehouses.
- Proficient experience working within the AWS and AWS tools (S3, Glue, EMR, Athena, Redshift)
- Proficient programming experience with Python and PySpark, Hive
- Experienced in maintaining infrastructure as code using Terraform or cloud formation
- Advanced understanding of both SQL and NoSQL technologies such as MongoDB / DocumentDB
- Solid understanding of data warehouse design patterns and best practices
- Experience in working with and processing large data sets in a time-sensitive environment while minimizing errors
- Hands-on experience working with big data technologies (Hadoop, Hive, Spark, Kafka)
- Hands-on experience working with Qlik (Attunity) Replicate and Qlik (Attunity) Compose
- Ability to develop test plans and stress test platforms
- Demonstrated strength in process development, process adherence, and process improvement
- Experience with complex Job scheduling
- Effective analytical, conceptual, and problem-solving skills
- Must be organized, disciplined, and task/goal oriented
- Able to prioritize and coordinate work through interpretation of high-level goals and strategy
- Effective team player with a positive attitude
- Strong oral and written English language communications skills
Location
AS-IN-Pune