Gather and process raw data at scale.
Defining data retention policies.
Work closely with our data science team (ad hoc analysis of the data).
Acquire data from primary or secondary sources and maintain databases/systems.
Create data sets and tables to store data. Work on data Analysis, modeling and architecture.
Make and test database schema changes when needed.
Identify user needs to create and administer the data lake.
Ingest and move data between databases that includes relational and no-SQL databases like Oracle, SQL Server, MongoDB, Elastic Search, Google Bigquery etc.
Implement security measures by encrypting sensitive data
Build and Modify ETL jobs to improve performance and efficiency
Work closely with the team members in the US to understand requirements; guide the team through implementation.
Experience with integration of data from multiple data sources utilizing ETL tools (Talend,Informatica, PDI, Kapow etc.)
Qualifications:
B.Tech or B.E degree in computer science or related technical fields
8+ years of experience with database systems (Oracle, SQL Server, MySQL, Google BigQuery, MongoDB, ElasticSearch etc.)
6+ years of experience with distributed computing principles, Programming Language (Python, Java, C, etc.)
6+ years experience with Big Data (Hadoop, Horton Works, etc.) and NoSQL databases, such as (Hbase, Spark, Hive, Cloudera etc)