Job Openings >> Data Engineer
Data Engineer
Summary
Title:Data Engineer
ID:1034
Department:CRISP Health (www.crisphealth.org)
Description

Position Title: Data Engineer – CRISP Insights

Reports Directly To: Senior Data Engineer Technical Lead

Location: Columbia, MD

Hours Per Week: 40

Risk Designation: High

Job Summary

The Data Engineer will be responsible for supporting HIT activities in the Baltimore/Washington region. The successful candidate will work to design, develop, and maintain a data analytics platform solution. CRISP Shared Services is currently running an Azure Data Lake 2 / Azure Data Factory / Spark / Databricks infrastructure with approximately 300 production use cases that consolidate several streams of healthcare data.  This role will primarily focus on the creation of new use cases through Databricks programming including communication, technical proficiency, collaboration, and willingness to learn new things.  This role assists a senior data engineer to implement new ingestion pipelines and troubleshoot production issues.

 

This position will have a base salary.

This position requires the candidate to be in the Columbia office 1 day a week.

Essential Duties and Responsibilities

Include the following. Other duties may be assigned.

  • Design, build, and maintain large, complex data processing pipelines that meet business requirements
  • Ensure the quality of deliverables by developing automated controls and performing unit, integration, and user acceptance testing
  • Develop scalable and re-usable frameworks for ingestion and transformation of large data sets
  • Troubleshoot and perform root cause analysis on data pipeline issues
  • Work with source system owners and business owners to incorporate business changes into data pipelines
  • Create comprehensive documentation of data workflows, processes, and infrastructure

Qualifications

To perform this job successfully, the incumbent must be able to perform each essential duty satisfactorily. The requirements listed below are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

  • Advanced programming skills in PySpark, Python, Spark SQL
  • Thorough understanding of Data Lakes, raw/enriched/curated layer concepts, and ETL within Azure framework
  • Proven development experience in building complex data pipelines for lakehouse/data warehouses using Agile methodology
  • Experience in architecture, design, and implementation using Databricks and Azure Data Factory

Required Experience/Education

  • 4+ year college degree (required)
  • 4+ years of working with data integration (required)
  • 2+ years of advanced programming in PySpark
  • Experience working with Databricks (preferred)

 

Alternatively, you can apply to this job using your profile from Indeed by clicking the button below:


ApplicantStack powered by Swipeclock