Title: | Data Engineer |
---|---|
ID: | 1034 |
Department: | CRISP Health (www.crisphealth.org) |
Position Title: Data Engineer – CRISP Insights
Reports Directly To: Senior Data Engineer Technical Lead
Location: Columbia, MD
Hours Per Week: 40
Risk Designation: High
Job Summary
The Data Engineer will be responsible for supporting HIT activities in the Baltimore/Washington region. The successful candidate will work to design, develop, and maintain a data analytics platform solution. CRISP Shared Services is currently running an Azure Data Lake 2 / Azure Data Factory / Spark / Databricks infrastructure with approximately 300 production use cases that consolidate several streams of healthcare data. This role will primarily focus on the creation of new use cases through Databricks programming including communication, technical proficiency, collaboration, and willingness to learn new things. This role assists a senior data engineer to implement new ingestion pipelines and troubleshoot production issues.
This position will have a base salary.
This position requires the candidate to be in the Columbia office 1 day a week.
Essential Duties and Responsibilities
Include the following. Other duties may be assigned.
- Design, build, and maintain large, complex data processing pipelines that meet business requirements
- Ensure the quality of deliverables by developing automated controls and performing unit, integration, and user acceptance testing
- Develop scalable and re-usable frameworks for ingestion and transformation of large data sets
- Troubleshoot and perform root cause analysis on data pipeline issues
- Work with source system owners and business owners to incorporate business changes into data pipelines
- Create comprehensive documentation of data workflows, processes, and infrastructure
Qualifications
To perform this job successfully, the incumbent must be able to perform each essential duty satisfactorily. The requirements listed below are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
- Advanced programming skills in PySpark, Python, Spark SQL
- Thorough understanding of Data Lakes, raw/enriched/curated layer concepts, and ETL within Azure framework
- Proven development experience in building complex data pipelines for lakehouse/data warehouses using Agile methodology
- Experience in architecture, design, and implementation using Databricks and Azure Data Factory
Required Experience/Education
- 4+ year college degree (required)
- 4+ years of working with data integration (required)
- 2+ years of advanced programming in PySpark
- Experience working with Databricks (preferred)
Alternatively, you can apply to this job using your profile from Indeed by clicking the button below: