Data Engineer

Harvard Medical School

Boston, MA

Job posting number: #7103659

Posted: June 21, 2022

Application Deadline: Open Until Filled

Job Description

Position Description

The Center for Computational Biomedicine (CCB) is a new center within the Blavatnik Institute at Harvard Medical School. Our mission is to provide cutting-edge computational capabilities, data analysis, and data integration technologies to support medical and biological research within the Medical School. Based at the Harvard Medical School Longwood Campus, we are part of a vibrant community of scientists, physicians, and engineers whose goal is to advance the boundaries of knowledge and improve patient care. The working environment combines the best features of a startup (fast pace, flexibility, flat hierarchies) with those of one of the leading medical schools (excellent benefits, outstanding opportunities for learning, great resources, name recognition).

CCB is looking for an individual to join the Data and Analytic Platforms Group, a group of engineers and scientists developing data warehousing and analytic solutions in support of epidemiology, healthcare economics, machine learning, and basic science research.

The Group works to reduce the burden on faculty by developing centrally managed and shareable data solutions to be used across research silos. We curate very large public and private healthcare utilization (insurance claims, electronic health record), multi-omics, environmental exposure, and social determinants data sets, provision access to those curated data sets, and develop analytic frameworks to accelerate reproducible academic research on top of them. Collectively these data sets contain information relating to hundreds of millions of patients.

This position reports to the Director of the CCB Data and Analytic Platforms Group. Primary responsibilities will include designing and implementing relational database architecture (schema, indexing, stored procedures, ETL processes, etc.) to warehouse multi-terabyte data sets in Microsoft SQL Server. This will include periodically evaluating various query performance metrics to ensure real-time availability to the research community and recommending modifications to the underlying database platform to resolve any identified issues. The bulk of this design work will be left up with the candidate, while a small portion will involve refactoring (or strategically deciding to abandon) existing ETL / indexing strategies. The data sets will be staged into a combination of proprietary schemas as well as the open-source i2b2 data model.

Additional opportunities will be available for the candidate to interact with individual scientific research teams to help improve their workflows.

**The below Typical Core Duties are a generalized list provided by Harvard's Job Frameworks, and may not actually reflect the job-specific responsibilities of this position.

Basic Qualifications

Minimum of seven years’ post-secondary education or relevant work experience

Additional Qualifications and Skills

Bachelor’s Degree in Computer Science or related degree preferred. At least 5 years experience as a software systems architect, including experience developing solutions with both relational database systems and at least one of the following languages: Java, Python, R.
Master’s Degree in a related field (Computer Science / Electrical Engineering, Bioinformatics, Statistics, Data Science, etc.) preferred.
Excellent communication skills, both written and oral
Experience with Microsoft SQL Server or cloud-based data warehousing technologies
Experience designing and maintaining multi-terabyte analytic relational databases, including index and query optimization
Experience orchestrating and optimizing Extract-Transform-Load (ETL) processes for multi- terabyte data warehouses
Comfort doing basic system administration in a Linux environment Comfort doing basic system administration in a Windows environment Experience with relational database index optimization
Experience with containerized (Docker or Singularity) workflows/paradigms
Experience with non-relational database systems (graph, key/value, document, array data stores) Experience with the R statistical computing platform
Experience with Java Experience with Python
Experience with high-performance computing
Comfort independently exploring distributed computing and database technologies and generating executive reports
Experience with public cloud platforms (AWS, Azure, Google Cloud)

Harvard Medical School strives to cultivate an environment that promotes inclusiveness and collaboration among students, faculty and staff and to create new avenues for discussion that will advance our shared mission to improve the health of people throughout the world.

Apply Now

Please mention to the employer that you saw this ad on

More Info

Job posting number:#7103659
Application Deadline:Open Until Filled
Employer Location:Harvard Medical School
United States
More jobs from this employer
Institution Website