Key Responsibilities:
- Design/maintain extraction transform load (ETL) workflows,
dictionaries, and validation rules; run routine DQ checks and produce
facility DQ scorecards.
- Coordinate encrypted transfers, access controls, and audit logs;
support in country teams to execute central scripts locally.
- Build analysis ready datasets and documentation (README, data
schemas, provenance).
- Manage the full data engineering pipeline across nine to ten
countries, covering extraction, transformation, validation, documentation,
and version control all study datasets.
- Implement and maintain data quality frameworks, including
completeness, internal consistency, timeliness, audit trails,
reconciliation checks, and facility feedback on data gaps.
- Maintain secure country specific servers, encryption, anonymization
processes, access control registries, and audit logs for compliance with
national data laws.
- Ensure correct execution of script based federated analysis
workflows, supporting in country teams to run centrally developed code
without exporting raw data.
- Produce quarterly clean datasets, documentation packs, and
harmonized structures ready for analysis (including wide/long reshaping,
panel construction, facility period indexing).
- Collaborate with analysts and statisticians to support data linkage,
model preparation datasets, and metadata curation.
Qualifications
- Masters in Biostatistics/Health Informatics/Data Management.
Mandatory
- At least 3 years working experience managing large routine health
datasets and health facility assessments. Mandatory
- SQL/Python/R; data modelling; data quality frameworks;
documentation; security best practice. Mandatory
How to Apply
