1. Digital Health Jobs
  2. Protege

Solutions Applied Data Scientist, Healthcare

Posted on May 23, 2026 (24 days ago)

Company Overview

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data. We are backed by world-class investors and partnering with ambitious teams in AI. Our team is lean, fast-moving, and focused on velocity and impact.

Role Overview

We are hiring a Solutions Applied Data Scientist to design, construct, and validate complex healthcare data cohorts used for AI model training. This role sits within the delivery organization and partners with Solutions Leads and delivery engineers to solve complex data challenges that arise during customer projects.
The role focuses on applied dataset construction, feasibility analysis, and data validation rather than research or model development.

Responsibilities

Work as a technical partner to Solutions Leads to solve complex data problems including cohort construction, multi-source dataset assembly, feasibility analysis, and data validation.
  • Write complex SQL queries to construct cohorts and implement inclusion/exclusion logic
  • Join datasets across multiple data sources and validate linkages
  • Investigate missing or anomalous data and perform data completeness analysis
  • Evaluate whether requested variables or labels exist in available data sources and propose proxies when needed
  • Partner with Solutions Engineers and platform teams when pipeline or infrastructure changes are required
  • Develop reusable tooling and workflows (SQL templates, automated validation checks, scripts) to increase delivery efficiency

What Success Looks Like

30 days: Learn delivery workflows, healthcare data partner realities, and start contributing to feasibility and validation work.
60 days: Independently support scoped technical escalations, write and validate SQL/Python workflows, and help answer feasibility questions.
90 days: Handle the hardest dataset problems with limited oversight, improve QA and repeatability, and propose workflow or platform improvements.

What You Bring

  • Experience with large structured healthcare datasets
  • Strong SQL and Python skills for complex queries, data analysis, and scripting
  • Experience joining and transforming large datasets and working with structured file formats (CSV, Parquet)
  • Experience performing data validation and exploratory analysis
  • Ability to translate ambiguous requirements into concrete data logic and communicate with technical and non-technical stakeholders
  • Experience using Claude Code / Codex

Protege Values

Pass the Loved Ones’ Test: Act with integrity and do the right thing.
Always Find a Way: Be resourceful and resilient.
Go Fast and Grow Fast: Move with urgency and learn quickly.
Practice Kindness and Candor: Communicate directly and respectfully.
Deliver Together: Collaborate and share ownership.
Own the Outcome. Hone the Craft.: Take pride in work and continuously improve.

How to Apply

To apply, use the application form linked on the job page. Click the "Apply for this Job" button on the Protege job posting.

Application Link

Visit: https://jobs.ashbyhq.com/protege/076f8f75-6890-4bf5-976d-328f6f583057/application