Research Scientist, Speech Technologies
Posted on April 28, 2025 (3 months ago)
About Us:
Hippocratic AI is building safety-focused large language model (LLM) for the healthcare industry. Our team comprised of ex-researchers from Microsoft, Meta, Nvidia, Apple, Stanford, John Hopkins and HuggingFace are reinventing the next generation of foundation model training and alignment to create AI-powered conversational agents for real time patient-AI interactions.
Why Join Our Team:
- Innovative Mission: We are developing a safe, healthcare-focused large language model (LLM) designed to revolutionize health outcomes on a global scale.
- Visionary Leadership: Hippocratic AI was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from leading institutions, including El Camino Health, Johns Hopkins, Stanford, Microsoft, Google, and NVIDIA.
- Strategic Investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.
- World-Class Team: Our team is composed of leading experts in healthcare and artificial intelligence, ensuring our technology is safe, effective, and capable of delivering meaningful improvements to healthcare delivery and outcomes.
Responsibilities:
- Design, develop, evaluate and update data-driven models for Speech First applications.
- Participate in research activities including the application and evaluation of speech technologies in the medical domain.
- Research and implement state-of-the-art (SOTA) models for conversational speech recognition from zero to one.
Basic Qualifications:
- PhD with 3+ years of experience in Speech Recognition or related field or Masters with 5+ years of hands on experience with ASR.
- Experience designing and developing algorithms for accurate and efficient speech recognition for both streaming and non-streaming use cases.
- Experience with training, evaluating, and optimizing ASR models for various factors including accuracy, latency, and resource utilization.
- Experience with preprocessing and curating large speech datasets for training models.
- Strong programming skills with working knowledge of Python & C++.
- Comfort working in a Linux/Unix command-line environment.
- Team player with good communication skills (oral and written).
Preferred Qualifications:
- Experience with building state-of-the-art ASR solutions, including setting up data pipelines, model architectures, and evaluation pipelines from zero to one.
- Hands-on experience with ESPNET, Kaldi, and PyTorch.
- Experience with CUDA.
- Experience leveraging large language models (LLMs) for enhanced speech recognition tasks.
- Experience with neural/end-to-end (E2E) endpoint modeling.
- Publications in tier 1 journals in the field of speech recognition/NLP.
How to Apply:
Click the "Apply for this Job" button on the job posting page to submit your application.