Senior Data Scientist
We are looking for an experienced Data Scientist with focus on large clinical text datasets (electronic medical records) to extract structured information like cancer diagnosis, stage, treatment, and timing, using LLMs. They will be responsible for processing clinical documents, prompt engineering, evaluating outputs against human-labeled data, and collaborating closely with clinical and data science teams.

Job  Description
Customertimes is a global digital engineering, product development, and technology consulting company. Headquartered in New York, we have a team of 1300+ experts and offices in 12 countries. Â
‍
Requirements:
- Strong proficiency in Python;
- 3+ years of relevant working experience in a technical capacity, with a focus on ML. Prior experience with LLMs is strongly preferred;
- Familiarity with LLM-based prompt engineering and text processing workflows;
- Basic familiarity with Spark, Databricks, and MLFlow environments (for interacting with infrastructure, not deep configuration);
- Experience working in AWS environments;
- Comfort with object-oriented programming concepts (e.g., using/extending classes, Pydantic models);
- Experience working with Anthropic models (preferred) or other major LLMs (OpenAI, LLAMA, etc.);
- Familiarity with RAG (Retrieval Augmented Generation) concepts (not core but potentially useful in the future);
- Basic exposure to classical NLP (e.g., LSTM architectures, non-LLM text processing);
- Healthcare or clinical text processing experience is a strong plus, but not required;
- Understanding of model evaluation metrics (classification, regression) and statistical validation basics.
‍
Responsibilities:
- Process electronic medical record documents into a form suitable for LLMs;
- Design and craft effective prompts for information extraction;
- Evaluate LLM performance against human-labeled ground truth;
- Collaborate with clinicians, epidemiologists, and statisticians to align extracted data with clinical meaning;
- Troubleshoot and iterate extraction pipelines;
- Integrate with internal ML infrastructure and production environments.
‍
What We Offer:
- 20 days of paid vacation, 15 paid days of sick leave with a doctor’s note, and 5 days of paid sick leave without a doctor’s note; 
- Medical insurance coverage for employees, plus an option for family insurance coverage at a corporate rate; 
- Support for participation in professional development opportunities (webinars, conferences, trainings, etc.); 
- Regular team-building opportunities as well as bi-annual company-wide events; 
- Flexible work environments, including in-office, remote, or hybrid, based on employee preference and manager approval. 
‍
Apply now
Senior Data Scientist
