Senior Data Scientist

We are looking for an experienced Data Scientist with focus on large clinical text datasets (electronic medical records) to extract structured information like cancer diagnosis, stage, treatment, and timing, using LLMs. They will be responsible for processing clinical documents, prompt engineering, evaluating outputs against human-labeled data, and collaborating closely with clinical and data science teams.

Type:

Full-time

Remote

Job ID:

JR - 85541

Apply now
Technologies:
Data Science
Machine Learning
Deep Learning
Large Language Models (LLMs)
Python
Spark
Databricks
MLflow
AWS
Open AI
LLAMA
Retrieval Augmented Generation (RAG)
NLP
Locations:
LATAM
Dominican Republic
Portugal
Spain
Poland
Romania
Ukraine
Georgia
South Africa

Table of contents

Apply now
Let’s be in touch!

Job  Description

Customertimes is a global digital engineering, product development, and technology consulting company. Headquartered in New York, we have a team of 1300+ experts and offices in 12 countries.  

‍

Requirements:

  • Strong proficiency in Python;
  • 3+ years of relevant working experience in a technical capacity, with a focus on ML. Prior experience with LLMs is strongly preferred;
  • Familiarity with LLM-based prompt engineering and text processing workflows;
  • Basic familiarity with Spark, Databricks, and MLFlow environments (for interacting with infrastructure, not deep configuration);
  • Experience working in AWS environments;
  • Comfort with object-oriented programming concepts (e.g., using/extending classes, Pydantic models);
  • Experience working with Anthropic models (preferred) or other major LLMs (OpenAI, LLAMA, etc.);
  • Familiarity with RAG (Retrieval Augmented Generation) concepts (not core but potentially useful in the future);
  • Basic exposure to classical NLP (e.g., LSTM architectures, non-LLM text processing);
  • Healthcare or clinical text processing experience is a strong plus, but not required;
  • Understanding of model evaluation metrics (classification, regression) and statistical validation basics.

‍

Responsibilities:

  • Process electronic medical record documents into a form suitable for LLMs;
  • Design and craft effective prompts for information extraction;
  • Evaluate LLM performance against human-labeled ground truth;
  • Collaborate with clinicians, epidemiologists, and statisticians to align extracted data with clinical meaning;
  • Troubleshoot and iterate extraction pipelines;
  • Integrate with internal ML infrastructure and production environments.

‍

What We Offer:

  • 20 days of paid vacation, 15 paid days of sick leave with a doctor’s note, and 5 days of paid sick leave without a doctor’s note; 
  • Medical insurance coverage for employees, plus an option for family insurance coverage at a corporate rate; 
  • Support for participation in professional development opportunities (webinars, conferences, trainings, etc.); 
  • Regular team-building opportunities as well as bi-annual company-wide events; 
  • Flexible work environments, including in-office, remote, or hybrid, based on employee preference and manager approval. 

‍

Apply now

Senior Data Scientist