About

Work with Us

About

Work with Us

Research Engineer
Generative AI

Apply Now

Job Description

Role Overview

We are looking for a Research Engineer to sit at the intersection of applied science and product engineering.

You will be one of a small team responsible for evaluating new product directions before they reach our engineering org, assessing feasibility, running rapid experiments, and owning the ML foundations that power what we build.

This is not a pure research role, nor a pure engineering role: you will read a PRD, determine whether the idea is worth pursuing, prove it with code, and then stay in the room to help build it.

Your focus is Generative AI. You will own the research layer for any product direction involving foundation models, from LLM-backed features and RAG pipelines to multimodal applications and agentic systems, and you will be expected to move fast without losing engineering rigour.

The role combines rapid technical judgment, disciplined experimentation, and close collaboration with product and engineering teams in a high-ownership environment.

Key Responsibilities

Feasibility & Architecture

Receive PRDs and produce clear feasibility assessments with risk-rated recommendations

Author architecture documents that bridge research findings and engineering implementation, forming the basis for user stories and sprint planning

Collaborate with product and engineering teams to translate research outcomes into buildable specs

Rapid Experimentation

Design and execute time-boxed experiments in sandboxed environments to validate or invalidate product hypotheses quickly

Build lightweight MVPs to demonstrate technical viability before full engineering investment

Know when to stop - ruthlessly prioritise signal over polish in the exploration phase

LLM & Foundation Model Research

Select, evaluate, and adapt foundation models to product requirements, knowing when a model is good enough off the shelf and when it needs fine-tuning or replacement

Own the full experimentation lifecycle: model selection, prompt development, adaptation, and evaluation, treating each as an engineering discipline with proper versioning and reproducibility

Evaluate open-weight model families (Qwen, Llama, Mistral) against proprietary APIs (OpenAI, Anthropic, Gemini) for cost, latency, and quality trade-offs in production workloads

RAG & Knowledge Systems

Architect and prototype retrieval-augmented systems, owning the full pipeline from data ingestion and chunking to entity extraction, knowledge-graph construction, and response quality evaluation

Work with graph-based RAG architectures (GraphRAG / LightRAG) that combine vector retrieval with structured knowledge graphs for domain-specific reasoning

Agents & Orchestration

Prototype agentic systems using multi-agent frameworks and assess their reliability, failure modes, and cost profiles before recommending them for productisation

Design agent orchestration patterns including stateful multi-turn conversations, task decomposition, and tool-use pipelines

ML Infrastructure & Experiment Tracking

Set up and maintain experiment tracking infrastructure and enforce rigorous logging discipline across the team

Lead data curation efforts: sourcing, cleaning, versioning, and documenting datasets

Build reusable research infrastructure - evaluation harnesses, prompt registries, model comparison tooling, baseline suites - that accelerates iteration speed

Proactively identify gaps in internal infrastructure (model serving, observability for LLM calls, cost monitoring, evaluation frameworks) and surface them as prioritised proposals for the engineering roadmap

Required Qualifications

Education

MS in Computer Science, Mathematics, Physics, Statistics, or a related quantitative discipline

PhD strongly preferred

Experience & Technical Depth

Deep, hands-on experience with LLMs in production or near-production settings

Strong Python engineering skills with a genuine commitment to clean, performant, maintainable code; you apply SOLID principles in research contexts as readily as in production

Docker proficiency; our entire platform is containerised (20+ services), and you must be comfortable building, debugging, and composing multi-service Docker environments daily

Experience with open-weight model serving infrastructure (SGLang, vLLM, or TGI), including GPU memory management, quantization formats (FP8 / FP4 / BF16), and inference optimization

Solid understanding of the trade-offs between proprietary APIs (OpenAI, Anthropic, Gemini) and open-weight models (Qwen, Llama, Mistral families), including when to call an API versus self-host

Experience with vector databases and hybrid retrieval architectures (dense + sparse, vector + graph)

Awareness of responsible AI considerations including hallucination, bias, safety, and content filtering

Evaluation, Infrastructure & Communication

Experience with LLM evaluation methodology; you have used or built tools like RAGAS, DeepEval, or custom eval harnesses

Strong written communication: your architecture docs and feasibility assessments are readable by engineers and product managers alike

Comfort operating in time-boxed exploration cycles where reproducibility, logging discipline, and pragmatic decision-making matter as much as model quality

Ability to move from PRD to prototype to implementation guidance without losing engineering rigour

Nice to Have

Experience with knowledge-graph-augmented retrieval (GraphRAG / LightRAG) or entity-relation extraction pipelines; we use graph + vector hybrid RAG, not just plain vector search

Experience with multi-agent orchestration frameworks (AutoGen, LangChain, CrewAI, or custom orchestration)

TypeScript proficiency; comfort working across the stack accelerates prototyping and production handoff

Domain experience in healthcare, medical coding (ICD-10, CPT, SNOMED CT), or regulated industries

Familiarity with distributed task processing (Celery, or similar) for async ML workloads

Experience with NLP tooling beyond LLMs, including entity recognition, fuzzy matching, or biomedical text processing

Exposure to differential privacy or synthetic data generation

Experience with observability for ML systems, including distributed tracing, LLM call monitoring, or cost tracking

Experience with systematic prompt optimization frameworks (DSPy or equivalent compiler-driven approaches)

Languages

Fluent English (mandatory)

Italian is a plus

What We Offer

A central role in shaping new product directions before full engineering investment

Direct exposure to frontier foundation-model, RAG, multimodal, and agentic system work in a regulated healthcare-data environment

High ownership across research, architecture, experimentation, and production handoff

The opportunity to build reusable ML research infrastructure that compounds team velocity

Flexible, international, and mission-driven working environment

Apply now

Complete the form below to express your interest in this role.

Information Security