Research Engineer
Generative AI
Role Overview
We are looking for a Research Engineer to sit at the intersection of applied science and product engineering.
You will be one of a small team responsible for evaluating new product directions before they reach our engineering org, assessing feasibility, running rapid experiments, and owning the ML foundations that power what we build.
This is not a pure research role, nor a pure engineering role: you will read a PRD, determine whether the idea is worth pursuing, prove it with code, and then stay in the room to help build it.
Your focus is Generative AI. You will own the research layer for any product direction involving foundation models, from LLM-backed features and RAG pipelines to multimodal applications and agentic systems, and you will be expected to move fast without losing engineering rigour.
The role combines rapid technical judgment, disciplined experimentation, and close collaboration with product and engineering teams in a high-ownership environment.
Key Responsibilities
Feasibility & Architecture
Receive PRDs and produce clear feasibility assessments with risk-rated recommendations
Author architecture documents that bridge research findings and engineering implementation, forming the basis for user stories and sprint planning
Collaborate with product and engineering teams to translate research outcomes into buildable specs
Rapid Experimentation
Design and execute time-boxed experiments in sandboxed environments to validate or invalidate product hypotheses quickly
Build lightweight MVPs to demonstrate technical viability before full engineering investment
Know when to stop - ruthlessly prioritise signal over polish in the exploration phase
LLM & Foundation Model Research
Select, evaluate, and adapt foundation models to product requirements, knowing when a model is good enough off the shelf and when it needs fine-tuning or replacement
Own the full experimentation lifecycle: model selection, prompt development, adaptation, and evaluation, treating each as an engineering discipline with proper versioning and reproducibility
Evaluate open-weight model families (Qwen, Llama, Mistral) against proprietary APIs (OpenAI, Anthropic, Gemini) for cost, latency, and quality trade-offs in production workloads
RAG & Knowledge Systems
Architect and prototype retrieval-augmented systems, owning the full pipeline from data ingestion and chunking to entity extraction, knowledge-graph construction, and response quality evaluation
Work with graph-based RAG architectures (GraphRAG / LightRAG) that combine vector retrieval with structured knowledge graphs for domain-specific reasoning
Agents & Orchestration
Prototype agentic systems using multi-agent frameworks and assess their reliability, failure modes, and cost profiles before recommending them for productisation
Design agent orchestration patterns including stateful multi-turn conversations, task decomposition, and tool-use pipelines
ML Infrastructure & Experiment Tracking
Set up and maintain experiment tracking infrastructure and enforce rigorous logging discipline across the team
Lead data curation efforts: sourcing, cleaning, versioning, and documenting datasets
Build reusable research infrastructure - evaluation harnesses, prompt registries, model comparison tooling, baseline suites - that accelerates iteration speed
Proactively identify gaps in internal infrastructure (model serving, observability for LLM calls, cost monitoring, evaluation frameworks) and surface them as prioritised proposals for the engineering roadmap
Required Qualifications
Education
MS in Computer Science, Mathematics, Physics, Statistics, or a related quantitative discipline
PhD strongly preferred
Experience & Technical Depth
Deep, hands-on experience with LLMs in production or near-production settings
Strong Python engineering skills with a genuine commitment to clean, performant, maintainable code; you apply SOLID principles in research contexts as readily as in production
Docker proficiency; our entire platform is containerised (20+ services), and you must be comfortable building, debugging, and composing multi-service Docker environments daily
Experience with open-weight model serving infrastructure (SGLang, vLLM, or TGI), including GPU memory management, quantization formats (FP8 / FP4 / BF16), and inference optimization
Solid understanding of the trade-offs between proprietary APIs (OpenAI, Anthropic, Gemini) and open-weight models (Qwen, Llama, Mistral families), including when to call an API versus self-host
Experience with vector databases and hybrid retrieval architectures (dense + sparse, vector + graph)
Awareness of responsible AI considerations including hallucination, bias, safety, and content filtering
Evaluation, Infrastructure & Communication
Experience with LLM evaluation methodology; you have used or built tools like RAGAS, DeepEval, or custom eval harnesses
Strong written communication: your architecture docs and feasibility assessments are readable by engineers and product managers alike
Comfort operating in time-boxed exploration cycles where reproducibility, logging discipline, and pragmatic decision-making matter as much as model quality
Ability to move from PRD to prototype to implementation guidance without losing engineering rigour
Nice to Have
Experience with knowledge-graph-augmented retrieval (GraphRAG / LightRAG) or entity-relation extraction pipelines; we use graph + vector hybrid RAG, not just plain vector search
Experience with multi-agent orchestration frameworks (AutoGen, LangChain, CrewAI, or custom orchestration)
TypeScript proficiency; comfort working across the stack accelerates prototyping and production handoff
Domain experience in healthcare, medical coding (ICD-10, CPT, SNOMED CT), or regulated industries
Familiarity with distributed task processing (Celery, or similar) for async ML workloads
Experience with NLP tooling beyond LLMs, including entity recognition, fuzzy matching, or biomedical text processing
Exposure to differential privacy or synthetic data generation
Experience with observability for ML systems, including distributed tracing, LLM call monitoring, or cost tracking
Experience with systematic prompt optimization frameworks (DSPy or equivalent compiler-driven approaches)
Languages
Fluent English (mandatory)
Italian is a plus
What We Offer
A central role in shaping new product directions before full engineering investment
Direct exposure to frontier foundation-model, RAG, multimodal, and agentic system work in a regulated healthcare-data environment
High ownership across research, architecture, experimentation, and production handoff
The opportunity to build reusable ML research infrastructure that compounds team velocity
Flexible, international, and mission-driven working environment
