Filip Nový

Filip Nový

AI/ML Engineer — Retrieval, GraphRAG, and Applied Machine Learning

MSc Data Science, University of Southern Denmark (June 2026)
Prague, Czech Republic · EU Citizen

About

I build retrieval-augmented generation systems that actually work in production. My master's thesis — a hybrid GraphRAG chatbot serving ~880 scientific papers — combined a Neo4j knowledge graph, dense vector search, and multi-query reformulation into a deployed application (algaebot.filipnovy.dk). A paper isolating the contribution of each pipeline component is under review at KONVENS 2026.

Previously I spent 1.5 years at T-Mobile Czech Republic engineering SQL data pipelines and shipping Power BI dashboards adopted by regional sales teams. I'm looking for AI/ML engineering roles where rigorous evaluation matters — not just building demos, but measuring whether they work.

Skills

Languages

Python (PyTorch, LangChain, FastAPI, Pydantic), SQL, R, Java basics

ML & NLP

Transformers, dense embeddings, cross-encoder reranking, RAG / GraphRAG, vector DBs (ChromaDB), knowledge graphs (Neo4j)

LLM & Evaluation

OpenAI / DeepSeek APIs, prompt engineering, RAGAS, custom eval frameworks, structured output (instructor), agentic workflows

Tools & MLOps

Git, Docker, Streamlit, Power BI, Plotly, DigitalOcean

Selected Projects

AlgaeBot — Hybrid GraphRAG for Scientific Literature

Master's Thesis · Neo4j · ChromaDB · LangChain · BGE · RAGAS · Streamlit

Production RAG system over 879 peer-reviewed PDFs. Full pipeline: PDF extraction with OCR fallback, recursive chunking (~32k chunks), knowledge graph construction (~60k entities, 100k+ edges), two-stage summarization, multi-query retrieval with cross-encoder reranking. Factorial ablation study isolating graph expansion and community summaries with statistical validation. +4.8% faithfulness over baseline. Paper under review at KONVENS 2026.

AlgaeBot system architecture Neo4j knowledge graph visualization

Deep Learning Portfolio

PyTorch · TensorFlow

From-scratch implementations: CNNs, RNNs/LSTMs, autoencoders, VAEs, GANs, Transformers, plus backpropagation and SGD variants. Built across coursework and self-study.

Vision Transformer for Plant Disease Classification

PyTorch · 87K images · 38 classes · 99.77% accuracy

Fine-tuned ViT with custom attention heads. Highest grade (12/12) on oral exam.

Quantitative Trading System

Python · Flask · Survival Analysis · Linear Programming

End-to-end trading advisor: Weibull survival model for order-fill probability, linear program for capital allocation. Ranked top 0.1% of 460,000+ accounts by actively traded capital.

Current rank: top 503 traders globally Trading system dashboard

Experience

Apr 2023 – Aug 2024

Data Analyst Trainee · T-Mobile Czech Republic

Engineered SQL pipelines against Oracle and MS SQL to consolidate revenue data. Designed Power BI dashboards adopted by regional sales teams across Czech Republic. Translated between business stakeholders and analytics team.

2025

Student Tutor · SDU Kolding

Mentored incoming MSc students across 30+ nationalities. Recognized in written recommendation by Programme Coordinator.

Education

2024 – 2026

MSc Data Science · University of Southern Denmark, Kolding

Thesis: Domain-Specific Chatbot for Algae Research — A Hybrid GraphRAG Approach. Supervisor: Tariq Yousef. Coursework (grade 12): Deep Learning, Applied ML, NLP, Data Visualization.

2020 – 2024

BSc Applied Computer Science · Prague University of Economics (VŠE)

Thesis: An Analysis of the Current State of Research on Artificial General Intelligence (grade A). Erasmus exchange in Poland.

Publications

Not All Context Helps: Isolating Graph Expansion and Community Summaries in Scientific RAG

Under review at KONVENS 2026

Controlled factorial ablation evaluating retrieval components over 879 scientific papers. Identified super-additive interaction between graph expansion and community summaries. Optimized configuration achieves +4.8% faithfulness.