Yannik Hesse
Research & Machine Learning Engineer
Jena, Germany
I work on reinforcement learning and agentic AI. I did my Master's at RWTH Aachen University with an exchange at ETH Zurich. Right now I'm fine-tuning AI agents for computer use at Warmwind.
Research
Learning to Search and Searching to Learn for Generalization in Planning
With Michael Aichmüller and Hector Geffner. A self-improving WA* learning framework combining a value heuristic (Relational GNN) with best-first search: the heuristic guides search, and the search data updates the heuristic via Q-Learning. The resulting heuristics generalize zero-shot to much larger instances, e.g. trained on Blocksworld with <30 blocks, solving instances with 488 blocks without search. Evaluated on Sokoban, PushWorld, The Witness, and IPC 2023 benchmarks.
On Limits of GNNs for Planning in Pushworld
Combines heuristic search with a learned value function for RL in planning domains. Tested on PushWorld, a benchmark for tool use and long-horizon reasoning. Extends R-GNNs with global attention pooling, reaching performance competitive with the classical planner LAMA.
Parallel Taping in Adjoint Algorithmic Differentiation
Speeds up adjoint AD by splitting the primal into partial functions and recording tapes concurrently with OpenMP. Includes checkpointing and rematerialization for memory-bound settings. Got up to 4x speedup on a single machine. Results were later integrated into dco/c++.
Open Source
Contribution to Puffer.ai
Built high-speed RL environments in C and training algorithms in PyTorch for PufferLib, an open-source high-performance reinforcement learning library. My 2048 environment is now a core entry in their benchmarking stack. Also featured on zen2048.com.
pdfalign
Table extraction with OCR from PDFs. Uses mean shift to align and pull structured data from scanned and digital documents.
Experience
Researcher , ML Lab, RWTH Aachen
Research on classical planning + deep RL. Built tree-based training algorithms with PyTorch and Slurm.
Data Engineer Intern , Infineon, Singapore
Built deep learning models for auditing automation. NLP and CV for document processing with LangChain, LLaMA, PyTorch.
Exchange , ETH Zurich
UNITECH exchange semester. Computer vision, deep learning, planning for autonomous robots. Course project: UDRL →
Education
M.Sc. Computer Science, RWTH Aachen
Summa cum laude. Thesis grade 1.0. Finished in Regelstudienzeit. Focus on deep learning and reinforcement learning. Includes UNITECH exchange at ETH Zurich.
B.Sc. Computer Science, RWTH Aachen
Grade 1.4 (very good). Finished in Regelstudienzeit. Focus on mathematics.
Skills
Machine Learning
- Reinforcement Learning
- Agentic AI / Computer Use
- Deep Learning (PyTorch)
- Fine-Tuning / RLHF
- Computer Vision
- NLP / LLMs
Programming
- Python
- C / C++
- SQL
Tools
- PyTorch / PyTorch Geometric
- Slurm / HPC Clusters
- Docker
- Git
- LangChain / Hugging Face
Research
- Scientific Writing
- LaTeX
- Experiment Design
Games
I love logic puzzles and competitive games. Here are some I built in my free time.
Poker Probability
How good are you at judging probabilities? Play poker but you only pick the winning probability.
Play →NPM Guesser
Big npm projects pull in wild dependencies. How many of these random packages with millions of downloads do you actually know?
View project →