Yannik Hesse

Research & Machine Learning Engineer

Jena, Germany

I work on reinforcement learning and agentic AI. I did my Master's at RWTH Aachen University with an exchange at ETH Zurich. Right now I'm fine-tuning AI agents for computer use at Warmwind.

[email protected]

CV GitHub LinkedIn X

Research

Paper (under review)

Learning to Search and Searching to Learn for Generalization in Planning

With Michael Aichmüller and Hector Geffner. A self-improving WA* learning framework combining a value heuristic (Relational GNN) with best-first search: the heuristic guides search, and the search data updates the heuristic via Q-Learning. The resulting heuristics generalize zero-shot to much larger instances, e.g. trained on Blocksworld with <30 blocks, solving instances with 488 blocks without search. Evaluated on Sokoban, PushWorld, The Witness, and IPC 2023 benchmarks.

PyTorch PyTorch Geometric R-GNNs WA* Q-Learning

Master's Thesis

On Limits of GNNs for Planning in Pushworld

Combines heuristic search with a learned value function for RL in planning domains. Tested on PushWorld, a benchmark for tool use and long-horizon reasoning. Extends R-GNNs with global attention pooling, reaching performance competitive with the classical planner LAMA.

PyTorch PyTorch Geometric GNNs Reinforcement Learning

Read thesis →

Bachelor's Thesis

Parallel Taping in Adjoint Algorithmic Differentiation

Speeds up adjoint AD by splitting the primal into partial functions and recording tapes concurrently with OpenMP. Includes checkpointing and rematerialization for memory-bound settings. Got up to 4x speedup on a single machine. Results were later integrated into dco/c++.

C++ CMake OpenMP HPC

Read thesis → View code →

Open Source

Reinforcement Learning

Contribution to Puffer.ai

Built high-speed RL environments in C and training algorithms in PyTorch for PufferLib, an open-source high-performance reinforcement learning library. My 2048 environment is now a core entry in their benchmarking stack. Also featured on zen2048.com.

C Raylib PyTorch

View project →

Document Processing

pdfalign

Table extraction with OCR from PDFs. Uses mean shift to align and pull structured data from scanned and digital documents.

Tesseract Mean Shift Python PyPI

View project →

Experience

2025 – present

ML Engineer , Warmwind

Fine-tuning agentic AI for computer use.

2025 – 2026

Researcher , ML Lab, RWTH Aachen

Research on classical planning + deep RL. Built tree-based training algorithms with PyTorch and Slurm.

2024

Data Engineer Intern , Infineon, Singapore

Built deep learning models for auditing automation. NLP and CV for document processing with LangChain, LLaMA, PyTorch.

2023 – 2024

Exchange , ETH Zurich

UNITECH exchange semester. Computer vision, deep learning, planning for autonomous robots. Course project: UDRL →

Education

2022 – 2025

M.Sc. Computer Science, RWTH Aachen

Summa cum laude. Thesis grade 1.0. Finished in Regelstudienzeit. Focus on deep learning and reinforcement learning. Includes UNITECH exchange at ETH Zurich.

2019 – 2022

B.Sc. Computer Science, RWTH Aachen

Grade 1.4 (very good). Finished in Regelstudienzeit. Focus on mathematics.

Skills

Machine Learning

Reinforcement Learning
Agentic AI / Computer Use
Deep Learning (PyTorch)
Fine-Tuning / RLHF
Computer Vision
NLP / LLMs

Programming

Python
C / C++
SQL

Tools

PyTorch / PyTorch Geometric
Slurm / HPC Clusters
Docker
Git
LangChain / Hugging Face

Research

Scientific Writing
LaTeX
Experiment Design

Games

I love logic puzzles and competitive games. Here are some I built in my free time.

Reaction

Two player game. Remove all the orbs from your opponent's board.

Play →

Poker Probability

How good are you at judging probabilities? Play poker but you only pick the winning probability.

Play →

NPM Guesser

Big npm projects pull in wild dependencies. How many of these random packages with millions of downloads do you actually know?

View project →