Yannik Hesse

Research & Machine Learning Engineer

Jena, Germany

I work on reinforcement learning and agentic AI. I did my Master's at RWTH Aachen University with an exchange at ETH Zurich. Right now I'm fine-tuning AI agents for computer use at Warmwind.

Yannik Hesse

Research

Paper (under review)

Learning to Search and Searching to Learn for Generalization in Planning

With Michael Aichmüller and Hector Geffner. A self-improving WA* learning framework combining a value heuristic (Relational GNN) with best-first search: the heuristic guides search, and the search data updates the heuristic via Q-Learning. The resulting heuristics generalize zero-shot to much larger instances, e.g. trained on Blocksworld with <30 blocks, solving instances with 488 blocks without search. Evaluated on Sokoban, PushWorld, The Witness, and IPC 2023 benchmarks.

PyTorch PyTorch Geometric R-GNNs WA* Q-Learning
Master's Thesis

On Limits of GNNs for Planning in Pushworld

Combines heuristic search with a learned value function for RL in planning domains. Tested on PushWorld, a benchmark for tool use and long-horizon reasoning. Extends R-GNNs with global attention pooling, reaching performance competitive with the classical planner LAMA.

PyTorch PyTorch Geometric GNNs Reinforcement Learning
Read thesis →
Bachelor's Thesis

Parallel Taping in Adjoint Algorithmic Differentiation

Speeds up adjoint AD by splitting the primal into partial functions and recording tapes concurrently with OpenMP. Includes checkpointing and rematerialization for memory-bound settings. Got up to 4x speedup on a single machine. Results were later integrated into dco/c++.

C++ CMake OpenMP HPC
Read thesis → View code →

Open Source

Reinforcement Learning

Contribution to Puffer.ai

Built high-speed RL environments in C and training algorithms in PyTorch for PufferLib, an open-source high-performance reinforcement learning library. My 2048 environment is now a core entry in their benchmarking stack. Also featured on zen2048.com.

C Raylib PyTorch
View project →
Document Processing

pdfalign

Table extraction with OCR from PDFs. Uses mean shift to align and pull structured data from scanned and digital documents.

Tesseract Mean Shift Python PyPI
View project →

Experience

2025 – present

ML Engineer , Warmwind

Fine-tuning agentic AI for computer use.

2025 – 2026

Researcher , ML Lab, RWTH Aachen

Research on classical planning + deep RL. Built tree-based training algorithms with PyTorch and Slurm.

2024

Data Engineer Intern , Infineon, Singapore

Built deep learning models for auditing automation. NLP and CV for document processing with LangChain, LLaMA, PyTorch.

2023 – 2024

Exchange , ETH Zurich

UNITECH exchange semester. Computer vision, deep learning, planning for autonomous robots. Course project: UDRL →

Education

2022 – 2025

M.Sc. Computer Science, RWTH Aachen

Summa cum laude. Thesis grade 1.0. Finished in Regelstudienzeit. Focus on deep learning and reinforcement learning. Includes UNITECH exchange at ETH Zurich.

2019 – 2022

B.Sc. Computer Science, RWTH Aachen

Grade 1.4 (very good). Finished in Regelstudienzeit. Focus on mathematics.

Skills

Machine Learning

  • Reinforcement Learning
  • Agentic AI / Computer Use
  • Deep Learning (PyTorch)
  • Fine-Tuning / RLHF
  • Computer Vision
  • NLP / LLMs

Programming

  • Python
  • C / C++
  • SQL

Tools

  • PyTorch / PyTorch Geometric
  • Slurm / HPC Clusters
  • Docker
  • Git
  • LangChain / Hugging Face

Research

  • Scientific Writing
  • LaTeX
  • Experiment Design

Games

I love logic puzzles and competitive games. Here are some I built in my free time.

Reaction

Two player game. Remove all the orbs from your opponent's board.

Play →

Poker Probability

How good are you at judging probabilities? Play poker but you only pick the winning probability.

Play →

NPM Guesser

Big npm projects pull in wild dependencies. How many of these random packages with millions of downloads do you actually know?

View project →