Reinforced Exploits, Not Optimized Rewards
run_tests() function instead…
MSc CS @ ETH Zürich · MATS 10.0 · AI Alignment
MSc Computer Science student at ETH Zürich and MATS 10.0 scholar. I work on empirical AI alignment — midtraining interventions, generalization, RL, and reward hacking.
Incoming Scholar, Technical Track. Working on extending Model Spec midtraining.
Independent empirical AI safety research on reward hacking in code models — when it emerges, whether it generalizes, and whether it can be detected.
“Anthropomorphic Misalignment Research Needs Stronger Evidence” (spotlight + oral).