


As an intern (or Master Thesis) in the Industrial Inspection group, you will investigate the use of Vision-Language-Action models (VLAs, see references below) in industrial tasks. These recent models argue they can directly generate robot commands to perform simple tasks textually prompted by a user. However, it remains unclear to what extent these models can be reliably deployed in industrial settings, where robustness and accuracy are paramount. The goal of this project is to evaluate the ability of VLA models to execute industry-level tasks that feature more complex environments and more specific objects which are less common in the internet data they are trained with. This evaluation will include benchmarking existing models and fine-tuning them for specific industrial tasks. We aim to publish the results of this project at a leading conference in Machine Learning or Robotics and develop an internal demonstration of the solution. Starting date June 1st 2026 for 6 months Your responsibilities * Review the literature on VLA models * Implement existing VLAs on our robotic 2-arm setup * Develop a challenging yet realistic benchmark to test existing VLAs * Enhance current VLAs by either fine-tuning them with industry-specific data or by contributing methodological advancements * Summarise the contributions and findings in a paper intended for submission to a top-tier conference * Create a fully operational demonstration setup that can be presented to customers and stakeholders