Thesis Application I am highly interested in completing my Bachelor’s thesis at the Chair for Machine Learning for Computer Vision and would like to discuss a potential thesis topic with you at an early stage. The thesis will be conducted in cooperation with PIT-CUP GmbH, where I am currently employed as a Working Student. My work there focuses on getting state-of-the-art Vision-Language Models (VLMs) to run completely offline on mobile devices. I have developed a working prototype in .NET MAUI that performs complex defect detection (e.g., analyzing cracks or rust) without any cloud connectivity. Unlike standard mobile deployments, the system I have engineered implements several advanced optimization techniques to overcome the hardware limitations of mobile processors (NPU/CPU). Specifically, I have successfully implemented a Static Key-Value (KV) Caching mechanism and a Split-Architecture Inference Pipeline using ONNX Runtime. These modifications allow large transformer models to run with fixed memory footprints and zero-allocation generation loops, solving the garbage collection latency issues typical in .NET environments. For my Bachelor’s thesis, I would like to scientifically evaluate this architecture. My goal is to quantify the performance impact of Static KV-Caching versus dynamic allocation on mobile hardware, analyze the memory trade-offs of component splitting (Vision Encoder vs. Text Decoder), and validate that LoRA-merged quantization retains sufficient accuracy for industrial inspection tasks. Proposed Thesis Topic Optimization and Evaluation of Static KV-Caching and Split-Architecture Pipelines for Offline Vision-Language Models on Mobile Devices Topic Description This thesis investigates the specific architectural challenges of deploying Large Vision-Language Models (LVLMs) on resource-constrained mobile hardware without internet connectivity. A custom .NET MAUI prototype serves as the experimental platform. The core scientific contribution lies in the comparative analysis of inference strategies. Specifically, the thesis will evaluate: Static KV-Caching: The performance capabilities of pre-allocating transformer attention buffers to eliminate runtime memory allocation overhead on mobile CPUs. Architecture Splitting: The memory-bandwidth benefits of physically separating the Vision Encoder and Text Decoder components to allow strict memory management during the inference cycle. Quantization Robustness: Evaluating the trade-offs between execution speed (via INT8/FP16 quantization) and the accuracy of domain-specific tasks (defect classification) when using LoRA-merged models. The work aims to provide a proven reference architecture for deploying complex multi-modal AI models on the Edge.