Academic Job Offers
You want to be part of our team as a PhD student or long term full staff member? We herewith encourage good students to send an application (incl. motiviation letter, CV etc.) to Prof. Stiefelhagen (rainer stiefelhagenDoy4∂kit edu).
Please note that we do not offer summer internships.
Hiwi Job Offers
Student Assistant — Scientific Linux Infrastructure
Support the administration of our Linux-based research servers, identity management, and virtualization infrastructure. This is not first-level IT support — the focus is on scientific computing, working alongside the responsible staff.
Tasks
GPU server setup and integration; administration of the Linux domain, Kerberos, and LDAP; work on LXC containers and KVM/QEMU VMs, including migration from LXC to VM-based setups.
Occasional workstation setup and minor hardware work on research systems (e.g. robotics). Almost no routine office IT.
Required skills
Solid Linux experience and a strong interest in server administration. Useful: shell scripting, networking basics, Kerberos/LDAP, LXC, KVM/QEMU.
What you gain
Hands-on experience with research-grade Linux infrastructure: server administration, virtualization, networking, identity management, and scientific computing platforms (+ money).
Interested? Contact CVHCI Admins. Online since April 2026.
Hiwi für SMART AGE Projekt [pdf] (online since September 2021)
HiWi for Mobility Assistance Systems (iOS User Interface Development) [pdf] (online since February 2025)
HiWi for Mobility Assistance Systems (Hardware Integration) [pdf] (online since February 2025)
Bachelor/Master Theses
Vision in Robotics
Perception and action for robotic agents — vision–language–action models, world action models, object segmentation, grasp planning.
| Topic | Level | Supervisor | Online since | Description |
|---|---|---|---|---|
| Various topics related to Vision–Language–Action (VLA) Models, World Action Models (WAM), Robotics Simulation (Isaac Sim), object detection, segmentation, grasp planning, etc. Open supervision across robotic perception and action learning. Multiple concrete directions available — contact David Schneider to shape a topic that fits your background. |
MA |
D. Schneider | April 2026 | Contact |
| Cross-image visual prompting for open-vocabulary detection with SAM-3 Transfer visual prompts across images to enable open-vocabulary object detection. |
MA |
D. Schneider | April 2026 |
More in this area: further topics in robotic perception, embodied learning, and manipulation — contact David Schneider.
Human motion, label noise, domain generalization & GenAI
Activity recognition, muscle activation estimation, synthetic data generation, domain adaptation and generalization.
| Topic | Level | Supervisor | Online since | Description |
|---|---|---|---|---|
| A robust approach towards imperfect scribble semantic segmentation available
Label-noise-tolerant learning from scribble annotations for semantic segmentation. |
MA |
Kunyu Peng | December 2024 | |
| Referring object tracking in long video available
Language-guided object tracking across long-horizon video sequences. |
MA |
Kunyu Peng | August 2024 | |
| LLMs for human muscle activation interpretation / estimation
Use large language models to interpret and estimate muscle activation patterns from motion data. |
MA |
D. Schneider | April 2026 |
More in this area: further topics on deep learning for human activity understanding are available — contact David Schneider or Kunyu Peng.
Document analysis & Vision–Large Language Models
Layout analysis, retrieval-augmented generation, anomaly detection, and unified representations for complex documents.
| Topic | Level | Supervisor | Online since | Description |
|---|---|---|---|---|
| Document anomaly detection
Detect anomalous regions, artifacts, or forgeries in complex document layouts. |
MA |
O. Moured · Y. Chen | February 2025 | |
| Multilingual few-shot document layout analysis
Few-shot layout analysis that generalizes across scripts and languages. |
MA |
O. Moured · Y. Chen | February 2025 | |
| Vision-based long-document information retrieval
Retrieve information from long documents end-to-end with vision–language models. |
MA |
O. Moured · Y. Chen | February 2025 | |
| Unified document representation
A single representation spanning text, layout, and visual elements of documents. |
MA |
O. Moured · Y. Chen | March 2025 |
More in this area: further theses on document analysis, generation, and VLLMs with RAG — contact O. Moured or Yufan Chen.
Medical computer vision
Surgical video understanding, interactive segmentation, and learning under generalization, adaptation and data-scarcity constraints.
| Topic | Level | Supervisor | Online since | Description |
|---|---|---|---|---|
| Benchmarking hallucinations in multi-turn surgical video dialogue new
Evaluate VLLM hallucinations in multi-turn dialogue grounded on surgical video. |
MA |
K. Peng · J. Wei | December 2025 |
More in this area: interactive segmentation for medical image analysis — contact Z. Marinov. Medical CV under generalization, adaptation and data-scarcity — contact Simon Reiß.
Scene segmentation & understanding
Open-ended scene understanding, vision–language, and embodied AI.
| Topic | Level | Supervisor | Online since | Description |
|---|---|---|---|---|
| Computer vision for real-world scene understanding
Open-ended scene understanding beyond closed-vocabulary benchmarks. |
MA |
J. Zhang | February 2025 |
More in this area: vision-and-language and embodied AI — contact J. Zheng.
Other topics
Visual in-context learning, few-shot, self-/semi-supervised and data-centric learning — contact Simon Reiß.
Your own proposal
Don’t see a matching topic? We welcome own proposals in all of the areas above. Please reach out to the supervisor whose group best matches your interest.
Additional open positions across KIT are listed at ACCESS∂KIT.
