Professor Catherine Zhao

CSENG Computer Science & Eng
College of Science & Engineering
Twin Cities

Project Title:

Deep Learning With Large Vision Language Models

This team focuses on developing trustworthy AI systems for multiple applications. Specifically, their current research builds on a variety of multi-modal tasks (i.e., visual attention, knowledge, reasoning, generation, planning) with large foundational models (e.g. LLM: LLaMA, PALM; VLM: LLaVA; AIGC: ControlNet, SD). Those works cover the preprocessing of large-scale real/simulated datasets (images, videos) and the training/fine-tuning of LLM/VLMs/AIGC, requiring a large number of computing GPU resources. For instance, fine-tuning an LLaMa with 7B parameters usually requires > 90G VRAM (i.e., 4-8 A100) and high data throughput. Besides training LLM/VLMs/AIGC, the researchers also need to leverage generative models to create large-scale simulated data and optimize the LLM/VLMs/AIGC with them.

Are you a member of this group? Log in to see more information.