Efficient Training and Inference of Multimodal Foundation Models.
This project aims to design elastic MLLM architectures, integrate Zero-Cost neural architecture search, and develop novel model compression techniques that jointly optimize accuracy and resource efficiency. By leveraging efficient training and inference strategies, such as speculative sampling, it seeks to substantially enhance inference speed and resource utilization across diverse application scenarios.
Related Patents
- Resource-Efficient Training for MLLMs via Adaptive Data Filtering
- Efficient Collaborative Inference for Autoregressive MLLMs
- Device and Edge Collaborative Personalized Inference for MLLMs with Heterogeneous Resources
- Efficient MLLM Training via Expanded Pipeline Stages
- Self-Driven Feedback and Symbolic Collaboration for MLLM Inference
- LoRA and MoE-Based Fine-Tuning for MLLMs
- Elastic Architecture Design and Pruning for MLLMs with Heterogeneous Experts
- Efficient MLLM Initialization via Weight Inheritance
