DeepSeek Series - VLM
Paper list
- DeepSeek-VL: Towards Real-World Vision-Language Understanding
- Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
- JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
- Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
General information
DeepSeek-VL and VL2 are two multimodal models for understanding, no visual generation.
Janus series are text and image generative models. Among them, Janus and Janus-Pro use autoregressive mechanism for the image generation, while JanusFlow use Rectified flow, like the diffusion models, to iteratively refine the generated contents from a noise to an image.