Megatron-LM (2)
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Megatron-LM (1)
Resources about distributed training with Megatron-LM
Github: https://github.com/NVIDIA/Megatron-LM
Document on NeMo: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
NeMo is a cloud-native generative AI framework built on top of Megatron-LM.
Overall view of Megatron-Core: https://docs.nvidia.com/megatron-core/developer-guide/latest/index.html
Official APIs with formal product support…
Megatron-LM are basically based on the following three papers. Let’s do some notes on them.
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
视频理解串讲-2
Video: https://www.youtube.com/watch?v=J2YC0-k57NM
C3D
3D convolution, easy to understand. Circumvent the use of optical flow, which is time-consuming. But 3D conv itself is also time-consuming with heavy computations.
BLIP-2 - Querying Transformer (Q-Former)
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
DPO - Direct Preference Optimization
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Tutorial https://www.youtube.com/watch?v=hvGa5Mba4c8
Paper: https://arxiv.org/abs/2305.18290
Mamba
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Quantization
Tutorial: https://www.youtube.com/watch?v=0VdNflU08yA