Zhuo's Blog

Megatron-LM (3)

Posted on 2024-12-26 Edited on 2024-12-28 In Paper reading

Reducing Activation Recomputation in Large Transformer Models

Megatron-LM (2)

Posted on 2024-12-26 Edited on 2024-12-28 In Paper reading

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

Megatron-LM (1)

Posted on 2024-12-26 Edited on 2024-12-28 In Paper reading

Resources about distributed training with Megatron-LM

Github: https://github.com/NVIDIA/Megatron-LM
Document on NeMo: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

NeMo is a cloud-native generative AI framework built on top of Megatron-LM.

Overall view of Megatron-Core: https://docs.nvidia.com/megatron-core/developer-guide/latest/index.html

Official APIs with formal product support…

Megatron-LM are basically based on the following three papers. Let’s do some notes on them.

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

视频理解串讲-2

Posted on 2024-08-01 Edited on 2024-12-28 In Paper reading

Video: https://www.youtube.com/watch?v=J2YC0-k57NM

C3D

3D convolution, easy to understand. Circumvent the use of optical flow, which is time-consuming. But 3D conv itself is also time-consuming with heavy computations.

视频理解串讲-1

Posted on 2024-08-01 Edited on 2024-12-28 In Paper reading

Video: https://www.youtube.com/watch?v=gK7AGO6okhc

视频理解在深度学习时代有过已下几大探索：

BLIP-2 - Querying Transformer (Q-Former)

Posted on 2024-07-22 Edited on 2024-12-28 In Paper reading

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

DPO - Direct Preference Optimization

Posted on 2024-07-05 Edited on 2024-12-28 In Paper reading

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Tutorial https://www.youtube.com/watch?v=hvGa5Mba4c8
Paper: https://arxiv.org/abs/2305.18290

Mamba

Posted on 2024-07-05 Edited on 2024-12-28 In Paper reading

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mistral - Mistral.AI company

Posted on 2024-07-03 Edited on 2024-12-28 In Paper reading

Tutorial:https://www.youtube.com/watch?v=UiX8K-xBUpE

Mistral 7B

Quantization

Posted on 2024-07-03 Edited on 2024-12-28 In Paper reading

Tutorial: https://www.youtube.com/watch?v=0VdNflU08yA