Resources about distributed training with Megatron-LM

Github: https://github.com/NVIDIA/Megatron-LM
Document on NeMo: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

NeMo is a cloud-native generative AI framework built on top of Megatron-LM.

Overall view of Megatron-Core: https://docs.nvidia.com/megatron-core/developer-guide/latest/index.html

Official APIs with formal product support…

Megatron-LM are basically based on the following three papers. Let’s do some notes on them.

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Read more »

Video: https://www.youtube.com/watch?v=J2YC0-k57NM

C3D

3D convolution, easy to understand. Circumvent the use of optical flow, which is time-consuming. But 3D conv itself is also time-consuming with heavy computations.

Read more »
0%