ZeRO (DeepSpeed from Microsoft)
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
paper (2019 arxiv, SC’20): https://arxiv.org/abs/1910.02054
website: https://www.deepspeed.ai/
paper (2019 arxiv, SC’20): https://arxiv.org/abs/1910.02054
website: https://www.deepspeed.ai/
Tool | Full name | Purpose in Hexo | Key Benefits |
---|---|---|---|
NVM | node version manager | Manage Node.js versions for Hexo projects. | Avoid compatibility issues, work with multiple projects requiring different Node.js versions. |
NPM | node package manager | Install Hexo, plugins, and dependencies. | Streamlined dependency management, consistent environment across systems. |
Github: https://github.com/NVIDIA/Megatron-LM
Document on NeMo: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
NeMo is a cloud-native generative AI framework built on top of Megatron-LM.
Overall view of Megatron-Core: https://docs.nvidia.com/megatron-core/developer-guide/latest/index.html
Official APIs with formal product support…
Megatron-LM are basically based on the following three papers. Let’s do some notes on them.
Video: https://www.youtube.com/watch?v=J2YC0-k57NM
3D convolution, easy to understand. Circumvent the use of optical flow, which is time-consuming. But 3D conv itself is also time-consuming with heavy computations.
Tutorial https://www.youtube.com/watch?v=hvGa5Mba4c8
Paper: https://arxiv.org/abs/2305.18290