For more information or if you need help retrieving your data, please contact Weights & Biases Customer Support at support@wandb.com
Training a multi-billion parameter LLM is usually a highly
experimental process with lots of trial and error. Normally, the
team would start with a much smaller model size, make sure
it’s promising, and scale up to more and more parameters.
Keep in mind that as you scale, there will be issues that require
addressing which simply won’t be present when training on
smaller data sizes.
Let’s look at some common pre-training steps, starting with
architecture.