Large language models (LLMs) such as OpenAI’s GPT series and Google’s BERT, have become fundamental technologies driving a growing number of applications, from automated customer service to sophisticated research tools. They are designed to work like human intelligence in ingesting text from books, websites, and other digital content on large databases. They learn the statistical properties of language, which allows them to generate coherent and contextually relevant text given a prompt or other input.
However, these models are created and trained at significant financial cost, given the large-dimensional parameter spaces and computational power concerned. This article looks at the complex costs of bringing such generative AI models to life, focusing mainly on infrastructure needs, data management, and the increasingly pivotal role of cloud computing.
The Anatomy of Large Language Models
LLMs typically use transformer models, which employ mechanisms like attention and context awareness to process parts of text to each other. This allows the model to weigh the importance of different parts of the input text differently, depending on the context provided by other parts of the text. For example, BERT learns word context in a sentence through bidirectional reading; thus, it seems particularly effective in tasks requiring a deep understanding of language context. This type of model uptake can be seen in BERT, which reads the entire text as context by reading the text bidirectionally (left-to-right and right to left), resulting in the model being able to effectively analyze text that requires deep contextual comprehension.
The Cost of Training
Now, let’s take a look at the cost of training LLMs with cloud services. Training LLMs require substantial computational resources, mainly high-end GPUs and specialized AI hardware. For example, estimates for the compute cost of training GPT-3 alone vary from around $500,000 up to $4.6 million, depending on specific hardware and operational efficiencies achieved during training.
As more and more AI development shifts to the cloud, cloud services have become one of the easiest and most reliable ways to train LLMs. Their scalability is excellent for the fluctuating demands of AI training cycles, but that convenience comes at a price.
At the NVIDIA GTC 2024 conference, the CEO of NVIDIA Jensen Huang revealed that it took 3 to 5 months to train the GPT-MoE-1.8T model using 25,000 Ampere-based GPUs (most likely an A100). He expects a similar time frame to train the same model on Hopper (H100) using 8,000 GPUs in 90 days. Most users will not train LLMs from scratch due to the vast cost associated with training LLMs. Instead, they will use pre-trained models that resulted from the efforts of others (such as ChatGPT or Llama2).
Training LLMs with Cloud GPU Services
There are 2 main methods of training LLMs using cloud GPU services:
- Hosting your own model
- Pay per token
Hosted models in the cloud
When hosting models in the cloud, some cloud service providers like LayerStack offer full stacks that can take you through the complete machine learning lifecycle, from data storage and compute to deployment and management. However, the costs involved are more than just the price of the GPUs. In a cloud service setup, you must also consider the costs of virtual CPUs (vCPUs), memory (RAM), and storage.
All these components can add up the costs, so optimizing resources to stay within cost is necessary. Generally, cloud providers typically charge based on the compute time, the amount of memory allocated, and the amount of data stored or transferred, which makes the training of large AI models more costly. Given that training an LLM can take months, there will be a cost additive over time, especially if you are going to train over iterations over large data sets.
Nevertheless, some users may choose to pay per token. Here’s how it works.
Pay per token (PPT) model for LLM Utilization
Pay-per-token (PPT) models emerged as a cost-effective solution for accessing large language models (LLMs), companies such as OpenAI and Google AI “pre-train” massive LLMs on specific datasets, datasets which are available to the public via APIs. Developers and businesses can use LLMs, such as GPT-3 or similar, without the need and difficulty of training expensive models themselves.
Users are not responsible for the initial costs of training and infrastructure, instead, they will pay a fee based on the number of tokens (which are approximately equivalent to words or sub-words) processed by the LLM while executing the task of text generation, translating, and/or writing code.
This model is much more economical for locations where heavy use of the LLM is not required, compared to in-house training, and users pay only for the resources used.
What makes training LLMs so costly?
Training large language models requires extensive computing power, and these models operate with billions of learning parameters and involve reaction computations over lengthy periods on sophisticated GPU processing hardware. Cloud services providing the infrastructure are expensive with costs driven by compute time, storage, and data transfer.
Steps to Reduce the Cost of Training LLMs
Though costs still can be significant when it comes to training LLMs, there are strategies to utilize resources better and reduce costs:
- Implement model optimization techniques:
- Carefully select the model architecture
- Optimize training data
- Use knowledge distillation
- Employ mixed-precision training
- Consider hardware optimizations:
- Monitor and optimize hardware utilization
- Choose the right hardware for your needs
- Explore different cloud service providers and pricing models
- Collaborate and Leverage Open-Source Tools
- Employ Open Source Frameworks
- Collaborate with a Research Institution
Data requirements and costs
Data is the lifeblood of LLMs. The quality, volume, and diversity of data critically affect the model’s effectiveness and accuracy. Collecting, cleaning, and keeping data incurs a considerable cost. And, importantly, data has to be sufficiently large and diverse so that the model can be trained to minimize biases and generalize across different contexts and inputs. Collecting a dataset involves an enormous amount of labor, including human labor, such as labeling for supervised learning problems, etc.
Nonetheless, data is not free, and even efficient management of data adds substantial costs. Below are several financial considerations involved with data management for LLMs:
Data acquisition: There are primarily two types of data acquisition for LLM training; buying already collected datasets and licensing existing datasets. Well-known research institutes and private companies will curate text and code datasets for LLM training. Datasets can be quite expensive depending on size, specificity to the subject area, and quality.
Data storage: Often, storing extremely large datasets decides that data management process, and for certain datasets, this can add substantial cost. Traditional on-premise storage methods can be expensive to maintain and scaling. The alternative of cloud storage should give the organization the option to scale (also incurring further cost), and possibly save money, but the data in the cloud storage facility will incur storage fees over time, especially with datasets in the terabyte or petabyte range.
Data preprocessing: Raw data, in its unprocessed state is rarely sufficiently usable for LLM training. Therefore, data will usually need significant cleaning, labeling, and formatting. Preprocessing data can involve:
Formatting: Ensuring the dataset formatting remains consistent and compatible with the LLM framework.
Implementing Cost-Saving Strategies in LLM Development
These strategies when applied, will allow both the researcher and the developer to save a large portion of the cost that would have been used in the training of LLMs. Optimization of the mode1l should be done carefully, together with using efficient hardware and cloud services, while cost-saving training configurations are put in place to minimize the financial burden that LLM development places on one.
In conclusion, given that AI and language models are ongoing developments, the bearing of the associated costs is an issue. However, with the continued improvement of hardware, software, and training techniques, we can expect these costs to become more manageable over time. Thus, the challenge remains how one focuses on the higher-capacity models of AI, keeping cost, environmental impact, and accessibility into consideration.