Large Language Models for Commercial Use

This blog explains what a license is for LLM models and why it is important. We resolve any doubts that you might have about the licensing of these models so that you do not run into legal troubles while using, modifying, or sharing them.

Commercial LLMs

Seeing LLMs (Large language models) being used for a variety of value-generating tasks all across the industry, every business wants to get their hands on them. However, before you start using these models commercially, it is important to understand the licensing and legal norms around this.

In this blog, we will resolve any doubts that you might have about the licensing of these models so that you do not run into legal troubles while using, modifying, or sharing them.

šŸ“Œ
We will continuously update this blog to cover all major LLMs and their licensing implications.

Can you just start using LLMs for your business?

We had a chat with a few leaders, and it turns out licensing LLMs commercially is more complicated. Let's take the example of Ā Vicuna.

Vicuna is an open-source chatbot trained by fine-tuning on LLaMA

If you deploy the Vicuna 13B model using a hugging face, you would find that the team behind the project has released just the delta weights of the models, which in turn need to be applied to the Original LLaMA model to make it work.

The Vicuna model card would show the license to be Apache 2.0 license, making one believe that the model can be used commercially.

Vicuna 13B Delta weights model Hugging Face Model Card

However, the LLaMA weights are not available commercially, making the vicuna model, in turn, only usable in research settings, not commercially.

Confusing, right? Let us try to explain how this works

Different types of licenses and what do they mean?

Here is a table with some of the common licenses that LLMs are found to have:

License LLMs Permissive or Copyleft Patent Grant Commercial Use Redistribution Modification
Apache 2.0 BERT, XLNet, XLM-RoBERTa, Flan-UL2, Cerebras, Bloom and Dolly 2.0 Permissive Yes Yes (with attribution) Yes Yes
MIT GPT-2, T5, and BLOOM Permissive No Yes (with attribution) Yes Yes
GPL-3.0 GLM-130B and NeMO LLM Copyleft Yes (for GPL-3.0 licensed software only) Yes (with source code) Yes (with source code) Yes
Proprietary GPT-3, LaMDA and Cohere Varies Varies Varies Varies Varies

Copyleft licenses like GPL-3.0 require that any derivative works of the software be licensed under the same license. This means that if you use GPL-3.0-licensed software in your project, your project must also be licensed under GPL-3.0.

Permissive licenses like Apache 2.0 and MIT allow users to use, modify and distribute the software under the license with minimal restrictions on how they use it or how they distribute it.

Explaining some of the common licenses that LLMs are licensed under:

Apache 2.0 License

Under this license, users must give credit to the original authors, include a copy of the license, and state any changes made to the software. Users must also not use any trademarks or logos associated with the software without permission.

MIT License

This license allows anyone to use, modify, and distribute the software for any purpose as long as they include a copy of the license and a notice of the original authors. The MIT License is similar to the Apache 2.0 License, but it does not have any conditions regarding trademarks or logos.

GPL-3.0 License

It allows anyone to use, modify, and distribute the software for any purpose as long as they share their source code under the same license. This means that users cannot create proprietary versions of the software or incorporate it into closed-source software without disclosing their code. The GPL-3.0 License also has some other conditions, such as providing a disclaimer of warranty and liability and ensuring that users can access or run the software without any restrictions.

Proprietary License

The last type of license for LLMs that we will discuss is the proprietary license, which is a non-open source license that grants limited rights to use the software under certain terms and conditions. The proprietary license usually requires users to pay a fee or obtain permission to access or use the software and may impose restrictions on how the software can be used or modified. The proprietary license may also prohibit users from sharing or distributing the software or its outputs without authorization.

RAIL License

RAIL license is a new copyright license that combines an Open Access approach to licensing with behavioral restrictions aimed at enforcing a vision of responsible AI. This license has certain use-based restrictions like it cannot be used in

  1. Anything that violates laws and regulations
  2. Exploit or harm minors, or uses that discriminate or harm ā€œindividuals or groups based on social behavior or known or predicted personal or personality characteristics.ā€

Some models under this license are- OPT, Stable Diffusion, BLOOM

The following table summarizes the license details of each LLM:

LLM License
Sheet2 Model,Domain,License ,Number of Parameters (in millions),Pretraining Dataset,Results,creator,Inference Speed,Release DateAlpaca-medium,general,MIT,7000,Base model Llama. Fine tuned on interactions collected from DaVinci model.,Almost equivalent with GPT-3 variant of ChatGPT. ,Stanford,Mā€¦

Which models can I use?

The problem with LLMs for commercial use is that they may not be open-source or may not allow commercial use (Models built on top of Meta's LLaMA model). This means that companies may have to pay to use them or may not be able to use them at all. Additionally, some companies may prefer to use open-source models for reasons such as transparency and the ability to modify the code.

There are several open-source language models that can be used commercially for free.

Bloom

Bloom is an open-access multilingual language model that contains 176 billion parameters and is trained for 3.5 months on 384 A100ā€“80GB GPUs.

It is licensed under bigscience-bloom-rail-1.0 license. This restricts BLOOM to not be used for certain use cases like for giving medical advice and medical results interpretation. This is in addition to other restrictions that are present under the RAIL license (Described above)

Dolly 2.0

Dolly 2.0 is a 12B parameter language model based on the EleutherAI Pythia model family and fine-tuned exclusively on a new, high-quality human-generated instruction following dataset, crowd-sourced among Databricks employees. It is the first open-source, instruction-following LLM fine-tuned on a human-generated instruction dataset licensed for research and commercial use. The entirety of Dolly 2.0, including the training code, the dataset, and the model weights, are open-sourced and suitable for commercial use.

Cerebras LLMs

Cerebras has released seven open-source Large Language Models (LLMs) for both research and commercial for-profit use cases. They released 7 models, which range in size from 111 million to 13 billion parameters. Their parameters have been trained using the Chinchilla formula.

Cerebras isnā€™t as advanced as either LLaMA or ChatGPT (get-3.5-turbo) but is designed to run on a budget and is fully open-source.

It is licensed under Apache 2.0.

Eleuther AI Models (Polyglot, GPT Neo, GPT NeoX, GPT-J, Pythia)

EleutherAI has trained and released several LLMs and the codebases used to train them. Several of these LLMs were the largest or most capable available at the time and have been widely used since in open-source research applications.

H2O GPT

The H2OGPT model is one of the largest language models available today and requires significant resources to run.

It has 20 billion parameters and requires a GPU with 24GB VRAM to run. The model has been trained on a variety of datasets and can be fine-tuned for specific tasks. Compared to other models like GPT-3, H2OGPT is smaller but still provides state-of-the-art performance.

It is available for commercial usage.

GOOGLE'S FLAN

Google has been doing strong research work related to LLMs as it is evident when they released the FLAN-T5 model last year.

These models are open source models and are considered to be a good alternative to GPT-3 when it comes to performance for some specific tasks such as NLP applications. They have the following pre-trained checkpoints available : FLAN-T5 small, FLAN-T5-base, FLAN-T5-large, FLAN-T5-XL and FLAN-T5 XXL.

Google released another model this year, FLAN-UL2. Ā This model is also based on the T5 architecture. It outperforms the previous open source models released by Google and is the largest of them all.

These models are available to use commercially and are licensed under Apache 2.0.

StableLM

Stability AI released it's first Large Language Model, StableLM, in April 2023. Stability AI has been working on developing open source models for the public to use as they gained a lot of popularity with Stable Diffusion, a scalable tool for creating images from text prompts.

StableLM is currently available in form of two models - StableLM-Alpha and StableVicuna. StableLM-Alpha has been released with paramaters size of 3B and 7B, with other larger parameters size model variants will be released in the near future. It has been built on the enormous dataset of The Pile.

Base model checkpoints (StableLM-Base-Alpha) are licensed under the Creative Commons license (CC BY-SA-4.0) while the fine-tuned checkpoints (StableLM-Tuned-Alpha) are licensed under the Non-Commercial Creative Commons license (CC BY-NC-SA-4.0), in alignment with Stanford's Alpaca license guidelines.


What have we been building?

We at TrueFoundry have been building solutions for enterprises that want to host LLMs on their own cloud or on-prem clusters. We provide:

  1. 80% lesser cost than OpenAI APIs
  2. Model catalog to create APIs and fine-tune LLMs on your cloud account with a single click.
  3. Management of all the infrastructure for running and serving LLMs

Sounds like something you would want to explore? We always love chatting with builders.

Have a ā˜•ļø with us