opsci • First Documented Code Notebooks in French for Fine-Tuning LLaMa-1 and LLaMa-2

For the past few months, ChatGPT has been facing competition from an open-access ecosystem of large language models (LLMs). LLaMa, MPT, Falcon, Pythia: alternatives are emerging every week with a wide variety of sizes (from 38 million to 65 billion parameters), corpora (primarily in English or multilingual), and usage licenses (from research use only to open source).

Meta recently released a new open version of LLaMa, LLaMa-2, on July 18, 2023. The license removes most of its usage restrictions (except for companies with a product used by... at least 700 million people). On this occasion, Opsci has released the FabriqueLLM project, a unique compilation of French-language educational resources for training open source models, notably containing the first series of code notebooks documented in French to perform fine-tuning on LLaMa-1 and LLaMa-2. This project is part of Opsci’s broader research initiative that explores LLMs to assess their multilingual capabilities and fine-tune them for specific applications in underrepresented European languages and rare dialects, as well as to promote the creation and usage of more culturally inclusive and relevant corpora for foundational models’ training by emerging AI-actors.

The first version of LLaMa was not distributed as open source but only for non-commercial research purposes. Even with this restriction, LLaMa has had a significant impact on the development of a rich ecosystem of open LLM alternatives to ChatGPT. Both the methods and the datasets commonly used to adapt LLMs were initially designed for LLaMa.

LLaMa is available in several versions: 7B, 13B, 33B, and 65B for LLaMa-1 and 7B, 13B, and 70B for LLaMa-2. The B here stands for a billion parameters. This gradation largely contributes to the model's success, which can adapt to various infrastructures and needs. LLaMa-1 13B is probably the most widely used open LLM today due to its good balance between performance and usability. To make the most of it, we are actually releasing three code notebooks which were first released on LeBonLLM's website, a niche community platform devoted to the French AI ecosystem and launched by Opsci in partnership with Datactivist:

Fine-tuning with LLaMa-1 7B, which should work without issues on the free version of Google Colab (about 20 minutes for an epoch on a dataset of 2,000 commands).
Fine-tuning with LLaMa-1 13B, which will require the paid version of Google Colab. The training takes between 13-15 GB of GPU RAM, which slightly exceeds the capabilities of the free version.
Fine-tuning of LLaMa-2 7B, which will also require the paid version of Google Colab. This code notebook is still experimental and will likely be optimized to run on the free version soon.

The code notebook for LLaMa-1 is based on LLMTune, a research project from Cornell Tech and Cornell University. The original version encountered several bugs when run on Google Colab, which have been fixed - particularly, consecutive backups don't get deleted, causing a complete saturation of Colab's RAM. For Llama-2, we used a script by Younes Belkada.

Like Falcon, this notebook is intentionally tested on a challenging instruction set: 2,000 excerpts from 17th-century French novels (Astrée, available on HuggingFace) that are part of our VintageLLM open instruction collection. It's a good way to quickly assess the impact of fine-tuning. Even with relatively harmless prompts, the model tends to respond in archaic French.

Les Lamas des provinces peruanes ſont d'une mœurs des plus civilizées: La majorité de ces peuples ſont catholiques. Ils ont des lettres & des langages perfeitement differenfles & du Roi de Naples auſſi bien qu'au Roi de France. Les plus fins de ce peuple qui demeurent à Parigo ont une civilité & une maniere de porter la perruque qui eſt une école. De tous les animaux des Indes, ce ne ſont pas ces Lama qui paroiſt: les plus sages. Nous avons d'ailleurs vus qu'ils pouffent l'un l'autre des couleurs d'une fureur fi extraordinaire & qu'ils entretiennent une vive haine pour les m…

This demonstration only runs for one epoch, which is enough for a first glimpse. To get a good model, it is recommended to run the fine-tuning for three epochs. On our demonstration dataset of 2,000 commands, one epoch will take about 1h15.