In March 2023, we published a comprehensive survey paper titled “A Survey of Large Language Models”. This survey has been updated to its 13th version, encompassing 83 pages of main content and covering over 900 references. The aim of this survey is to systematically organize the research advancements and core technologies related to large language models, discussing a significant number of relevant works. Since the preprint of this survey was released, it has garnered considerable attention from readers.
Following the release of the English survey article, several readers inquired about the availability of a corresponding Chinese version. In response, we published the Chinese translation of the survey in August 2023. To further provide Chinese study materials on large language models, we initiated the compilation of a Chinese book at the end of December 2023 and have recently completed the initial draft. Unlike the English survey article, the Chinese book is designed to cater to beginner readers of large language model technology. Consequently, we have significantly updated and reorganized the content to present a comprehensive framework and roadmap of large language model technology. This book is intended for advanced undergraduate and early graduate students with a background in deep learning and can serve as an introductory reference. For more information about the Chinese book project, please visit the following link:
Chinese Book Download Links
Download Link 1: https://github.com/LLMBook-zh/LLMBook-zh.github.io/blob/main/LLMBook.pdf
Download Link 2: http://aibox.ruc.edu.cn/zws/index.htm
Organization of the Book Chapters
Part One: Background and Fundamentals
Chapter 1: Introduction
- Development history of large language models
- Overview of key technologies
Chapter 2: Fundamentals
- Scaling Law
- Development history of the GPT series models
Chapter 3: Large Language Model Resources
- Open-source models
- Data
- Code libraries
Part Two: Pre-training
Chapter 4: Data Preparation
- Data collection
- Data cleaning
- Data balancing
- Curriculum learning methods
Chapter 5: Model Architecture
- Transformer structure
- Mainstream architectures of large language models
- Detailed improvements
Chapter 6: Model Pre-training
- Pre-training tasks
- Optimization parameter settings
- Parallel training methods
Part Three: Fine-tuning and Alignment
Chapter 7: Instruction Fine-tuning
- Instruction data collection and synthesis methods
- Instruction fine-tuning strategies and effects
Chapter 8: Human Alignment
- 3H standards
- RLHF algorithms
- Non-RL algorithms
Part Four: Using Large Language Models
Chapter 9: Decoding and Deployment
- Decoding generation algorithms
- Decoding acceleration algorithms
- Model compression algorithms
Chapter 10: Prompt Learning
- Basic prompting methods
- Contextual learning
- Chain of thought
Chapter 11: Planning and Agents
- Complex planning methods
- Building agents
Part Five: Evaluation and Applications
Chapter 12: Evaluation
- Evaluation metrics and methods
- Basic and advanced capability evaluation
- Evaluation systems
Chapter 13: Applications
- Overview of applications in research and professional fields
During the writing of this book, we received extensive feedback and numerous revisions from many colleagues, for which we express our heartfelt gratitude. We hope that everyone will continue to support and follow our Chinese book on large language models. Your support and feedback are our greatest motivation to move forward. This first edition of the book is just a starting point; we plan to continuously update and improve the content online. We especially welcome readers to provide valuable criticisms and suggestions, and we will acknowledge those who contribute significantly on our website. If you have any comments, feedback, or suggestions, please use the GitHub Issues page (https://github.com/LLMBook-zh/LLMBook-zh.github.io/issues) or contact us via email.
To better organize and disseminate the latest advancements and technical frameworks of large model technology, we offer the following supplementary resources for readers to reference and use while reading this book.
LLMBox: We have developed a comprehensive code toolkit called LLMBox, specifically designed for the development and implementation of large language models. It is based on a standardized training process and a comprehensive model evaluation framework. LLMBox aims to be a pipeline for training and utilizing large language models, integrating numerous practical features to ensure high flexibility and efficiency during both the training and utilization stages. Toolkit link: https://github.com/RUCAIBox/LLMBox.
YuLan: The YuLan series models are large language models supporting chat functionality, developed collaboratively by the faculty and students of the Gaoling School of Artificial Intelligence at Renmin University of China. The name “YuLan” is derived from the university's emblematic flower. The latest version has been entirely pre-trained from scratch and utilizes curriculum learning techniques for supervised fine-tuning on bilingual data in both Chinese and English. This includes high-quality instructions and human preference data. Model link: https://github.com/RUC-GSAI/YuLan-Chat