๐
August 3-7, 2025
๐ Toronto, Canada | KDD 2025 Conference
๐ค Presented by NVIDIA Deep Learning Solution Architects
๐ฏ Tutorial Abstract
With reasoning models like DeepSeek-R1 and OpenAI's o1 demonstrating breakthrough capabilities in complex problem-solving, there's growing interest in the AI community about how to unlock similar capabilities in other large language models (LLMs).
This hands-on tutorial dives into practical methods for building reasoning capabilities in LLMs through two primary approaches:
- Knowledge Distillation: Transferring capabilities from advanced reasoning models
- Reinforcement Learning: Further enhancing capabilities through post-training techniques
Participants will learn how to transfer reasoning capabilities from cutting-edge models like DeepSeek-R1 into smaller LLMs such as Qwen and Llama, and then explore how reinforcement learning can take these capabilities even further.
๐ฅ Target Audience and Prerequisites
Target audience: Data scientists and engineers who are interested in enhancing reasoning capabilities in LLMs for downstream tasks.
Skill Level: The tutorial will be designed to accommodate participants with varying levels of expertise, from beginners to moderately skilled users, with the pace set to ensure beginners can follow comfortably.
Prerequisites:
- Basic knowledge of deep learning and LLM concepts
- Familiarity with Python programming
๐ Tutorial Outline
This part covers the core methodologies that can be leveraged to incentivize reasoning capabilities into large language models (LLMs). First, we will explain how knowledge distillation works for transferring reasoning abilities from large models to smaller ones, covering concepts like long Chain-of-Thought data and teacher-student frameworks. Then, we will introduce how reinforcement learning (RL) can be applied to post-train LLMs for enhanced reasoning, including reward design and various RL algorithms.
In this part we will introduce the lab environment, the libraries, frameworks, and datasets used in the exercise. We'll guide the participants to conduct a quick verification process to ensure everyone's environment is correctly configured.
In this lab, participants will learn how to automatically generate long Chain-of-Thought data that encapsulates reasoning processes from advanced reasoning models, such as DeepSeek-R1. We'll demonstrate how to filter low-quality data and prepare it for distillation into smaller models using NeMo's data processing tools.
This lab will walk participants through implementing knowledge distillation in open-source models like Qwen and Llama using NVIDIA's NeMo framework. We'll cover the technical details of setting up distillation training experiment in NeMo, monitoring training progress effectively, and evaluating the results.
This lab will teach participants how to set up a reinforcement learning environment for post-training LLMs to further enhance reasoning capabilities. We'll cover topics including how to use the RL frameworks, how to design reward functions, and guide participants through the whole fine-tuning process using RL.
In conclusion, this tutorial will revisit the two post-training approaches discussed earlier, summarizing their suitable applications, limitations, and unresolved challenges. Participants will also receive a comprehensive set of resources, including online Jupyter Notebook tutorials, curated "awesome lists," and best practices for distillation and reinforcement learning (RL) training.
๐จโ๐ซ Meet Our Instructors
Zhaopeng Qiu
NVIDIA Solution Architect
Currently working at NVIDIA as a Deep Learning Solution Architect. He graduated from Peking University in 2018. His research focuses on large language models (LLMs), recommender systems, and natural language processing (NLP). He has authored over 20 research papers published in leading journals and conferences, including KDD, AAAI, WWW, TKDE, NAACL, COLING, and others.
Jingqi Zhang
NVIDIA Solution Architect
Currently a Solution Architect at NVIDIA, specializing in large language models (LLMs). His work focuses on various aspects of LLMs including training methodologies, practical applications, and reasoning models. Prior to joining NVIDIA, Jingqi obtained both his Bachelor's and Master's degrees in Computer Science and Technology from Xi'an Jiaotong University.
Shuang Yu
NVIDIA Solution Architect
A Solution Architect at NVIDIA focusing on LLMs. She holds a Bachelor's Degree in Automation and a Master's Degree in Computer Science from Tsinghua University. Before joining NVIDIA, Shuang worked as a software architect at IBM, where she led the development of an enterprise-level machine learning platform.
๐ ๏ธ Technical Requirements
Technical Resources
- Jupyter notebooks
- APIs for accessing various reasoning models (free tiers)
- NeMo framework (open-source)
- Sample datasets for distillation and RL training
- Open-source models (Qwen, Llama)
Software Environment
- Python 3.8+
- CUDA-compatible GPU (recommended)
- Docker (optional)
๐ฏ Learning Outcomes
By the end of this session, participants will be equipped with:
- Theoretical Understanding: Core concepts of knowledge distillation and reinforcement learning for reasoning
- Practical Skills: Hands-on experience in data preparation and processing techniques
- Technical Implementation: Ability to use NeMo framework for model distillation experiments
- Advanced Techniques: Knowledge of applying RL for post-training reasoning enhancement
- Real-world Application: Practical experience that can be applied to their own projects
๐ Important Links