๐Ÿ“… August 3-7, 2025
๐Ÿ“ Toronto, Canada | KDD 2025 Conference
๐Ÿค Presented by NVIDIA Deep Learning Solution Architects

๐ŸŽฏ Tutorial Abstract

With reasoning models like DeepSeek-R1 and OpenAI's o1 demonstrating breakthrough capabilities in complex problem-solving, there's growing interest in the AI community about how to unlock similar capabilities in other large language models (LLMs).

This hands-on tutorial dives into practical methods for building reasoning capabilities in LLMs through two primary approaches:

Participants will learn how to transfer reasoning capabilities from cutting-edge models like DeepSeek-R1 into smaller LLMs such as Qwen and Llama, and then explore how reinforcement learning can take these capabilities even further.

๐Ÿ‘ฅ Target Audience and Prerequisites

Target audience: Data scientists and engineers who are interested in enhancing reasoning capabilities in LLMs for downstream tasks.

Skill Level: The tutorial will be designed to accommodate participants with varying levels of expertise, from beginners to moderately skilled users, with the pace set to ensure beginners can follow comfortably.

Prerequisites:

๐Ÿ“š Tutorial Outline

๐ŸŒŸ Introduction and Lab Overview
โฑ๏ธ 20 Minutes
This part covers the core methodologies that can be leveraged to incentivize reasoning capabilities into large language models (LLMs). First, we will explain how knowledge distillation works for transferring reasoning abilities from large models to smaller ones, covering concepts like long Chain-of-Thought data and teacher-student frameworks. Then, we will introduce how reinforcement learning (RL) can be applied to post-train LLMs for enhanced reasoning, including reward design and various RL algorithms.
๐Ÿ”ง Hands-on: Setting Up the Environment for the Lab
โฑ๏ธ 10 Minutes
In this part we will introduce the lab environment, the libraries, frameworks, and datasets used in the exercise. We'll guide the participants to conduct a quick verification process to ensure everyone's environment is correctly configured.
๐Ÿ” Hands-on: Extracting Reasoning Data
โฑ๏ธ 30 Minutes
In this lab, participants will learn how to automatically generate long Chain-of-Thought data that encapsulates reasoning processes from advanced reasoning models, such as DeepSeek-R1. We'll demonstrate how to filter low-quality data and prepare it for distillation into smaller models using NeMo's data processing tools.
๐Ÿงช Hands-on: Distilling Reasoning Capability into Smaller Models
โฑ๏ธ 60 Minutes
This lab will walk participants through implementing knowledge distillation in open-source models like Qwen and Llama using NVIDIA's NeMo framework. We'll cover the technical details of setting up distillation training experiment in NeMo, monitoring training progress effectively, and evaluating the results.
๐Ÿš€ Hands-on: Post-training using Reinforcement Learning
โฑ๏ธ 60 Minutes
This lab will teach participants how to set up a reinforcement learning environment for post-training LLMs to further enhance reasoning capabilities. We'll cover topics including how to use the RL frameworks, how to design reward functions, and guide participants through the whole fine-tuning process using RL.
๐ŸŽ“ Conclusion and Resources
โฑ๏ธ 20 Minutes
In conclusion, this tutorial will revisit the two post-training approaches discussed earlier, summarizing their suitable applications, limitations, and unresolved challenges. Participants will also receive a comprehensive set of resources, including online Jupyter Notebook tutorials, curated "awesome lists," and best practices for distillation and reinforcement learning (RL) training.

๐Ÿ‘จโ€๐Ÿซ Meet Our Instructors

Zhaopeng Qiu
NVIDIA Solution Architect
Currently working at NVIDIA as a Deep Learning Solution Architect. He graduated from Peking University in 2018. His research focuses on large language models (LLMs), recommender systems, and natural language processing (NLP). He has authored over 20 research papers published in leading journals and conferences, including KDD, AAAI, WWW, TKDE, NAACL, COLING, and others.
Jingqi Zhang
NVIDIA Solution Architect
Currently a Solution Architect at NVIDIA, specializing in large language models (LLMs). His work focuses on various aspects of LLMs including training methodologies, practical applications, and reasoning models. Prior to joining NVIDIA, Jingqi obtained both his Bachelor's and Master's degrees in Computer Science and Technology from Xi'an Jiaotong University.
Shuang Yu
NVIDIA Solution Architect
A Solution Architect at NVIDIA focusing on LLMs. She holds a Bachelor's Degree in Automation and a Master's Degree in Computer Science from Tsinghua University. Before joining NVIDIA, Shuang worked as a software architect at IBM, where she led the development of an enterprise-level machine learning platform.

๐Ÿ› ๏ธ Technical Requirements

Technical Resources

  • Jupyter notebooks
  • APIs for accessing various reasoning models (free tiers)
  • NeMo framework (open-source)
  • Sample datasets for distillation and RL training
  • Open-source models (Qwen, Llama)

Software Environment

  • Python 3.8+
  • CUDA-compatible GPU (recommended)
  • Docker (optional)

๐ŸŽฏ Learning Outcomes

By the end of this session, participants will be equipped with:

๐Ÿ”— Important Links