🧠 Incentivizing Reasoning in LLMs

📅 August 3-7, 2025

📍 Toronto, Canada | KDD 2025 Conference

🤝 Presented by NVIDIA Deep Learning Solution Architects

🎯 Tutorial Abstract

With reasoning models like DeepSeek-R1 and OpenAI's o1 demonstrating breakthrough capabilities in complex problem-solving, there's growing interest in the AI community about how to unlock similar capabilities in other large language models (LLMs).

This hands-on tutorial dives into practical methods for building reasoning capabilities in LLMs through two primary approaches:

Knowledge Distillation: Transferring capabilities from advanced reasoning models
Reinforcement Learning: Further enhancing capabilities through post-training techniques

Participants will learn how to transfer reasoning capabilities from cutting-edge models like DeepSeek-R1 into smaller LLMs such as Qwen and Llama, and then explore how reinforcement learning can take these capabilities even further.

👥 Target Audience and Prerequisites

Target audience: Data scientists and engineers who are interested in enhancing reasoning capabilities in LLMs for downstream tasks.

Skill Level: The tutorial will be designed to accommodate participants with varying levels of expertise, from beginners to moderately skilled users, with the pace set to ensure beginners can follow comfortably.

Prerequisites:

Basic knowledge of deep learning and LLM concepts
Familiarity with Python programming

📚 Tutorial Outline

🌟 Introduction and Lab Overview

⏱️ 20 Minutes

This part covers the core methodologies that can be leveraged to incentivize reasoning capabilities into large language models (LLMs). First, we will explain how knowledge distillation works for transferring reasoning abilities from large models to smaller ones, covering concepts like long Chain-of-Thought data and teacher-student frameworks. Then, we will introduce how reinforcement learning (RL) can be applied to post-train LLMs for enhanced reasoning, including reward design and various RL algorithms.

🔧 Hands-on: Setting Up the Environment for the Lab

⏱️ 10 Minutes

In this part we will introduce the lab environment, the libraries, frameworks, and datasets used in the exercise. We'll guide the participants to conduct a quick verification process to ensure everyone's environment is correctly configured.

🔍 Hands-on: Extracting Reasoning Data

⏱️ 30 Minutes

In this lab, participants will learn how to automatically generate long Chain-of-Thought data that encapsulates reasoning processes from advanced reasoning models, such as DeepSeek-R1. We'll demonstrate how to filter low-quality data and prepare it for distillation into smaller models using NeMo's data processing tools.

🧪 Hands-on: Distilling Reasoning Capability into Smaller Models

⏱️ 60 Minutes

This lab will walk participants through implementing knowledge distillation in open-source models like Qwen and Llama using NVIDIA's NeMo framework. We'll cover the technical details of setting up distillation training experiment in NeMo, monitoring training progress effectively, and evaluating the results.

🚀 Hands-on: Post-training using Reinforcement Learning

⏱️ 60 Minutes

This lab will teach participants how to set up a reinforcement learning environment for post-training LLMs to further enhance reasoning capabilities. We'll cover topics including how to use the RL frameworks, how to design reward functions, and guide participants through the whole fine-tuning process using RL.

🎓 Conclusion and Resources

⏱️ 20 Minutes

In conclusion, this tutorial will revisit the two post-training approaches discussed earlier, summarizing their suitable applications, limitations, and unresolved challenges. Participants will also receive a comprehensive set of resources, including online Jupyter Notebook tutorials, curated "awesome lists," and best practices for distillation and reinforcement learning (RL) training.

👨‍🏫 Meet Our Instructors

Zhaopeng Qiu

NVIDIA Solution Architect

Currently working at NVIDIA as a Deep Learning Solution Architect. He graduated from Peking University in 2018. His research focuses on large language models (LLMs), recommender systems, and natural language processing (NLP). He has authored over 20 research papers published in leading journals and conferences, including KDD, AAAI, WWW, TKDE, NAACL, COLING, and others.

Jingqi Zhang

NVIDIA Solution Architect

Currently a Solution Architect at NVIDIA, specializing in large language models (LLMs). His work focuses on various aspects of LLMs including training methodologies, practical applications, and reasoning models. Prior to joining NVIDIA, Jingqi obtained both his Bachelor's and Master's degrees in Computer Science and Technology from Xi'an Jiaotong University.

Shuang Yu

NVIDIA Solution Architect

A Solution Architect at NVIDIA focusing on LLMs. She holds a Bachelor's Degree in Automation and a Master's Degree in Computer Science from Tsinghua University. Before joining NVIDIA, Shuang worked as a software architect at IBM, where she led the development of an enterprise-level machine learning platform.

🛠️ Technical Requirements

Technical Resources

Jupyter notebooks
APIs for accessing various reasoning models (free tiers)
NeMo framework (open-source)
Sample datasets for distillation and RL training
Open-source models (Qwen, Llama)

Software Environment

Python 3.8+
CUDA-compatible GPU (recommended)
Docker (optional)

🎯 Learning Outcomes

By the end of this session, participants will be equipped with:

Theoretical Understanding: Core concepts of knowledge distillation and reinforcement learning for reasoning
Practical Skills: Hands-on experience in data preparation and processing techniques
Technical Implementation: Ability to use NeMo framework for model distillation experiments
Advanced Techniques: Knowledge of applying RL for post-training reasoning enhancement
Real-world Application: Practical experience that can be applied to their own projects

🔗 Important Links

🌐 Tutorial Website: zpqiu.github.io/reasoning-model-tutorial-kdd2025
📚 KDD 2025 Official: kdd2025.kdd.org
🛠️ NeMo Framework: NVIDIA NeMo