Master GPU Programming & Architectures | CUDA & OpenCL Course

Course Details

Exam Registration	757
Course Status	Ongoing
Course Type	Elective
Language	English
Duration	12 weeks
Categories	Computer Science and Engineering, Systems
Credit Points	3
Level	Undergraduate/Postgraduate
Start Date	19 Jan 2026
End Date	10 Apr 2026
Enrollment Ends	02 Feb 2026
Exam Registration Ends	20 Feb 2026
Exam Date	25 Apr 2026 IST
NCrF Level	4.5 — 8.0

Unlock the Power of Parallel Computing: A Deep Dive into GPU Architectures and Programming

In today's data-driven world, the demand for high-performance computing is skyrocketing. From training complex neural networks to simulating physical phenomena, the Graphics Processing Unit (GPU) has emerged as the workhorse of modern computational tasks. If you're a student or professional looking to master this critical technology, a comprehensive course on GPU Architectures and Programming is your gateway to the future of computing.

Course Overview: Bridging Theory and Practice

This meticulously designed 12-week course, offered at the undergraduate and postgraduate level, provides a holistic understanding of how GPUs work and how to program them efficiently. It begins with a solid foundation in traditional CPU architecture before progressively introducing the paradigms that make GPUs uniquely powerful for parallel tasks.

The journey starts with a review of RISC pipelines, cache memory, and SIMD (Single Instruction, Multiple Data) instructions. This sets the stage for understanding the GPU's evolution as a generalized form of SIMD, known as SIMT (Single Instruction, Multiple Thread). You'll delve into the heart of GPU architecture—Streaming Multiprocessors (SMs), cache hierarchies, and the graphics pipeline—before moving to hands-on programming.

Your Expert Instructor: Prof. Soumyajit Dey

The course is led by Prof. Soumyajit Dey of the Department of Computer Science and Engineering at IIT Kharagpur. With a PhD from IIT Kharag and extensive teaching and research experience, Prof. Dey brings invaluable expertise to the classroom. His research focuses on Runtime Systems for Heterogeneous Platforms, including building scheduling frameworks for GPGPU applications in embedded systems and large-scale clusters. He has successfully run the popular "High Performance Parallel Programming" course at IIT KGP, ensuring this curriculum is both rigorous and practical.

Detailed Course Curriculum: A 12-Week Learning Path

The course is structured to build your knowledge step-by-step, from fundamental concepts to advanced application development.

Week	Topics Covered
1-2	Review of CPU Architecture, SIMD, GPU Architecture Basics (SMs, Caches)
3-4	Introduction to CUDA Programming, Multi-dimensional Data Mapping, Synchronization
5-6	Advanced CUDA: Warp Scheduling, Thread Divergence, Memory Access Coalescing
7-8	Performance Optimization: Reduction Kernels, Kernel Fusion, Thread/Block Tuning
9-10	OpenCL Basics and Programming for Heterogeneous Platforms (CPUs & GPUs)
11-12	Capstone Application Design: Efficient Neural Network Training and Inference

What You Will Learn: Key Skills and Outcomes

Architectural Insights: Understand the fundamental differences between CPU and GPU architectures and the SIMT execution model.
Proficient Programming: Gain hands-on experience with the two industry-standard parallel programming models: NVIDIA's CUDA and open-standard OpenCL.
Performance Optimization: Master critical optimization techniques like memory coalescing, shared memory usage, and warp-level programming to write efficient, high-speed GPU code.
Real-World Application: Apply your knowledge to design and optimize applications in cutting-edge fields like machine learning, specifically for neural network workloads.

Who Should Take This Course?

This course is ideally suited for:

Students in Computer Science, Electronics, and Electrical Engineering.
Programmers and engineers aiming to enter fields like Artificial Intelligence, Scientific Computing, Data Analytics, or Game Development.
Anyone with prerequisites in Programming & Data Structures, Digital Logic, and basic Computer Architecture who wants to leverage parallel hardware.

Industry Relevance and Support

Skills in GPU programming are in high demand across the tech industry. This course aligns with the needs of leading companies including:

NVIDIA & AMD: The primary hardware manufacturers driving GPU innovation.
Google & Amazon: Heavily utilize GPU clusters for cloud AI services (Google TPU inspiration, AWS EC2 instances).
Big-Data and HPC Companies: Across finance, biotechnology, and automotive sectors for simulation and modeling.

Essential Learning Resources

To complement the lectures, the course references seminal texts in the field:

"Computer Architecture: A Quantitative Approach" by Hennessy and Patterson (for foundational architecture).
"Programming Massively Parallel Processors" by Kirk and Hwu (the definitive guide to CUDA).
"Heterogeneous Computing with OpenCL" by Gaster, Howes, and Kaeli (for OpenCL mastery).

Embark on this 12-week journey to transform from a sequential programmer into a parallel computing expert. By understanding GPU architectures and mastering programming models like CUDA and OpenCL, you'll equip yourself with the skills to tackle the most computationally intensive challenges of tomorrow.

Enroll Now →

Explore More

Mock Test All Courses Start Learning Today