Course Outline


  • What is ROCm?
  • What is HIP?
  • ROCm vs CUDA vs OpenCL
  • Overview of ROCm and HIP features and architecture
  • Setting up the Development Environment

Getting Started

  • Creating a new ROCm project using Visual Studio Code
  • Exploring the project structure and files
  • Compiling and running the program
  • Displaying the output using printf and fprintf


  • Understanding the role of ROCm API in the host program
  • Using ROCm API to query device information and capabilities
  • Using ROCm API to allocate and deallocate device memory
  • Using ROCm API to copy data between host and device
  • Using ROCm API to launch kernels and synchronize threads
  • Using ROCm API to handle errors and exceptions

HIP Language

  • Understanding the role of HIP language in the device program
  • Using HIP language to write kernels that execute on the GPU and manipulate data
  • Using HIP data types, qualifiers, operators, and expressions
  • Using HIP built-in functions, variables, and libraries to perform common tasks and operations

ROCm and HIP Memory Model

  • Understanding the difference between host and device memory models
  • Using ROCm and HIP memory spaces, such as global, shared, constant, and local
  • Using ROCm and HIP memory objects, such as pointers, arrays, textures, and surfaces
  • Using ROCm and HIP memory access modes, such as read-only, write-only, read-write, etc.
  • Using ROCm and HIP memory consistency model and synchronization mechanisms

ROCm and HIP Execution Model

  • Understanding the difference between host and device execution models
  • Using ROCm and HIP threads, blocks, and grids to define the parallelism
  • Using ROCm and HIP thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
  • Using ROCm and HIP block functions, such as __syncthreads, __threadfence_block, etc.
  • Using ROCm and HIP grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.


  • Understanding the common errors and bugs in ROCm and HIP programs
  • Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
  • Using ROCm Debugger to debug ROCm and HIP programs on AMD devices
  • Using ROCm Profiler to analyze ROCm and HIP programs on AMD devices


  • Understanding the factors that affect the performance of ROCm and HIP programs
  • Using ROCm and HIP coalescing techniques to improve memory throughput
  • Using ROCm and HIP caching and prefetching techniques to reduce memory latency
  • Using ROCm and HIP shared memory and local memory techniques to optimize memory accesses and bandwidth
  • Using ROCm and HIP profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps


  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors


  • Developers who wish to learn how to use ROCm and HIP to program AMD GPUs and exploit their parallelism
  • Developers who wish to write high-performance and scalable code that can run on different AMD devices
  • Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance
 28 Hours

Testimonials (2)