HON’s Wiki # CUDA

Home / High-Performance Computing (HPC)

Contents

Introduced by NVIDIA in 2006. While GPU compute was hackishly possible before CUDA through the fixed graphics pipeline, CUDA and CUDA-capable GPUs provided more somewhat more generalized GPU architecture and a programming model for GPU compute.

TODO

Resources

Setup

Resources

Linux Installation

The toolkit on Linux can be installed in different ways:

If an NVIDIA driver is already installed, it must match the CUDA version.

Downloads: CUDA Toolkit Download (NVIDIA)

Ubuntu w/ NVIDIA’s CUDA Repo

  1. Follow the steps to add the NVIDIA CUDA repo: CUDA Toolkit Download (NVIDIA)
    • Use the “deb (network)” method, which will show instructions for adding the repo.
    • But don’t install cuda yet.
  2. (Optional) Remove anything NVIDIA or CUDA from the system to avoid conflicts: apt purge --autoremove 'cuda' 'cuda-' 'nvidia-*' 'libnvidia-*'
    • Warning: May break your PC. There may be better ways to do this.
    • This is sometimes required to fix broken CUDA updates etc.
  3. Install CUDA from the new repo (includes the NVIDIA driver): apt install cuda
  4. Setup PATH: echo 'export PATH=$PATH:/usr/local/cuda/bin' | sudo tee -a /etc/profile.d/cuda.sh

Docker Containers

Docker containers may run NVIDIA applications using the NVIDIA runtime for Docker.

See Docker.

DCGM

Usage

Tools

CUDA-GDB

TODO

CUDA-MEMCHECK

nvprof

NVIDIA Visual Profiler (nvvp)

Nsight (Suite)

Nsight Compute

Info

Installation (Ubuntu)

Usage

Troubleshooting

“Driver/library version mismatch” and similar:

Other related error messages from various tools:

Caused by the NVIDIA driver being updated without the kernel module being reloaded.

Solution: Reboot.

Hardware Architecture (Info)

SMs and Blocks

Warp Schedulers and Warps

Memories

GPUDirect

GPUDirect Peer to Peer (P2P)

GPUDirect RDMA

GPUDirect Async

GPUDirect Storage

Programming (Info)

General

Thread Hierarchy

Synchronization

Contexts

Streams

Memories

Memory Hierarchy

Register Memory

Local Memory

Shared Memory

Global Memory

Constant Memory

Texture Memory

TODO

Data Alignment

Managed Data

Unified Virtual Addressing (UVA)

Unified Memory

Peer-to-Peer (P2P) Communication

CUDA-Aware MPI

Miscellanea

Performance Measurements (Info)

Time Measurements

Memory Throughput Measurements

Computational Throughput Measurements

Metrics


hon.one | HON95/wiki | Edit page