Prep CUDA

Prep CUDA

C/C++ competency

CUDA Kernels

Managing Memory between Host and Deivce

Concurrency Strategy

GPU programming is usually a 3-step process:

  • Copy (transfer data from host to device)
  • Compute parallel on GPU device
  • Copy (transfer data back to host)
    Total runtime is the sum of the three steps one-by-one.
  • (CUDA streams) overlay memory transfer and compute
    Total runtime will be no longer than non-overlap
Author

Eva W.

Posted on

2021-08-12

Updated on

2021-08-12

Licensed under

Comments