Home / Software Engineering
default(firstprivate)
. The “default default” is shared
, which may be inefficient, whereas firstprivate
copies the initial value and then uses a local variable instead.TODO Cleanup.
distribute
, target data
target
: Run the region on a device (GPU etc.). Only a single thread will be run if nothing more is specified. Often combined directly with teams parallel
.teams
: Spawn a league of teams (like CUDA blocks).
parallel
(with target
): Spawn threads withing the teams (like CUDA threads within the blocks).
target teams
region to control which parts should run with all threads and which should only be run by initial threads.barrier
within parallel regions to synchronize.target update ...
to update variables to/from device while inside a target region.begin declare target
before and #pragma omp end declare target
after. It can now be used by both host and target.nowait
.depend(in/out: <var>)
may be used to declare variable dependencies for regions, mainly for use with tasks (like nowait
target regions). // CUDA-equivalent: compute_stuff<<<1, 4>>>(args)
#pragma omp target teams num_teams(1)
{
before_stuff();
#pragma omp parallel num_threads(4) default(firstprivate)
{
compute_stuff(args);
}
after_stuff();
}
-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_86
(NVIDIA RTX 3090) or -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx1030
(AMD RX 6900 XT).-Rpass=openmp-opt -Rpass-missed=openmp-opt
. Use -Rpass-analysis=openmp-opt
too for even more info.LIBOMPTARGET_INFO=1
to show runtime info like when kernels are executed on the devices.