Home / Software Engineering
default(firstprivate). The “default default” is shared, which may be inefficient, whereas firstprivate copies the initial value and then uses a local variable instead.TODO Cleanup.
distribute, target datatarget: Run the region on a device (GPU etc.). Only a single thread will be run if nothing more is specified. Often combined directly with teams parallel.teams: Spawn a league of teams (like CUDA blocks).
parallel (with target): Spawn threads withing the teams (like CUDA threads within the blocks).
target teams region to control which parts should run with all threads and which should only be run by initial threads.barrier within parallel regions to synchronize.target update ... to update variables to/from device while inside a target region.begin declare target before and #pragma omp end declare target after. It can now be used by both host and target.nowait.depend(in/out: <var>) may be used to declare variable dependencies for regions, mainly for use with tasks (like nowait target regions). // CUDA-equivalent: compute_stuff<<<1, 4>>>(args)
#pragma omp target teams num_teams(1)
{
before_stuff();
#pragma omp parallel num_threads(4) default(firstprivate)
{
compute_stuff(args);
}
after_stuff();
}
-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_86 (NVIDIA RTX 3090) or -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx1030 (AMD RX 6900 XT).-Rpass=openmp-opt -Rpass-missed=openmp-opt. Use -Rpass-analysis=openmp-opt too for even more info.LIBOMPTARGET_INFO=1 to show runtime info like when kernels are executed on the devices.