Abstract: In this article, we extend the CUDAMPILIB framework, which facilitates the programming of parallel applications for multi-node systems with one or more graphical processing units (GPUs) per ...