Improving the GPU space of computation under triangular domain problems

Abstract

There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. For problems with non-square geometry, this is not always the best idea because part of the space of computation is executed without any practical use. Two-dimensional triangular domain problems are a particular case of interest. Problems such as the Euclidean distance map, LU decomposition, collision detection and simulations over triangular tiled domains are all td-problems and they appear frequently in many areas of science. In this work, we propose an improved GPU mapping function g(lambda), that maps any lambda block to a unique location (i, j) in the triangular domain. The mapping is based on the properties of the lower triangular matrix and it works at a block level, thus not compromising thread organization within a block. The theoretical improvement from using g(lambda) is upper bounded as I < 2 and the number of wasted blocks is reduced from O(n^2) to O(n). Our experimental results on Nvidia’s Kepler GPU architecture show that g(lambda) is between 12% and 15% faster than the bounding box strategy.

Publication
arXiv
Cristobal A Navarro
Cristobal A Navarro
Professor at the Universidad Austral de Chile

Professor at the Universidad Austral de Chile

Nancy Hitschfeld Kahler
Nancy Hitschfeld Kahler
+Lab founder | Full Professor Universidad de Chile

Full Professor at the Department of Computer Science, University of Chile. Her main research interests include geometric modeling, geometric meshes, and parallel algorithms (GPU computing), focused in computational science, and engineering applications.