There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. For problems with non-square geometry, this is not always the best idea because part of the space of computation is executed without any practical use. Two-dimensional triangular domain problems are a particular case of interest. Problems such as the Euclidean distance map, LU decomposition, collision detection and simulations over triangular tiled domains are all td-problems and they appear frequently in many areas of science. In this work, we propose an improved GPU mapping function g(lambda), that maps any lambda block to a unique location (i, j) in the triangular domain. The mapping is based on the properties of the lower triangular matrix and it works at a block level, thus not compromising thread organization within a block. The theoretical improvement from using g(lambda) is upper bounded as I < 2 and the number of wasted blocks is reduced from O(n^2) to O(n). Our experimental results on Nvidia’s Kepler GPU architecture show that g(lambda) is between 12% and 15% faster than the bounding box strategy.