Did you know that you can use GPU in different ways to speedup your computations? Let’s have a closer look.

High-performance computer architectures are developing quickly by having more and faster cores in the CPUs (Central Processing Units) or GPUs (Graphics Processing Units). Recently, a new generation of GPUs appeared, offering tera-FLOPs performance on a single card.

CPU versus GPU

The GPU and CPU architectures have their own advantages and disadvantages.

CPUs are optimized for sequential performance and good at instruction level parallelism, pipelining, etc. With a powerful hierarchical cache, and scheduling mechanism, the sequential performance is very good.

In contrast, GPUs are designed for high instruction throughput with a much weaker cache or scheduling ability. In GPU programming, users have to spend more time to ensure good scheduling, load balancing and memory access, which can be done automatically on a CPU. As a result, GPU kernels are always simple and computationally intensive.

The GPU was originally designed to accelerate the manipulation of images in a frame buffer that was mapped to an output display. GPUs were used as a part of a so-called graphics pipeline, meaning that the graphics data was sent through a sequence of stages that were implemented as a combination of CPU software and GPU hardware. Nowadays GPUs are more and more used as GPGPU (General Purpose GPU) to speedup computations.

2 ways of using a GPU

A GPU can be used in two different ways:

  • as an independent compute node replacing the CPU or
  • as an accelerator.

In the first case, the algorithm is split to solve a number of independent sub-problems that are then transferred to the GPU and computed separately (with little or no communication). To achieve the best performance, the data is kept on the GPU when possible. As GPUs have generally much less memory available than CPUs, this impacts the size of the problem significantly.

In the second case, the GPU is considered as an accelerator, which means that the problem is solved on the CPU while off-loading some computational intensive parts of the algorithm to the GPU. Here, the data is transferred to and from the GPU for each new task.

Let’s take the wave equation as an example. The wave equation can be formulated in time or frequency domain. The wave equation in the time domain is usually solved using a time-stepping scheme, which does not require a solution of the linear system of equations. The wave equation in the frequency domain (Helmholtz equation), in opposite, is solved with an iterative method that contains the solver of the linear system of equations in matrix form.

The simplicity of the time-stepping algorithms makes it easy to use GPUs of modest size as accelerator to speedup the computations.

However, it is not trivial to use GPUs as accelerators for iterative methods that require solution of a linear system of equations. The main reason for this is that the most iterative methods consist of matrix-vector and vector-vector operations (e.g. matrix-vector multiplication). By using the GPU as an accelerator, the matrices need to be distributed across GPUs. The vectors would “live” on the CPU and are transferred when needed to the relevant GPU to execute matrix-vector multiplications.

Accelerator or replacement?

Ideally, GPUs would be used as a replacement but the limited memory makes this difficult for large numerical problems. There seem to be a trend where CPUs and GPUs are merging so that the same memory can be accessed equally fast from the GPU or the CPU. In that case the question “accelerator or replacement?” would become irrelevant as one can alternate between both hardware without taking into account the data location.

How do you use GPU: as replacement or as accelerator? Let us know in the comment box.

Related posts

Wave propagation simulation in 2D
Parallel wave modelling on a 8-GPU monster machine

Please leave your email address to register and receive our newsletter:

[mc4wp_form id=”916″]