There is a great option for speed-up of your Matlab code: Use your graphics card. If you have an Nvidia graphics card, there is a whole universe of optimized code for these cards. The underlying technology is called CUDA and many of the required functions for a transparent usage from Matlab already exist. There are three important collections: GPUMat from GP-YOU, Jacket from Accelereyes and the Matlab Parallel Computing Toolbox from The Mathworks. These toolboxes make GPU programming in Matlab very simple. Which one is the best?
Concepts of Matlab GPU programming
There are basically two concepts for using GPUs in Matlab: Use a GPU data type and either leave the program execution in the Matlab interpreter or compile the execution into an intermediate language (mostly CUDA) and execute the result directly on the GPU. If the Matlab interpreter manages the program execution each command is sent separately to the GPU. This often dramatically slows down the workflow and kills the performance. Thus, a compiled of the execution sequence is required for most cases.
All three toolboxes support the compilation: Jacket from Accelereyes does it most transparently, which makes it easy for the user. GPUMat starts the compiler most explicitly calling a C++ compiler and generation of mex files. The Matlab Parallel Computation toolbox does not really compile but suggests using a special version of “arrayfun()”.
Not all computations available in Matlab are available on the GPU, too.
Matlab Parallel Computing Toolbox (The MathWorks)
- 171 functions (List of functions for GPU programming – Parallel Computing Toolbox )
- Besides GPU programming, this toolbox supports multiple cores and multiple CPUs for any Matlab m-code.
- 589 function (List of functions for GPU programming – Jacket)
- 11 functions for OpenGL based GPU data plotting
- 179 functions (List of functions in GPUmat_User_Guide.pdf in zip)
- Intel Core 2 Quad, 3Ghz
- 8 GB RAM
- Nvidia GeForce GTX 275 / Nvidia GeForce GTX 520 Ti
- Windows 7 Ultimate SP1, 64 Bit
- CUDA 4.1
- Matlab 2012a
- Parallel Computing Toolbox 6.0
- GPUMat 0.280, 64 Bit
- Jacket 2.1
Monte Carlo Option Pricing on GTX 275
The figure above presents the results of Monte Carlo option pricing in double precision. Jacket Compilation is by far the best choice for more than 100.000 paths. The Matlab Parallel Computing Toolbox (PCT) does not work well for most cases with a speed-up of up to 2x compared to Jacket with a speed-up of up to 14x. Even worse is GPUMat, which never reached a speed-up larger than 1x and which crashed for more than 1.000.000 paths. Interestingly, the Matlab Parallel Computing Toolbox with 4 CPU workers (CPU parallel) did not perform well either: best speed-up factor was 1.5x. Note: These performances highly depend on the hardware setting. On server grade hardware, I saw PCT with 4 workers to speed up by about 4x.
Update (2012-07-29): Monte Carlo Option Pricing on GTX 520 Ti
Using a newer GPU (Nvidia GTX 520 Ti), the results change. Especially GPUMat performs much better on GTX 520Ti compared to the GTX 275. Especially on large datasets, GPUMat now performs almost as good as Jacket. Also, the Matlab Parallel Computing Toolbox consistantly performs better on GTX 520 Ti than on GTX 275. In contrast to these results Jacket from Accelereyes is worse on GTX 520 Ti. The speed-up drops from 14x to 12x. But, Jacket still performs best among the GPU toolboxes.
All toolboxes only work in the context of Matlab, i.e. you have to have a valid Matlab License
Matlab Parallel Computing Toolbox (The Mathworks):
- Propriatary License, about 1000€ (commercial)
- Student and accademic discounts available
- Propriatary License, about 1000$ US (commercial)
- Accademic discount available (price: 350$ US)
- Free Open Source, GNU GLPv3
We did see a single test case for benchmarking GPU toolboxes. The results will be different for different test cases and on different hardware, but my experience is that the tendency stays the same.
GPUMat from GP-YOU is the free entry to GPU programming with Matlab. It allows to learn the basic principles without license costs. But, on the tested hardware, GPUMat did not deliver any advantage.
The Matlab Parallel Computing Toolbox delivers multi-core on CPU and GPU programming. Surprisingly, the multi core performance with 4 cores does only speed-up the computation by a factor of 1.5x. On the GPU, the best speed-up is about 2x, which again is not good.
Jacket delivers the best performance: A factor of 14x is about the speed-up which one can expect in theory from a GPU computation in double precision on this Hardware. This is impressive.
If anyone creates bester implementations, drop me a line and I will update this post.
Appendix: Links and Benchmarking Code
http://www.mathworks.com/products/parallel-computing/: Matlab Parallel Toolbox
function V = bench_CPU_European(numPaths) %Simple European steps = 250; r = (0.05); sigma = (0.4); T = (1); dt = T/(steps); K = (100); S = 100 * ones(numPaths,1); for i=1:steps rnd = randn(numPaths,1); S = S .* exp((r-0.5*sigma.^2)*dt + sigma*sqrt(dt)*rnd); end V = mean( exp(-r*T)*max(K-S,0) );
function V = bench_CPUP_European(numPaths) % parallel on 4 CPUs paths = ceil(numPaths/4); S = 100 * ones(paths,1); Payoff = zeros(4,1); parfor iterP = 1:4 Payoff(iterP) = mean(bench_CPU_European(paths)); end V = mean( Payoff ); end
other codes on request