Table of Contents

Optimizing HPC programs


Optimizing a code is a full time job. This page regroups few tips and major compiler options to get good performances. To reach higher performances, many tutorials are available on Internet. Before looking for hard optimization, developers should track common mistakes like bad memory management, bad IO, independent operations inside loops, stripping, etc. Algorithms used should also be checked.

I try to keep these pages up to date, but some flags may be deprecated.

Few words

Optimization order

The order for a good optimization is :
Algorithm optimization (thinking future parallelization) ⇒ Code optimization ⇒ Parallelization

External resources

http://wiki.gentoo.org/wiki/GCC_optimization/en
https://software.intel.com/en-us/articles/step-by-step-optimizing-with-intel-c-compiler

Basics

To get all performances from compilers, and considering you are compiling on the same computer architecture (CP, MB, etc) you are running calculations, use the following options :

Gnu gfortran

Standard :

-O3 -march=native -mtune=native

Hard optimization (use carefully, may slow down or give wrong results) :

-O4 -ffast-math -fforce-addr -fstrength-reduce -frerun-cse-after-loop -fexpensive-optimizations -fcaller-saves -funroll-loops -funroll-all-loops -fno-rerun-loop-opt

Intel ifort

-O3 -fast -xHost

If you get problems with -fast (probably because of static missing libraries), replace with -O3 -xHost -no-prec-div -ipo. If you still have problems (linking), replace -ipo by -ip. At the end, -O3 -xHost is enough if all other flags do not work properly.

Warning: if you need precision (like with long computation: cfd, etc), do not use -no-prec-div, only -xHost. -no-prec-div may reduce precision of division operations.

Vectorize

GCC (gcc/gfortran)

To get informations on what is vectorized by the compiler, add :

-ftree-vectorizer-verbose=2

To change verbosity, change the number at the end (0 to 6)

Intel Compiler (icc/ifort)

To get informations on what is vectorized by the compiler, add :

Note: vec-report is now deprecated. Use qopt-report. Report is saved in an optrpt file.

-qopt-report=1

To change verbosity, add a number at the end (0 to 5) use :

-qopt-report=5

To get more info on SIMD used, user can use -fcode-asm -Faasm.s to get assembly language used by compiler:

ifort -O3 -qopt-report=1 test.f90 -fcode-asm -Faasm.s

Then take a look in asm.s. For example, SSE2 instructions would be:

009a5 f2 44 0f 58 c7   addsd %xmm7, %xmm8                     #test.f90:35.83
009aa f2 44 0f 59 c0   mulsd %xmm0, %xmm8                     #test.f90:35.83

And if you used -xHost or similar optimization options, AVX and FMA may show up:

00942 c4 42 e9 a9 d8   vfmadd213sd %xmm8, %xmm2, %xmm11       #test.f90:34.25
00991 c5 4d 58 0b      vaddpd (%rbx), %ymm6, %ymm9            #test.f90:35.61