User Tools

Site Tools


Site Tools

Debugging and Optimizing

BIG CHANGE IN PROGRESS…

This page is dedicated to debugging methods for HPC codes. These tools and tips are the results my own experience, HPC developers should know these basic options to save time in their work. All methods here provide a way to trace the related bug, which means finding the exact code line that is generating the bug.

Compilers used :

  • gcc and gfortran 4.8.2 (from ubuntu 14.04 x86_64)
  • icc and ifort 14.0.3 (from Intel Parallel Studio 2013 SP1 x86_64)

Tools used :

  • valgrind 3.10.0
  • gdb 7.7

Main types of bugs

When developing HPC programs, bugs encountered are often the sames. Here is a list of most common bugs :

  • Floating point exceptions called fpe (Invalid, Overflow, Zero)
  • Uninitialized values reading
  • Allocation/deallocation issues
  • Array out of bound reading/writing
  • IO issues
  • Memory leak
  • Stack overflow
  • Buffer overflow

There are many other types of bugs, but these are the most common and the most easy to solve when using the appropriate tools.

When could there be a bug ?

  • Program returns an error message
  • Program returns an error exit code (other than 0)
  • Program finishes with NaN or +Inf values
  • Program ends unexpectedly
  • Other cases, many scenario are possible

How to get the exit code of a program ?

  • $? gives you the exit code of the last executed command.
  • Other than 0 means something went wrong, and this code may help you understand why.
~$ gfortran myokprog.f90
~$ ./a.out
Hello world !
~$ echo $?
0
~$ gfortran mybugprog.f90
~$ ./a.out
Program received signal SIGSEGV: Segmentation
fault - invalid memory reference.
Backtrace for this error:
#0 0x7FFC993C87D7
#1 0x7FFC993C8DDE
#2 0x7FFC9901FC2F
Segmentation fault (core dumped)
$ echo $?
139

How to find them

Quick massive dirty debug strategy

Most of the time, these compilation options will find your bug :

Compiler Compiler options
gfortran -Wuninitialized -O -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid -fbounds-check -fimplicit-none -ftrapv
gcc -g -Wall
ifort -g -traceback -fpe0 -check all -ftrapuv -fp-stack-check -warn all -no-ftz
icc Test 1 : -g -traceback -check=uninit -fp-stack-check -no-ftz
Test 2 : -g -traceback -check-pointers=rw

If C code, try FPE strategy (see below).

If not enough, compile with :

Compiler Compiler options
gfortran -g -fbacktrace
gcc -g
ifort -g -traceback
icc -g -traceback

And launch the program with valgrind :

~$ valgrind myprog.exe

Most of the time it will get the error. If not, then see the chart below.

Overall strategy

Floating Point exception

There are three types of FPE :

  • Zero : when you divide by zero, very common in HPC. For example : A/0.0=+∞
  • Invalid : when the operation is mathematically impossible. For example : acos(10.0) = NaN
  • Overflow/Underflow : when you reach maximum/minimum number that system can hold. For example : exp(10E15) = A huge number = +Inf

Behavior : FPE will not generate an error at runtime or at compilation time (GCC/INTEL).

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler flags : -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid.
The fpe will be explicitly displayed at runtime.
ifort Compiler flags : -g -traceback -fpe0.
The fpe will be explicitly displayed at runtime.

Example :

program myprog
 implicit none
 real(8) :: d1,d2,d3
 
 d2 = 10.0d0
 d3 = 0.0d0
 d1 = d2 / d3
 
end program myprog
~$ gfortran -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid myprog.f90
~$ ./a.out 
 
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
 
Backtrace for this error:
#0  0x7FC0FE8877D7
#1  0x7FC0FE887DDE
#2  0x7FC0FE4DEC2F
#3  0x4006DD in myprog at myprog.f90:7
Floating point exception (core dumped)

Bug is at line 7.

Tracing in C

Compiler Way to trace bug
gcc and icc Add #include <fenv.h> in the main source file, then use feenableexcept(FE_DIVBYZERO| FE_INVALID|FE_OVERFLOW); juste after main.
Compiler flags : -g.
The fpe will generate a floating point error at runtime. Then use gdb to get informations on the code line generating the fpe.

Example :

#include <fenv.h>
int main(int argc, char **argv)
{
 feenableexcept(FE_DIVBYZERO| FE_INVALID|FE_OVERFLOW);
 double d1,d2,d3;
 
 d2 = 10.0;
 d3 = 0.0;
 d1 = d2 / d3;
}
~$ gcc -g myfile.c -lm
~$ ./a.out 
Floating point exception (core dumped)
~$ gdb a.out
(gdb) run
Starting program: /home/spehn/a.out 
 
Program received signal SIGFPE, Arithmetic exception.
0x0000000000400637 in main (argc=1, argv=0x7fffffffdf78) at myfile.c:9
9	 d1 = d2 / d3;
(gdb) 

Bug is at line 9.

Uninitialized variables

When you try to read a non initialized variable. The program may not stop, and all following calculations will be based on a random value. This is common with MPI programs (Ghosts, etc).
Three main types of initialized variables :

  • Static variable : variable uninitialized is static
  • Dynamic variable : variable uninitialized is dynamic
  • Not allocated variable : try to use a non allocated dynamic variable

Behavior :

  • Static variable : no error at runtime
  • Dynamic variable : no error at runtime
  • Not allocated variable : segmentation fault at runtime

Memcheck of Valgrind will let the program run and use uninitialized values, keeping track of these operations. It will only complain when a variable “goes out” of the program (printing in the terminal, writing in a file, etc). The error will be indicated at the line of this print/write. To get more informations on the variable uninitialized, use --track-origins=yes as Valgrind flag.

Tracing in Fortran

Compiler Way to trace bug
gfortran - static variable : Compiler options : -Wuninitialized -O -g -fbacktrace. Will display a warning at compilation time.
To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- dynamic variable : Compiler options : -g -fbacktrace.
Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- not allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime.
ifort - static variable : Compiler options : -check all. The error will be explicitly displayed at runtime.
Possibility to replace all uninitialized values by a huge number, use -ftrapuv
- dynamic variable : Compiler options : -g -traceback.
Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- not allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime.
program myprog
 
 implicit none
 real(8) :: d1,d2
 
 d1 = d2*10.0d0
 
end program myprog
~$ gfortran -Wuninitialized -g -fbacktrace myprog.f90 
myprog.f90: In function ‘myprog’:
myprog.f90:6:0: warning: ‘d2’ is used uninitialized in this function [-Wuninitialized]
  d1 = d2*10.0d0
 ^
~$ ifort -fpp -Duninitstatic myprog.f90 -g -check all
~$ ./a.out 
forrtl: severe (193): Run-Time Check Failure. The variable 'myprog_$D2' is being used without being defined
Image              PC                Routine            Line        Source             
a.out              0000000000402336  Unknown               Unknown  Unknown
libc.so.6          00007F3785537EC5  Unknown               Unknown  Unknown
a.out              0000000000402229  Unknown               Unknown  Unknown

Error is coming from variable D2. Adding -traceback would provide line information.

program myprog
 
 real(8), allocatable, dimension(:) :: d1,d2
 
 allocate(d1(1:10), d2(1:10))
 d1(3) = d2(4)*10.0d0
 print *,d1(3),d2(4)
 deallocate(d1)
 
end program myprog
~$ ifort myprog.f90 -g -traceback
~$ valgrind --track-origins=yes ./a.out
[...]
==21655== Conditional jump or move depends on uninitialised value(s)
==21655==    at 0x448595: cvt_ieee_t_to_text_ex (in /home/sphen/Downloads/a.out)
==21655==    by 0x426F22: for__format_value (in /home/sphen/Downloads/a.out)
==21655==    by 0x40AD5A: for_write_seq_lis_xmit (in /home/sphen/Downloads/a.out)
==21655==    by 0x4025C6: MAIN__ (myprog.f90:7)
==21655==    by 0x402335: main (in /home/sphen/Downloads/a.out)
==21655==  Uninitialised value was created by a heap allocation
==21655==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21655==    by 0x406518: for_alloc_allocatable (in /home/sphen/Downloads/a.out)
==21655==    by 0x4024C5: MAIN__ (myprog.f90:5)
==21655==    by 0x402335: main (in /home/sphen/Downloads/a.out)
[...]

Error is at line 7 and variable was created at line 5.

Tracing in C

Compiler Way to trace bug
gcc - static variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time.
To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- dynamic variable : Compiler options : -g.
Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- not allocated variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time.
To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)”
To get more informations, use gdb and ask backtrace.
icc - static variable : Compiler options : -Wuninitialized. Will display a warning at compilation time.
-g -traceback -check=uninit. The error will be explicitly displayed at runtime.
- dynamic variable : Compiler options : -g -traceback.
Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- not allocated variable : Compiler options : -Wuninitialized. Will display a warning at compilation time.
-g -traceback -check=uninit. The error will be explicitly displayed at runtime.

Allocation/deallocation issues

Tracing in Fortran

Compiler Way to trace bug
gfortran - free a non allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime.
- allocate an already allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime.
- not freed memory : Compiler options : -g -fbacktrace.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.
ifort - free a non allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime.
- allocate an already allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime.
- not freed memory : Compiler options : -g -traceback.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.

Tracing in C

Compiler Way to trace bug
gcc - free a non allocated variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time.
To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- allocate an already allocated variable : Compiler options : -g -fbacktrace.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.
- not freed memory : Compiler options : -g -fbacktrace.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.
icc - free a non allocated variable : Compiler options : -Wuninitialized. Will display a warning at compilation time.
-g -traceback -check=uninit. The error will be explicitly displayed at runtime.
- allocate an already allocated variable : Compiler options : -g -traceback.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.
- not freed memory : Compiler options : -g -traceback.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.

Array out of bound reading/writing

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler options : -g -fbacktrace -fbounds-check. The error will be explicitly displayed at runtime.
ifort Compiler options : -g -traceback -check all (or -check bounds). The error will be explicitly displayed at runtime.

Tracing in C

Compiler Way to trace bug
gcc Compiler options : -g. Use Valgrind, the error will be a “Invalid read/write of size 8/16”.
Or patch gcc and recompile it with bounds checking (http://sourceforge.net/projects/boundschecking/)
icc Compiler options : -g -traceback -check-pointers=rw. The error will be explicitly displayed at runtime.
Warning : check-pointers=rw makes all other debugging options not working when activated, be careful.

IO issues

IO errors are often very explicit. No need to use a debugging tool. However, Valgrind and fpe options can detect some related errors (bad reading = bad initialized value or = fpe, etc.)

Do not forget to set -g -fbacktrace (gfortran) or -g -traceback (icc/ifort) to get useful error information.

Simply be careful by securing all read/write (get output code and check it).

Memory leak

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler options : -g -fbacktrace. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost.
ifort Compiler options : -g -traceback. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost.

Tracing in C

Compiler Way to trace bug
gcc Compiler options : -g. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost.
icc Compiler options : -g -traceback. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost.

Stack overflow

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler options : -g -fbacktrace. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”.
gdb will catch it with backtrace but not a lot of informations.
ifort Compiler options : -g -traceback. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”.
gdb will catch it with backtrace but not a lot of informations.

Tracing in C

Compiler Way to trace bug
gcc Compiler options : -g. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”.
gdb will catch it with backtrace but not a lot of informations.
icc Compiler options : -g -traceback. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”.
gdb will catch it with backtrace but not a lot of informations.

Buffer overflow

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime.
ifort Compiler options : -g -traceback. The error will be explicitly displayed at runtime.

Tracing in C

Compiler Way to trace bug
gcc Compiler options : -g. Use gdb. Ask for backtrace after error, lot of informations.
icc Compiler options : -g -traceback -check-pointers=rw. The error will be explicitly displayed at runtime.
Warning : check-pointers=rw makes all other debugging options not working when activated, be careful.





This page is dedicated to debugging tools and options, and to optimizing tools and options. Of course there are other solutions, but I found these extremely useful. Note that a code should be tested with a debugger at each new implementation/modification ! And one should never try to optimize a code without debugging first.


Debugging


Few words


I will start from scratch, considering you do not know the basis.

There are many ways of debugging. The most common way is to print “hello1”, “hello2”, etc everywhere in the code, and see which was the last “hello” and by iteration, converging to the bug. This “brute force” method can be used in some cases, but require a lot of time, and some errors (memory errors most of the time) appears randomly, making them difficult to locate.

First, backup your code. When debugging, you may do bad things.

Then the best approach is to use an iterative way of methods and tools :

  1. Analyze the error message and understand it. Sometimes (with I/O (i.e. reading/writing on hard drive) for example), errors are explicit and lines are provided if you compiled with a -g / -g -backtrace option.
  2. If the error message is not useful (for example: segmentation fault, core dumped, glibc error, etc), use debugging options of the compiler. This will locate the error in 90 percent of cases and explain what is the problem.
  3. If you couldn't locate the error, or the message is still not sufficient, use a debugger. I suggest Valgrind for gcc based codes, and Intel Inspector for Intel based codes.
  4. If your error is with MPI communications, you will need to use a top level debugger, like Totalview, which is not free of charge, but most of the time is provided on supercomputers.
  5. If you still don't get the error, well, bad luck. It is time to use brute force, but not using hello messages. Truncate your code by removing a part, test it, then if it still fails, remove another part, or if it succeed, add another, etc. You will finish by locating the problem.

In any cases: do not try to debug more than 3 hours, and take some rest every hours. Even if you think so, you will not be efficient, you will make the code worse, you will correct your error but add others, and become crazy. Debug the morning, and do something else the afternoon.


Compilers options


Before using a debugger or an optimizing tool, the first things to use are Compilers options (also called flags). They are sufficient for more than 90 percent bugs and the optimization level is enough for most codes serial (i.e. non multithread/multiprocess) codes.

The compilation flags I often use :


Gnu gfortran


Debug :

-g -Wuninitialized -O -fbacktrace -fbounds-check -ffpe-trap=zero,underflow,overflow,invalid -ftrapv -fimplicit-none -fno-automatic

Preprocessing :

-cpp -DMYPREPRO


Intel ifort


Light Debug :

-g -debug -traceback -fp-stack-check

Hard Debug :

-g -debug -traceback -check all -implicitnone -warn all -fpe0 -fp-stack-check -ftrapuv -heap-arrays -gen-interface -warn interface

Preprocessing :

-fpp -DMYPREPRO

Important : In order to use debuggers tools that we will see after, gcc/g++/gfortran programs need to be compiled using the -g flag (and no others debug flags) and the optimization flags desired. In the same way, icc/icpc/ifort programs need to be compiled using the -g -traceback flags and the optimization flags desired.


Debug using compilation options


Consider this program in fortran (fortran is very similar to C) :

bug.f90
Program Bug
 
 implicit none
 real, allocatable, dimension(:) :: tab
 
 allocate(tab(1:10))
 tab(:) = 1.0
 call Buggy()
 deallocate(tab)
 
 contains
 
 Subroutine Buggy()
 
 print *, tab(11)
 
 End Subroutine Buggy
 
End Program Bug

If compiled using no options or optim options and then execute :

gfortran bug.f90
./a.out

You get, with no errors or warnings :

1.85398793E-40

The same using ifort compiler. You know this result is absurd, but you want to locate the error. When compiled with debug options :

gfortran bug.f90 -g -Wuninitialized -O -fbacktrace -fbounds-check -ffpe-trap=zero,underflow,overflow,invalid -ftrapv -fimplicit-none -fno-automatic
./a.out

You get :

At line 15 of file bug.f90
Fortran runtime error: Array reference out of bounds for array 'tab', upper bound of dimension 1 exceeded (11 > 10)
 
Backtrace for this error:
+ function buggy (0x400A70)
at line 15 of file bug.f90
+ function bug (0x400BE4)
at line 9 of file bug.f90
+ /lib/libc.so.6(__libc_start_main+0xfd) [0x7fa49f04ac4d]

Which is simple to use: you made an error, line 15 of file bug.f90, the array tab has been called with 11 when it's size is not more than 10 (in fortran, arrays start at 1).

Now, using ifort :

ifort bug.f90 -g -debug -traceback -check all -implicitnone -warn all -fpe0 -fp-stack-check -ftrapuv -heap-arrays -gen-interface -warn interface
./a.out
forrtl: severe (408): fort: (2): Subscript #1 of the array TAB has value 11 which is greater than the upper bound of 10
 
Image PC Routine Line Source
a.out 000000000046AA2E Unknown Unknown Unknown
a.out 00000000004694C6 Unknown Unknown Unknown
a.out 0000000000422242 Unknown Unknown Unknown
a.out 0000000000404AFB Unknown Unknown Unknown
a.out 0000000000405011 Unknown Unknown Unknown
a.out 000000000040356E bug_IP_buggy_ 15 bug.f90
a.out 0000000000403252 MAIN__ 8 bug.f90
a.out 0000000000402B8C Unknown Unknown Unknown
libc.so.6 00007F9CBFFFBEA5 Unknown Unknown Unknown
a.out 0000000000402A89 Unknown Unknown Unknown

Which is also easy to understand (using line and source, you can see that main call buggy at line 8, and that buggy created the error at line 15).

Using these methods, you can locate most of bugs.

If it is not enough, or if your bug disappear using these options (can append), then you may need to use a debugger.


Debug using a debugger


Some says gdb is better, others valgrind is better. In fact, both are good. I am just used to valgrind, so I will present this one. Note that valgrind can also be used to profile the code, check memory leaks, test cache use, etc. We will see that in the optimisation section. Note also that valgrind support MPI implementation if built with it. Last point: valgrind will slow down A LOT your execution and is extremely talkative. If the bug appears after a long time of run, and that you know in which part of the code it occurs, you may use special flags to tell valgrind monitor only this part (see valgrind documentation).

Let's re-use our previous code. To use valgrind, you have to compile using -g option, combined with optimisation flags if your code use them in normal time.

gfortran bug.f90 -g -O3
valgrind ./a.out
 
==25150== Memcheck, a memory error detector
==25150== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==25150== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==25150== Command: ./a.out
==25150==
==25150== Invalid read of size 4
==25150== at 0x4F13EF0: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==25150== by 0x4F15AAE: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==25150== by 0x4F165FE: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==25150== by 0x40093B: MAIN__ (bug.f90:15)
==25150== by 0x4007AC: main (bug.f90:9)
==25150== Address 0x5c634e8 is 0 bytes after a block of size 40 alloc'd
==25150== at 0x4C2CD7B: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25150== by 0x4008B1: MAIN__ (bug.f90:6)
==25150== by 0x4007AC: main (bug.f90:9)
==25150==
0.00000000
==25150==
==25150== HEAP SUMMARY:
==25150== in use at exit: 0 bytes in 0 blocks
==25150== total heap usage: 23 allocs, 23 frees, 12,076 bytes allocated
==25150==
==25150== All heap blocks were freed -- no leaks are possible
==25150==
==25150== For counts of detected and suppressed errors, rerun with: -v
==25150== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

OK, more difficult to understand, but valgrind locate near everything and is made for more advanced users, you will have to deal with it.

Some error report using valgrind are :

  • Invalid read of size 4: you tried to read a non-existing value of float/real(4) type.
  • Invalid read of size 8: you tried to read a non-existing value of double/real(8) type.
  • Invalid write of size 4: you tried to write a non-existing value of float/real(4) type.
  • Invalid write of size 8: you tried to write a non-existing value of double/real(8) type.
  • Conditional jump or move depends on uninitialized value: you are reading a value (int/integer, float/real, etc) that has not be initialized (no value)
  • Access not within mapped region at address… Stack Overflow : this error appears often using multithreading. it means your program made a stack overflow, i.e. you tried to allocate to much on the stack. Locate huge arrays, and allocate them on the heap (do not forget that in some multithread implementations like OpenMP, each sub thread allocate it's duplicated values on the main program stack. If you use too much threads, your stack will not be enough).

Last things on valgrind :

To use it in parallel, using MPI :

mpirun -np 4 valgrind ./myprog.exe

Note that valgrind will display many identic errors, even when there are only one (because you may repeat this error a lot of time). Try to find the first error, and then use this message as a starting point.

But ! Some libs (like MPI libs, etc) also contain bugs, often at start up, and valgrind will display them. I strongly suggest you add a print at the beginning of your code (at first line), and then when analysing valgrind output, do not consider errors before this print.


Optimization


Few words

  • Do not use exotic optimization flags you don't understand. Prefer basic flags with a clean code (will let the compiler do good choices).
  • Be extremely careful with -O3 and -O4 as they do not conserve numerical precision. If doing mathematical calculations, prefer -O2.
  • Try to maximize the use of vectorial instructions (SIMDs) by giving the compiler a clean code.

External resources

Basics

To get all performances from compilers, and considering you are compiling on the same computer architecture (CP, MB, etc) you are running calculations, use the following options :

Gnu gfortran

Standard :

-O2 -march=native -mtune=native

Hard optimization (use carefully, may slow down or gave wrong results) :

-O4 -ffast-math -fforce-addr -fstrength-reduce -frerun-cse-after-loop -fexpensive-optimizations -fcaller-saves -funroll-loops -funroll-all-loops -fno-rerun-loop-opt

Intel ifort

-O2 -fast

If you get problems with -fast (probably because of static missing libraries), replace with -xHost -no-prec-div -ipo. If you still have problems (linking), replace -ipo by -ip.