User Tools

Site Tools


Site Tools

Debugging HPC programs


This page is dedicated to debugging methods for HPC codes. New HPC developers should know these basic options to save time in their work. All methods here provide a way to trace the related bug, which means finding the exact code line that is generating the bug.

I try to keep these pages up to date, but some flags may be deprecated.

Beware: some things named bug here may not be bugs but only mathematical/physical results. For example, a calculation may finish with a result just too high to be stored in 64bit memory. In fact, this is not really a bug, just a limitation, code and calculations are good.

If you are new in HPC programming or in debugging, a small tutorial on how to use the following flags is available. See help debug. There are also examples for FPE and Uninitiliazed values debugging. All methods are then based on the same philosophy.

For reference :
Compilers used :

  • gcc and gfortran 4.8.2 (from ubuntu 14.04 x86_64)
  • icc and ifort 14.0.3 (from Intel Parallel Studio 2013 SP1 x86_64)

Tools used :

  • valgrind 3.10.0
  • gdb 7.7

Files used to simulate most of common bugs : deb_f.f90 , deb_c.c.

Main types of bugs

When developing HPC programs, bugs encountered are often the sames. Here is a list of most common bugs :

There are many other types of bugs, but these are the most common and the most easy to solve when using the appropriate tools.

When could there be a bug ?

First of all is to identify the presence of a bug :

  • Program returns an error message
  • Program returns an error exit code (other than 0)
  • Program finishes with NaN or +Inf values
  • Program ends unexpectedly
  • Other cases, many scenario are possible

How to get the exit code of a program ?

  • $? gives you the exit code of the last executed command.
  • Other than 0 means something went wrong, and this code may help you understand why.
~$ gfortran myokprog.f90
~$ ./a.out
Hello world !
~$ echo $?
0
~$ gfortran mybugprog.f90
~$ ./a.out
Program received signal SIGSEGV: Segmentation
fault - invalid memory reference.
Backtrace for this error:
#0 0x7FFC993C87D7
#1 0x7FFC993C8DDE
#2 0x7FFC9901FC2F
Segmentation fault (core dumped)
$ echo $?
139

How to find them

Here is the list of debug flags/tools to use to trace bugs discussed above. First part is generic (Quick debug strategy), while the second part is specific for each bug.

Quick debug strategy

Most of the time, these compilation options will find your bug (except for gcc which has only few debug options) :

Compiler Compiler options
gfortran -Wuninitialized -O -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid -fbounds-check -fimplicit-none -ftrapv
gcc -g -Wall
ifort -g -traceback -fpe0 -check all -ftrapuv -fp-stack-check -warn all -no-ftz
icc Test 1 : -g -traceback -check=uninit -fp-stack-check -no-ftz
Test 2 : -g -traceback -check-pointers=rw

If C code, try FPE strategy (see below).

If not enough, compile with :

Compiler Compiler options
gfortran -g -fbacktrace
gcc -g
ifort -g -traceback
icc -g -traceback

And launch the program with valgrind :

~$ valgrind myprog.exe

Most of the time it will get the error.

Floating Point exception

There are three types of FPE :

  • Zero : when you divide by zero, very common in HPC. For example : A/0.0=+∞
  • Invalid : when the operation is mathematically impossible. For example : acos(10.0) = NaN
  • Overflow/Underflow : when you reach maximum/minimum number that system can hold. For example : exp(10E15) = A huge number = +Inf

Behavior : FPE will not generate an error at runtime or at compilation time (GCC/INTEL).

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler flags : -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid.
The fpe will be explicitly displayed at runtime.
ifort Compiler flags : -g -traceback -fpe0.
The fpe will be explicitly displayed at runtime.

Tracing in C

Compiler Way to trace bug
gcc and icc Add #include <fenv.h> in the main source file, then use feenableexcept(FE_DIVBYZERO| FE_INVALID|FE_OVERFLOW); juste after main.
Compiler flags : -g.
The fpe will generate a floating point error at runtime. Then use gdb to get informations on the code line generating the fpe.

Uninitialized variables

When you try to read a non initialized variable. The program may not stop, and all following calculations will be based on a random value. This is common with MPI programs (Ghosts, etc).
Three main types of initialized variables :

  • Static variable : variable uninitialized is static
  • Dynamic variable : variable uninitialized is dynamic
  • Not allocated variable : try to use a non allocated dynamic variable

Behavior :

  • Static variable : no error at runtime
  • Dynamic variable : no error at runtime
  • Not allocated variable : segmentation fault at runtime

Memcheck of Valgrind will let the program run and use uninitialized values, keeping track of these operations. It will only complain when a variable “goes out” of the program (printing in the terminal, writing in a file, etc). The error will be indicated at the line of this print/write. To get more informations on the variable uninitialized, use --track-origins=yes as Valgrind flag.

Tracing in Fortran

Compiler Way to trace bug
gfortran - static variable : Compiler options : -Wuninitialized -O -g -fbacktrace. Will display a warning at compilation time.
To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- dynamic variable : Compiler options : -g -fbacktrace.
Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- not allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime.
ifort - static variable : Compiler options : -check all. The error will be explicitly displayed at runtime.
Possibility to replace all uninitialized values by a huge number, use -ftrapuv
- dynamic variable : Compiler options : -g -traceback.
Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- not allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime.

Tracing in C

Compiler Way to trace bug
gcc - static variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time.
To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- dynamic variable : Compiler options : -g.
Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- not allocated variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time.
To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)”
To get more informations, use gdb and ask backtrace.
icc - static variable : Compiler options : -Wuninitialized. Will display a warning at compilation time.
-g -traceback -check=uninit. The error will be explicitly displayed at runtime.
- dynamic variable : Compiler options : -g -traceback.
Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- not allocated variable : Compiler options : -Wuninitialized. Will display a warning at compilation time.
-g -traceback -check=uninit. The error will be explicitly displayed at runtime.

Allocation/deallocation issues

Tracing in Fortran

Compiler Way to trace bug
gfortran - free a non allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime.
- allocate an already allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime.
- not freed memory : Compiler options : -g -fbacktrace.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.
ifort - free a non allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime.
- allocate an already allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime.
- not freed memory : Compiler options : -g -traceback.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.

Tracing in C

Compiler Way to trace bug
gcc - free a non allocated variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time.
To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)”
- allocate an already allocated variable : Compiler options : -g -fbacktrace.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.
- not freed memory : Compiler options : -g -fbacktrace.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.
icc - free a non allocated variable : Compiler options : -Wuninitialized. Will display a warning at compilation time.
-g -traceback -check=uninit. The error will be explicitly displayed at runtime.
- allocate an already allocated variable : Compiler options : -g -traceback.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.
- not freed memory : Compiler options : -g -traceback.
Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost.

Array out of bound reading/writing

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler options : -g -fbacktrace -fbounds-check. The error will be explicitly displayed at runtime.
ifort Compiler options : -g -traceback -check all (or -check bounds). The error will be explicitly displayed at runtime.

Tracing in C

Compiler Way to trace bug
gcc Compiler options : -g. Use Valgrind, the error will be a “Invalid read/write of size 8/16”.
Or patch gcc and recompile it with bounds checking (http://sourceforge.net/projects/boundschecking/)
icc Compiler options : -g -traceback -check-pointers=rw. The error will be explicitly displayed at runtime.
Warning : check-pointers=rw makes all other debugging options not working when activated, be careful.

IO issues

IO errors are often very explicit. No need to use a debugging tool. However, Valgrind and fpe options can detect some related errors (bad reading = bad initialized value or = fpe, etc.)

Do not forget to set -g -fbacktrace (gfortran) or -g -traceback (icc/ifort) to get useful error information.

Simply be careful by securing all read/write (get output code and check it).

Memory leak

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler options : -g -fbacktrace. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost.
ifort Compiler options : -g -traceback. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost.

Tracing in C

Compiler Way to trace bug
gcc Compiler options : -g. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost.
icc Compiler options : -g -traceback. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost.

Stack overflow

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler options : -g -fbacktrace. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”.
gdb will catch it with backtrace but not a lot of informations.
ifort Compiler options : -g -traceback. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”.
gdb will catch it with backtrace but not a lot of informations.

Tracing in C

Compiler Way to trace bug
gcc Compiler options : -g. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”.
gdb will catch it with backtrace but not a lot of informations.
icc Compiler options : -g -traceback. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”.
gdb will catch it with backtrace but not a lot of informations.

Buffer overflow

Tracing in Fortran

Compiler Way to trace bug
gfortran Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime.
ifort Compiler options : -g -traceback. The error will be explicitly displayed at runtime.

Tracing in C

Compiler Way to trace bug
gcc Compiler options : -g. Use gdb. Ask for backtrace after error, lot of informations.
icc Compiler options : -g -traceback -check-pointers=rw. The error will be explicitly displayed at runtime.
Warning : check-pointers=rw makes all other debugging options not working when activated, be careful.