This page is dedicated to debugging methods for HPC codes. New HPC developers should know these basic options to save time in their work.
All methods here provide a way to trace the related bug, which means finding the exact code line that is generating the bug.
I try to keep these pages up to date, but some flags may be deprecated.
Beware: some things named bug here may not be bugs but only mathematical/physical results. For example, a calculation may finish with a result just too high to be stored in 64bit memory. In fact, this is not really a bug, just a limitation, code and calculations are good.
If you are new in HPC programming or in debugging, a small tutorial on how to use the following flags is available. See help debug. There are also examples for FPE and Uninitiliazed values debugging. All methods are then based on the same philosophy.
For reference :
Compilers used :
Tools used :
Files used to simulate most of common bugs : deb_f.f90 , deb_c.c.
When developing HPC programs, bugs encountered are often the sames. Here is a list of most common bugs :
There are many other types of bugs, but these are the most common and the most easy to solve when using the appropriate tools.
First of all is to identify the presence of a bug :
How to get the exit code of a program ?
~$ gfortran myokprog.f90 ~$ ./a.out Hello world ! ~$ echo $? 0
~$ gfortran mybugprog.f90 ~$ ./a.out Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0 0x7FFC993C87D7 #1 0x7FFC993C8DDE #2 0x7FFC9901FC2F Segmentation fault (core dumped) $ echo $? 139
Here is the list of debug flags/tools to use to trace bugs discussed above. First part is generic (Quick debug strategy), while the second part is specific for each bug.
Most of the time, these compilation options will find your bug (except for gcc which has only few debug options) :
Compiler | Compiler options |
---|---|
gfortran | -Wuninitialized -O -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid -fbounds-check -fimplicit-none -ftrapv |
gcc | -g -Wall |
ifort | -g -traceback -fpe0 -check all -ftrapuv -fp-stack-check -warn all -no-ftz |
icc | Test 1 : -g -traceback -check=uninit -fp-stack-check -no-ftz Test 2 : -g -traceback -check-pointers=rw |
If C code, try FPE strategy (see below).
If not enough, compile with :
Compiler | Compiler options |
---|---|
gfortran | -g -fbacktrace |
gcc | -g |
ifort | -g -traceback |
icc | -g -traceback |
And launch the program with valgrind :
~$ valgrind myprog.exe
Most of the time it will get the error.
There are three types of FPE :
Behavior : FPE will not generate an error at runtime or at compilation time (GCC/INTEL).
Compiler | Way to trace bug |
---|---|
gfortran | Compiler flags : -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid. The fpe will be explicitly displayed at runtime. |
ifort | Compiler flags : -g -traceback -fpe0. The fpe will be explicitly displayed at runtime. |
Compiler | Way to trace bug |
---|---|
gcc and icc | Add #include <fenv.h> in the main source file, then use feenableexcept(FE_DIVBYZERO| FE_INVALID|FE_OVERFLOW); juste after main. Compiler flags : -g. The fpe will generate a floating point error at runtime. Then use gdb to get informations on the code line generating the fpe. |
When you try to read a non initialized variable. The program may not stop, and all following calculations will be based on a random value. This is common with MPI programs (Ghosts, etc).
Three main types of initialized variables :
Behavior :
Memcheck of Valgrind will let the program run and use uninitialized values, keeping track of these operations. It will only complain when a variable “goes out” of the program (printing in the terminal, writing in a file, etc). The error will be indicated at the line of this print/write. To get more informations on the variable uninitialized, use --track-origins=yes as Valgrind flag.
Compiler | Way to trace bug |
---|---|
gfortran | - static variable : Compiler options : -Wuninitialized -O -g -fbacktrace. Will display a warning at compilation time. To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
- dynamic variable : Compiler options : -g -fbacktrace. Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
|
- not allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime. | |
ifort | - static variable : Compiler options : -check all. The error will be explicitly displayed at runtime. Possibility to replace all uninitialized values by a huge number, use -ftrapuv |
- dynamic variable : Compiler options : -g -traceback. Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
|
- not allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime. |
Compiler | Way to trace bug |
---|---|
gcc | - static variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time. To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
- dynamic variable : Compiler options : -g. Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
|
- not allocated variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time. To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)” To get more informations, use gdb and ask backtrace. |
|
icc | - static variable : Compiler options : -Wuninitialized. Will display a warning at compilation time. -g -traceback -check=uninit. The error will be explicitly displayed at runtime. |
- dynamic variable : Compiler options : -g -traceback. Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
|
- not allocated variable : Compiler options : -Wuninitialized. Will display a warning at compilation time. -g -traceback -check=uninit. The error will be explicitly displayed at runtime. |
Compiler | Way to trace bug |
---|---|
gfortran | - free a non allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime. |
- allocate an already allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime. | |
- not freed memory : Compiler options : -g -fbacktrace. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
|
ifort | - free a non allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime. |
- allocate an already allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime. | |
- not freed memory : Compiler options : -g -traceback. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
Compiler | Way to trace bug |
---|---|
gcc | - free a non allocated variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time. To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
- allocate an already allocated variable : Compiler options : -g -fbacktrace. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
|
- not freed memory : Compiler options : -g -fbacktrace. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
|
icc | - free a non allocated variable : Compiler options : -Wuninitialized. Will display a warning at compilation time. -g -traceback -check=uninit. The error will be explicitly displayed at runtime. |
- allocate an already allocated variable : Compiler options : -g -traceback. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
|
- not freed memory : Compiler options : -g -traceback. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
Compiler | Way to trace bug |
---|---|
gfortran | Compiler options : -g -fbacktrace -fbounds-check. The error will be explicitly displayed at runtime. |
ifort | Compiler options : -g -traceback -check all (or -check bounds). The error will be explicitly displayed at runtime. |
Compiler | Way to trace bug |
---|---|
gcc | Compiler options : -g. Use Valgrind, the error will be a “Invalid read/write of size 8/16”. Or patch gcc and recompile it with bounds checking (http://sourceforge.net/projects/boundschecking/) |
icc | Compiler options : -g -traceback -check-pointers=rw. The error will be explicitly displayed at runtime. Warning : check-pointers=rw makes all other debugging options not working when activated, be careful. |
IO errors are often very explicit. No need to use a debugging tool. However, Valgrind and fpe options can detect some related errors (bad reading = bad initialized value or = fpe, etc.)
Do not forget to set -g -fbacktrace (gfortran) or -g -traceback (icc/ifort) to get useful error information.
Simply be careful by securing all read/write (get output code and check it).
Compiler | Way to trace bug |
---|---|
gfortran | Compiler options : -g -fbacktrace. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost. |
ifort | Compiler options : -g -traceback. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost. |
Compiler | Way to trace bug |
---|---|
gcc | Compiler options : -g. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost. |
icc | Compiler options : -g -traceback. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost. |
Compiler | Way to trace bug |
---|---|
gfortran | Compiler options : -g -fbacktrace. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”. gdb will catch it with backtrace but not a lot of informations. |
ifort | Compiler options : -g -traceback. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”. gdb will catch it with backtrace but not a lot of informations. |
Compiler | Way to trace bug |
---|---|
gcc | Compiler options : -g. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”. gdb will catch it with backtrace but not a lot of informations. |
icc | Compiler options : -g -traceback. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”. gdb will catch it with backtrace but not a lot of informations. |
Compiler | Way to trace bug |
---|---|
gfortran | Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime. |
ifort | Compiler options : -g -traceback. The error will be explicitly displayed at runtime. |
Compiler | Way to trace bug |
---|---|
gcc | Compiler options : -g. Use gdb. Ask for backtrace after error, lot of informations. |
icc | Compiler options : -g -traceback -check-pointers=rw. The error will be explicitly displayed at runtime. Warning : check-pointers=rw makes all other debugging options not working when activated, be careful. |