===== Nagios server ===== Nagios allows monitoring servers and detecting any unexpected behavior (considering a probe was setup to detect it). We will monitor here disk space, services, and ib network. Add nagios user and nagcmd, a user used to launch nagios commands. groupadd nagios useradd -m -g nagios nagios passwd nagios groupadd nagcmd usermod -g nagcmd nagios usermod -g nagcmd apache Install needed packages (built previously, see preparing install). yum install nagios nagios-contrib nagios-debuginfo nagios-devel nagios-plugins nagios-plugins-debuginfo Edit /usr/local/nagios/etc/objects/contacts.cfg and add here your email address to let nagios know where to send alerts, and also set admin user as nagiosadmin. define contact{ contact_name nagiosadmin ; Short name of user use generic-contact ; Inherit default values from generic-contact template (defined above) alias Administrateur Nagios ; Full name of user email root@localhost ; Adresse Email pour les notifications } define contactgroup{ contactgroup_name admins alias Administrateurs Nagios members nagiosadmin } Then generate a password for the nagiosadmin user, which will be used in the web interface, and restart httpd: htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin service httpd restart Now, check the configuration, it should be OK: /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg […] Total Warnings: 0 Total Errors: 0 And start nagios: /etc/init.d/nagios start It is now possible to login into nagios interface at http://localhost/nagios Time now to configure nagios to monitor our servers. mkdir /usr/local/nagios/etc/linux_servers/ Edit /usr/local/nagios/etc/objects/commands.cfg and add check_nrpe command: ############################################################################### # COMMANDS.CFG - SAMPLE COMMAND DEFINITIONS FOR NAGIOS 4.1.1 # # # NOTES: This config file provides you with some example command definitions # that you can reference in host, service, and contact definitions. # # You don't need to keep commands in a separate file from your other # object definitions. This has been done just to make things easier to # understand. # ############################################################################### # 'check_nrep' command definition define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -u -c $ARG1$ -a $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ } Then, in /usr/local/nagios/etc/nagios.cfg, add our linux_serverss directory. This directory is where we will store our configuration files of servers monitored. # You can also tell Nagios to process all config files (with a .cfg # extension) in a particular directory by using the cfg_dir # directive as shown below: #cfg_dir=/usr/local/nagios/etc/servers #cfg_dir=/usr/local/nagios/etc/printers #cfg_dir=/usr/local/nagios/etc/switches #cfg_dir=/usr/local/nagios/etc/routers cfg_dir=/usr/local/nagios/etc/linux_servers Then create a group for Linux servers in /usr/local/nagios/etc/linux_servers/groupe_linux_servers.cfg: # Define a hostgroup for Linux machines # All hosts that use the linux-server template will automatically be a member of this group define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Serveurs Linux ; Long name of the group members compute1,compute2 ; separes par des virgules } And add servers and services in /usr/local/nagios/etc/linux_servers/servprod.cfg: # Host definition define host{ use linux-server ; host_name compute1 ; alias Serveur compute1 ; address compute1 ; } define host{ use linux-server ; host_name compute2 ; alias Serveur compute2 ; address compute2 ; } # Check disc space define service{ use generic-service hostgroup_name linux-servers service_description Espace disque / check_command check_nrpe!check_disk!80%!90%!/ } # Check cpu load define service{ use generic-service hostgroup_name linux-servers service_description Charge CPU check_command check_nrpe!check_load!80!90 } # Check number of users logged in define service{ use generic-service hostgroup_name linux-servers service_description Nombre utilisateurs check_command check_nrpe!check_users!2!10 } Then, comment the following lines in /usr/local/nagios/etc/objects/localhost.cfg: # Define an optional hostgroup for Linux machines #define hostgroup{ # hostgroup_name linux-servers ; The name of the hostgroup # alias Linux Servers ; Long name of the group # members localhost ; Comma separated list of hosts that belong to this group # } Set rights, test configuration, and restart nagios: chown -R nagios.nagios /usr/local/nagios/etc/serveurs_* /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg /etc/init.d/nagios start We will add specific probes in nagios later. For now, nagios allow you to know which hosts are up, and their basic information. ===== Nagios ===== Add nagios group and user: groupadd nagios && useradd nagios -g nagios -p nagios Then install nagios-plugin and nrpe: yum install nagios-plugins nagios-plugins-debuginfo nrpe nrpe-debuginfo nrpe-plugin Edit /etc/xinetd.d/nrpe and add the server ip, to allow it to query information. # default: on # description: NRPE (Nagios Remote Plugin Executor) service nrpe { flags = REUSE socket_type = stream port = 5666 wait = no user = nagios group = nagios server = /usr/local/nagios/bin/nrpe server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd log_on_failure += USERID disable = no only_from = 127.0.0.1 10.1.0.1 } Then edit /etc/services and add (or uncomment) the following line at the right place: nrpe 5666/tcp # NRPE Then restart xinetd, and check it is listening using netstat: systemctl restart xinetd yum install net-tools [root@compute1 ~]# netstat -at | grep nrpe tcp6 0 0 [::]:nrpe [::]:* LISTEN [root@compute1 ~]# Edit /usr/local/nagios/etc/nrpe.cfg, and uncomment the following lines: command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$ command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$ command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ Now test locally: [root@compute1 ~]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 NRPE v2.15 [root@compute1 ~]# /usr/local/nagios/libexec/check_disk -w 10% -c 5% -u GB DISK OK - free space: / 2 GB (70% inode=85%); /dev 0 GB (100% inode=99%); /dev/shm 0 GB (100% inode=99%); /run 0 GB (98% inode=99%); /sys/fs/cgroup 0 GB (100% inode=99%); /boot 3 GB (97% inode=99%); /run/user/0 0 GB (100% inode=99%);| /=1GB;2;2;0;3 /dev=0GB;0;0;0;0 /dev/shm=0GB;0;0;0;0 /run=0GB;0;0;0;0 /sys/fs/cgroup=0GB;0;0;0;0 /boot=0GB;2;2;0;3 /run/user/0=0GB;0;0;0;0 [root@compute1 ~]# Nagios server should now be able to reach the client.