User Tools

Site Tools


Site Tools

Installing a basic HPC cluster


Last update : 28/01/2015

The aim here is to install a basic HPC cluster based on opensource softwares. Instead of using RedHat OS, we are using CentOS (which is a Redhat clone).
4 Types of nodes :

  • Login : minimal (core), 3 interfaces : eth0 to Internet, eth1 to nodes, ib0 to nodes (infiniband)
  • Nodes : minimal (core), 2 interfaces : eth0, ib0
  • Admin : minimal (core), 2 inferfaces : eth0 to you, eth1 to login/nodes
  • Store : minimal (core), 2 interface : eth0, ib0

Users ssh on login, compile here, manage data here, and submit jobs here. Admin hosts services like ldap, jobscheduler, monitoring, etc. Store hosts nfs service for data. IB subnet manager can be on login node or store node.


                                                                                       +-------------+ 192.168.0.10
                                                                           YOU --------+    admin    +----------+
                                                                                       +-------------+          |
                                                                                                                |
                                                                                                                |
                                                                                                                |
                                                                   eth0 192.168.   1.1         1.2         1.3  |
                                                                          +---------+-----------+-----------+---+----......
                                  +---------+  eth1, ip : 192.168.0.1     |         |           |           |
      eth0, ip : xxx.xxx.xxx.xxx  |         +-----------------------------+    +----+----+ +----+----+ +----+----+
WEB ------------------------------+  login  |                                  |  node1  | |  node2  | |  node3  |
                                  |         +-----------------------------+    +----+----+ +----+----+ +----+----+
                                  +---------+  ib0,  ip : 10.0.0.1        |         |           |           |
                                                                          +---------+-----------+-----------+---+----......
                                                                   ib0 10.0.       1.1         1.2         1.3  |
                                                                                                                |
                                                                                                                |
                                                                                                                |
                                                                                       +-------------+          |
                                                                      to Ethernet  ----+   storage   +----------+
                                                                      192.168.0.20     +-------------+ 10.0.0.20      
           

Few notes before we start :

  • Keep the system as much simple as possible, and rely on softwares in repository instead of compiling everything to get update easily. You will then be able to have spear time to focus on other things.
  • If you plan to use a RAID controller, I strongly suggest you reverse engineer it by simulating scenarios of failures and test it's responses. Then, write somewhere how to use it properly, it may save you in the future, RAID controllers can be a real source of data losses and production time losses if you do not know how to use them.
  • On some nodes, you may want to disable RAID. However, with some motherboards (Dell for example), you cannot deactivate it. A tip is to use raid 0 with only one disk per virtual drive to use no RAID.

System used here is CentOS 7.0.1406. To install first nodes from a usb key, see : http://wiki.centos.org/HowTos/InstallFromUSBkey eth0 and eth1 are respectively enp6s2f0 and enp6s2f1 on the node here.

Login

Install login node using the following base packages groups : minimal. It is important to minimize attack surface on this very exposed node, so we will use the minimal required things.
Install help : http://www.tecmint.com/centos-7-installation/
Note : if you are using a VM, set RAM to 1Go minimum to have GUI and be able to choose packages.

Change root shell color

This should be done to protect system from you. When you see red color, you should be careful.
Add (for bold and red) :

PS1="\[\e[01;31m\]\h:\w#\[\e[00;m\] "

in /root/.bashrc

Update

Then, yum update and answer Yes. Few warnings :

warning: rpmts_HdrFromFdno: Header V3 RSA/SHA1 Signature, key ID c105b9de: NOKEY
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
Importing GPG key 0xC105B9DE:
 Userid : CentOS-6 Key (CentOS 6 Official Signing Key) <centos-6-key@centos.org>
 Package: centos-release-6-5.el6.centos.11.1.x86_64 (@anaconda-CentOS-201311272149.x86_64/6.5)
 From   : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
Is this ok [y/N]: y

Say Yes if the URL is correct.

Then reboot to make sure you use the last kernel and have the correct red color in your shell.

SeLinux

Now, check if SELinux is active and set to Enforcing :

login0:~# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted
login0:~# 

OK. This is a good idea to keep SELinux activated on the login node. We will deactivate it however on calculations nodes for performances reasons.
Important : SELinux is a good way to enforce security only if you follow standards. If you like to tune many things, install softwares in non standards locations, etc, you should deactivate it, has it can become a serious threat to the system if badly used.
You can try :

Last thing : on login node, when working and facing SELinux blockages, do not deactivate it, but set it to permissive temporary until you finish modifications (this will continue to write attributes to files).

Few network adjustments

Then first, let's secure ssh. (I installed vim to edit files : yum install vim). We will add a standard user that we will use to login on login node, and then we will use su command to rise to root. This allows to deactivate root login on ssh. I prefer adduser over useradd :

login:~# adduser sphen
login:~# passwd sphen
Changing password for user sphen.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.
login:~#

Check you can su to this user, exit, logout and try to login using this user and then get root. If OK, you can follow the next instructions. Edit /etc/ssh/sshd_config and uncomment PermitRootLogin and set it to no :

- #PermitRootLogin yes
+ PermitRootLogin no

Then restart service.

login0:~# service sshd restart
Stopping sshd:                                             [  OK  ]
Starting sshd:                                             [  OK  ]
login0:~# 

Now, because this is a simple local network and for performances reasons, let's disable ipv6. Keep in mind that doing this will generate errors/warnings in logs (https://bugzilla.redhat.com/show_bug.cgi?id=641836).
Edit /etc/default/grub and add ipv6.disable=1 to GRUB_CMDLINE_LINUX :

GRUB_CMDLINE_LINUX="ipv6.disable=1 vconsole.keymap=us crashkernel=auto  vconsole.font=latarcyrheb-sun16 rhgb quiet"

Then apply changes :

login0:~# grub2-mkconfig -o /boot/grub2/grub.cfg

Reboot to apply changes, try login using root (system should not allow that now we adjusted sshd_config), then login using sphen user and su to root, and check ipv6 status using :

ip addr show | grep inet6

If nothing appear, it's all good.

Since RHEL7/CentOS7, iptables basic service has been replace by firewalld. However, iptables generated by such a firewall are difficult to read. Let's reverse on basic iptables :

yum install iptables-services
systemctl mask firewalld.service
systemctl enable iptables.service
systemctl stop firewalld.service
systemctl start iptables.service

Chek firewall (iptables) rules :

login:~# iptables-save
# Generated by iptables-save v1.4.21 on Wed Jan 28 08:57:40 2015
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [54:5088]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
# Completed on Wed Jan 28 08:57:40 2015

Check if iptables is enable by default :

login:~# systemctl -a | grep iptables.service
iptables.service                                                                                         loaded active   exited    IPv4 firewall with iptables
login:~# systemctl -a | grep firewalld.service
firewalld.service                                                                                        masked inactive dead      firewalld.service
login:~# 

OK.

Now let's install fail2ban to prevent basic brute force. It is not in the CentOS repository. You can use EPEL (EPEL) or install it manually. Do not forget to “yum install wget” to have it available :

wget http://mirrors.ircam.fr/pub/fedora/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
yum --nogpgcheck localinstall epel-release-7-5.noarch.rpm
yum update
yum search fail2ban
yum install fail2ban
systemctl enable fail2ban.service
systemctl start fail2ban.service

Check using another computer (ip) and try ssh many times with false password. You should be rapidly baned.

Internal Network

Now let's configure the second interface, the one connected to internal network. Note that depending of your kernel, Ethernet cards may not be in the same order, use MAC addresses to identify each one if this is the case (by default, CentOS set HWADDR value). Set ip, static here, and all other informations in the configuration file of the interface :

login0:/etc/sysconfig/network-scripts# cat ifcfg-enp6s2f1
HWADDR=00:E0:ED:XX:XX:XX
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=enp6s2f1
UUID=03607fd3-ad47-4741-80ea-cc916c8d2c31
ONBOOT=yes
IPADDR0="192.168.0.1"
PREFIX0="24"

Note : the cluster here is small, we use a /24 mask.

Other nodes will use this one to reach Internet when needed. Two iptables profiles will be available : one standard with no internet access, and one that allows it. We will use default (all blocked) at startup and normal time, and activate Internet access for nodes on the fly when needed.
First activate ipv4 forwarding. To use it one time :

echo 1 > /proc/sys/net/ipv4/ip_forward

For information, to make it permanent (I do not recommend it) edit /etc/sysctl.conf and set net.ipv4.ip_forward to 1 :

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

Now iptables. Save your current iptables configuration :

mkdir /etc/iptables
chmod -R 700 /etc/iptables
iptables-save > /etc/iptables/iptables.default
chmod 400 /etc/iptables/iptables.default

Now activate MASQUERADE : (remember, here my eth0 is enp6s2f0 and my eth1 is enp6s2f1)

iptables -t nat -A POSTROUTING -o enp6s2f0 -j MASQUERADE
iptables -A FORWARD -i enp6s2f0 -o enp6s2f1 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i enp6s2f1 -o enp6s2f0 -j ACCEPT

You can see modifications with iptables-save compared to previous rules :

# Generated by iptables-save v1.4.21 on Wed Jan 28 10:09:29 2015
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A POSTROUTING -o enp6s2f0 -j MASQUERADE
COMMIT
# Completed on Wed Jan 28 10:09:29 2015
# Generated by iptables-save v1.4.21 on Wed Jan 28 10:09:29 2015
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [37:3736]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -i enp6s2f0 -o enp6s2f1 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i enp6s2f1 -o enp6s2f0 -j ACCEPT
COMMIT
# Completed on Wed Jan 28 10:09:29 2015

But this configuration will not be functional because iptables reject FORWARD before all (rules are used in order of appearance). I suggest to save this configuration and modify it manually to re-inject it after :

iptables-save > /etc/iptables/iptables.masquerading

And replace in /etc/iptables/iptables.masquerading :

-A FORWARD -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -i enp6s2f0 -o enp6s2f1 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i enp6s2f1 -o enp6s2f0 -j ACCEPT

By

-A FORWARD -i enp6s2f0 -o enp6s2f1 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i enp6s2f1 -o enp6s2f0 -j ACCEPT
-A FORWARD -j REJECT --reject-with icmp-host-prohibited

Secure the file :

chmod 400 /etc/iptables/iptables.masquerading

Then restart iptables :

systemctl restart iptables.service

Now, everything is ready. To block all Internet access (which is the default at boot) :

echo 0 > /proc/sys/net/ipv4/ip_forward
iptables-restore /etc/iptables/iptables.default

To activate Internet access on the fly :

echo 1 > /proc/sys/net/ipv4/ip_forward
iptables-restore /etc/iptables/iptables.masquerading

Use both to open and close access when needed.

Infiniband

Next step is to setup users.