User Tools

Site Tools


Site Tools

Compute nodes

We will now deploy compute nodes, and add it to the slurm pool. Compute nodes should be installed using PXE (see managing cluster chapter on how to do that).
All commands here have to be done on the compute node, except when specified somewhere else. We will also assume that we work on a freshly installed (first reboot after PXE OS installation) node, mycompute1.

Repository

First thing to do is to setup repository client, to be able to install packages.

Simple way is to upload files from batman. This is why we updated the repository file on batman after installing the http server (so that this file is the same for all nodes on the cluster, even for batman). To copy the files from batman to freshly installed compute1 node, on batman use:

scp /etc/yum.repos.d/os_base.local.repo mycompute1:/etc/yum.repos.d/os_base.local.repo
scp /etc/yum.repos.d/own.local.repo mycompute1:/etc/yum.repos.d/own.local.repo
ssh mycompute1 "rm -f /etc/yum.repos.d/CentOS-*"
ssh mycompute1 "yum clean all; yum update -y"

Network

We need node to use a static ip. Login using root, and edit network file /etc/sysconfig/network-scripts/ifcfg-enp0s3 as follows (here, it is for mycompute1) :

DEVICE="enp0s3"
NAME="enp0s3"
TYPE="Ethernet"
NM_CONTROLLED=no
ONBOOT="yes"
BOOTPROTO="static"
IPADDR="10.0.3.1"
NETMASK=255.255.0.0

Apply using (or reboot):

systemctl restart network

Firewall

Disable network manager, and disable firewalld :

systemctl disable NetworkManager
systemctl stop NetworkManager
systemctl disable firewalld.service
systemctl stop firewalld.service

Dns

As for repositories, you can upload the same file than the one on batman. To do so, on batman, use the following command:

scp /etc/resolv.conf mycompute1:/etc/resolv.conf

Or do it manually. To do so, edit /etc/resolv.conf as following, to tell host where to find dns service:

search sphen.local
nameserver 10.0.0.1

Ntp

Add ntp server ip to local configuration. If using centos, use:

sed -i.bak '/centos.pool.ntp.org/ d' /etc/ntp.conf

Else if RHEL:

sed -i.bak '/rhel.pool.ntp.org/ d' /etc/ntp.conf

Then add ip:

echo "server 10.0.0.1 iburst" >> /etc/ntp.conf

And start ntpd and sync with server:

systemctl start ntpd
systemctl enable ntpd
ntpq -p

Munge and Slurm

To install munge and slurm, do the same steps than described in main server installation.

But instead of generating a munge key, we will copy the one from the master to the server (so both have the same file).

yum install munge munge-libs

Then from batman :

scp /etc/munge/munge.key mycompute1:/etc/munge/munge.key

Now do the remaining configuration:

chmod 0400 /etc/munge/munge.key
chown munge:munge /etc/munge/munge.key
mkdir /var/run/munge
chown munge:munge /var/run/munge -R
chmod -R 0755 /var/run/munge
systemctl start munge
systemctl enable munge

Same for slurm.conf file. We will copy it from batman. Also, instead of launching slurmctld service at the end, launch and enable slurmd.

Install rpm, and create all required directory and also the slurm user:

groupadd -g 777 slurm
useradd  -m -c "Slurm workload manager" -d /etc/slurm -u 777 -g slurm -s /bin/bash slurm 
yum install slurm slurm-munge
mkdir /var/spool/slurmd
chown -R slurm:slurm /var/spool/slurmd
mkdir /etc/slurm/SLURM
chown -R slurm:slurm /etc/slurm/SLURM
chmod 0755 -R /var/spool/slurmd
mkdir /var/log/slurm/
chown -R slurm:slurm /var/log/slurm/

Copy from batman the slurm.conf file to compute:

scp /etc/slurm/slurm.conf mycompute1:/etc/slurm/slurm.conf

Then start slurm server:

systemctl start slurmd
systemctl enable slurmd

To test when failing to start, use -D -vvvvvv:

slurmd -D -vvvvvv

Nfs

On each compute node, mount /hpc-softwares and /home. Install needed packages:

yum -y install nfs-utils

Then start needed services:

systemctl start rpcbind
systemctl enable rpcbind

Create soft directory, /home should already be there:

mkdir /hpc-softwares

Then edit /etc/fstab, and add at the end:

10.0.1.1:/hpc-softwares /hpc-softwares nfs ro,rsize=32768,wsize=32768,intr,nfsvers=3,bg 0 0
10.0.1.1:/home /home nfs rw,rsize=32768,wsize=32768,intr,nfsvers=3,bg 0 0

And mount the directories:

mount /hpc-softwares
mount /home

Ldap

LDAP configuration is easy on client side. First, install needed packages:

yum -y install openldap-clients nss-pam-ldapd 

Then, tell client where server is and what is base domain to use:

authconfig --enableldap --enableldapauth --ldapserver=10.0.0.1 --ldapbasedn="dc=sphen,dc=local" --enablemkhomedir --update 

Now, to activate SSL exchanges:

echo "TLS_REQCERT allow" >> /etc/openldap/ldap.conf 
echo "tls_reqcert allow" >> /etc/nslcd.conf 
authconfig --enableldaptls --update 

That’s all for a compute node. We will see later tools to automate this installation.