====== Compute nodes ====== We will now deploy compute nodes, and add it to the slurm pool. Compute nodes should be installed using PXE (see [[system:linux_cluster:managing_cluster|managing cluster]] chapter on how to do that).\\ All commands here have to be done on the compute node, except when specified somewhere else. We will also assume that we work on a freshly installed (first reboot after PXE OS installation) node, mycompute1. ===== Repository ===== First thing to do is to setup repository client, to be able to install packages. Simple way is to upload files from batman. This is why we updated the repository file on batman after installing the http server (so that this file is the same for all nodes on the cluster, even for batman). To copy the files from batman to freshly installed compute1 node, __on batman__ use: scp /etc/yum.repos.d/os_base.local.repo mycompute1:/etc/yum.repos.d/os_base.local.repo scp /etc/yum.repos.d/own.local.repo mycompute1:/etc/yum.repos.d/own.local.repo ssh mycompute1 "rm -f /etc/yum.repos.d/CentOS-*" ssh mycompute1 "yum clean all; yum update -y" ===== Network ===== We need node to use a static ip. Login using root, and edit network file **/etc/sysconfig/network-scripts/ifcfg-enp0s3** as follows (here, it is for mycompute1) : DEVICE="enp0s3" NAME="enp0s3" TYPE="Ethernet" NM_CONTROLLED=no ONBOOT="yes" BOOTPROTO="static" IPADDR="10.0.3.1" NETMASK=255.255.0.0 Apply using (or reboot): systemctl restart network ===== Firewall ===== Disable network manager, and disable firewalld : systemctl disable NetworkManager systemctl stop NetworkManager systemctl disable firewalld.service systemctl stop firewalld.service ===== Dns ===== As for repositories, you can upload the same file than the one on batman. To do so, on batman, use the following command: scp /etc/resolv.conf mycompute1:/etc/resolv.conf Or do it manually. To do so, edit /etc/resolv.conf as following, to tell host where to find dns service: search sphen.local nameserver 10.0.0.1 ===== Ntp ===== Add ntp server ip to local configuration. If using centos, use: sed -i.bak '/centos.pool.ntp.org/ d' /etc/ntp.conf Else if RHEL: sed -i.bak '/rhel.pool.ntp.org/ d' /etc/ntp.conf Then add ip: echo "server 10.0.0.1 iburst" >> /etc/ntp.conf And start ntpd and sync with server: systemctl start ntpd systemctl enable ntpd ntpq -p ===== Munge and Slurm ===== To install munge and slurm, do the same steps than described in main server installation. But instead of generating a munge key, we will copy the one from the master to the server (so both have the same file). yum install munge munge-libs Then from //batman// : scp /etc/munge/munge.key mycompute1:/etc/munge/munge.key Now do the remaining configuration: chmod 0400 /etc/munge/munge.key chown munge:munge /etc/munge/munge.key mkdir /var/run/munge chown munge:munge /var/run/munge -R chmod -R 0755 /var/run/munge systemctl start munge systemctl enable munge Same for slurm.conf file. We will copy it from batman. Also, instead of launching slurmctld service at the end, launch and enable slurmd. Install rpm, and create all required directory and also the slurm user: groupadd -g 777 slurm useradd -m -c "Slurm workload manager" -d /etc/slurm -u 777 -g slurm -s /bin/bash slurm yum install slurm slurm-munge mkdir /var/spool/slurmd chown -R slurm:slurm /var/spool/slurmd mkdir /etc/slurm/SLURM chown -R slurm:slurm /etc/slurm/SLURM chmod 0755 -R /var/spool/slurmd mkdir /var/log/slurm/ chown -R slurm:slurm /var/log/slurm/ Copy from batman the slurm.conf file to compute: scp /etc/slurm/slurm.conf mycompute1:/etc/slurm/slurm.conf Then start slurm server: systemctl start slurmd systemctl enable slurmd To test when failing to start, use -D -vvvvvv: slurmd -D -vvvvvv ===== Nfs ===== On each compute node, mount /hpc-softwares and /home. Install needed packages: yum -y install nfs-utils Then start needed services: systemctl start rpcbind systemctl enable rpcbind Create soft directory, /home should already be there: mkdir /hpc-softwares Then edit /etc/fstab, and add at the end: 10.0.1.1:/hpc-softwares /hpc-softwares nfs ro,rsize=32768,wsize=32768,intr,nfsvers=3,bg 0 0 10.0.1.1:/home /home nfs rw,rsize=32768,wsize=32768,intr,nfsvers=3,bg 0 0 And mount the directories: mount /hpc-softwares mount /home ===== Ldap ===== LDAP configuration is easy on client side. First, install needed packages: yum -y install openldap-clients nss-pam-ldapd Then, tell client where server is and what is base domain to use: authconfig --enableldap --enableldapauth --ldapserver=10.0.0.1 --ldapbasedn="dc=sphen,dc=local" --enablemkhomedir --update Now, to activate SSL exchanges: echo "TLS_REQCERT allow" >> /etc/openldap/ldap.conf echo "tls_reqcert allow" >> /etc/nslcd.conf authconfig --enableldaptls --update That’s all for a compute node. We will see later tools to automate this installation.