====== Cluster Deployment Management Tools ====== {{ :system:linux_cluster:ghot-in-the-shell-end.jpg?500 |}} There are many deployment and management tools. Here is my recommended list : * Big HPC cluster : Salt Stack * Small HPC cluster : Ansible * Very heterogeneous managed stuff : Puppet My hearth goes to Salt and Ansible. Salt is really flexible and has an important thing for HPC or Cloud : loops. This is a personal choice. ===== Salt Stack ===== ==== Basic commands ==== Ensure all minion are up to date : salt '*' state.highstate -v Replace '*' by the hostname or the ip of the minion you want to update only the state. To get all the process done on a specific minion, login on it with ssh, then run : salt-call -l debug state.highstate This is a good way to debug and see what is going on. ==== Tip ==== Use jinja in top files, and share a file containing all servers ip for both pillar and state top : Pillar top file (/srv/pillar/top.sls): {% import_yaml 'servers.sls' as vars %} base: '{{vars.servers.dhcp.ip}}': - pkgs - services - dhcp-server State top file (/srv/salt/top.sls): {% import_yaml 'servers.sls' as vars %} base: '{{vars.servers.dhcp.ip}}': - repository.client - dhcp.server Then edit your server.sls file in /srv/pillar/server.sls : servers: repository: name: repo0 ip: 172.16.0.12 dhcp: name: dhcp0 ip: 172.16.0.12 To finish, add a link in linux to allow state top to read also this file : ln -s /srv/pillar/servers.sls /srv/salt/servers.sls Using this, all Salt is dynamic !! :-D ==== Jinja2 ==== Basic jinja2 syntax : Get a value/string from pillar (here pillar network, value of subnet) : {{ salt['pillar.get']('network:subnet') }} Loop on a pillar last level list : {% for rangeip in salt['pillar.get']('network:dhcp:range') %} range {{rangeip}}; {% endfor %} Loop on a pillar non last level list : {% for host, args in salt['pillar.get']('nodes', {}).items() %} host {{ host }} { hardware ethernet {{ args.hwaddr }}; fixed-address {{ args.ip }}; } {% endfor %} Split a string into a list using a specific character as separator (useful for DNS configuration files !): {% set list1 = salt['pillar.get']('network:subnet').split('.') %} {% if salt['pillar.get']('network:netmask') == '255.255.255.0' %} option broadcast-address {{ list1[0] }}.{{ list1[1] }}.{{ list1[2] }}.255; {% elif salt['pillar.get']('network:netmask') == '255.255.0.0' %} option broadcast-address {{ list1[0] }}.{{ list1[1] }}.225.255; {% elif salt['pillar.get']('network:netmask') == '255.0.0.0' %} option broadcast-address {{ list1[0] }}.255.225.255; {% else %} option broadcast-address CANNOT UNDERSTAND NETMASK !!! See dhcp/dhcpd.conf.jinja; {% endif %} More : http://jinja.pocoo.org/docs/dev/templates/ ===== Ansible ===== ==== Install ==== git clone git://github.com/ansible/ansible.git --recursive tar cvzf ansible.tar.gz ansible mkdir /home/yourlogin/pip pip install --ignore-installed --target=/home/yourlogin/pip --install-option="--install-purelib=/home/yourlogin/pip" paramiko PyYAML Jinja2 httplib2 six tar cvzf pip.tar.gz /home/yourlogin/pip Then on master node : cd /root tar xvzf ansible.tar.gz tar xvzf pip.tar.gz Add nodes into ansible nodes file (/root/ansible_hosts) : [repo] 10.0.0.2 [dhcp] 10.0.0.3 [pxe] 10.0.0.4 Now launch ansible environnement (must be done each time you will use ansible) : source /root/ansible/hacking/env-setup export PYTHONPATH=/root/pip/:$PYTHONPATH Set node list file : export ANSIBLE_INVENTORY=/root/ansible_hosts And check everything is ok (for this example, only 2 of the 3 other nodes are online, 4 is offline to display the error) : ansible all -m ping 10.0.0.2 | SUCCESS => { "changed": false, "ping": "pong" } 10.0.0.3 | SUCCESS => { "changed": false, "ping": "pong" } 10.0.0.4 | FAILED! => { "failed": true, "msg": "ERROR! SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue" } ==== Playbooks example ==== To execute a playbook on nodes : ansible-playbook /root/nodes/repo/playbooks/default.pb If something goes wrong, you can see debug using -v (-vv -vvv -vvvv etc for level of debugs). Note also that ansible will stop when an error is detected. You can ignore errors using ignore_errors : - name: Say hello command: echo "hello" ignore_errors: true === Repository === --- - hosts: repo remote_user: root tasks: ############################################################### ########### vsftpd server installation ### - name: Installing vsftpd rpm command: chdir=/mnt/Packages/ rpm -ivh vsftpd-3.0.2-9.el7.x86_64.rpm - name: Enable vsftpd on start command: systemctl enable vsftpd - name: Start vsftpd on start command: systemctl start vsftpd - name: Installing libxml2-python rpm command: chdir=/mnt/Packages/ rpm -ivh libxml2-python-2.9.1-5.el7_0.1.x86_64.rpm - name: Installing deltarpm rpm command: chdir=/mnt/Packages/ rpm -ivh deltarpm-3.6-3.el7.x86_64.rpm - name: Installing python-deltarpm rpm command: chdir=/mnt/Packages/ rpm -ivh python-deltarpm-3.6-3.el7.x86_64.rpm - name: Installing createrepo rpm command: chdir=/mnt/Packages/ rpm -ivh createrepo-0.9.9-23.el7.noarch.rpm ############################################################### ########### copy rpm and create repository ### - file: path=/var/ftp/pub/localrepo state=directory mode=0755 - name: Copy packages from DVD/iso to repository and repository configuration file shell: cp -ar /mnt/Packages/*.* /var/ftp/pub/localrepo/ - copy: src=/root/nodes/repo/files/default.localrepo.repo dest=/etc/yum.repos.d/localrepo.repo owner=root group=root mode=0640 - name: Create repository command: createrepo -v /var/ftp/pub/localrepo/ - name: Restore SEl flags command: restorecon -R /var/ftp ############################################################### ########### remove online repositories and update yum ### - file: path=/etc/yum.repos.d.old state=directory mode=0755 - name: Move online repo shell: mv /etc/yum.repos.d/CentOS-* /etc/yum.repos.d.old - name: Restore SEl flags command: restorecon -R /etc/yum.repos.d - name: Yum list repo command: yum repolist - name: Yum clean command: yum clean all - name: Yum update command: yum update ############################################################### ########### disable firewall and set selinux to permissive ### - name: Stop firewalld command: systemctl stop firewalld - name: Disable firewalld and copy selinx configuration file command: systemctl disable firewalld - copy: src=/root/nodes/repo/files/default.selinux dest=/etc/selinux/config owner=root group=root mode=0640 - name: Restore SEl flags command: restorecon /etc/selinux/config ############################################################### ########### check and reboot ### - name: Make sure vsftpd is running service: name=vsftpd state=running - name: Restart node command: /sbin/reboot async: 0 poll: 0 ignore_errors: true - name: Waiting for node to come back local_action: wait_for host={{ inventory_hostname }} state=started port=22 delay=1 timeout=300 sudo: false === dhcp === --- - hosts: dhcp remote_user: root tasks: ############################################################### ########### add local repository, remove online repositories and update yum ### - copy: src=/root/nodes/dhcp/files/default.localrepo.repo dest=/etc/yum.repos.d/localrepo.repo owner=root group=root mode=0640 - file: path=/etc/yum.repos.d.old state=directory mode=0755 - name: Move online repo shell: mv /etc/yum.repos.d/CentOS-* /etc/yum.repos.d.old - name: Restore SEl flags command: restorecon -R /etc/yum.repos.d - name: Yum list repo command: yum repolist - name: Yum clean command: yum clean all - name: Yum update command: yum update ############################################################### ########### install dhcpd server ### - name: Install dhcpd server yum: name=dhcp state=latest - copy: src=/root/nodes/dhcp/files/default.dhcpd.conf dest=/etc/dhcp/dhcpd.conf owner=root group=root mode=0644 - name: Restore SEl flags command: restorecon /etc/dhcp/dhcpd.conf - name: Enable dhcpd on start command: systemctl enable dhcpd.service - name: Start dhcpd on start command: systemctl start dhcpd.service ############################################################### ########### disable firewall and set selinux to permissive ### - name: Stop firewalld command: systemctl stop firewalld - name: Disable firewalld and copy selinx configuration file command: systemctl disable firewalld - copy: src=/root/nodes/dhcp/files/default.selinux dest=/etc/selinux/config owner=root group=root mode=0640 - name: Restore SEl flags command: restorecon /etc/selinux/confi ############################################################### ########### check and reboot ### - name: Make sure dhcpd is running service: name=dhcpd state=running - name: Restart node command: /sbin/reboot async: 0 poll: 0 ignore_errors: true - name: Waiting for node to come back local_action: wait_for host={{ inventory_hostname }} state=started port=22 delay=1 timeout=300 sudo: false === PXE === --- - hosts: pxe remote_user: root tasks: ############################################################### ########### add local repository, remove online repositories and update yum ### - copy: src=/root/nodes/pxe/files/default.localrepo.repo dest=/etc/yum.repos.d/localrepo.repo owner=root group=root mode=0640 - file: path=/etc/yum.repos.d.old state=directory mode=0755 - name: Move online repo shell: mv /etc/yum.repos.d/CentOS-* /etc/yum.repos.d.old - name: Restore SEl flags command: restorecon -R /etc/yum.repos.d - name: Yum list repo command: yum repolist - name: Yum clean command: yum clean all - name: Yum update command: yum update ############################################################### ########### install tftp, xinetd and vsftpd ### - name: Install tftp yum: name=tftp state=latest - name: Install tftp-server yum: name=tftp-server state=latest - name: Install xinetd yum: name=xinetd state=latest - copy: src=/root/nodes/pxe/files/default.tftp dest=/etc/xinetd.d/tftp owner=root group=root mode=0640 - name: Restore SEl flags command: restorecon /etc/xinetd.d/tftp - name: Start xinetd command: systemctl start xinetd - name: Enable xinetd command: systemctl enable xinetd - name: Install syslinux yum: name=syslinux state=latest - name: Install wget yum: name=wget state=latest - name: Install vsftpd yum: name=vsftpd state=latest ############################################################### ########### copy files for pxe boot ### - name: Copy pxelinux.0 command: cp -v /usr/share/syslinux/pxelinux.0 /var/lib/tftpboot - name: Copy menu.c32 command: cp -v /usr/share/syslinux/menu.c32 /var/lib/tftpboot - name: Copy memdisk command: cp -v /usr/share/syslinux/memdisk /var/lib/tftpboot - name: Copy mboot.c32 command: cp -v /usr/share/syslinux/mboot.c32 /var/lib/tftpboot - name: Copy chain.c32 command: cp -v /usr/share/syslinux/chain.c32 /var/lib/tftpboot - file: path=/var/lib/tftpboot/pxelinux.cfg state=directory mode=0755 - file: path=/var/lib/tftpboot/netboot/ state=directory mode=0755 - file: path=/var/ftp/pub/iso state=directory mode=0755 - name: Copy vmlinuz command: cp /mnt/images/pxeboot/vmlinuz /var/lib/tftpboot/netboot/ - name: Copy initrd.img command: cp /mnt/images/pxeboot/initrd.img /var/lib/tftpboot/netboot/ - name: Restore SEl flags command: restorecon -R /var/lib/tftpboot - copy: src=/root/nodes/pxe/files/default.ks.cfg dest=/var/ftp/pub/ks.cfg owner=root group=root mode=0644 - name: Restore SEl flags command: restorecon /var/ftp/pub/ks.cfg - copy: src=/root/nodes/pxe/files/default.pxelinux.cfg.default dest=/var/lib/tftpboot/pxelinux.cfg/default owner=root group=root mode=0644 - name: Restore SEl flags command: restorecon /var/lib/tftpboot/pxelinux.cfg/default ############################################################### ########### copy minimal iso content and start services ### - name: Copy minimal iso to /var/ftp/pub/iso/ shell: cp -Rv /mnt/* /var/ftp/pub/iso/ - name: Restore SEl flags command: restorecon -R /var/ftp/pub/ - name: Start vsftpd command: systemctl start vsftpd - name: Enable vsftpd command: systemctl enable vsftpd - name: Restart vsftpd command: systemctl restart vsftpd - name: Restart xinetd command: systemctl restart xinetd - name: Set rights on /var/lib/tftpboot command: chmod 777 /var/lib/tftpboot ############################################################### ########### disable firewall and set selinux to permissive ### - name: Stop firewalld command: systemctl stop firewalld - name: Disable firewalld and copy selinx configuration file command: systemctl disable firewalld - copy: src=/root/nodes/pxe/files/default.selinux dest=/etc/selinux/config owner=root group=root mode=0640 - name: Restore SEl flags command: restorecon /etc/selinux/config ############################################################### ########### check and reboot ### - name: Make sure vsftpd is running service: name=vsftpd state=running - name: Make sure xinetd is running service: name=xinetd state=running - name: Restart node command: /sbin/reboot async: 0 poll: 0 ignore_errors: true - name: Waiting for node to come back local_action: wait_for host={{ inventory_hostname }} state=started port=22 delay=1 timeout=300 sudo: false ===== Puppet ===== How to debug YAML files: brute force with http://www.yamllint.com/ ==== Errors ==== From : http://makandracards.com/makandra/29365-vague-puppet-error-messages-with-broken-yaml-files Error: Could not retrieve catalog from remote server: Error 400 on SERVER: (): found character that cannot start any token while scanning for the next token at line 1297 column 3 Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run Error: Could not retrieve catalog from remote server: Error 400 on SERVER: undefined method `empty?' for nil:NilClass at /etc/puppet/environments/production/manifests/nodes.pp:1 on node example.makandra.de Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run Need to check presence of " when using %{ at start of a line : Bad: foo: %{::fqdn} Good: foo: "%{::fqdn}"