Benutzer-Werkzeuge

Webseiten-Werkzeuge


heartbeat

Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

Beide Seiten der vorigen RevisionVorhergehende Überarbeitung
heartbeat [2014-01-16 16:46:14] – [Heartbeat schlägt nicht mehr -> Pacemaker gibt jetzt den Takt an] manfredheartbeat [2016-04-12 22:50:00] (aktuell) – Externe Bearbeitung 127.0.0.1
Zeile 1: Zeile 1:
 +====== Heartbeat ======
 +
 +
 +===== Installation =====
 +
 +  # aptitude update && aptitiude install heartbeat
 +  # mkdir -p /var/run/heartbeat/rsctmp/send_arp
 +
 +
 +===== Die Konfigurationsdateien in der Übersicht =====
 +
 +  * **/etc/ha.d/authkeys:** Diese Datei enthält die preshared Secrets zur Authentifizerung der Heartbeat Kommunikation. Das shared Secret muss auf allen Clusterknoten identisch sein.
 +  * **/etc/ha.d/ha.cf:** Hierbei handelt es sich um die Haupt-Konfigurationsdatei. In ihr werden Clusterknoten, Kommunikationswege, Clustertimeouts, Fallback Optionen und weitere Dinge festgelegt.
 +  * **/etc/ha.d/haresources:** In dieser Konfiguration werden die vom Heartbeart verwalteten Ressourcen festgelegt (ClusterIP, Laufwerke, Dienste,...).
 +  * **/etc/ha.d/resource.d/:** Dieses Verzeichnis beinhaltet die durch die haresources aufgerufenen "INIT-Scripts" zum Start und Stop der verwalteten Ressourcen.
 +
 +  # tar cvf /var/lib/dokuwiki/data/media/openqrm_had.tar /etc/ha.d/authkeys /etc/ha.d/ha.cf /etc/ha.d/haresources /etc/ha.d/resource.d/drbddisk /etc/ha.d/resource.d/fsck_jfs /etc/ha.d/resource.d/myopenqrm /etc/ha.d/resource.d/mydrbd /etc/ha.d/resource.d/killpid
 +
 +
 +===== authkeys =====
 +
 +  (echo -ne "auth 1\n1 sha1 "; dd if=/dev/urandom bs=512 count=1 | openssl sha1) > /etc/ha.d/authkeys
 +  chmod 0600 /etc/ha.d/authkeys
 +
 +Die so erstellte Datei muss auf beide Knoten kopiert werden.
 +
 +FIXME
 +
 +
 +===== /etc/ha.d/haresources =====
 +
 +  rbopenqrm01 \
 +          mac::bond0::00:19:19:19:19:19 \
 +          IPaddr2::192.168.1.100/24/bond0:0 \
 +          mydrbd \
 +          drbddisk::data \
 +          LVM::data \
 +          mount_all_dev \
 +          check_lsof \
 +          mysql \
 +          ntp \
 +          squid \
 +          apache2 \
 +          nagios-nrpe-server
 +
 +
 +===== /opt/pidfile.cfg =====
 +
 +  if [ -z "${PIDDATEI}" ] ; then
 +          SCRIPT="$(basename ${0})"
 +          PIDNAME="$(echo "${SCRIPT}"|rev|sed 's/.*\.//'|rev)"
 +          PIDDATEI="/var/run/${PIDNAME}.pid"
 +  fi
 +  
 +  ##------------------------------------------------------------------
 +  pid_beginn()
 +  {
 +          if  [ -e "${PIDDATEI}" ] ; then
 +                  exit 0
 +          else
 +                  echo "$$" > ${PIDDATEI} && echo "${PIDDATEI} wurde angelegt" || echo "${HOSTNAME}:${SCRIPT} konnte ${PIDDATEI} nicht anlegen..."
 +          fi
 +  }
 +  ##------------------------------------------------------------------
 +  pid_status()
 +  {
 +          if  [ -e "${PIDDATEI}" ] ; then
 +                  echo "${SCRIPT} läuft schon mit der PID $(cat ${PIDDATEI}) ..."
 +          else
 +                  echo "${SCRIPT} ist gestoppt ..."
 +          fi
 +  }
 +  ##------------------------------------------------------------------
 +  pid_ende()
 +  {
 +          if  [ -e "${PIDDATEI}" ] ; then
 +                  rm -fv ${PIDDATEI} || echo "${HOSTNAME}:${SCRIPT} PID-File (${PIDDATEI}) kann nicht geloescht werden..."
 +          fi
 +  }
 +  ##------------------------------------------------------------------
 +
 +
 +===== /etc/ha.d/resource.d/mac =====
 +
 +  #!/bin/bash
 +  
 +  #set -x
 +  
 +  # damit das Script LSB-konform ist
 +  . /opt/pidfile.cfg
 +  
 +  KOMMANDO="$(echo "${@}" | awk '{print $NF}')"
 +  
 +  case ${KOMMANDO} in
 +          start)
 +                  pid_beginn
 +                cat $(find /sys/ -type f | fgrep 'address' | fgrep "${1}") > ${PIDDATEI}
 +                ifdown ${1}
 +                ifconfig ${1} hw ether ${2}
 +                ifup ${1}
 +                  ;;
 +  
 +          status)
 +                  pid_status
 +                  ;;
 +  
 +          stop)
 +                  if [ -e "${PIDDATEI}" ] ; then
 +                          ifdown ${1}
 +                          ifconfig ${1} hw ether $(cat ${PIDDATEI})
 +                          ifup ${1}
 +                  fi
 +                  pid_ende
 +                  ;;
 +  
 +  esac
 +  
 +  exit 0
 +
 +
 +===== /etc/ha.d/resource.d/drbddisk =====
 +
 +  #!/bin/bash
 +  #
 +  # This script is inteded to be used as resource script by heartbeat
 +  #
 +  # Copright 2003-2008 LINBIT Information Technologies
 +  # Philipp Reisner, Lars Ellenberg
 +  #
 +  ###
 +  
 +  DEFAULTFILE="/etc/default/drbd"
 +  DRBDADM="/sbin/drbdadm"
 +  
 +  if [ -f $DEFAULTFILE ]; then
 +    . $DEFAULTFILE
 +  fi
 +  
 +  if [ "$#" -eq 2 ]; then
 +    RES="$1"
 +    CMD="$2"
 +  else
 +    RES="all"
 +    CMD="$1"
 +  fi
 +  
 +  ## EXIT CODES
 +  # since this is a "legacy heartbeat R1 resource agent" script,
 +  # exit codes actually do not matter that much as long as we conform to
 +  #  http://wiki.linux-ha.org/HeartbeatResourceAgent
 +  # but it does not hurt to conform to lsb init-script exit codes,
 +  # where we can.
 +  #  http://refspecs.linux-foundation.org/LSB_3.1.0/
 +  #     LSB-Core-generic/LSB-Core-generic/iniscrptact.html
 +  ####
 +  
 +  case "$CMD" in
 +      start)
 +        # try several times, in case heartbeat deadtime
 +        # was smaller than drbd ping time
 +        try=6
 +        while true; do
 +                $DRBDADM primary $RES && break
 +                let "--try" || exit 1 # LSB generic error
 +                sleep 1
 +        done
 +        ;;
 +      stop)
 +        $DRBDADM secondary $RES
 +        ex=$?
 +        case $ex in
 +        0)
 +                exit 0
 +                ;;
 +        11)
 +                # see drbdadm_main.c adm_generic and m_system
 +                # as well as drbdsetup.c:
 +                # in fact a role change was attempted, but failed.
 +                echo >&2 "$DRBDADM secondary $RES: exit code $ex, mapping to 1"
 +                exit 1 # LSB generic error
 +                ;;
 +        *)
 +                # other error, may be syntax error in config file,
 +                # anything else: to not confuse heartbeat further,
 +                # and avoid reboot due so "failed stop recovery",
 +                # pretend that we succeeded in stopping this.
 +                echo >&2 "$DRBDADM secondary $RES: exit code $ex, mapping to 0"
 +                exit 0
 +                ;;
 +        esac
 +        ;;
 +      status)
 +        if [ "$RES" = "all" ]; then
 +            echo "A resource name is required for status inquiries."
 +            exit 10
 +        fi
 +        ST=$( $DRBDADM role $RES )
 +        STATE=${ST%/*}
 +        case $STATE in
 +                Primary)
 +                        echo "running (Primary)"
 +                        exit 0 # LSB status "service is OK"
 +                        ;;
 +                Secondary|Unconfigured)
 +                        echo "stopped ($STATE)" ;;
 +                "")
 +                        echo "stopped" ;;
 +                *)
 +                        # unexpected. whatever...
 +                        echo "stopped ($ST)" ;;
 +        esac
 +        exit 3 # LSB status "service is not running"
 +        ;;
 +      *)
 +        echo "Usage: drbddisk [resource] {start|stop|status}"
 +        exit 1
 +        ;;
 +  esac
 +  
 +  exit 0
 +
 +
 +===== /etc/ha.d/resource.d/mount_all_dev =====
 +
 +  #!/bin/bash
 +  
 +  VOLGRUPPE="data"
 +  
 +  # damit das Script LSB-konform ist
 +  . /opt/pidfile.cfg
 +  
 +  
 +  case ${1} in
 +          start)
 +                  pid_beginn
 +                  for mountdev in $(fgrep noauto /etc/fstab | awk '{print $1}' | while read L1BEZ REST; do echo "${L1BEZ}" | egrep "^/dev/mapper/${VOLGRUPPE}-" | awk '{gsub("[\"]","");print $1}'; echo "${L1BEZ}" | egrep "^/dev/${VOLGRUPPE}/" | awk '{gsub("[\"]","");print $1}'; echo "${L1BEZ}" | awk -F'=' '/^LABEL=/ {gsub("[\"]","");print $2}' | while read L2BEZ; do blkid -L ${L2BEZ}; done; done)
 +                  do
 +                          for umgebung in $(blkid -o udev ${mountdev})
 +                          do 
 +                                  export ${umgebung}
 +                          done 
 +                          if [ -n "${ID_FS_TYPE}" ] ; then
 +                                  fsck -t ${ID_FS_TYPE} -y ${mountdev} && mount -v ${mountdev}
 +                          fi
 +                  done
 +                  ;;
 +  
 +          status)
 +                  pid_status
 +                  ;;
 +  
 +          stop)
 +                  for mountpoint in $(tac /etc/mtab | egrep "^/dev/mapper/${VOLGRUPPE}-" | awk '{print $2}')
 +                  do
 +                          umount -v ${mountpoint}
 +                  done
 +                  pid_ende
 +                  ;;
 +  
 +  esac
 +  
 +  exit 0
 +
 +
 +
 +
 +===== /etc/ha.d/resource.d/check_lsof =====
 +
 +  #!/bin/sh
 +  
 +  HAVOLGR="lager"         # HA-Volumen-Gruppe
 +  MPOINTS="$(fgrep 'noauto' /etc/fstab | egrep "^LABEL=|^/dev/mapper/${HAVOLGR}-|^/dev/${HAVOLGR}/" | awk '{print $2}' | awk '{print " "$1}' | tr -s '\n' '|' | sed 's/|$//' | head -n1)"
 +  
 +  case $1 in
 +        start)
 +                ;;
 +    
 +          status)
 +                  lsof | egrep "${MPOINTS}";echo
 +                  ;;
 +    
 +          stop)
 +                  while [ -n "$(lsof | egrep "${MPOINTS}")" ]
 +                  do
 +                          echo "Warte bis die noch offenen Datei-Haendler geschlossen sind..."
 +                          sleep 4
 +                          RPIDS="$(lsof | fgrep '/collectd/' | awk '{print $2}' | sort | uniq)";
 +                          if [ -n "${RPIDS}" ] ; then
 +                                  kill ${RPIDS};
 +                          fi;
 +                          sleep 4 ;
 +                  done
 +                ;;
 +  
 +  esac
 +  
 +  exit 0
 +
 +
 +===== Cluster-Schwenk =====
 +
 +  # /etc/init.d/heartbeat standby && tail -f /var/log/syslog
 +  
 +  May 21 11:40:58 rbopenqrm01 heartbeat: [1893]: info: rbopenqrm02 wants to go standby [all]
 +  May 21 11:40:58 rbopenqrm01 ipfail: [2172]: debug: Other side is unstable.
 +  May 21 11:40:58 rbopenqrm01 kernel: [ 5357.069101] block drbd1: peer( Primary -> Secondary ) 
 +  May 21 11:40:59 rbopenqrm01 heartbeat: [1893]: info: standby: acquire [all] resources from rbopenqrm02
 +  May 21 11:40:59 rbopenqrm01 heartbeat: [3830]: info: acquire all HA resources (standby).
 +  May 21 11:40:59 rbopenqrm01 ResourceManager[3844]: info: Acquiring resource group: rbopenqrm01 IPaddr::10.10.5.80/24/br0 mydrbd drbddisk::data LVM::data Filesystem::/dev/data/mysql::/var/lib/mysql::jfs Filesystem::/dev/data/etcmysql::/etc/mysql::jfs mysql
 +  May 21 11:40:59 rbopenqrm01 IPaddr[3871]: INFO:  Resource is stopped
 +  May 21 11:40:59 rbopenqrm01 ResourceManager[3844]: info: Running /etc/ha.d/resource.d/IPaddr 10.10.5.80/24/br0 start
 +  May 21 11:40:59 rbopenqrm01 IPaddr[3951]: INFO: Using calculated netmask for 10.10.5.80: 255.255.255.0
 +  May 21 11:40:59 rbopenqrm01 IPaddr[3951]: INFO: eval ifconfig br0:0 10.10.5.80 netmask 255.255.255.0 broadcast 10.10.5.255
 +  May 21 11:40:59 rbopenqrm01 IPaddr[3927]: INFO:  Success
 +  May 21 11:40:59 rbopenqrm01 ResourceManager[3844]: info: Running /etc/ha.d/resource.d/mydrbd  start
 +  May 21 11:40:59 rbopenqrm01 mountd[3767]: Caught signal 15, un-registering and exiting.
 +  May 21 11:40:59 rbopenqrm01 kernel: [ 5357.466126] nfsd: last server has exited, flushing export cache
 +  May 21 11:40:59 rbopenqrm01 ResourceManager[3844]: info: Running /etc/ha.d/resource.d/drbddisk data start
 +  May 21 11:40:59 rbopenqrm01 kernel: [ 5357.492064] block drbd1: role( Secondary -> Primary ) 
 +  May 21 11:40:59 rbopenqrm01 LVM[4119]: INFO: LVM Volume data is offline
 +  May 21 11:40:59 rbopenqrm01 LVM[4113]: INFO:  Resource is stopped
 +  May 21 11:40:59 rbopenqrm01 ResourceManager[3844]: info: Running /etc/ha.d/resource.d/LVM data start
 +  May 21 11:40:59 rbopenqrm01 LVM[4171]: INFO: Activating volume group data
 +  May 21 11:40:59 rbopenqrm01 LVM[4171]: INFO: File descriptor 4 (socket:[5467]) leaked on vgscan invocation. Parent PID 4171: /bin/sh File descriptor 5 (socket:[5468]) leaked on vgscan invocation. Parent PID 4171: /bin/sh File descriptor 6 (/proc/loadavg) leaked on vgscan invocation. Parent PID 4171: /bin/sh File descriptor 7 (socket:[5861]) leaked on vgscan invocation. Parent PID 4171: /bin/sh File descriptor 8 (socket:[5479]) leaked on vgscan invocation. Parent PID 4171: /bin/sh File descriptor 9 (socket:[5474]) leaked on vgscan invocation. Parent PID 4171: /bin/sh File descriptor 10 (socket:[5867]) leaked on vgscan invocation. Parent PID 4171: /bin/sh File descriptor 11 (socket:[5476]) leaked on vgscan invocation. Parent PID 4171: /bin/sh File descriptor 12 (socket:[5481]) leaked on vgscan invocation. Parent PID 4171: /bin/sh File des criptor 13 (pipe:[9664]) leaked on vgscan invocation. Parent PID 4171: /bin/sh Reading all physical volumes. This may take a while... Found volume group "data" using metadata type lvm2
 +  May 21 11:40:59 rbopenqrm01 LVM[4171]: INFO: File descriptor 4 (socket:[5467]) leaked on vgchange invocation. Parent PID 4171: /bin/sh File descriptor 5 (socket:[5468]) leaked on vgchange invocation. Parent PID 4171: /bin/sh File descriptor 6 (/proc/loadavg) leaked on vgchange invocation. Parent PID 4171: /bin/sh File descriptor 7 (socket:[5861]) leaked on vgchange invocation. Parent PID 4171: /bin/sh File descriptor 8 (socket:[5479]) leaked on vgchange invocation. Parent PID 4171: /bin/sh File descriptor 9 (socket:[5474]) leaked on vgchange invocation. Parent PID 4171: /bin/sh File descriptor 10 (socket:[5867]) leaked on vgchange invocation. Parent PID 4171: /bin/sh File descriptor 11 (socket:[5476]) leaked on vgchange invocation. Parent PID 4171: /bin/sh File descriptor 12 (socket:[5481]) leaked on vgchange invocation. Parent PID 4171: /bin/sh File descriptor 13 (pipe:[9664]) leaked on vgchange invocation. Parent PID 4171: /bin/sh 2 logical volume(s) in volume group "data" now active
 +  May 21 11:40:59 rbopenqrm01 LVM[4165]: INFO:  Success
 +  May 21 11:40:59 rbopenqrm01 Filesystem[4233]: INFO:  Resource is stopped
 +  May 21 11:40:59 rbopenqrm01 ResourceManager[3844]: info: Running /etc/ha.d/resource.d/Filesystem /dev/data/mysql /var/lib/mysql jfs start
 +  May 21 11:40:59 rbopenqrm01 Filesystem[4303]: INFO: Running start for /dev/data/mysql on /var/lib/mysql
 +  May 21 11:40:59 rbopenqrm01 Filesystem[4297]: INFO:  Success
 +  May 21 11:40:59 rbopenqrm01 Filesystem[4375]: INFO:  Resource is stopped
 +  May 21 11:40:59 rbopenqrm01 ResourceManager[3844]: info: Running /etc/ha.d/resource.d/Filesystem /dev/data/etcmysql /etc/mysql jfs start
 +  May 21 11:40:59 rbopenqrm01 Filesystem[4444]: INFO: Running start for /dev/data/etcmysql on /etc/mysql
 +  May 21 11:41:00 rbopenqrm01 Filesystem[4438]: INFO:  Success
 +  May 21 11:41:00 rbopenqrm01 ResourceManager[3844]: info: Running /etc/init.d/mysql  start
 +  May 21 11:41:00 rbopenqrm01 kernel: [ 5358.319849] type=1505 audit(1274434860.115:10):  operation="profile_replace" pid=4529 name="/usr/sbin/mysqld"
 +  May 21 11:41:01 rbopenqrm01 /etc/mysql/debian-start[4544]: Upgrading MySQL tables if necessary.
 +  May 21 11:41:01 rbopenqrm01 heartbeat: [3830]: info: all HA resource acquisition completed (standby).
 +  May 21 11:41:01 rbopenqrm01 heartbeat: [1893]: info: Standby resource acquisition done [all].
 +  May 21 11:41:01 rbopenqrm01 /etc/mysql/debian-start[4547]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
 +  May 21 11:41:01 rbopenqrm01 /etc/mysql/debian-start[4547]: Looking for 'mysql' as: /usr/bin/mysql
 +  May 21 11:41:01 rbopenqrm01 /etc/mysql/debian-start[4547]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
 +  May 21 11:41:01 rbopenqrm01 /etc/mysql/debian-start[4547]: This installation of MySQL is already upgraded to 5.1.41, use --force if you still need to run mysql_upgrade
 +  May 21 11:41:01 rbopenqrm01 /etc/mysql/debian-start[4554]: Checking for insecure root accounts.
 +  May 21 11:41:01 rbopenqrm01 /etc/mysql/debian-start[4558]: Triggering myisam-recover for all MyISAM tables
 +  May 21 11:41:02 rbopenqrm01 heartbeat: [1893]: info: remote resource transition completed.
 +
 +Bei ein sauberen Schwenk sollte hier **"remote resource transition completed."** stehen.
 +
 +
 +====== Heartbeat schlägt nicht mehr -> Pacemaker gibt jetzt den Takt an ======
 +
 +{{:pacemaker.pdf|Pacemaker}} => [[Pacemaker und Corosync]]
 +
 +Pacemaker, welches ursprünglich aus dem Entwicklungsstrang von HA-Linux (Heartbeat) entkoppelt wurde, erledigt hier einen wesentlich besseren Job als Heartbeat. Vor allem lassen sich nicht nur komplette Knoten, sondern auch einzelne Services überwachen und bei Bedarf auf demselben oder einem anderen Knoten starten.
 +
 +
 +===== Auch in der offiziellen Doku von DRBD wird eher auf Pacemaker gesetzt: =====
 +
 +DRBD is frequently found in system configurations using the Linux-HA cluster manager ("Heartbeat"). Heartbeat **has been superseded by the Pacemaker** cluster manager and the latter **should be used whenever possible** — please see Chapter 8, Integrating DRBD with Pacemaker clusters for more information. Nonetheless, this chapter outlines Heartbeat configurations and is intended for users who must maintain existing legacy Heartbeat systems for policy reasons.
 +