Benutzer-Werkzeuge

Webseiten-Werkzeuge


datenbank:ha-postgresql-cluster

Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

Beide Seiten der vorigen RevisionVorhergehende Überarbeitung
Nächste Überarbeitung
Vorhergehende Überarbeitung
datenbank:ha-postgresql-cluster [2026-05-12 17:55:18] manfreddatenbank:ha-postgresql-cluster [2026-05-29 10:09:43] (aktuell) – [Tests] manfred
Zeile 1: Zeile 1:
 ====== HA-PostgreSQL-Cluster ====== ====== HA-PostgreSQL-Cluster ======
 +
 +siehe auch: **[[::Datenbank:PostgreSQL]]**
  
  
Zeile 6: Zeile 8:
 //Es soll ein hochverfügbarer PostgreSQL‑Cluster mit Patroni, EtcE, HAProxy und KeepaliveD auf drei Systemen installiert werden.// //Es soll ein hochverfügbarer PostgreSQL‑Cluster mit Patroni, EtcE, HAProxy und KeepaliveD auf drei Systemen installiert werden.//
  
-Wichtige Unterschiede zwischen diesem HA-PostgreSQL-Cluster mit 3 Systemen, gegenüber einem HA-PostgreSQL-Cluster mit 5 Systemen sind, dass hier alle Pakete auf allen 3 Systemen installiert werden müssen. Und weil PostgreSQL und HAProxy auf dem selben Host laufen, muß der Datenbank-Zugriff über den Cluster-Zugang (HAProxy) auf einen anderen Port laufen. In diesem Fall laufen die PostgreSQL-Instanzen aus dem Standard-Port (''5432'') und der HAProxy jeweils auf einen neuen Port (''5433'').+Wichtige Unterschiede zwischen diesem HA-PostgreSQL-Cluster mit 3 Systemen, gegenüber einem HA-PostgreSQL-Cluster mit 5 Systemen sind, dass hier alle Pakete auf allen 3 Systemen installiert werden müssen. Und weil PostgreSQL und HAProxy auf dem selben Host laufen, muß der Datenbank-Zugriff über den Cluster-Zugang (HAProxy) auf einen anderen Port laufen. 
 +In diesem Fall laufen die PostgreSQL-Instanzen auf dem Standard-Port (''5432'') und der HAProxy jeweils auf neuen Ports (schreiben: ''5433'' / lesen: ''5443'').
  
 In einem HA-PostgreSQL-Cluster mit 5 Systemen laufen EtcE, HAProxy und KeepaliveD auf 2 separate Systeme und dann können PostgreSQL und HAProxy auf dem gleichen Port laufen. In einem HA-PostgreSQL-Cluster mit 5 Systemen laufen EtcE, HAProxy und KeepaliveD auf 2 separate Systeme und dann können PostgreSQL und HAProxy auf dem gleichen Port laufen.
Zeile 18: Zeile 21:
 > apt install vim screen mc csync2 links python3 python3-psycopg2 jq curl wget net-tools etcd-server etcd-client haproxy keepalived postgresql postgresql-contrib patroni > apt install vim screen mc csync2 links python3 python3-psycopg2 jq curl wget net-tools etcd-server etcd-client haproxy keepalived postgresql postgresql-contrib patroni
 > echo "export EDITOR=/usr/bin/vim" > ~/.bash_profile > echo "export EDITOR=/usr/bin/vim" > ~/.bash_profile
 +</code>
 +
 +
 +==== PG-DataDir mit ZFS ====
 +
 +<code bash DataDir im RAID-1>
 +> apt install zfsutils-linux
 +
 +> zpool create -m /var/lib/postgresql pg_datadir mirror /dev/nvme2n1 /dev/nvme3n1
 +> zpool list
 +NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
 +pg_datadir  2.91T   284K  2.91T        -             0%     0%  1.00x    ONLINE  -
 +
 +> zpool status
 +  pool: pg_datadir
 + state: ONLINE
 +config:
 +
 + NAME         STATE     READ WRITE CKSUM
 + pg_datadir   ONLINE               0
 +   mirror-0   ONLINE               0
 +     nvme2n1  ONLINE               0
 +     nvme3n1  ONLINE               0
 +
 +errors: No known data errors
 +
 +> zfs set compression=lz4    pg_datadir
 +> zfs set recordsize=16K     pg_datadir
 +> zfs set logbias=latency    pg_datadir
 +> zfs set atime=off          pg_datadir
 +
 +> mkdir -p /var/lib/postgresql/16/main
 +> chown -R postgres:postgres /var/lib/postgresql/
 +> chmod 700 /var/lib/postgresql/16/main
 +
 +> mount | grep postgresql
 +
 +> zfs get mountpoint,compression,encryption,logbias,recordsize
 +NAME        PROPERTY     VALUE                SOURCE
 +pg_datadir  mountpoint   /var/lib/postgresql  local
 +pg_datadir  compression  lz4                  local
 +pg_datadir  encryption   off                  default
 +pg_datadir  logbias      latency              local
 +pg_datadir  recordsize   16K                  local
 +
 +> df -h
 +Filesystem      Size  Used Avail Use% Mounted on
 +...
 +pg_datadir      2.9T   40M  2.9T   1% /var/lib/postgresql
 </code> </code>
  
Zeile 28: Zeile 80:
 group Loadbalancer #group name, we can have multiple groups group Loadbalancer #group name, we can have multiple groups
 { {
-    host pg-knoten-01; #master server +    host FRA2A-MEDOS04; #master server 
-    host pg-knoten-02; #slave server +    host FRA2B-MEDOS04; #slave server 
-    host pg-knoten-03; #slave server+    host FRA2C-MEDOS04; #slave server
    
     key /etc/csync2_ssl.key;     key /etc/csync2_ssl.key;
Zeile 137: Zeile 189:
  
 <code text Starten> <code text Starten>
-systemctl restart etcd +systemctl restart etcd 
-systemctl enable etcd+systemctl enable etcd
 </code> </code>
  
Zeile 151: Zeile 203:
 === Hinweis === === Hinweis ===
  
-**//Solange der Cluster unhealthy ist, sind Verwaltungsoperationen wie etcdctl member add absichtlich verboten.//**+**//Solange der Cluster unhealthy ist, sind Verwaltungsoperationen wie ''etcdctl member add'' absichtlich verboten.//**
  
 <code bash> <code bash>
Zeile 239: Zeile 291:
 frontend postgres_write frontend postgres_write
  #bind *:5432  #bind *:5432
- bind *:5433+ bind *:5433 # Port zum schreiben
  default_backend pg_leader  default_backend pg_leader
    
Zeile 255: Zeile 307:
 # LESEZUGRIFFE (alle, lastverteilung) # LESEZUGRIFFE (alle, lastverteilung)
 # ============================ # ============================
-# [~]# curl -s "http://admin:admin@localhost:8404/stats;csv" | grep -E 'pg_leader|pg_replica' | awk -F',' '{print $2, $18, $56}'+# [~]# curl -s "http://admin:admin@localhost:8404/stats;csv" | grep -E 'pg_leader|pg_replicas' | awk -F',' '{print $2, $18, $56}'
 # pg1 UP -1 # pg1 UP -1
 # pg2 DOWN -1 # pg2 DOWN -1
 # pg3 DOWN -1 # pg3 DOWN -1
 # BACKEND UP -1 # BACKEND UP -1
-# pg1 UP -1+# pg1 DOWN -1
 # pg2 UP -1 # pg2 UP -1
 # pg3 UP -1 # pg3 UP -1
 # BACKEND UP -1 # BACKEND UP -1
 frontend postgres_read frontend postgres_read
- bind *:6543 + bind *:5443 # Port zum lesen 
- default_backend pg_replica + #default_backend pg_replicas bei Ausfall von 2 Knotenkein lesen mehr möglich 
-  +
-backend pg_replica +Wenn mindestens eine Replica gesund ist → Replicas 
- mode tcp + use_backend pg_replicas if { nbsrv(pg_replicasgt 0 
- option httpchk +
- balance leastconn                         # Verteilt nach geringster Last +Andernfalls automatisch zum Leader
- http-check send meth GET uri /liveness ver HTTP/1.1 hdr Host localhost +
- http-check expect status 200 +
- default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions +
- server pg1 pg-knoten-01:5432 check port 8008 maxconn 100 +
- server pg2 pg-knoten-02:5432 check port 8008 maxconn 100 +
- server pg3 pg-knoten-03:5432 check port 8008 maxconn 100 +
-</code> +
- +
-<hidden /etc/haproxy/haproxy.cfg_Version_1> +
-<code text /etc/haproxy/haproxy.cfg_Version_1> +
-global +
- log /dev/log local0 +
- maxconn 5000 +
- stats socket /run/haproxy/admin.sock mode 660 level admin +
-  +
-defaults +
- mode tcp +
- timeout connect 5s +
- timeout client  30s +
- timeout server  30s +
-  +
- +
-listen stats +
- bind 0.0.0.0:8404 +
- #bind 127.0.0.1:8404 +
- mode http +
- stats enable +
- stats uri /stats +
- stats refresh 10s +
- stats auth admin:admin +
- +
-frontend postgresql +
- # Weil HAProxy und PostgreSQL auf dem selben host laufen, +
- muß der HAProxy einen anderen Port verwenden! +
- #bind *:5432 +
- bind *:5433 +
- default_backend pg_patroni_knoten +
-  +
-HAProxy überprüft nicht den PostgreSQL-Port+
-sondern den Patroni-Port! +
-Es genügt also nicht, die DB zu starten (was man auch nicht tun sollte)+
-# es muß Patroni laufen, der startet dann PostgreSQL. +
-# [~]# curl -s "http://admin:admin@localhost:8404/stats;csv" | grep "pg_patroni_knoten" | awk -F',' '{print $2, $18, $56}' +
-pg1 UP 909 +
-pg2 UP 899 +
-# pg3 UP 1211 +
-# BACKEND UP 902 +
-backend pg_patroni_knoten +
- http-check send meth GET /liveness +
- http-check expect status 200 +
- server pg1 pg-knoten-01:5432 check port 8008 +
- server pg2 pg-knoten-02:5432 check port 8008 +
- server pg3 pg-knoten-03:5432 check port 8008 +
-</code> +
-</hidden> +
- +
-<hidden /etc/haproxy/haproxy.cfg_Version_2> +
-<code text /etc/haproxy/haproxy.cfg_Version_2> +
-global +
- log /dev/log local0 +
- maxconn 5000 +
- stats socket /run/haproxy/admin.sock mode 660 level admin +
-  +
-defaults +
- mode tcp +
- timeout connect 5s +
- timeout client  30s +
- timeout server  30s +
- timeout check 5s +
- log global +
- option tcplog +
- retries 3 +
-  +
- +
-listen stats +
- bind 0.0.0.0:8404 +
- mode http +
- stats enable +
- stats uri /stats +
- stats refresh 10s +
- stats auth admin:admin +
- +
-# ============================ +
-# SCHREIBZUGRIFFE (nur Leader/Primary) +
-# ============================ +
-frontend postgres_write +
- #bind *:5432 +
- bind *:5433+
  default_backend pg_leader  default_backend pg_leader
    
-backend pg_leader+backend pg_replicas
  mode tcp  mode tcp
  option httpchk  option httpchk
- http-check send meth GET uri /primary ver HTTP/1.1 hdr Host localhost+ balance leastconn # Verteilt nach geringster Last 
 + http-check send meth GET uri /replica ver HTTP/1.1 hdr Host localhost
  http-check expect status 200  http-check expect status 200
- default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions + default-server inter 3s fall 3 rise 2
- server pg1 pg-knoten-01:5432 check port 8008 maxconn 100 +
- server pg2 pg-knoten-02:5432 check port 8008 maxconn 100 +
- server pg3 pg-knoten-03:5432 check port 8008 maxconn 100 +
- +
-# ============================ +
-# LESEZUGRIFFE (alle, lastverteilung) +
-# ============================ +
-# [~]# curl -s "http://admin:admin@localhost:8404/stats;csv" | grep -E 'pg_leader|pg_replica' | awk -F',' '{print $2, $18, $56}' +
-# pg1 UP -1 +
-# pg2 DOWN -1 +
-# pg3 DOWN -1 +
-# BACKEND UP -1 +
-# pg1 UP -1 +
-# pg2 UP -1 +
-# pg3 UP -1 +
-# BACKEND UP -1 +
-frontend postgres_read +
- bind *:6543 +
- default_backend pg_replica +
-  +
-backend pg_replica +
- mode tcp +
- option httpchk +
- balance leastconn                         # Verteilt nach geringster Last +
- http-check send meth GET uri /liveness ver HTTP/1.1 hdr Host localhost +
- http-check expect status 200 +
- default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions +
- server pg1 pg-knoten-01:5432 check port 8008 maxconn 100 +
- server pg2 pg-knoten-02:5432 check port 8008 maxconn 100 +
- server pg3 pg-knoten-03:5432 check port 8008 maxconn 100 +
-</code> +
-</hidden> +
- +
-<hidden /etc/haproxy/haproxy.cfg_Version_3> +
-<code text /etc/haproxy/haproxy.cfg_Version_3> +
-global +
- log /dev/log local0 +
- maxconn 5000 +
- stats socket /run/haproxy/admin.sock mode 660 level admin +
-  +
-defaults +
- mode tcp +
- timeout connect 5s +
- timeout client  30s +
- timeout server  30s +
- timeout check 5s +
- log global +
- option tcplog +
- retries 3 +
-  +
- +
-listen stats +
- bind 0.0.0.0:8404 +
- mode http +
- stats enable +
- stats uri /stats +
- stats refresh 10s +
- stats auth admin:admin +
- +
-# ============================ +
-# SCHREIBZUGRIFFE (nur Leader/Primary) +
-# ============================ +
-frontend postgres_write +
- #bind *:5432 +
- bind *:5433 +
- default_backend pg_leader +
-  +
-backend pg_leader +
- mode tcp +
- option httpchk +
- http-check send meth GET uri /primary ver HTTP/1.1 hdr Host localhost +
- http-check expect status 200 +
- default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions +
- server pg1 pg-knoten-01:5432 check port 8008 maxconn 100 +
- server pg2 pg-knoten-02:5432 check port 8008 maxconn 100 +
- server pg3 pg-knoten-03:5432 check port 8008 maxconn 100 +
- +
-# ============================ +
-# LESEZUGRIFFE (nur Replicas, lastverteilung) +
-# ============================ +
-# [~]# curl -s "http://admin:admin@localhost:8404/stats;csv" | grep -E 'pg_leader|pg_replica' | awk -F',' '{print $2, $18, $56}' +
-# pg1 UP -1 +
-# pg2 DOWN -1 +
-# pg3 DOWN -1 +
-# BACKEND UP -1 +
-# pg1 DOWN -1 +
-# pg2 UP -1 +
-# pg3 UP -1 +
-# BACKEND UP -1 +
-frontend postgres_read +
- bind *:6543 +
- default_backend pg_replica +
-  +
-backend pg_replica +
- mode tcp +
- option httpchk +
- balance leastconn                         # Verteilt nach geringster Last +
- http-check send meth GET uri /primary ver HTTP/1.1 hdr Host localhost +
- http-check expect status 503 +
- default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions+
  server pg1 pg-knoten-01:5432 check port 8008 maxconn 100  server pg1 pg-knoten-01:5432 check port 8008 maxconn 100
  server pg2 pg-knoten-02:5432 check port 8008 maxconn 100  server pg2 pg-knoten-02:5432 check port 8008 maxconn 100
  server pg3 pg-knoten-03:5432 check port 8008 maxconn 100  server pg3 pg-knoten-03:5432 check port 8008 maxconn 100
 </code> </code>
-</hidden> 
  
 <code bash> <code bash>
 > service haproxy restart > service haproxy restart
 ... ...
-May 08 17:47:16 pg-knoten-03 systemd[1]: Starting haproxy.service - HAProxy Load Balancer... +May 08 17:47:16 FRA2C-MEDOS04 systemd[1]: Starting haproxy.service - HAProxy Load Balancer... 
-May 08 17:47:16 pg-knoten-03 haproxy[1219172]: [NOTICE]   (1219172) : New worker (1219177) forked +May 08 17:47:16 FRA2C-MEDOS04 haproxy[1219172]: [NOTICE]   (1219172) : New worker (1219177) forked 
-May 08 17:47:16 pg-knoten-03 haproxy[1219172]: [NOTICE]   (1219172) : Loading success. +May 08 17:47:16 FRA2C-MEDOS04 haproxy[1219172]: [NOTICE]   (1219172) : Loading success. 
-May 08 17:47:16 pg-knoten-03 systemd[1]: Started haproxy.service - HAProxy Load Balancer. +May 08 17:47:16 FRA2C-MEDOS04 systemd[1]: Started haproxy.service - HAProxy Load Balancer. 
-May 08 17:47:16 pg-knoten-03 haproxy[1219177]: [WARNING]  (1219177) : Server pg_patroni_knoten/pg1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 ac> +May 08 17:47:16 FRA2C-MEDOS04 haproxy[1219177]: [WARNING]  (1219177) : Server pg_patroni_knoten/pg1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 ac> 
-May 08 17:47:17 pg-knoten-03 haproxy[1219177]: [WARNING]  (1219177) : Server pg_patroni_knoten/pg2 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 1 ac>+May 08 17:47:17 FRA2C-MEDOS04 haproxy[1219177]: [WARNING]  (1219177) : Server pg_patroni_knoten/pg2 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 1 ac>
  
 > ss -antp | grep -F LISTEN | grep -F haproxy > ss -antp | grep -F LISTEN | grep -F haproxy
-LISTEN    0      4096              0.0.0.0:5433             0.0.0.0:    users:(("haproxy",pid=995927,fd=9))+LISTEN    0      4096                 0.0.0.0:8404                0.0.0.0:    users:(("haproxy",pid=2513697,fd=9)) 
 +LISTEN    0      4096                 0.0.0.0:5443                0.0.0.0:    users:(("haproxy",pid=2513697,fd=11)) 
 +LISTEN    0      4096                 0.0.0.0:5433                0.0.0.0:    users:(("haproxy",pid=2513697,fd=10))
 </code> </code>
  
Zeile 562: Zeile 428:
         vrrp_no_swap         vrrp_no_swap
 } }
- +
 vrrp_script chk_dienst_vip { vrrp_script chk_dienst_vip {
- script "/root/bin/check_haproxy.sh"+        script "/root/bin/check_haproxy.sh"
         interval 2         interval 2
-        weight 2 +        weight -50 
- fall 2+        fall 2
 } }
- +
 vrrp_instance VIP { vrrp_instance VIP {
-        interface bond2 +        #interface bond2 
-        state BACKUP+        interface vlan203
         virtual_router_id 203         virtual_router_id 203
 +        state BACKUP
         priority 100         priority 100
 +        #nopreempt
         advert_int 1         advert_int 1
- +
         track_script {         track_script {
                 chk_dienst_vip                 chk_dienst_vip
         }         }
- +
         authentication {         authentication {
                 auth_type PASS                 auth_type PASS
                 auth_pass RaBoo9as                 auth_pass RaBoo9as
         }         }
- +
         virtual_ipaddress {         virtual_ipaddress {
                 10.145.43.30/24 dev vlan203                 10.145.43.30/24 dev vlan203
         }         }
-  + 
- unicast_src_ip 10.145.43.31 +        #mcast_src_ip 10.145.43.31 
- unicast_peer { + 
- #10.145.43.31 +        unicast_src_ip 10.145.43.31 
- 10.145.43.32 +        unicast_peer { 
- 10.145.43.33 +                #10.145.43.31 
- }+                10.145.43.32 
 +                10.145.43.33 
 +        }
 } }
 </file> </file>
Zeile 606: Zeile 476:
         vrrp_no_swap         vrrp_no_swap
 } }
- +
 vrrp_script chk_dienst_vip { vrrp_script chk_dienst_vip {
- script "/root/bin/check_haproxy.sh"+        script "/root/bin/check_haproxy.sh"
         interval 2         interval 2
-        weight 2 +        weight -50 
- fall 2+        fall 2
 } }
- +
 vrrp_instance VIP { vrrp_instance VIP {
-        interface bond2 +        #interface bond2 
-        state BACKUP+        interface vlan203
         virtual_router_id 203         virtual_router_id 203
 +        state BACKUP
         priority 100         priority 100
 +        #nopreempt
         advert_int 1         advert_int 1
- +
         track_script {         track_script {
                 chk_dienst_vip                 chk_dienst_vip
         }         }
- +
         authentication {         authentication {
                 auth_type PASS                 auth_type PASS
                 auth_pass RaBoo9as                 auth_pass RaBoo9as
         }         }
- +
         virtual_ipaddress {         virtual_ipaddress {
                 10.145.43.30/24 dev vlan203                 10.145.43.30/24 dev vlan203
         }         }
-  + 
- unicast_src_ip 10.145.43.32 +        #mcast_src_ip 10.145.43.32 
- unicast_peer { + 
- 10.145.43.31 +        unicast_src_ip 10.145.43.32 
- #10.145.43.32 +        unicast_peer { 
- 10.145.43.33 +                10.145.43.31 
- }+                #10.145.43.32 
 +                10.145.43.33 
 +        }
 } }
 </file> </file>
Zeile 650: Zeile 524:
         vrrp_no_swap         vrrp_no_swap
 } }
- +
 vrrp_script chk_dienst_vip { vrrp_script chk_dienst_vip {
- script "/root/bin/check_haproxy.sh"+        script "/root/bin/check_haproxy.sh"
         interval 2         interval 2
-        weight 2 +        weight -50 
- fall 2+        fall 2
 } }
- +
 vrrp_instance VIP { vrrp_instance VIP {
-        interface bond2+        #interface bond2 
 +        interface vlan203 
 +        virtual_router_id 203
         state BACKUP         state BACKUP
-        virtual_router_id 203 
         priority 100         priority 100
 +        #nopreempt
         advert_int 1         advert_int 1
- +
         track_script {         track_script {
                 chk_dienst_vip                 chk_dienst_vip
         }         }
- +
         authentication {         authentication {
                 auth_type PASS                 auth_type PASS
                 auth_pass RaBoo9as                 auth_pass RaBoo9as
         }         }
- +
         virtual_ipaddress {         virtual_ipaddress {
                 10.145.43.30/24 dev vlan203                 10.145.43.30/24 dev vlan203
         }         }
-  
- unicast_src_ip 10.145.43.33 
- unicast_peer { 
- 10.145.43.31 
- 10.145.43.32 
- #10.145.43.33 
- } 
-} 
-</file> 
  
-<file bash /root/bin/check_pg.sh> +        #mcast_src_ip 10.145.43.33
-#!/bin/bash +
-  +
-#==============================================================================# +
-# Dieses Skript prüft den Zugriff auf die einzelnen DB-Knoten. +
-# Es wird erst ein Fehler ausgegeben, wenn kein DB-Knoten mehr erreichbar ist. +
-#==============================================================================# +
-  +
-### Wenn PostgreSQL+Patroni nicht mit ETCD, HAProxy und KeepaliveD auf dem selben Host liegen +
-#AUSGABE="$(for DBIP in $(awk '/[[:space:]][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[:]/{gsub("[:]"," "); print $3}' /etc/haproxy/haproxy.cfg) +
-#do +
-#        echo "\l" | psql -U postgres -h ${DBIP} >/dev/null 2>/dev/null && echo "${DBIP}" +
-#done)" +
-  +
-### Nur wenn PostgreSQL+Patroni, ETCD, HAProxy und KeepaliveD auf dem selben Host liegen +
-AUSGABE="$(sudo -u postgres psql -c '\l' >/dev/null 2>/dev/null && hostname -s)"+
  
-# Test +        unicast_src_ip 10.145.43.33 
-#echo "AUSGABE='${AUSGABE}'" +        unicast_peer 
-  +                10.145.43.31 
-if [ x = "x${AUSGABE}" ] ; then +                10.145.43.32 
-        echo "CRIT: kein DB-Knoten ist erreichbar" +                #10.145.43.33 
-        exit 2 +        } 
-else +}
-        echo "OK: ${AUSGABE}" | tr -s '\n' ' '; echo +
-        exit 0 +
-fi+
 </file> </file>
  
Zeile 720: Zeile 569:
    
 #==============================================================================# #==============================================================================#
-# Dieses Skript prüft den Zugriff auf die einzelnen DB-Knoten. +# Dieses Skript prüft, ob der HAProxy läuft und erreichbar ist.
-# Es wird erst ein Fehler ausgegebenwenn kein DB-Knoten mehr erreichbar ist.+
 #==============================================================================# #==============================================================================#
    
-HA_PORT="$(awk '/^[[:space:]]*bind[[:space:]]/{gsub("[:]"," "); print $3}' /etc/haproxy/haproxy.cfg)" +# Prüfeob HAProxy läuft und erreichbar ist 
-SS_AUSGABE="$(ss -antp | grep -F LISTEN | grep -F "haproxy" | grep -E ":${HA_PORT}[[:space:]]")+HOST="127.0.0.1
-HA_AUSGABE="$(links -ssl.certificates 0 -dump http://admin:admin@localhost:8404/stats | grep -E 'pg1|pg2|pg3' | wc -l)"+PORT=5433
  
-if [ x = "x${SS_AUSGABE}" ] ; then +# timeout wichtig, damit Keepalived nicht hängt 
-        echo "CRIT: Port nicht offen" +timeout 1 bash -c "</dev/tcp/$HOST/$PORT>/dev/null 2>&
-        exit 2+ 
 +if [ $? -eq 0 ]; then 
 +    exit 0   # OK → VIP behalten
 else else
- if [ 3 -eq "${HA_AUSGABE}" ] ; then +    exit 1   # Fehler → VIP abgeben
-        echo "OK" +
-        exit +
- else +
-        echo "CRIT: haproxy antwortet nicht richtig" +
-        exit 2 +
- fi+
 fi fi
 </file> </file>
Zeile 749: Zeile 593:
 <code text VIP testen> <code text VIP testen>
 > ping 10.145.43.30 > ping 10.145.43.30
 +</code>
 +
 +
 +=== VRRP-Kommunikation ===
 +
 +<code bash>
 +> # tcpdump -i vlan203 proto 112 -n | grep -F 'vrid 203'
 +tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
 +listening on vlan203, link-type EN10MB (Ethernet), snapshot length 262144 bytes
 +12:51:40.263101 IP 10.145.43.31 > 10.145.43.33: VRRPv2, Advertisement, vrid 203, prio 102, authtype simple, intvl 1s, length 20
 +12:51:41.263202 IP 10.145.43.31 > 10.145.43.33: VRRPv2, Advertisement, vrid 203, prio 102, authtype simple, intvl 1s, length 20
 +12:51:42.263398 IP 10.145.43.31 > 10.145.43.33: VRRPv2, Advertisement, vrid 203, prio 102, authtype simple, intvl 1s, length 20
 +12:51:43.263547 IP 10.145.43.31 > 10.145.43.33: VRRPv2, Advertisement, vrid 203, prio 102, authtype simple, intvl 1s, length 20
 </code> </code>
  
Zeile 763: Zeile 620:
  
 <code bash Netzwerkzugriffe erlauben> <code bash Netzwerkzugriffe erlauben>
-> echo "listen_addresses = '*'" >> /etc/postgresql/16/main/postgresql.conf+> echo "listen_addresses = '*'" >> /etc/postgresql/16/main/pg_hba.conf 
 +> echo "host    replication     all             0.0.0.0/              scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf 
 +> echo "host    replication     all             ::/                   scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf
 > echo "host    all             all             0.0.0.0/              scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf > echo "host    all             all             0.0.0.0/              scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf
 > echo "host    all             all             ::/                   scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf > echo "host    all             all             ::/                   scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf
Zeile 913: Zeile 772:
       username: repl       username: repl
       password: AejohTe6       password: AejohTe6
- +
 pg_hba: pg_hba:
-    - host postgres dbcheck 0.0.0.0/trust +    - host replication all 0.0.0.0/scram-sha-256 
-    - host postgres dbcheck ::/0 trust+    - host replication all ::/0 scram-sha-256
     - host all all 0.0.0.0/0 scram-sha-256     - host all all 0.0.0.0/0 scram-sha-256
     - host all all ::/0 scram-sha-256     - host all all ::/0 scram-sha-256
Zeile 971: Zeile 830:
       username: repl       username: repl
       password: AejohTe6       password: AejohTe6
- +
 pg_hba: pg_hba:
-    - host postgres dbcheck 0.0.0.0/trust +    - host replication all 0.0.0.0/scram-sha-256 
-    - host postgres dbcheck ::/0 trust+    - host replication all ::/0 scram-sha-256
     - host all all 0.0.0.0/0 scram-sha-256     - host all all 0.0.0.0/0 scram-sha-256
     - host all all ::/0 scram-sha-256     - host all all ::/0 scram-sha-256
Zeile 1029: Zeile 888:
       username: repl       username: repl
       password: AejohTe6       password: AejohTe6
- +
 pg_hba: pg_hba:
-    - host postgres dbcheck 0.0.0.0/trust +    - host replication all 0.0.0.0/scram-sha-256 
-    - host postgres dbcheck ::/0 trust+    - host replication all ::/0 scram-sha-256
     - host all all 0.0.0.0/0 scram-sha-256     - host all all 0.0.0.0/0 scram-sha-256
     - host all all ::/0 scram-sha-256     - host all all ::/0 scram-sha-256
Zeile 1042: Zeile 901:
 > systemctl disable postgresql > systemctl disable postgresql
 > cp /etc/postgresql/16/main/postgresql.conf /var/lib/postgresql/16/main/ > cp /etc/postgresql/16/main/postgresql.conf /var/lib/postgresql/16/main/
 +> cp /etc/postgresql/16/main/pg_hba.conf /var/lib/postgresql/16/main/
  
 > mkdir /var/lib/postgresql/16/main/conf.d > mkdir /var/lib/postgresql/16/main/conf.d
Zeile 1159: Zeile 1019:
 | Member       | Host         | Role    | State   | TL | Lag in MB | | Member       | Host         | Role    | State   | TL | Lag in MB |
 +--------------+--------------+---------+---------+----+-----------+ +--------------+--------------+---------+---------+----+-----------+
-| pg-knoten-01 | pg-knoten-01 | Replica | running |    |   unknown |+| pg-knoten-01 | pg-knoten-03 | Replica | running |    |   unknown |
 +--------------+--------------+---------+---------+----+-----------+ +--------------+--------------+---------+---------+----+-----------+
 </code> </code>
Zeile 1166: Zeile 1026:
 > patronictl -c /etc/patroni/config.yml failover > patronictl -c /etc/patroni/config.yml failover
 Current cluster topology Current cluster topology
-+ Cluster: pgcluster (7637152973093863500) ---------+----+-----------+++ Cluster: pgcluster (7637152973093863500) -------+----+-----------+
 | Member       | Host         | Role    | State   | TL | Lag in MB | | Member       | Host         | Role    | State   | TL | Lag in MB |
 +--------------+--------------+---------+---------+----+-----------+ +--------------+--------------+---------+---------+----+-----------+
-| pg-knoten-01 | pg-knoten-01 | Replica | running |    |   unknown |+| pg-knoten-01 | pg-knoten-03 | Replica | running |    |   unknown |
 +--------------+--------------+---------+---------+----+-----------+ +--------------+--------------+---------+---------+----+-----------+
 Candidate ['pg-knoten-01'] []: pg-knoten-01 Candidate ['pg-knoten-01'] []: pg-knoten-01
 Are you sure you want to failover cluster pgcluster? [y/N]: y Are you sure you want to failover cluster pgcluster? [y/N]: y
 +</code>
 +
 +<code bash >
 +> curl -s "http://admin:admin@localhost:8404/stats;csv" | grep -E 'pg_leader|pg_replica' | awk -F',' '{print $2, $18, $56}'
 +pg1 UP 1752
 +pg2 DOWN -1
 +pg3 DOWN -1
 +BACKEND UP 1752
 +pg1 DOWN -1
 +pg2 DOWN -1
 +pg3 DOWN -1
 +BACKEND DOWN -1
 +
 +> patronictl -c /etc/patroni/config.yml list
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+
 +| Member       | Host         | Role    | State   | TL | Lag in MB |
 ++--------------+--------------+---------+---------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running |  3 |           |
 +| pg-knoten-02 | pg-knoten-02 | Replica | stopped |    |   unknown |
 +| pg-knoten-03 | pg-knoten-03 | Replica | stopped |    |   unknown |
 ++--------------+--------------+---------+---------+----+-----------+
 +
 +
 +Wenn das so aussieht, dann kann von den Replicas nicht gelesen werden.
 +Um das Problem zu beheben, müssen die Replicas (beide) reinitialisiert werden.
 +
 +
 +> patronictl -c /etc/patroni/config.yml reinit pgcluster pg-knoten-02
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+
 +| Member       | Host         | Role    | State   | TL | Lag in MB |
 ++--------------+--------------+---------+---------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running |  3 |           |
 +| pg-knoten-02 | pg-knoten-02 | Replica | stopped |    |   unknown |
 +| pg-knoten-03 | pg-knoten-03 | Replica | stopped |    |   unknown |
 ++--------------+--------------+---------+---------+----+-----------+
 +Are you sure you want to reinitialize members pg-knoten-02? [y/N]: y
 +Failed: reinitialize for member pg-knoten-02, status code=503, (bootstrap from leader 'pg-knoten-01' already in progress)
 +Do you want to cancel it and reinitialize anyway? [y/N]: y
 +Success: reinitialize for member pg-knoten-02
 +
 +> patronictl -c /etc/patroni/config.yml reinit pgcluster pg-knoten-03
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+
 +| Member       | Host         | Role    | State   | TL | Lag in MB |
 ++--------------+--------------+---------+---------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running |  3 |           |
 +| pg-knoten-02 | pg-knoten-02 | Replica | stopped |    |   unknown |
 +| pg-knoten-03 | pg-knoten-03 | Replica | stopped |    |   unknown |
 ++--------------+--------------+---------+---------+----+-----------+
 +Are you sure you want to reinitialize members pg-knoten-03? [y/N]: y
 +Failed: reinitialize for member pg-knoten-03, status code=503, (bootstrap from leader 'pg-knoten-01' already in progress)
 +Do you want to cancel it and reinitialize anyway? [y/N]:  
 +
 +
 +Jetzt arbeiten die Replicas ordentliche und man kann von ihnen auch ordentlich lesen:
 +
 +
 +> patronictl -c /etc/patroni/config.yml list
 ++ Cluster: pgcluster (7637598528541987072) ---------+----+-----------+
 +| Member       | Host         | Role    | State     | TL | Lag in MB |
 ++--------------+--------------+---------+-----------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running    6 |           |
 +| pg-knoten-02 | pg-knoten-02 | Replica | streaming |  6 |         0 |
 +| pg-knoten-03 | pg-knoten-03 | Replica | streaming |  6 |         0 |
 ++--------------+--------------+---------+-----------+----+-----------+
 +
 +> curl -s "http://admin:admin@localhost:8404/stats;csv" | grep -E 'pg_leader|pg_replica' | awk -F',' '{print $2, $18, $56}'
 +pg1 UP 6000
 +pg2 DOWN -1
 +pg3 DOWN -1
 +BACKEND UP 6000
 +pg1 DOWN -1
 +pg2 UP -1
 +pg3 UP -1
 +BACKEND UP -1
 +</code>
 +
 +
 +=== Probleme nach kompletten Cluster-Neustart ===
 +
 +<code bash IST-Zustand>
 +> patronictl -c /etc/patroni/config.yml list
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+
 +| Member       | Host         | Role    | State   | TL | Lag in MB |
 ++--------------+--------------+---------+---------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Replica | running | 13 |       217 |
 +| pg-knoten-02 | pg-knoten-02 | Replica | running | 13 |       217 |
 +| pg-knoten-03 | pg-knoten-03 | Replica | running | 13 |       217 |
 ++--------------+--------------+---------+---------+----+-----------+
 +</code>
 +
 +<code bash einen zum Leader bestimmen>
 +> patronictl -c /etc/patroni/config.yml failover pgcluster
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+
 +| Member       | Host         | Role    | State   | TL | Lag in MB |
 ++--------------+--------------+---------+---------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Replica | running | 13 |       217 |
 +| pg-knoten-02 | pg-knoten-02 | Replica | running | 13 |       217 |
 +| pg-knoten-03 | pg-knoten-03 | Replica | running | 13 |       217 |
 ++--------------+--------------+---------+---------+----+-----------+
 +Candidate ['pg-knoten-01', 'pg-knoten-02', 'pg-knoten-03'] []: pg-knoten-01
 +Are you sure you want to failover cluster pgcluster? [y/N]: y
 +2026-05-13 13:48:00.52027 Successfully failed over to "pg-knoten-01"
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+
 +| Member       | Host         | Role    | State   | TL | Lag in MB |
 ++--------------+--------------+---------+---------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running | 13 |           |
 +| pg-knoten-02 | pg-knoten-02 | Replica | running | 13 |       217 |
 +| pg-knoten-03 | pg-knoten-03 | Replica | running | 13 |       217 |
 ++--------------+--------------+---------+---------+----+-----------+
 +
 +> patronictl -c /etc/patroni/config.yml list
 ++ Cluster: pgcluster (7637598528541987072) ---------+----+-----------+
 +| Member       | Host         | Role    | State     | TL | Lag in MB |
 ++--------------+--------------+---------+-----------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running   | 14 |           |
 +| pg-knoten-02 | pg-knoten-02 | Replica | streaming | 14 |         0 |
 +| pg-knoten-03 | pg-knoten-03 | Replica | streaming | 14 |         0 |
 ++--------------+--------------+---------+-----------+----+-----------+
 </code> </code>
  
Zeile 1340: Zeile 1318:
  
 //Immer erst mit Dirk Hartmann absprechen!// //Immer erst mit Dirk Hartmann absprechen!//
 +
 +<code bash Datenbanken und User zeigen>
 +> echo "\l" | psql -h10.145.43.30 -p5433 -U postgres
 +Password for user postgres: 
 +                                                       List of databases
 +   Name    |  Owner   | Encoding | Locale Provider |   Collate      Ctype    | ICU Locale | ICU Rules |   Access privileges   
 +-----------+----------+----------+-----------------+-------------+-------------+------------+-----------+-----------------------
 + postgres  | postgres | UTF8     | libc            | de_DE.UTF-8 | de_DE.UTF-8 |            |           
 + template0 | postgres | UTF8     | libc            | de_DE.UTF-8 | de_DE.UTF-8 |            |           | =c/postgres          +
 +                    |          |                                                    |           | postgres=CTc/postgres
 + template1 | postgres | UTF8     | libc            | de_DE.UTF-8 | de_DE.UTF-8 |            |           | =c/postgres          +
 +                    |          |                                                    |           | postgres=CTc/postgres
 +(3 rows)
 +
 +
 +> echo "\du" | psql -h10.145.43.30 -p5433 -U postgres
 +Password for user postgres: 
 +                             List of roles
 + Role name |                         Attributes                         
 +-----------+------------------------------------------------------------
 + postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS
 + repl      | Replication
 +</code>
 +
 +<code bash>
 +> service keepalived stop
 +> service patroni stop
 +
 +> /root/bin/upgrade.sh ; sync ; /root/bin/upgrade.sh ; sync ; /root/bin/upgrade.sh ; sync
 +> reboot && exit
 +</code>
 +
 +
 +==== Status / Monitoring ====
  
 <file bash /root/bin/Status.sh> <file bash /root/bin/Status.sh>
 #!/bin/bash #!/bin/bash
  
-VERSION="v2026051100" # initiale Erstellung+#VERSION="v2026051100" # initiale Erstellung 
 +VERSION="v2026051200" # erweitert 
 + 
 +uptime
  
 # KeepaliveD prüfen # KeepaliveD prüfen
Zeile 1354: Zeile 1369:
 echo "#------------------------------------------------------------------------------#" echo "#------------------------------------------------------------------------------#"
 # PostgreSQL Commits # PostgreSQL Commits
-sudo -u postgres psql -t -c "SELECT 'commits=' || SUM(xact_commit) || ', rollbacks=' || SUM(xact_rollback) FROM pg_stat_database;"+pg_isready -q && sudo -u postgres psql -t -c "SELECT 'commits=' || SUM(xact_commit) || ', rollbacks=' || SUM(xact_rollback) FROM pg_stat_database;" || echo Replica
  
 # Active Connections # Active Connections
-sudo -u postgres psql -t -c "SELECT COUNT(*) || ' active connections' FROM pg_stat_activity WHERE state='active';"+pg_isready -q && sudo -u postgres psql -t -c "SELECT COUNT(*) || ' active connections' FROM pg_stat_activity WHERE state='active';"
  
 # HAProxy abfragen # HAProxy abfragen
Zeile 1380: Zeile 1395:
  curl -s http://${HIP}:8008/leader | jq . | grep -E 'state|postmaster_start_time|role|scope|name'  curl -s http://${HIP}:8008/leader | jq . | grep -E 'state|postmaster_start_time|role|scope|name'
 done done
 +
 +echo "#------------------------------------------------------------------------------#"
 +pg_isready -q && echo "SELECT name, setting, unit, SOURCE, sourcefile, sourceline
 +FROM pg_settings
 +WHERE name IN (
 +  'autovacuum_work_mem',
 +  'effective_cache_size',
 +  'maintenance_work_mem',
 +  'max_connections',
 +  'shared_buffers',
 +  'work_mem'
 +  );" | sudo -u postgres psql
 </file> </file>
  
-<code bash Netzwerkzugriff per HAProxy prüfen> +<file bash /root/bin/CPU-Nutzung.sh> 
-> echo "\l" | psql -h10.145.43.30 -p5433 -U postgres +#!/bin/bash
-Password for user postgres:  +
-                                                       List of databases +
-   Name    |  Owner   | Encoding | Locale Provider |   Collate      Ctype    | ICU Locale | ICU Rules |   Access privileges    +
------------+----------+----------+-----------------+-------------+-------------+------------+-----------+----------------------- +
- postgres  | postgres | UTF8     | libc            | de_DE.UTF-8 | de_DE.UTF-8 |            |           |  +
- template0 | postgres | UTF8     | libc            | de_DE.UTF-8 | de_DE.UTF-8 |            |           | =c/postgres          + +
-                    |          |                                                    |           | postgres=CTc/postgres +
- template1 | postgres | UTF8     | libc            | de_DE.UTF-8 | de_DE.UTF-8 |            |           | =c/postgres          + +
-           |          |          |                                                    |           | postgres=CTc/postgres +
-(3 rows)+
  
 +# Laufende Parallel‑Worker sehen
  
-> echo "\du" | psql -h10.145.43.30 -p5433 -postgres +echo " 
-Password for user postgres +SELECT pid, backend_type, state 
-                             List of roles +FROM pg_stat_activity 
- Role name                         Attributes                          +WHERE backend_type LIKE '%parallel%'; 
------------+------------------------------------------------------------ + 
- postgres  SuperuserCreate roleCreate DB, Replication, Bypass RLS +SELECT backend_type, count(*) 
- repl      | Replication+FROM pg_stat_activity 
 +GROUP BY backend_type; 
 +" | sudo -u postgres psql 
 +</file> 
 + 
 +<file bash /root/bin/CREATE_VIEW_SHOW_PROCESSLIST.sh> 
 +#!/bin/bash 
 + 
 +echo "# Bonus: Alias wie SHOW PROCESSLIST" 
 +echo " 
 +CREATE VIEW show_processlist AS 
 +SELECT 
 +    pid AS id, 
 +    usename AS "User", 
 +    client_addr AS host, 
 +    datname AS db, 
 +    state, 
 +    now() - query_start AS time, 
 +    query AS info 
 +FROM pg_stat_activity; 
 +" | sudo -u postgres psql 
 +</file> 
 + 
 +<file bash /root/bin/SHOW_PROCESSLIST.sh> 
 +#!/bin/bash 
 + 
 + 
 +echo "# Bonus: Alias wie SHOW PROCESSLIST" 
 +echo " 
 +SELECT * FROM show_processlist; 
 +" | sudo -u postgres psql 
 +</file> 
 + 
 +<file bash /root/bin/DB-Status.sh> 
 +#!/bin/bash 
 + 
 + 
 +#echo "# pg_stat_activity" 
 +#echo " 
 +#SELECT * FROM pg_stat_activity; 
 +#" | sudo -u postgres psql 
 + 
 +#echo "### Praxisnahe Varianten (das, was man wirklich braucht)" 
 +#echo "# Übersicht wie in MySQL" 
 +#echo " 
 +#SELECT 
 +#    pid, 
 +#    usename, 
 +#    datname, 
 +#    state, 
 +#    client_addr, 
 +#    application_name, 
 +#    backend_start, 
 +#    query_start, 
 +#    query 
 +#FROM pg_stat_activity 
 +#ORDER BY query_start DESC; 
 +#" | sudo -u postgres psql 
 + 
 +echo "# Nur aktive Queries (was läuft gerade)" 
 +echo " 
 +SELECT 
 +    pid, 
 +    usename, 
 +    now() - query_start AS runtime, 
 +    state, 
 +    query 
 +FROM pg_stat_activity 
 +WHERE state = 'active' 
 +ORDER BY runtime DESC; 
 +" | sudo -u postgres psql 
 + 
 +echo "# Laufzeitkiller: die schlimmsten Queries" 
 +echo " 
 +SELECT 
 +    pid, 
 +    usename, 
 +    now() - query_start AS runtime, 
 +    query 
 +FROM pg_stat_activity 
 +WHERE state = 'active' 
 +  AND now() - query_start > interval '5 minutes' 
 +ORDER BY runtime DESC; 
 +" | sudo -u postgres psql 
 + 
 +echo "# Wer blockiert wen?" 
 +echo " 
 +SELECT 
 +    a.pid AS blocked_pid, 
 +    a.query AS blocked_query, 
 +    b.pid AS blocking_pid, 
 +    b.query AS blocking_query 
 +FROM pg_locks bl 
 +JOIN pg_stat_activity a ON a.pid = bl.pid 
 +JOIN pg_locks kl 
 +  ON kl.locktype = bl.locktype 
 + AND kl.database IS NOT DISTINCT FROM bl.database 
 + AND kl.relation IS NOT DISTINCT FROM bl.relation 
 + AND kl.page IS NOT DISTINCT FROM bl.page 
 + AND kl.tuple IS NOT DISTINCT FROM bl.tuple 
 + AND kl.transactionid IS NOT DISTINCT FROM bl.transactionid 
 + AND kl.classid IS NOT DISTINCT FROM bl.classid 
 + AND kl.objid IS NOT DISTINCT FROM bl.objid 
 + AND kl.objsubid IS NOT DISTINCT FROM bl.objsubid 
 + AND kl.pid != bl.pid 
 +JOIN pg_stat_activity b ON b.pid = kl.pid 
 +WHERE NOT bl.granted; 
 +" | sudo -u postgres psql 
 + 
 +#echo "### Sessions beenden (KILL QUERY / KILL CONNECTION)" 
 +#echo "# Query abbrechen (sanft)" 
 +#echo " 
 +#SELECT pg_cancel_backend(pid) 
 +#FROM pg_stat_activity 
 +#WHERE pid = 12345; 
 +#" | sudo -postgres psql 
 + 
 +#echo "# Session hart beenden" 
 +#echo " 
 +#SELECT pg_terminate_backend(pid) 
 +#FROM pg_stat_activity 
 +#WHERE pid = 12345; 
 +#" | sudo -u postgres psql 
 + 
 +echo "# Idle in transaction (sehr wichtig!)" 
 +echo " 
 +SELECT 
 +    pid, 
 +    usename, 
 +    now() - query_start AS idle_time, 
 +    query 
 +FROM pg_stat_activity 
 +WHERE state = 'idle in transaction' 
 +ORDER BY idle_time DESC; 
 +sudo -u postgres psql 
 +</file> 
 + 
 +<file bash /root/bin/DB-Status_-_nur_eigenes.sh> 
 +#!/bin/bash 
 + 
 + 
 +if [ x = "x${1}" ] ; then 
 + DBUSER="postgres" 
 +else 
 + DBUSER="${1}" 
 +fi 
 + 
 +echo "# Nur eigene DB / eigener User" 
 +#echo " 
 +#SELECT * 
 +#FROM pg_stat_activity 
 +#WHERE datname = current_database(); 
 +#" | sudo -u postgres psql 
 + 
 +#echo "# Oder" 
 +echo " 
 +SELECT * 
 +FROM pg_stat_activity 
 +WHERE usename = '${DBUSER}'; 
 +" | sudo -u postgres psql 
 +</file> 
 + 
 + 
 +==== Replica reinitialisieren ==== 
 + 
 +<code bash pg-knoten-01 reparieren - neu syncen> 
 +> patronictl -c /etc/patroni/config.yml reinit pgcluster pg-knoten-02 
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------
 +| Member       | Host         | Role    | State   | TL | Lag in MB | 
 ++--------------+--------------+---------+---------+----+-----------+ 
 +| pg-knoten-01 | pg-knoten-01 | Leader  running |  3 |           | 
 +| pg-knoten-02 | pg-knoten-02 | Replica | stopped |    |   unknown | 
 +| pg-knoten-03 | pg-knoten-03 | Replica | stopped |    |   unknown | 
 ++--------------+--------------+---------+---------+----+-----------+ 
 +Are you sure you want to reinitialize members pg-knoten-02? [y/N]: y 
 +Failed: reinitialize for member pg-knoten-02status code=503(bootstrap from leader 'pg-knoten-01' already in progress) 
 +Do you want to cancel it and reinitialize anyway? [y/N]: y 
 +Success: reinitialize for member pg-knoten-02
 </code> </code>
  
-//Wie das beste Vorgehen bei Wartungsarbeiten ist, muß erst noch ermittelt ausgelotet werden.//+ 
 +==== failover ====
  
 <code bash> <code bash>
-> service keepalived stop +[root@FRA2C-MEDOS04]---------------------------------------------------------------------[0]-[0]-[2026-05-12 14:26:12] 
-> service patroni stop+[~]# patronictl -c /etc/patroni/config.yml failover 
 +Current cluster topology 
 ++ Cluster: pgcluster (7637598528541987072) ---------+----+-----------+ 
 +| Member       | Host         | Role    | State     | TL | Lag in MB | 
 ++--------------+--------------+---------+-----------+----+-----------+ 
 +| pg-knoten-01 | pg-knoten-01 | Replica | streaming |  8 |         0 | 
 +| pg-knoten-02 | pg-knoten-02 | Leader  | running    8 |           | 
 +| pg-knoten-03 | pg-knoten-03 | Replica | streaming |  8 |         0 | 
 ++--------------+--------------+---------+-----------+----+-----------+ 
 +Candidate ['pg-knoten-01', 'pg-knoten-03'] []: pg-knoten-01 
 +Are you sure you want to failover cluster pgcluster, demoting current leader pg-knoten-02? [y/N]: y 
 +2026-05-12 14:26:55.23394 Successfully failed over to "pg-knoten-01" 
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+ 
 +| Member       | Host         | Role    | State   | TL | Lag in MB | 
 ++--------------+--------------+---------+---------+----+-----------+ 
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running |  8 |           | 
 +| pg-knoten-02 | pg-knoten-02 | Replica | stopped |    |   unknown | 
 +| pg-knoten-03 | pg-knoten-03 | Replica | running |  8 |         0 | 
 ++--------------+--------------+---------+---------+----+-----------+
  
-> /root/bin/upgrade.sh ; sync ; /root/bin/upgrade.sh ; sync ; /root/bin/upgrade.sh ; sync +[root@FRA2C-MEDOS04]---------------------------------------------------------------------[0]-[0]-[2026-05-12 14:26:55] 
-> reboot && exit+[~]# patronictl -c /etc/patroni/config.yml list 
 ++ Cluster: pgcluster (7637598528541987072) ---------+----+-----------+ 
 +| Member       | Host         | Role    | State     | TL | Lag in MB | 
 ++--------------+--------------+---------+-----------+----+-----------+ 
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running    9 |           | 
 +| pg-knoten-02 | pg-knoten-02 | Replica | streaming |  9 |         0 | 
 +| pg-knoten-03 | pg-knoten-03 | Replica | streaming |  9 |         0 | 
 ++--------------+--------------+---------+-----------+----+-----------+
 </code> </code>
  
Zeile 1447: Zeile 1668:
     Write‑Traffic NUR zum Leader schicken     Write‑Traffic NUR zum Leader schicken
     Read‑Traffic auf Replicas verteilen     Read‑Traffic auf Replicas verteilen
 +</code>
 +
 +
 +===== PostgreSQL-Tuning =====
 +
 +<code bash Daten asynchron einspielen, das geht etwas schneller>
 +> psql -c "SET synchronous_commit=off;"
 +</code>
 +
 +<code text>
 +Kurzfassung:
 +
 +autovacuum_work_mem  => 3% bis 10% des RAM
 +effective_cache_size => 50% bis 75% der RAM-Größe
 +maintenance_work_mem => 3% bis 10% des RAM
 +max_connections      => nur soviel, wie man wirklich braucht
 +shared_buffers       => 10% bis 25% der RAM-Größe
 +
 +                      frei verfügbarer RAM
 +work_mem ~ --------------------------------------------
 +           max_connections * Sort/Hash-Knoten pro Query
 +
 +work_mem ~           => ca. 1% der RAM-Größe; Vorsicht, kann zu OOM-Kill führen!
 +</code>
 +
 +<file bash /root/bin/PostgreSQL-Tuning.sh>
 +#!/bin/bash
 +
 +#VERSION="v2026051100" # initiale Erstellung
 +VERSION="v2026051300" # jetzt werden die neu berechneten Werte, automatisch in die DB geschrieben
 +
 +#CLUSTER_NAME="pgcluster"
 +CLUSTER_NAME="$(awk '/^scope: /{print $NF}' /etc/patroni/config.yml)"
 +
 +### RAM: 1547643 MB
 +# > echo "SELECT name, setting, unit, SOURCE, sourcefile, sourceline FROM pg_settings WHERE name IN ('shared_buffers', 'effective_cache_size', 'work_mem', 'max_connections', 'maintenance_work_mem', 'autovacuum_work_mem', 'shared_memory_size');" | sudo -u postgres psql
 +#          name          setting  | unit |       source                       sourcefile                  | sourceline 
 +# ----------------------+-----------+------+--------------------+---------------------------------------------+------------
 +#  autovacuum_work_mem  | -1        | kB   | default            |                                                       
 +#  effective_cache_size | 148635648 | 8kB  | configuration file | /var/lib/postgresql/16/main/postgresql.conf |          6
 +#  maintenance_work_mem | 78643200  | kB   | configuration file | /var/lib/postgresql/16/main/postgresql.conf |          9
 +#  max_connections      | 100            | command line                                                             
 +#  shared_buffers       | 49545216  | 8kB  | configuration file | /var/lib/postgresql/16/main/postgresql.conf |         17
 +#  shared_memory_size   | 394457    | MB   | default            |                                                       
 +#  work_mem             | 8388608   | kB   | configuration file | /var/lib/postgresql/16/main/postgresql.conf |         22
 +# (7 rows)
 +
 +RAM_IN_MB="$(free -m | tail -n+2 | head -n1 | awk '{printf "%.0f\n", $2}')"
 +echo "RAM: ${RAM_IN_MB} MB"
 +
 +S_PG="$(echo "SELECT name, setting, unit
 +FROM pg_settings
 +WHERE name IN (
 +  'autovacuum_work_mem',
 +  'effective_cache_size',
 +  'maintenance_work_mem',
 +  'max_connections',
 +  'shared_buffers',
 +  'work_mem'
 +);" | sudo -u postgres psql -t | head -n-1 | awk '{print $1,$3,$5}' | grep -Fv -- '-1' | awk '{a=$2; if ($3=="8kB") {a=$2*8}; b=a/1024" MB"; if (a>10485760) b=a/1024/1024" GB"; if ($1=="max_connections") b=$2; print $1"\t\t"b}')"
 +echo "
 +# IST-Werte (ggf. autovacuum_work_mem = maintenance_work_mem):
 +${S_PG}
 +"
 +
 +AUTOVACUUM_WORK_MEM="$(echo "${RAM_IN_MB}"  | awk '{a=$1/10;   b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')"
 +EFFECTIVE_CACHE_SIZE="$(echo "${RAM_IN_MB}" | awk '{a=$1*3/4;  b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')"
 +MAINTENANCE_WORK_MEM="$(echo "${RAM_IN_MB}" | awk '{a=$1/100;  b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')"
 +MAX_CONNECTIONS="$(awk '/^processor[[:space:]]+: /{printf "%.0f\n", $NF*11/10}' /proc/cpuinfo | sort -n | tail -n1)"
 +SHARED_BUFFERS="$(echo "${RAM_IN_MB}"       | awk '{a=$1/4;    b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')"
 +WORK_MEM="$(echo "${RAM_IN_MB}"             | awk '{a=$1/10000; b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')"
 +MAX_WAL_SIZE="$(echo "${RAM_IN_MB}"  | awk '{a=$1/10;   b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')"
 +
 +echo "# empfohlene Werte, die meistens passen:
 +autovacuum_work_mem:  ${AUTOVACUUM_WORK_MEM} (ca. 10% vom RAM)
 +effective_cache_size: ${EFFECTIVE_CACHE_SIZE} (ca. 50-75% vom RAM)
 +maintenance_work_mem: ${MAINTENANCE_WORK_MEM} (ca. 1-10% vom RAM)
 +max_connections:      ${MAX_CONNECTIONS} (ca. CPU-Kerne + 10%)
 +shared_buffers:       ${SHARED_BUFFERS} (ca. 10-25% vom RAM)
 +max_wal_size:         ${MAX_WAL_SIZE} (ca. 10% vom RAM / eigentlich kein Bezug zum RAM, 50-200GB)
 +work_mem:             ${WORK_MEM} (ca. 0,1-1% vom RAM; Vorsicht, kann zu OOM-Kill führen!)
 +"
 +
 +### neu berechnete Werte, direkt setzen
 +### patronictl -c /etc/patroni/config.yml edit-config pgcluster
 +echo "#==============================================================================#"
 +echo
 +jq -n \
 +  --arg ecs "${EFFECTIVE_CACHE_SIZE}" \
 +  --arg mwm "${MAINTENANCE_WORK_MEM}" \
 +  --arg mc  "${MAX_CONNECTIONS}" \
 +  --arg sb  "${SHARED_BUFFERS}" \
 +  --arg mw  "${MAX_WAL_SIZE}" \
 +  --arg wm  "${WORK_MEM}" \
 +'{
 +  postgresql: {
 +    parameters: {
 +      effective_cache_size: $ecs,
 +      maintenance_work_mem: $mwm,
 +      max_connections: $mc,
 +      shared_buffers: $sb,
 +      max_wal_size: $mw,
 +      wal_buffers: "1GB",
 +      checkpoint_timeout: "30min",
 +      work_mem: $wm
 +    }
 +  }
 +}' | curl -s -X PATCH http://localhost:8008/config \
 +        -H "Content-Type: application/json" \
 +        --data-binary @-
 +echo
 +echo "#==============================================================================#"
 +
 +### aktuelle Einstellungen anzeigen
 +patronictl -c /etc/patroni/config.yml show-config ${CLUSTER_NAME}
 +
 +### neue Einstellungen aktivieren
 +### shared_buffers benötigt leider einen Restart
 +service patroni restart
 +</file>
 +
 +<code bash RAM: 1547643 MB>
 +         name          setting  | unit |       source                       sourcefile                  | sourceline 
 +----------------------+-----------+------+--------------------+---------------------------------------------+------------
 + autovacuum_work_mem  | -1        | kB   | default            |                                                       
 + effective_cache_size | 148635648 | 8kB  | configuration file | /var/lib/postgresql/16/main/postgresql.conf |          6
 + maintenance_work_mem | 78643200  | kB   | configuration file | /var/lib/postgresql/16/main/postgresql.conf |          9
 + max_connections      | 100            | command line                                                             
 + shared_buffers       | 49545216  | 8kB  | configuration file | /var/lib/postgresql/16/main/postgresql.conf |         17
 + shared_memory_size   | 394457    | MB   | default            |                                                       
 + work_mem             | 8388608   | kB   | configuration file | /var/lib/postgresql/16/main/postgresql.conf |         22
 +(7 rows)
 +</code>
 +
 +<code bash Konfigurationsvorschlag nachher ansehen>
 +> /root/bin/PostgreSQL-Tuning.sh
 +RAM: 1547643 MB
 +
 +# IST-Werte (ggf. autovacuum_work_mem = maintenance_work_mem):
 +effective_cache_size 1134 GB
 +maintenance_work_mem 151 GB
 +max_connections 150
 +shared_buffers 378 GB
 +work_mem 15 GB
 +
 +# empfohlene Werte, die meistens passen:
 +autovacuum_work_mem:  151GB (ca. 10% vom RAM)
 +effective_cache_size: 1134GB (ca. 50-75% vom RAM)
 +maintenance_work_mem: 151GB (ca. 3-10% vom RAM)
 +max_connections:      140 (ca. CPU-Kerne + 10%)
 +shared_buffers:       378GB (ca. 10-25% vom RAM)
 +work_mem:             15GB (ca. 1% vom RAM; Vorsicht, kann zu OOM-Kill führen!)
 +
 +#==============================================================================#
 +
 +{"loop_wait": 10, "maximum_lag_on_failover": 1048576, "postgresql": {"parameters": {"effective_cache_size": "1134GB", "hot_standby": "on", "maintenance_work_mem": "151GB", "max_replication_slots": 10, "max_wal_senders": 10, "shared_buffers": "378GB", "wal_level": "replica", "work_mem": "15GB", "max_connections": "140"}, "use_pg_rewind": true}, "retry_timeout": 10, "ttl": 30}
 +#==============================================================================#
 +loop_wait: 10
 +maximum_lag_on_failover: 1048576
 +postgresql:
 +  parameters:
 +    effective_cache_size: 1134GB
 +    hot_standby: 'on'
 +    maintenance_work_mem: 151GB
 +    max_connections: '140'
 +    max_replication_slots: 10
 +    max_wal_senders: 10
 +    shared_buffers: 378GB
 +    wal_level: replica
 +    work_mem: 15GB
 +  use_pg_rewind: true
 +retry_timeout: 10
 +ttl: 30
 +</code>
 +
 +<file bash /root/bin/CPU-Nutzung.sh>
 +#!/bin/bash
 +
 +
 +# Laufende Parallel‑Worker sehen
 +
 +echo "
 +SELECT pid, backend_type, state
 +FROM pg_stat_activity
 +WHERE backend_type LIKE '%parallel%';
 +
 +SELECT backend_type, count(*)
 +FROM pg_stat_activity
 +GROUP BY backend_type;
 +" | sudo -u postgres psql
 +</file>
 +
 +<code bash>
 +> /root/bin/CPU-Nutzung.sh
 + pid | backend_type | state 
 +-----+--------------+-------
 +(0 rows)
 +
 +   backend_type    | count 
 +-------------------+-------
 + client backend    |     3
 + startup               1
 + background writer |     1
 + walreceiver           1
 + checkpointer      |     1
 +(5 rows)
 +</code>
 +
 +
 +===== Tests =====
 +
 +<code bash>
 +> time sudo -u postgres pgbench -i -s 1000 testdb
 +dropping old tables...
 +creating tables...
 +generating data (client-side)...
 +100000000 of 100000000 tuples (100%) done (elapsed 85.93 s, remaining 0.00 s)
 +vacuuming...
 +creating primary keys...
 +done in 125.30 s (drop tables 4.04 s, create tables 0.01 s, client-side generate 86.85 s, vacuum 0.84 s, primary keys 33.56 s).
 +
 +real 2m5.365s
 +user 0m0.038s
 +sys 0m0.077s
 +
 +> sudo -u postgres pgbench -c 128 -j 128 -T 120 -N testdb
 +pgbench (16.13 (Ubuntu 16.13-0ubuntu0.24.04.1))
 +starting vacuum...end.
 +transaction type: <builtin: simple update>
 +scaling factor: 1000
 +query mode: simple
 +number of clients: 128
 +number of threads: 128
 +maximum number of tries: 1
 +duration: 120 s
 +number of transactions actually processed: 5956017
 +number of failed transactions: 0 (0.000%)
 +latency average = 2.578 ms
 +initial connection time = 66.949 ms
 +tps = 49656.884083 (without initial connection time)
 +
 +> sudo -u postgres pgbench -c 250 -j 250 -T 120 -N testdb
 +pgbench (16.13 (Ubuntu 16.13-0ubuntu0.24.04.1))
 +starting vacuum...end.
 +transaction type: <builtin: simple update>
 +scaling factor: 1000
 +query mode: simple
 +number of clients: 250
 +number of threads: 250
 +maximum number of tries: 1
 +duration: 120 s
 +number of transactions actually processed: 6463558
 +number of failed transactions: 0 (0.000%)
 +latency average = 4.637 ms
 +initial connection time = 154.766 ms
 +tps = 53915.406631 (without initial connection time)
 +</code>
 +
 +<code bash WAL>
 +> echo "SELECT wal_write_time, wal_sync_time FROM pg_stat_wal;" | sudo -u postgres psql -d testdb
 + wal_write_time | wal_sync_time 
 +----------------+---------------
 +              0 |             0
 +(1 row)
 +</code>
 +
 +<code bash ZFS I/O>
 +> zpool iostat -v
 +               capacity     operations     bandwidth 
 +pool         alloc   free   read  write   read  write
 +-----------  -----  -----  -----  -----  -----  -----
 +pg_datadir   9.12G  2.90T      0  1.32K    276  29.3M
 +  mirror-0   9.12G  2.90T      0  1.32K    276  29.3M
 +    nvme2n1      -      -      0    673    138  14.6M
 +    nvme3n1      -      -      0    676    138  14.6M
 +-----------  -----  -----  -----  -----  -----  -----
 +</code>
 +
 +
 +==== Testdaten einspielen ====
 +
 +<code bash>
 +> echo "COPY tabelle FROM '/tmp/export_mit_header.tsv' WITH (FORMAT csv, DELIMITER E'\t', HEADER MATCH, NULL '');" | psql -U dbuser -d testdb
 +
 +oder
 +
 +> cat /tmp/export_mit_header.tsv | psql -d testdb -U dbuser -c "COPY tabelle FROM STDIN WITH (FORMAT csv, DELIMITER E'\t', HEADER MATCH, NULL '');"
 +</code>
 +
 +<code bash Datensätze zählen>
 +> echo "SELECT COUNT(*) AS Tabellenname FROM tabelle;" | psql -d testdb -U dbuser
 +</code>
 +
 +
 +===== Verbindungen SSL/TLS-Verschlüsselt =====
 +
 +
 +==== Verschlüsselung konfigurieren ====
 +
 +<code bash postgresql.conf>
 +> patronictl -c /etc/patroni/config.yml edit-config pgcluster
 +loop_wait: 10
 +maximum_lag_on_failover: 1048576
 +postgresql:
 +  parameters:
 +    hot_standby: 'on'
 +    max_replication_slots: 10
 +    max_wal_senders: 10
 +    wal_level: replica
 +    ssl: on
 +    ssl_ca_file: '/etc/ssl/server-ca.pem'
 +    ssl_cert_file: '/etc/ssl/server-cert.pem'
 +    ssl_key_file: '/etc/ssl/server-key.pem'
 +    ssl_min_protocol_version: 'TLSv1.2'
 +    ssl_ciphers: 'HIGH:!aNULL:!3DES:!MD5:!DES:!MEDIUM'
 +    ssl_prefer_server_ciphers: on
 +  use_pg_rewind: true
 +retry_timeout: 10
 +ttl: 30
 +</code>
 +
 +
 +==== Verschlüsselung erzwingen ====
 +
 +<code bash pg_hba.conf>
 +> vim /etc/patroni/config.yml
 +...
 +pg_hba:
 +    - hostnossl all all 0.0.0.0/0 reject
 +    - hostnossl all all ::/0 reject
 +    - hostssl replication all 0.0.0.0/0 scram-sha-256
 +    - hostssl replication all ::/0 scram-sha-256
 +    - hostssl all all 0.0.0.0/0 scram-sha-256
 +    - hostssl all all ::/0 scram-sha-256
 +</code>
 +
 +
 +==== Verschlüsselung aktivieren ====
 +
 +<code bash kompletten Cluster restarten>
 +> patronictl -c /etc/patroni/config.yml restart pgcluster
 +</code>
 +
 +
 +===== temp_tablespaces =====
 +
 +<code bash temp_tablespaces-Verzeichnis anlegen>
 +> mkdir -p /var/lib/postgresql/temp_tablespaces
 +> chown postgres:postgres /var/lib/postgresql/temp_tablespaces
 +> chmod 0700 /var/lib/postgresql/temp_tablespaces
 +</code>
 +
 +<code bash temp_tablespaces konfigurieren>
 +> echo "SELECT spcname, pg_tablespace_location(oid) FROM pg_tablespace WHERE spcname = 'fasttmp';" | sudo -u postgres psql
 +> echo "CREATE TABLESPACE fasttmp LOCATION '/var/lib/postgresql/temp_tablespaces';" | sudo -u postgres psql
 +> echo "SELECT spcname, pg_tablespace_location(oid) FROM pg_tablespace WHERE spcname = 'fasttmp';" | sudo -u postgres psql
 +
 +> patronictl -c /etc/patroni/config.yml edit-config
 +...
 +postgresql:
 +  parameters:
 +    temp_file_limit: '-1'
 +    temp_tablespaces: fasttmp
 +    ...
 </code> </code>
  
  
/home/http/wiki/data/attic/datenbank/ha-postgresql-cluster.1778608518.txt · Zuletzt geändert: von manfred