Benutzer-Werkzeuge

Webseiten-Werkzeuge


datenbank:ha-postgresql-cluster

Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

Beide Seiten der vorigen RevisionVorhergehende Überarbeitung
Nächste Überarbeitung
Vorhergehende Überarbeitung
datenbank:ha-postgresql-cluster [2026-05-12 18:49:02] manfreddatenbank:ha-postgresql-cluster [2026-05-29 10:09:43] (aktuell) – [Tests] manfred
Zeile 1: Zeile 1:
 ====== HA-PostgreSQL-Cluster ====== ====== HA-PostgreSQL-Cluster ======
 +
 +siehe auch: **[[::Datenbank:PostgreSQL]]**
  
  
Zeile 6: Zeile 8:
 //Es soll ein hochverfügbarer PostgreSQL‑Cluster mit Patroni, EtcE, HAProxy und KeepaliveD auf drei Systemen installiert werden.// //Es soll ein hochverfügbarer PostgreSQL‑Cluster mit Patroni, EtcE, HAProxy und KeepaliveD auf drei Systemen installiert werden.//
  
-Wichtige Unterschiede zwischen diesem HA-PostgreSQL-Cluster mit 3 Systemen, gegenüber einem HA-PostgreSQL-Cluster mit 5 Systemen sind, dass hier alle Pakete auf allen 3 Systemen installiert werden müssen. Und weil PostgreSQL und HAProxy auf dem selben Host laufen, muß der Datenbank-Zugriff über den Cluster-Zugang (HAProxy) auf einen anderen Port laufen. In diesem Fall laufen die PostgreSQL-Instanzen aus dem Standard-Port (''5432'') und der HAProxy jeweils auf einen neuen Port (''5433''). +Wichtige Unterschiede zwischen diesem HA-PostgreSQL-Cluster mit 3 Systemen, gegenüber einem HA-PostgreSQL-Cluster mit 5 Systemen sind, dass hier alle Pakete auf allen 3 Systemen installiert werden müssen. Und weil PostgreSQL und HAProxy auf dem selben Host laufen, muß der Datenbank-Zugriff über den Cluster-Zugang (HAProxy) auf einen anderen Port laufen. 
-Weiterhin gibt es in diesem Aufbau noch einen schreibgeschützten Port (''5443''), über den man auf den man nicht in die DB schreiben kann und von den Replicas liest, um den Leader zu entlasten.+In diesem Fall laufen die PostgreSQL-Instanzen auf dem Standard-Port (''5432'') und der HAProxy jeweils auf neuen Ports (schreiben: ''5433'' / lesen: ''5443'').
  
 In einem HA-PostgreSQL-Cluster mit 5 Systemen laufen EtcE, HAProxy und KeepaliveD auf 2 separate Systeme und dann können PostgreSQL und HAProxy auf dem gleichen Port laufen. In einem HA-PostgreSQL-Cluster mit 5 Systemen laufen EtcE, HAProxy und KeepaliveD auf 2 separate Systeme und dann können PostgreSQL und HAProxy auf dem gleichen Port laufen.
Zeile 19: Zeile 21:
 > apt install vim screen mc csync2 links python3 python3-psycopg2 jq curl wget net-tools etcd-server etcd-client haproxy keepalived postgresql postgresql-contrib patroni > apt install vim screen mc csync2 links python3 python3-psycopg2 jq curl wget net-tools etcd-server etcd-client haproxy keepalived postgresql postgresql-contrib patroni
 > echo "export EDITOR=/usr/bin/vim" > ~/.bash_profile > echo "export EDITOR=/usr/bin/vim" > ~/.bash_profile
 +</code>
 +
 +
 +==== PG-DataDir mit ZFS ====
 +
 +<code bash DataDir im RAID-1>
 +> apt install zfsutils-linux
 +
 +> zpool create -m /var/lib/postgresql pg_datadir mirror /dev/nvme2n1 /dev/nvme3n1
 +> zpool list
 +NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
 +pg_datadir  2.91T   284K  2.91T        -             0%     0%  1.00x    ONLINE  -
 +
 +> zpool status
 +  pool: pg_datadir
 + state: ONLINE
 +config:
 +
 + NAME         STATE     READ WRITE CKSUM
 + pg_datadir   ONLINE               0
 +   mirror-0   ONLINE               0
 +     nvme2n1  ONLINE               0
 +     nvme3n1  ONLINE               0
 +
 +errors: No known data errors
 +
 +> zfs set compression=lz4    pg_datadir
 +> zfs set recordsize=16K     pg_datadir
 +> zfs set logbias=latency    pg_datadir
 +> zfs set atime=off          pg_datadir
 +
 +> mkdir -p /var/lib/postgresql/16/main
 +> chown -R postgres:postgres /var/lib/postgresql/
 +> chmod 700 /var/lib/postgresql/16/main
 +
 +> mount | grep postgresql
 +
 +> zfs get mountpoint,compression,encryption,logbias,recordsize
 +NAME        PROPERTY     VALUE                SOURCE
 +pg_datadir  mountpoint   /var/lib/postgresql  local
 +pg_datadir  compression  lz4                  local
 +pg_datadir  encryption   off                  default
 +pg_datadir  logbias      latency              local
 +pg_datadir  recordsize   16K                  local
 +
 +> df -h
 +Filesystem      Size  Used Avail Use% Mounted on
 +...
 +pg_datadir      2.9T   40M  2.9T   1% /var/lib/postgresql
 </code> </code>
  
Zeile 29: Zeile 80:
 group Loadbalancer #group name, we can have multiple groups group Loadbalancer #group name, we can have multiple groups
 { {
-    host pg-knoten-01; #master server +    host FRA2A-MEDOS04; #master server 
-    host pg-knoten-02; #slave server +    host FRA2B-MEDOS04; #slave server 
-    host pg-knoten-03; #slave server+    host FRA2C-MEDOS04; #slave server
    
     key /etc/csync2_ssl.key;     key /etc/csync2_ssl.key;
Zeile 138: Zeile 189:
  
 <code text Starten> <code text Starten>
-systemctl restart etcd +systemctl restart etcd 
-systemctl enable etcd+systemctl enable etcd
 </code> </code>
  
Zeile 152: Zeile 203:
 === Hinweis === === Hinweis ===
  
-**//Solange der Cluster unhealthy ist, sind Verwaltungsoperationen wie etcdctl member add absichtlich verboten.//**+**//Solange der Cluster unhealthy ist, sind Verwaltungsoperationen wie ''etcdctl member add'' absichtlich verboten.//**
  
 <code bash> <code bash>
Zeile 290: Zeile 341:
 > service haproxy restart > service haproxy restart
 ... ...
-May 08 17:47:16 pg-knoten-03 systemd[1]: Starting haproxy.service - HAProxy Load Balancer... +May 08 17:47:16 FRA2C-MEDOS04 systemd[1]: Starting haproxy.service - HAProxy Load Balancer... 
-May 08 17:47:16 pg-knoten-03 haproxy[1219172]: [NOTICE]   (1219172) : New worker (1219177) forked +May 08 17:47:16 FRA2C-MEDOS04 haproxy[1219172]: [NOTICE]   (1219172) : New worker (1219177) forked 
-May 08 17:47:16 pg-knoten-03 haproxy[1219172]: [NOTICE]   (1219172) : Loading success. +May 08 17:47:16 FRA2C-MEDOS04 haproxy[1219172]: [NOTICE]   (1219172) : Loading success. 
-May 08 17:47:16 pg-knoten-03 systemd[1]: Started haproxy.service - HAProxy Load Balancer. +May 08 17:47:16 FRA2C-MEDOS04 systemd[1]: Started haproxy.service - HAProxy Load Balancer. 
-May 08 17:47:16 pg-knoten-03 haproxy[1219177]: [WARNING]  (1219177) : Server pg_patroni_knoten/pg1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 ac> +May 08 17:47:16 FRA2C-MEDOS04 haproxy[1219177]: [WARNING]  (1219177) : Server pg_patroni_knoten/pg1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 ac> 
-May 08 17:47:17 pg-knoten-03 haproxy[1219177]: [WARNING]  (1219177) : Server pg_patroni_knoten/pg2 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 1 ac>+May 08 17:47:17 FRA2C-MEDOS04 haproxy[1219177]: [WARNING]  (1219177) : Server pg_patroni_knoten/pg2 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 1 ac>
  
 > ss -antp | grep -F LISTEN | grep -F haproxy > ss -antp | grep -F LISTEN | grep -F haproxy
Zeile 377: Zeile 428:
         vrrp_no_swap         vrrp_no_swap
 } }
- +
 vrrp_script chk_dienst_vip { vrrp_script chk_dienst_vip {
- script "/root/bin/check_haproxy.sh"+        script "/root/bin/check_haproxy.sh"
         interval 2         interval 2
-        weight 2 +        weight -50 
- fall 2+        fall 2
 } }
- +
 vrrp_instance VIP { vrrp_instance VIP {
-        interface bond2 +        #interface bond2 
-        state BACKUP+        interface vlan203
         virtual_router_id 203         virtual_router_id 203
 +        state BACKUP
         priority 100         priority 100
 +        #nopreempt
         advert_int 1         advert_int 1
- +
         track_script {         track_script {
                 chk_dienst_vip                 chk_dienst_vip
         }         }
- +
         authentication {         authentication {
                 auth_type PASS                 auth_type PASS
                 auth_pass RaBoo9as                 auth_pass RaBoo9as
         }         }
- +
         virtual_ipaddress {         virtual_ipaddress {
                 10.145.43.30/24 dev vlan203                 10.145.43.30/24 dev vlan203
         }         }
-  + 
- unicast_src_ip 10.145.43.31 +        #mcast_src_ip 10.145.43.31 
- unicast_peer { + 
- #10.145.43.31 +        unicast_src_ip 10.145.43.31 
- 10.145.43.32 +        unicast_peer { 
- 10.145.43.33 +                #10.145.43.31 
- }+                10.145.43.32 
 +                10.145.43.33 
 +        }
 } }
 </file> </file>
Zeile 421: Zeile 476:
         vrrp_no_swap         vrrp_no_swap
 } }
- +
 vrrp_script chk_dienst_vip { vrrp_script chk_dienst_vip {
- script "/root/bin/check_haproxy.sh"+        script "/root/bin/check_haproxy.sh"
         interval 2         interval 2
-        weight 2 +        weight -50 
- fall 2+        fall 2
 } }
- +
 vrrp_instance VIP { vrrp_instance VIP {
-        interface bond2 +        #interface bond2 
-        state BACKUP+        interface vlan203
         virtual_router_id 203         virtual_router_id 203
 +        state BACKUP
         priority 100         priority 100
 +        #nopreempt
         advert_int 1         advert_int 1
- +
         track_script {         track_script {
                 chk_dienst_vip                 chk_dienst_vip
         }         }
- +
         authentication {         authentication {
                 auth_type PASS                 auth_type PASS
                 auth_pass RaBoo9as                 auth_pass RaBoo9as
         }         }
- +
         virtual_ipaddress {         virtual_ipaddress {
                 10.145.43.30/24 dev vlan203                 10.145.43.30/24 dev vlan203
         }         }
-  + 
- unicast_src_ip 10.145.43.32 +        #mcast_src_ip 10.145.43.32 
- unicast_peer { + 
- 10.145.43.31 +        unicast_src_ip 10.145.43.32 
- #10.145.43.32 +        unicast_peer { 
- 10.145.43.33 +                10.145.43.31 
- }+                #10.145.43.32 
 +                10.145.43.33 
 +        }
 } }
 </file> </file>
Zeile 465: Zeile 524:
         vrrp_no_swap         vrrp_no_swap
 } }
- +
 vrrp_script chk_dienst_vip { vrrp_script chk_dienst_vip {
- script "/root/bin/check_haproxy.sh"+        script "/root/bin/check_haproxy.sh"
         interval 2         interval 2
-        weight 2 +        weight -50 
- fall 2+        fall 2
 } }
- +
 vrrp_instance VIP { vrrp_instance VIP {
-        interface bond2+        #interface bond2 
 +        interface vlan203 
 +        virtual_router_id 203
         state BACKUP         state BACKUP
-        virtual_router_id 203 
         priority 100         priority 100
 +        #nopreempt
         advert_int 1         advert_int 1
- +
         track_script {         track_script {
                 chk_dienst_vip                 chk_dienst_vip
         }         }
- +
         authentication {         authentication {
                 auth_type PASS                 auth_type PASS
                 auth_pass RaBoo9as                 auth_pass RaBoo9as
         }         }
- +
         virtual_ipaddress {         virtual_ipaddress {
                 10.145.43.30/24 dev vlan203                 10.145.43.30/24 dev vlan203
         }         }
-  
- unicast_src_ip 10.145.43.33 
- unicast_peer { 
- 10.145.43.31 
- 10.145.43.32 
- #10.145.43.33 
- } 
-} 
-</file> 
  
-<file bash /root/bin/check_pg.sh> +        #mcast_src_ip 10.145.43.33
-#!/bin/bash +
-  +
-#==============================================================================# +
-# Dieses Skript prüft den Zugriff auf die einzelnen DB-Knoten. +
-# Es wird erst ein Fehler ausgegeben, wenn kein DB-Knoten mehr erreichbar ist. +
-#==============================================================================# +
-  +
-### Wenn PostgreSQL+Patroni nicht mit ETCD, HAProxy und KeepaliveD auf dem selben Host liegen +
-#AUSGABE="$(for DBIP in $(awk '/[[:space:]][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[:]/{gsub("[:]"," "); print $3}' /etc/haproxy/haproxy.cfg) +
-#do +
-#        echo "\l" | psql -U postgres -h ${DBIP} >/dev/null 2>/dev/null && echo "${DBIP}" +
-#done)" +
-  +
-### Nur wenn PostgreSQL+Patroni, ETCD, HAProxy und KeepaliveD auf dem selben Host liegen +
-AUSGABE="$(sudo -u postgres psql -c '\l' >/dev/null 2>/dev/null && hostname -s)"+
  
-# Test +        unicast_src_ip 10.145.43.33 
-#echo "AUSGABE='${AUSGABE}'" +        unicast_peer 
-  +                10.145.43.31 
-if [ x = "x${AUSGABE}" ] ; then +                10.145.43.32 
-        echo "CRIT: kein DB-Knoten ist erreichbar" +                #10.145.43.33 
-        exit 2 +        } 
-else +}
-        echo "OK: ${AUSGABE}" | tr -s '\n' ' '; echo +
-        exit 0 +
-fi+
 </file> </file>
  
Zeile 535: Zeile 569:
    
 #==============================================================================# #==============================================================================#
-# Dieses Skript prüft den Zugriff auf die einzelnen DB-Knoten. +# Dieses Skript prüft, ob der HAProxy läuft und erreichbar ist.
-# Es wird erst ein Fehler ausgegebenwenn kein DB-Knoten mehr erreichbar ist.+
 #==============================================================================# #==============================================================================#
    
-HA_PORT="$(awk '/^[[:space:]]*bind[[:space:]]/{gsub("[:]"," "); print $3}' /etc/haproxy/haproxy.cfg)" +# Prüfeob HAProxy läuft und erreichbar ist 
-SS_AUSGABE="$(ss -antp | grep -F LISTEN | grep -F "haproxy" | grep -E ":${HA_PORT}[[:space:]]")+HOST="127.0.0.1
-HA_AUSGABE="$(links -ssl.certificates 0 -dump http://admin:admin@localhost:8404/stats | grep -E 'pg1|pg2|pg3' | wc -l)"+PORT=5433
  
-if [ x = "x${SS_AUSGABE}" ] ; then +# timeout wichtig, damit Keepalived nicht hängt 
-        echo "CRIT: Port nicht offen" +timeout 1 bash -c "</dev/tcp/$HOST/$PORT>/dev/null 2>&
-        exit 2+ 
 +if [ $? -eq 0 ]; then 
 +    exit 0   # OK → VIP behalten
 else else
- if [ 3 -eq "${HA_AUSGABE}" ] ; then +    exit 1   # Fehler → VIP abgeben
-        echo "OK" +
-        exit +
- else +
-        echo "CRIT: haproxy antwortet nicht richtig" +
-        exit 2 +
- fi+
 fi fi
 </file> </file>
Zeile 564: Zeile 593:
 <code text VIP testen> <code text VIP testen>
 > ping 10.145.43.30 > ping 10.145.43.30
 +</code>
 +
 +
 +=== VRRP-Kommunikation ===
 +
 +<code bash>
 +> # tcpdump -i vlan203 proto 112 -n | grep -F 'vrid 203'
 +tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
 +listening on vlan203, link-type EN10MB (Ethernet), snapshot length 262144 bytes
 +12:51:40.263101 IP 10.145.43.31 > 10.145.43.33: VRRPv2, Advertisement, vrid 203, prio 102, authtype simple, intvl 1s, length 20
 +12:51:41.263202 IP 10.145.43.31 > 10.145.43.33: VRRPv2, Advertisement, vrid 203, prio 102, authtype simple, intvl 1s, length 20
 +12:51:42.263398 IP 10.145.43.31 > 10.145.43.33: VRRPv2, Advertisement, vrid 203, prio 102, authtype simple, intvl 1s, length 20
 +12:51:43.263547 IP 10.145.43.31 > 10.145.43.33: VRRPv2, Advertisement, vrid 203, prio 102, authtype simple, intvl 1s, length 20
 </code> </code>
  
Zeile 578: Zeile 620:
  
 <code bash Netzwerkzugriffe erlauben> <code bash Netzwerkzugriffe erlauben>
-> echo "listen_addresses = '*'" >> /etc/postgresql/16/main/postgresql.conf+> echo "listen_addresses = '*'" >> /etc/postgresql/16/main/pg_hba.conf 
 +> echo "host    replication     all             0.0.0.0/              scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf 
 +> echo "host    replication     all             ::/                   scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf
 > echo "host    all             all             0.0.0.0/              scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf > echo "host    all             all             0.0.0.0/              scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf
 > echo "host    all             all             ::/                   scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf > echo "host    all             all             ::/                   scram-sha-256" >> /etc/postgresql/16/main/pg_hba.conf
Zeile 724: Zeile 768:
     superuser:     superuser:
       username: postgres       username: postgres
-      password: geheim01+      password: my_pg_super_pass
     replication:     replication:
       username: repl       username: repl
-      password: geheim02+      password: AejohTe6
  
 pg_hba: pg_hba:
     - host replication all 0.0.0.0/0 scram-sha-256     - host replication all 0.0.0.0/0 scram-sha-256
 +    - host replication all ::/0 scram-sha-256
     - host all all 0.0.0.0/0 scram-sha-256     - host all all 0.0.0.0/0 scram-sha-256
     - host all all ::/0 scram-sha-256     - host all all ::/0 scram-sha-256
Zeile 781: Zeile 826:
     superuser:     superuser:
       username: postgres       username: postgres
-      password: geheim01+      password: my_pg_super_pass
     replication:     replication:
       username: repl       username: repl
-      password: geheim02+      password: AejohTe6
  
 pg_hba: pg_hba:
     - host replication all 0.0.0.0/0 scram-sha-256     - host replication all 0.0.0.0/0 scram-sha-256
 +    - host replication all ::/0 scram-sha-256
     - host all all 0.0.0.0/0 scram-sha-256     - host all all 0.0.0.0/0 scram-sha-256
     - host all all ::/0 scram-sha-256     - host all all ::/0 scram-sha-256
Zeile 838: Zeile 884:
     superuser:     superuser:
       username: postgres       username: postgres
-      password: geheim01+      password: my_pg_super_pass
     replication:     replication:
       username: repl       username: repl
-      password: geheim02+      password: AejohTe6
  
 pg_hba: pg_hba:
     - host replication all 0.0.0.0/0 scram-sha-256     - host replication all 0.0.0.0/0 scram-sha-256
 +    - host replication all ::/0 scram-sha-256
     - host all all 0.0.0.0/0 scram-sha-256     - host all all 0.0.0.0/0 scram-sha-256
     - host all all ::/0 scram-sha-256     - host all all ::/0 scram-sha-256
Zeile 854: Zeile 901:
 > systemctl disable postgresql > systemctl disable postgresql
 > cp /etc/postgresql/16/main/postgresql.conf /var/lib/postgresql/16/main/ > cp /etc/postgresql/16/main/postgresql.conf /var/lib/postgresql/16/main/
 +> cp /etc/postgresql/16/main/pg_hba.conf /var/lib/postgresql/16/main/
  
 > mkdir /var/lib/postgresql/16/main/conf.d > mkdir /var/lib/postgresql/16/main/conf.d
Zeile 971: Zeile 1019:
 | Member       | Host         | Role    | State   | TL | Lag in MB | | Member       | Host         | Role    | State   | TL | Lag in MB |
 +--------------+--------------+---------+---------+----+-----------+ +--------------+--------------+---------+---------+----+-----------+
-| pg-knoten-01 | pg-knoten-01 | Replica | running |    |   unknown |+| pg-knoten-01 | pg-knoten-03 | Replica | running |    |   unknown |
 +--------------+--------------+---------+---------+----+-----------+ +--------------+--------------+---------+---------+----+-----------+
 </code> </code>
Zeile 978: Zeile 1026:
 > patronictl -c /etc/patroni/config.yml failover > patronictl -c /etc/patroni/config.yml failover
 Current cluster topology Current cluster topology
-+ Cluster: pgcluster (7637152973093863500) ---------+----+-----------+++ Cluster: pgcluster (7637152973093863500) -------+----+-----------+
 | Member       | Host         | Role    | State   | TL | Lag in MB | | Member       | Host         | Role    | State   | TL | Lag in MB |
 +--------------+--------------+---------+---------+----+-----------+ +--------------+--------------+---------+---------+----+-----------+
-| pg-knoten-01 | pg-knoten-01 | Replica | running |    |   unknown |+| pg-knoten-01 | pg-knoten-03 | Replica | running |    |   unknown |
 +--------------+--------------+---------+---------+----+-----------+ +--------------+--------------+---------+---------+----+-----------+
 Candidate ['pg-knoten-01'] []: pg-knoten-01 Candidate ['pg-knoten-01'] []: pg-knoten-01
Zeile 1009: Zeile 1057:
  
 Wenn das so aussieht, dann kann von den Replicas nicht gelesen werden. Wenn das so aussieht, dann kann von den Replicas nicht gelesen werden.
-Um Problem das zu beheben, müssen die Replicas (beide) reinitialisiert werden.+Um das Problem zu beheben, müssen die Replicas (beide) reinitialisiert werden.
  
  
Zeile 1059: Zeile 1107:
 pg3 UP -1 pg3 UP -1
 BACKEND UP -1 BACKEND UP -1
 +</code>
 +
 +
 +=== Probleme nach kompletten Cluster-Neustart ===
 +
 +<code bash IST-Zustand>
 +> patronictl -c /etc/patroni/config.yml list
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+
 +| Member       | Host         | Role    | State   | TL | Lag in MB |
 ++--------------+--------------+---------+---------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Replica | running | 13 |       217 |
 +| pg-knoten-02 | pg-knoten-02 | Replica | running | 13 |       217 |
 +| pg-knoten-03 | pg-knoten-03 | Replica | running | 13 |       217 |
 ++--------------+--------------+---------+---------+----+-----------+
 +</code>
 +
 +<code bash einen zum Leader bestimmen>
 +> patronictl -c /etc/patroni/config.yml failover pgcluster
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+
 +| Member       | Host         | Role    | State   | TL | Lag in MB |
 ++--------------+--------------+---------+---------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Replica | running | 13 |       217 |
 +| pg-knoten-02 | pg-knoten-02 | Replica | running | 13 |       217 |
 +| pg-knoten-03 | pg-knoten-03 | Replica | running | 13 |       217 |
 ++--------------+--------------+---------+---------+----+-----------+
 +Candidate ['pg-knoten-01', 'pg-knoten-02', 'pg-knoten-03'] []: pg-knoten-01
 +Are you sure you want to failover cluster pgcluster? [y/N]: y
 +2026-05-13 13:48:00.52027 Successfully failed over to "pg-knoten-01"
 ++ Cluster: pgcluster (7637598528541987072) -------+----+-----------+
 +| Member       | Host         | Role    | State   | TL | Lag in MB |
 ++--------------+--------------+---------+---------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running | 13 |           |
 +| pg-knoten-02 | pg-knoten-02 | Replica | running | 13 |       217 |
 +| pg-knoten-03 | pg-knoten-03 | Replica | running | 13 |       217 |
 ++--------------+--------------+---------+---------+----+-----------+
 +
 +> patronictl -c /etc/patroni/config.yml list
 ++ Cluster: pgcluster (7637598528541987072) ---------+----+-----------+
 +| Member       | Host         | Role    | State     | TL | Lag in MB |
 ++--------------+--------------+---------+-----------+----+-----------+
 +| pg-knoten-01 | pg-knoten-01 | Leader  | running   | 14 |           |
 +| pg-knoten-02 | pg-knoten-02 | Replica | streaming | 14 |         0 |
 +| pg-knoten-03 | pg-knoten-03 | Replica | streaming | 14 |         0 |
 ++--------------+--------------+---------+-----------+----+-----------+
 </code> </code>
  
Zeile 1227: Zeile 1319:
 //Immer erst mit Dirk Hartmann absprechen!// //Immer erst mit Dirk Hartmann absprechen!//
  
-<file bash /root/bin/Status.sh> +<code bash Datenbanken und User zeigen>
-#!/bin/bash +
- +
-VERSION="v2026051100" # initiale Erstellung +
- +
-# KeepaliveD prüfen +
-service keepalived status >/dev/null && echo "OK: KeepaliveD läuft" || echo "CRIT: KeepaliveD ist tot" +
- +
-# IP + VIP +
-ip a | grep -F 'inet 10.145.43.3' +
- +
-echo "#------------------------------------------------------------------------------#" +
-# PostgreSQL Commits +
-sudo -u postgres psql -t -c "SELECT 'commits=' || SUM(xact_commit) || ', rollbacks=' || SUM(xact_rollback) FROM pg_stat_database;" +
- +
-# Active Connections +
-sudo -u postgres psql -t -c "SELECT COUNT(*) || ' active connections' FROM pg_stat_activity WHERE state='active';" +
- +
-# HAProxy abfragen +
-echo "#------------------------------------------------------------------------------#" +
-#links -ssl.certificates 0 -dump http://admin:admin@localhost:8404/stats +
-curl -s "http://admin:admin@localhost:8404/stats;csv" | grep -E 'pg_leader|pg_replica' | awk -F',' '{print $2, $18, $56}' +
- +
-# Patroni abfragen +
-patronictl -c /etc/patroni/config.yml list +
- +
-# EtcD abfragen +
-etcdctl --endpoints="http://pg-knoten-01:2379,http://pg-knoten-02:2379,http://pg-knoten-03:2379" endpoint health +
- +
-# Patroni abfragen +
-#echo "#------------------------------------------------------------------------------#" +
-#curl -i http://localhost:8008/leader ; echo +
-+
-for HIP in 10.145.43.31 10.145.43.32 10.145.43.33 +
-do +
- echo "#------------------------------------------------------------------------------#" +
- #curl -i http://${HIP}:8008/leader +
- curl -s http://${HIP}:8008/leader | jq . | grep -E 'state|postmaster_start_time|role|scope|name' +
-done +
-</file> +
- +
-<code bash Netzwerkzugriff per HAProxy prüfen>+
 > echo "\l" | psql -h10.145.43.30 -p5433 -U postgres > echo "\l" | psql -h10.145.43.30 -p5433 -U postgres
 Password for user postgres:  Password for user postgres: 
Zeile 1290: Zeile 1341:
  repl      | Replication  repl      | Replication
 </code> </code>
- 
-//Wie das beste Vorgehen bei Wartungsarbeiten ist, muß erst noch ermittelt ausgelotet werden.// 
  
 <code bash> <code bash>
Zeile 1300: Zeile 1349:
 > reboot && exit > reboot && exit
 </code> </code>
 +
  
 ==== Status / Monitoring ==== ==== Status / Monitoring ====
Zeile 1554: Zeile 1604:
  
 <code bash> <code bash>
-[root@pg-knoten-03]---------------------------------------------------------------------[0]-[0]-[2026-05-12 14:26:12]+[root@FRA2C-MEDOS04]---------------------------------------------------------------------[0]-[0]-[2026-05-12 14:26:12]
 [~]# patronictl -c /etc/patroni/config.yml failover [~]# patronictl -c /etc/patroni/config.yml failover
 Current cluster topology Current cluster topology
Zeile 1575: Zeile 1625:
 +--------------+--------------+---------+---------+----+-----------+ +--------------+--------------+---------+---------+----+-----------+
  
-[root@pg-knoten-03]---------------------------------------------------------------------[0]-[0]-[2026-05-12 14:26:55]+[root@FRA2C-MEDOS04]---------------------------------------------------------------------[0]-[0]-[2026-05-12 14:26:55]
 [~]# patronictl -c /etc/patroni/config.yml list [~]# patronictl -c /etc/patroni/config.yml list
 + Cluster: pgcluster (7637598528541987072) ---------+----+-----------+ + Cluster: pgcluster (7637598528541987072) ---------+----+-----------+
Zeile 1622: Zeile 1672:
  
 ===== PostgreSQL-Tuning ===== ===== PostgreSQL-Tuning =====
 +
 +<code bash Daten asynchron einspielen, das geht etwas schneller>
 +> psql -c "SET synchronous_commit=off;"
 +</code>
  
 <code text> <code text>
Zeile 1641: Zeile 1695:
 <file bash /root/bin/PostgreSQL-Tuning.sh> <file bash /root/bin/PostgreSQL-Tuning.sh>
 #!/bin/bash #!/bin/bash
 +
 +#VERSION="v2026051100" # initiale Erstellung
 +VERSION="v2026051300" # jetzt werden die neu berechneten Werte, automatisch in die DB geschrieben
 +
 +#CLUSTER_NAME="pgcluster"
 +CLUSTER_NAME="$(awk '/^scope: /{print $NF}' /etc/patroni/config.yml)"
  
 ### RAM: 1547643 MB ### RAM: 1547643 MB
Zeile 1673: Zeile 1733:
 " "
  
-AUTOVACUUM_WORK_MEM="$(echo "${RAM_IN_MB}"  | awk '{a=$1/10;  b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "\t\t%.0f %s\n", e,f}')" +AUTOVACUUM_WORK_MEM="$(echo "${RAM_IN_MB}"  | awk '{a=$1/10;   b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')" 
-EFFECTIVE_CACHE_SIZE="$(echo "${RAM_IN_MB}" | awk '{a=$1*3/4; b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "\t\t%.0f %s\n", e,f}')" +EFFECTIVE_CACHE_SIZE="$(echo "${RAM_IN_MB}" | awk '{a=$1*3/4;  b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')" 
-MAINTENANCE_WORK_MEM="$(echo "${RAM_IN_MB}" | awk '{a=$1/10;  b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "\t\t%.0f %s\n", e,f}')" +MAINTENANCE_WORK_MEM="$(echo "${RAM_IN_MB}" | awk '{a=$1/100;  b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')" 
-SHARED_BUFFERS="$(echo "${RAM_IN_MB}"       | awk '{a=$1/4;   b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "\t\t%.0f %s\n", e,f}')" +MAX_CONNECTIONS="$(awk '/^processor[[:space:]]+: /{printf "%.0f\n", $NF*11/10}' /proc/cpuinfo | sort -n | tail -n1)" 
-WORK_MEM="$(echo "${RAM_IN_MB}"             | awk '{a=$1/100; b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "\t\t%.0f %s\n", e,f}')"+SHARED_BUFFERS="$(echo "${RAM_IN_MB}"       | awk '{a=$1/4;    b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')" 
 +WORK_MEM="$(echo "${RAM_IN_MB}"             | awk '{a=$1/10000; b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')" 
 +MAX_WAL_SIZE="$(echo "${RAM_IN_MB}"  | awk '{a=$1/10;   b="MB"; c=a/1024; d="GB"; e=a; f=b; if (a>10240) {e=c; f=d} printf "%.0f%s\n", e,f}')"
  
 echo "# empfohlene Werte, die meistens passen: echo "# empfohlene Werte, die meistens passen:
-autovacuum_work_mem: ${AUTOVACUUM_WORK_MEM} (ca. 10% vom RAM)+autovacuum_work_mem:  ${AUTOVACUUM_WORK_MEM} (ca. 10% vom RAM)
 effective_cache_size: ${EFFECTIVE_CACHE_SIZE} (ca. 50-75% vom RAM) effective_cache_size: ${EFFECTIVE_CACHE_SIZE} (ca. 50-75% vom RAM)
-maintenance_work_mem: ${MAINTENANCE_WORK_MEM} (ca. 3-10% vom RAM) +maintenance_work_mem: ${MAINTENANCE_WORK_MEM} (ca. 1-10% vom RAM
-shared_buffers: ${SHARED_BUFFERS} (ca. 10-25% vom RAM) +max_connections:      ${MAX_CONNECTIONS} (ca. CPU-Kerne + 10%
-work_mem: ${WORK_MEM} (ca. 1% vom RAM; Vorsicht, kann zu OOM-Kill führen!)+shared_buffers:       ${SHARED_BUFFERS} (ca. 10-25% vom RAM
 +max_wal_size:         ${MAX_WAL_SIZE} (ca. 10% vom RAM / eigentlich kein Bezug zum RAM, 50-200GB
 +work_mem:             ${WORK_MEM} (ca. 0,1-1% vom RAM; Vorsicht, kann zu OOM-Kill führen!)
 " "
-</file> 
  
-<code bash Konfigurationsvorschlag vorher ansehen> +### neu berechnete Werte, direkt setzen 
-/root/bin/PostgreSQL-Tuning.sh +### patronictl -c /etc/patroni/config.yml edit-config pgcluster 
-RAM1547643 MB+echo "#==============================================================================#" 
 +echo 
 +jq -n \ 
 +  --arg ecs "${EFFECTIVE_CACHE_SIZE}"
 +  --arg mwm "${MAINTENANCE_WORK_MEM}"
 +  --arg mc  "${MAX_CONNECTIONS}"
 +  --arg sb  "${SHARED_BUFFERS}"
 +  --arg mw  "${MAX_WAL_SIZE}"
 +  --arg wm  "${WORK_MEM}"
 +'{ 
 +  postgresql
 +    parameters: { 
 +      effective_cache_size: $ecs, 
 +      maintenance_work_mem: $mwm, 
 +      max_connections: $mc, 
 +      shared_buffers: $sb, 
 +      max_wal_size: $mw, 
 +      wal_buffers: "1GB", 
 +      checkpoint_timeout: "30min", 
 +      work_mem: $wm 
 +    } 
 +  } 
 +}' | curl -s -X PATCH http://localhost:8008/config \ 
 +        -H "Content-Type: application/json"
 +        --data-binary @- 
 +echo 
 +echo "#==============================================================================#"
  
-IST-Werte (ggfautovacuum_work_mem = maintenance_work_mem): +### aktuelle Einstellungen anzeigen 
-effective_cache_size 4096 MB +patronictl -c /etc/patroni/config.yml show-config ${CLUSTER_NAME}
-maintenance_work_mem 64 MB +
-max_connections 100 +
-shared_buffers 128 MB +
-work_mem 4 MB+
  
-empfohlene Werte, die meistens passen: +### neue Einstellungen aktivieren 
-autovacuum_work_mem: 151 GB (ca. 10% vom RAM) +### shared_buffers benötigt leider einen Restart 
-effective_cache_size: 1134 GB (ca. 50-75% vom RAM) +service patroni restart 
-maintenance_work_mem: 151 GB (ca. 3-10% vom RAM) +</file>
-shared_buffers: 378 GB (ca. 10-25% vom RAM) +
-work_mem: 15 GB (ca. 1% vom RAM; Vorsicht, kann zu OOM-Kill führen!) +
-</code> +
- +
-<code bash Patroni konfigurieren> +
-> patronictl -c /etc/patroni/config.yml edit-config pgcluster +
-loop_wait: 10 +
-maximum_lag_on_failover: 1048576 +
-postgresql: +
-  parameters: +
-    hot_standby: 'on' +
-    max_replication_slots: 10 +
-    max_wal_senders: 10 +
-    wal_level: replica +
-    effective_cache_size: 1134GB +
-    maintenance_work_mem: 75GB +
-    shared_buffers: 378GB +
-    work_mem: 8GB +
-  use_pg_rewind: true +
-retry_timeout: 10 +
-ttl: 30 +
- +
-Apply these changes? [y/N]: y +
-Configuration changed +
- +
- +
-service patroni restart +
-</code>+
  
 <code bash RAM: 1547643 MB> <code bash RAM: 1547643 MB>
Zeile 1751: Zeile 1808:
 # IST-Werte (ggf. autovacuum_work_mem = maintenance_work_mem): # IST-Werte (ggf. autovacuum_work_mem = maintenance_work_mem):
 effective_cache_size 1134 GB effective_cache_size 1134 GB
-maintenance_work_mem 75 GB +maintenance_work_mem 151 GB 
-max_connections 100+max_connections 150
 shared_buffers 378 GB shared_buffers 378 GB
-work_mem 8192 MB+work_mem 15 GB
  
 # empfohlene Werte, die meistens passen: # empfohlene Werte, die meistens passen:
-autovacuum_work_mem:  151 GB (ca. 10% vom RAM) +autovacuum_work_mem:  151GB (ca. 10% vom RAM) 
-effective_cache_size:  1134 GB (ca. 50-75% vom RAM) +effective_cache_size: 1134GB (ca. 50-75% vom RAM) 
-maintenance_work_mem:  151 GB (ca. 3-10% vom RAM) +maintenance_work_mem: 151GB (ca. 3-10% vom RAM
-shared_buffers:  378 GB (ca. 10-25% vom RAM) +max_connections:      140 (ca. CPU-Kerne + 10%
-work_mem:  15 GB (ca. 1% vom RAM; Vorsicht, kann zu OOM-Kill führen!)+shared_buffers:       378GB (ca. 10-25% vom RAM) 
 +work_mem:             15GB (ca. 1% vom RAM; Vorsicht, kann zu OOM-Kill führen!) 
 + 
 +#==============================================================================# 
 + 
 +{"loop_wait": 10, "maximum_lag_on_failover": 1048576, "postgresql": {"parameters": {"effective_cache_size": "1134GB", "hot_standby": "on", "maintenance_work_mem": "151GB", "max_replication_slots": 10, "max_wal_senders": 10, "shared_buffers": "378GB", "wal_level": "replica", "work_mem": "15GB", "max_connections": "140"}, "use_pg_rewind": true}, "retry_timeout": 10, "ttl": 30} 
 +#==============================================================================# 
 +loop_wait: 10 
 +maximum_lag_on_failover: 1048576 
 +postgresql: 
 +  parameters: 
 +    effective_cache_size: 1134GB 
 +    hot_standby: 'on' 
 +    maintenance_work_mem: 151GB 
 +    max_connections: '140' 
 +    max_replication_slots: 10 
 +    max_wal_senders: 10 
 +    shared_buffers: 378GB 
 +    wal_level: replica 
 +    work_mem: 15GB 
 +  use_pg_rewind: true 
 +retry_timeout: 10 
 +ttl: 30
 </code> </code>
  
Zeile 1780: Zeile 1859:
 " | sudo -u postgres psql " | sudo -u postgres psql
 </file> </file>
 +
 +<code bash>
 +> /root/bin/CPU-Nutzung.sh
 + pid | backend_type | state 
 +-----+--------------+-------
 +(0 rows)
 +
 +   backend_type    | count 
 +-------------------+-------
 + client backend    |     3
 + startup               1
 + background writer |     1
 + walreceiver           1
 + checkpointer      |     1
 +(5 rows)
 +</code>
 +
 +
 +===== Tests =====
 +
 +<code bash>
 +> time sudo -u postgres pgbench -i -s 1000 testdb
 +dropping old tables...
 +creating tables...
 +generating data (client-side)...
 +100000000 of 100000000 tuples (100%) done (elapsed 85.93 s, remaining 0.00 s)
 +vacuuming...
 +creating primary keys...
 +done in 125.30 s (drop tables 4.04 s, create tables 0.01 s, client-side generate 86.85 s, vacuum 0.84 s, primary keys 33.56 s).
 +
 +real 2m5.365s
 +user 0m0.038s
 +sys 0m0.077s
 +
 +> sudo -u postgres pgbench -c 128 -j 128 -T 120 -N testdb
 +pgbench (16.13 (Ubuntu 16.13-0ubuntu0.24.04.1))
 +starting vacuum...end.
 +transaction type: <builtin: simple update>
 +scaling factor: 1000
 +query mode: simple
 +number of clients: 128
 +number of threads: 128
 +maximum number of tries: 1
 +duration: 120 s
 +number of transactions actually processed: 5956017
 +number of failed transactions: 0 (0.000%)
 +latency average = 2.578 ms
 +initial connection time = 66.949 ms
 +tps = 49656.884083 (without initial connection time)
 +
 +> sudo -u postgres pgbench -c 250 -j 250 -T 120 -N testdb
 +pgbench (16.13 (Ubuntu 16.13-0ubuntu0.24.04.1))
 +starting vacuum...end.
 +transaction type: <builtin: simple update>
 +scaling factor: 1000
 +query mode: simple
 +number of clients: 250
 +number of threads: 250
 +maximum number of tries: 1
 +duration: 120 s
 +number of transactions actually processed: 6463558
 +number of failed transactions: 0 (0.000%)
 +latency average = 4.637 ms
 +initial connection time = 154.766 ms
 +tps = 53915.406631 (without initial connection time)
 +</code>
 +
 +<code bash WAL>
 +> echo "SELECT wal_write_time, wal_sync_time FROM pg_stat_wal;" | sudo -u postgres psql -d testdb
 + wal_write_time | wal_sync_time 
 +----------------+---------------
 +              0 |             0
 +(1 row)
 +</code>
 +
 +<code bash ZFS I/O>
 +> zpool iostat -v
 +               capacity     operations     bandwidth 
 +pool         alloc   free   read  write   read  write
 +-----------  -----  -----  -----  -----  -----  -----
 +pg_datadir   9.12G  2.90T      0  1.32K    276  29.3M
 +  mirror-0   9.12G  2.90T      0  1.32K    276  29.3M
 +    nvme2n1      -      -      0    673    138  14.6M
 +    nvme3n1      -      -      0    676    138  14.6M
 +-----------  -----  -----  -----  -----  -----  -----
 +</code>
 +
 +
 +==== Testdaten einspielen ====
 +
 +<code bash>
 +> echo "COPY tabelle FROM '/tmp/export_mit_header.tsv' WITH (FORMAT csv, DELIMITER E'\t', HEADER MATCH, NULL '');" | psql -U dbuser -d testdb
 +
 +oder
 +
 +> cat /tmp/export_mit_header.tsv | psql -d testdb -U dbuser -c "COPY tabelle FROM STDIN WITH (FORMAT csv, DELIMITER E'\t', HEADER MATCH, NULL '');"
 +</code>
 +
 +<code bash Datensätze zählen>
 +> echo "SELECT COUNT(*) AS Tabellenname FROM tabelle;" | psql -d testdb -U dbuser
 +</code>
 +
 +
 +===== Verbindungen SSL/TLS-Verschlüsselt =====
 +
 +
 +==== Verschlüsselung konfigurieren ====
 +
 +<code bash postgresql.conf>
 +> patronictl -c /etc/patroni/config.yml edit-config pgcluster
 +loop_wait: 10
 +maximum_lag_on_failover: 1048576
 +postgresql:
 +  parameters:
 +    hot_standby: 'on'
 +    max_replication_slots: 10
 +    max_wal_senders: 10
 +    wal_level: replica
 +    ssl: on
 +    ssl_ca_file: '/etc/ssl/server-ca.pem'
 +    ssl_cert_file: '/etc/ssl/server-cert.pem'
 +    ssl_key_file: '/etc/ssl/server-key.pem'
 +    ssl_min_protocol_version: 'TLSv1.2'
 +    ssl_ciphers: 'HIGH:!aNULL:!3DES:!MD5:!DES:!MEDIUM'
 +    ssl_prefer_server_ciphers: on
 +  use_pg_rewind: true
 +retry_timeout: 10
 +ttl: 30
 +</code>
 +
 +
 +==== Verschlüsselung erzwingen ====
 +
 +<code bash pg_hba.conf>
 +> vim /etc/patroni/config.yml
 +...
 +pg_hba:
 +    - hostnossl all all 0.0.0.0/0 reject
 +    - hostnossl all all ::/0 reject
 +    - hostssl replication all 0.0.0.0/0 scram-sha-256
 +    - hostssl replication all ::/0 scram-sha-256
 +    - hostssl all all 0.0.0.0/0 scram-sha-256
 +    - hostssl all all ::/0 scram-sha-256
 +</code>
 +
 +
 +==== Verschlüsselung aktivieren ====
 +
 +<code bash kompletten Cluster restarten>
 +> patronictl -c /etc/patroni/config.yml restart pgcluster
 +</code>
 +
 +
 +===== temp_tablespaces =====
 +
 +<code bash temp_tablespaces-Verzeichnis anlegen>
 +> mkdir -p /var/lib/postgresql/temp_tablespaces
 +> chown postgres:postgres /var/lib/postgresql/temp_tablespaces
 +> chmod 0700 /var/lib/postgresql/temp_tablespaces
 +</code>
 +
 +<code bash temp_tablespaces konfigurieren>
 +> echo "SELECT spcname, pg_tablespace_location(oid) FROM pg_tablespace WHERE spcname = 'fasttmp';" | sudo -u postgres psql
 +> echo "CREATE TABLESPACE fasttmp LOCATION '/var/lib/postgresql/temp_tablespaces';" | sudo -u postgres psql
 +> echo "SELECT spcname, pg_tablespace_location(oid) FROM pg_tablespace WHERE spcname = 'fasttmp';" | sudo -u postgres psql
 +
 +> patronictl -c /etc/patroni/config.yml edit-config
 +...
 +postgresql:
 +  parameters:
 +    temp_file_limit: '-1'
 +    temp_tablespaces: fasttmp
 +    ...
 +</code>
  
  
/home/http/wiki/data/attic/datenbank/ha-postgresql-cluster.1778611742.txt · Zuletzt geändert: von manfred