Scenario: “Manhattan”: can’t write data into database.
Level: Medium
Type: Fix
Tags: disk volumes postgres realistic-interviews
Description: Your objective is to be able to insert a row in an existing Postgres database. The issue is not specific to Postgres and you don’t need to know details about it (although it may help).
Helpful Postgres information: it’s a service that listens to a port (:5432) and writes to disk in a data directory, the location of which is defined in the data_directory parameter of the configuration file /etc/postgresql/14/main/postgresql.conf
. In our case Postgres is managed by systemd as a unit with name postgresql.
Test: (from default admin user) sudo -u postgres psql -c "insert into persons(name) values ('jane smith');" -d dt
Should return: INSERT 0 1
Time to Solve: 20 minutes.
On entre ici dans les scénarios de niveau intermédiaire proposés sur sadservers.com.
On a une commande qui doit fonctionner pour résoudre le challenge, on va donc la lancer pour voir pourquoi ça ne passe pas.
1
2
3
root@i-0b8ce19730a071b11:/# sudo -u postgres psql -c "insert into persons(name) values ('jane smith');" -d dt
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
Le client ne parvient pas à se connecter au socket serveur qui est de type Unix.
Je regarde le fichier de configuration de Postgresql. Ce dernier est plein de lignes commentées, mais avec l’aide de grep
je peux faire le tri.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
root@i-0b8ce19730a071b11:/# cat /etc/postgresql/14/main/postgresql.conf | grep -v "^\s*#" | grep -v "^$"
data_directory = '/opt/pgdata/main' # use data in another directory
hba_file = '/etc/postgresql/14/main/pg_hba.conf' # host-based authentication file
ident_file = '/etc/postgresql/14/main/pg_ident.conf' # ident configuration file
external_pid_file = '/var/run/postgresql/14-main.pid' # write an extra PID file
port = 5432 # (change requires restart)
max_connections = 100 # (change requires restart)
unix_socket_directories = '/var/run/postgresql' # comma-separated list of directories
ssl = on
ssl_cert_file = '/etc/ssl/certs/ssl-cert-snakeoil.pem'
ssl_key_file = '/etc/ssl/private/ssl-cert-snakeoil.key'
shared_buffers = 128MB # min 128kB
dynamic_shared_memory_type = posix # the default is the first option
max_wal_size = 1GB
min_wal_size = 80MB
log_line_prefix = '%m [%p] %q%u@%d ' # special values:
log_timezone = 'Etc/UTC'
cluster_name = '14/main' # added to process titles if nonempty
stats_temp_directory = '/var/run/postgresql/14-main.pg_stat_tmp'
datestyle = 'iso, mdy'
timezone = 'Etc/UTC'
lc_messages = 'C.UTF-8' # locale for system error message
lc_monetary = 'C.UTF-8' # locale for monetary formatting
lc_numeric = 'C.UTF-8' # locale for number formatting
lc_time = 'C.UTF-8' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
include_dir = 'conf.d' # include files ending in '.conf' from
On voit un numéro de port et une mention des sockets Unix. Ça semble raccord avec le nom de fichier auquel tente d’accéder le client.
A tout hasard on peut vérifier les ports TCP :
1
2
3
4
5
6
root@i-0b8ce19730a071b11:/# ss -lntp
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=616,fd=3))
LISTEN 0 128 *:6767 *:* users:(("sadagent",pid=590,fd=7))
LISTEN 0 128 *:8080 *:* users:(("gotty",pid=582,fd=6))
LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=616,fd=4))
Rien du tout ici. Et comme on s’y attend, rien de plus pour le socket Unix :
1
2
root@i-0b8ce19730a071b11:/# ls /var/run/postgresql/
14-main.pg_stat_tmp
En fin de compte, est-ce que PostgreSQL tourne ?
1
2
root@i-0b8ce19730a071b11:/# ps aux | grep -i postgr
root 890 0.0 0.1 4964 820 pts/0 S+ 09:09 0:00 grep -i postgr
Non. Voyons voir la liste des unités systemd avec la commande systemctl list-units
:
1
2
postgresql.service loaded active exited PostgreSQL RDBMS
● postgresql@14-main.service loaded failed failed PostgreSQL Cluster 14-main
Il y a deux services dont l’un qui est en échec. On va se renseigner sur le status de chacun.
1
2
3
4
5
6
7
8
9
root@i-0b8ce19730a071b11:/# systemctl status postgresql.service
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Sat 2024-03-02 09:02:30 UTC; 9min ago
Process: 670 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 670 (code=exited, status=0/SUCCESS)
Mar 02 09:02:30 i-0b8ce19730a071b11 systemd[1]: Starting PostgreSQL RDBMS...
Mar 02 09:02:30 i-0b8ce19730a071b11 systemd[1]: Started PostgreSQL RDBMS.
On voit sur le second service que la création d’un fichier échoue en raison d’un disque plein :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
root@i-0b8ce19730a071b11:/# systemctl status postgresql@14-main.service
● postgresql@14-main.service - PostgreSQL Cluster 14-main
Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled)
Active: failed (Result: protocol) since Sat 2024-03-02 09:12:49 UTC; 1min 45s ago
Process: 901 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect 14-main start (code=exited, status=1/FAILURE)
Mar 02 09:12:49 i-0b8ce19730a071b11 systemd[1]: Starting PostgreSQL Cluster 14-main...
Mar 02 09:12:49 i-0b8ce19730a071b11 postgresql@14-main[901]: Error: /usr/lib/postgresql/14/bin/pg_ctl /usr/lib/postgresql/14/bin/pg_ctl start -D /opt/pgdata/main -l /var/log/postgresql/postgresql-14-main.log -s
Mar 02 09:12:49 i-0b8ce19730a071b11 postgresql@14-main[901]: 2024-03-02 09:12:49.122 UTC [906] FATAL: could not create lock file "postmaster.pid": No space left on device
Mar 02 09:12:49 i-0b8ce19730a071b11 postgresql@14-main[901]: pg_ctl: could not start server
Mar 02 09:12:49 i-0b8ce19730a071b11 postgresql@14-main[901]: Examine the log output.
Mar 02 09:12:49 i-0b8ce19730a071b11 systemd[1]: postgresql@14-main.service: Can't open PID file /run/postgresql/14-main.pid (yet?) after start: No such file or directory
Mar 02 09:12:49 i-0b8ce19730a071b11 systemd[1]: postgresql@14-main.service: Failed with result 'protocol'.
Mar 02 09:12:49 i-0b8ce19730a071b11 systemd[1]: Failed to start PostgreSQL Cluster 14-main.
J’ai pu récupérer la commande exacte que le service tente de lancer :
1
2
3
/usr/lib/postgresql/14/bin/pg_ctl start -D /opt/pgdata/main \
-l /var/log/postgresql/postgresql-14-main.log -s -o \
-c 'config_file="/etc/postgresql/14/main/postgresql.conf"'
Voici l’aide pour l’explication des options :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
postgres@i-0b8ce19730a071b11:/$ /usr/lib/postgresql/14/bin/pg_ctl --help
pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.
Usage:
pg_ctl init[db] [-D DATADIR] [-s] [-o OPTIONS]
pg_ctl start [-D DATADIR] [-l FILENAME] [-W] [-t SECS] [-s]
[-o OPTIONS] [-p PATH] [-c]
pg_ctl stop [-D DATADIR] [-m SHUTDOWN-MODE] [-W] [-t SECS] [-s]
pg_ctl restart [-D DATADIR] [-m SHUTDOWN-MODE] [-W] [-t SECS] [-s]
[-o OPTIONS] [-c]
pg_ctl reload [-D DATADIR] [-s]
pg_ctl status [-D DATADIR]
pg_ctl promote [-D DATADIR] [-W] [-t SECS] [-s]
pg_ctl logrotate [-D DATADIR] [-s]
pg_ctl kill SIGNALNAME PID
Common options:
-D, --pgdata=DATADIR location of the database storage area
-s, --silent only print errors, no informational messages
-t, --timeout=SECS seconds to wait when using -w option
-V, --version output version information, then exit
-w, --wait wait until operation completes (default)
-W, --no-wait do not wait until operation completes
-?, --help show this help, then exit
If the -D option is omitted, the environment variable PGDATA is used.
Options for start or restart:
-c, --core-files allow postgres to produce core files
-l, --log=FILENAME write (or append) server log to FILENAME
-o, --options=OPTIONS command line options to pass to postgres
(PostgreSQL server executable) or initdb
-p PATH-TO-POSTGRES normally not necessary
Options for stop or restart:
-m, --mode=MODE MODE can be "smart", "fast", or "immediate"
Shutdown modes are:
smart quit after all clients have disconnected
fast quit directly, with proper shutdown (default)
immediate quit without complete shutdown; will lead to recovery on restart
Allowed signal names for kill:
ABRT HUP INT KILL QUIT TERM USR1 USR2
Report bugs to <pgsql-bugs@lists.postgresql.org>.
PostgreSQL home page: <https://www.postgresql.org/>
Lançons la commande directement pour voir si on reproduit puis jetons un œil aux disques :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
postgres@i-0ae5a99cc5d8a1de9:/$ /usr/lib/postgresql/14/bin/pg_ctl start -D /opt/pgdata/main -l /var/log/postgresql/postgresql-14-main.log -s -o '-c config_file="/etc/postgresql/14/main/postgresql.conf"'
pg_ctl: could not start server
Examine the log output.
postgres@i-0ae5a99cc5d8a1de9:/$ tail /var/log/postgresql/postgresql-14-main.log
2024-03-02 09:34:19.689 UTC [904] FATAL: could not create lock file "postmaster.pid": No space left on device
postgres@i-0ae5a99cc5d8a1de9:/$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 224M 0 224M 0% /dev
tmpfs 47M 1.5M 46M 4% /run
/dev/nvme1n1p1 7.7G 1.2G 6.1G 17% /
tmpfs 233M 0 233M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 233M 0 233M 0% /sys/fs/cgroup
/dev/nvme1n1p15 124M 278K 124M 1% /boot/efi
/dev/nvme0n1 8.0G 8.0G 28K 100% /opt/pgdata
tmpfs 47M 0 47M 0% /run/user/108
Où se trouve normalement ce fichier postmaster.pid
? D’après cette documentation :
Tant que le serveur est lancé, son pid est stocké dans le fichier
postmaster.pid
du répertoire de données. C’est utilisé pour empêcher plusieurs instances du serveur d’être exécutées dans le même répertoire de données et peut aussi être utilisé pour arrêter le processus le serveur.
Par conséquent, ça correspond bien au dossier /opt/pgdata
qui est plein.
1
2
3
4
5
6
7
8
9
10
11
12
13
postgres@i-0ae5a99cc5d8a1de9:/$ cd /opt/pgdata
postgres@i-0ae5a99cc5d8a1de9:/opt/pgdata$ du -h --max-depth 1
50M ./main
8.0G .
postgres@i-0ae5a99cc5d8a1de9:/opt/pgdata$ ls -alh
total 8.0G
drwxr-xr-x 3 postgres postgres 82 May 21 2022 .
drwxr-xr-x 3 root root 4.0K May 21 2022 ..
-rw-r--r-- 1 root root 69 May 21 2022 deleteme
-rw-r--r-- 1 root root 7.0G May 21 2022 file1.bk
-rw-r--r-- 1 root root 923M May 21 2022 file2.bk
-rw-r--r-- 1 root root 488K May 21 2022 file3.bk
drwx------ 19 postgres postgres 4.0K May 21 2022 main
Il y a trois fichiers .bk
qui prennent de la place. Il suffit de les supprimer. Après ça le service fonctionne normalement :
1
2
3
root@i-0ae5a99cc5d8a1de9:/# systemctl restart postgresql.service
root@i-0ae5a99cc5d8a1de9:/# sudo -u postgres psql -c "insert into persons(name) values ('jane smith');" -d dt
INSERT 0 1