On 09. 04. 2018 15:42, Tomas Vondra wrote: > On 04/09/2018 12:29 AM, Bruce Momjian wrote: >> An crazy idea would be to have a daemon that checks the logs and >> stops Postgres when it seems something wrong. >> > That doesn't seem like a very practical way. It's better than nothing, > of course, but I wonder how would that work with containers (where I > think you may not have access to the kernel log at all). Also, I'm > pretty sure the messages do change based on kernel version (and possibly > filesystem) so parsing it reliably seems rather difficult. And we > probably don't want to PANIC after I/O error on an unrelated device, so > we'd need to understand which devices are related to PostgreSQL. > > regards >
For a bit less (or more) crazy idea, I'd imagine creating a Linux kernel module with kprobe/kretprobe capturing the file passed to fsync or even byte range within file and corresponding return value shouldn't be that hard. Kprobe has been a part of Linux kernel for a really long time, and from first glance it seems like it could be backported to 2.6 too. Then you could have stable log messages or implement some kind of "fsync error log notification" via whatever is the most sane way to get this out of kernel. If the kernel is new enough and has eBPF support (seems like >=4.4), using bcc-tools[1] should enable you to write a quick script to get exactly that info via perf events[2]. Obviously, that's a stopgap solution ... Kind regards, Gasper [1] https://github.com/iovisor/bcc [2] https://blog.yadutaf.fr/2016/03/30/turn-any-syscall-into-event-introducing-ebpf-kernel-probes/