I've tried to rule out a possible data corruption issue, server is using ECC RAM modules, checksums are enabled on database. I've tried rebuilding indexes, running VACUUM FULL, switching to replica etc.
Anyway the issues was probably caused by a faulty NVMe adapter: kernel: pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) kernel: pcieport 0000:00:01.1: device [8086:1905] error status/mask=00001000/00002000 kernel: pcieport 0000:00:01.1: [12] Timeout kernel: pcieport 0000:00:01.1: Error of this Agent is reported first kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) kernel: nvme 0000:02:00.0: device [144d:a808] error status/mask=000000c1/0000e000 kernel: nvme 0000:02:00.0: [ 0] RxErr (First) somehow the database on disk got corrupted and it can't be fixed without losing some data. You can close the ticket, sorry for any inconvenience caused. On Tue, 7 Sept 2021 at 17:57, Tomas Barton <[email protected]> wrote: > Here's an attempt to make the bug reproducible. Unfortunately I'm not able > to reproduce the issue with generated data. > > dropdb testdb || true > createdb -E UTF8 testdb > cat <<EOF > stress.sql > CREATE TABLE "public".downloaded_images ( > itemid text NOT NULL, > property text NOT NULL, > image_number integer DEFAULT 0 NOT NULL, > filename text, > download_time timestamp without time zone DEFAULT CURRENT_TIMESTAMP, > failures_count integer DEFAULT 0 > ); > > > INSERT INTO "public".downloaded_images(itemid, property, filename, > download_time) > SELECT md5(RANDOM()::TEXT), 'categoryPagePhotoUrl', md5(RANDOM()::TEXT), > NOW() > FROM generate_series(1, 100000); > EOF > > cat <<EOF > evil.sql > BEGIN; > > CREATE TABLE IF NOT EXISTS "vgg16_fc1" > (itemId TEXT, > embedding_number INT DEFAULT 0, > embedding JSONB, > weight NUMERIC DEFAULT 1, > last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP, > additional_data JSON, > PRIMARY KEY(itemId, embedding_number) > ); > > CREATE INDEX IF NOT EXISTS "last_update_vgg16_fc1" ON > "vgg16_fc1" > USING btree ("last_update"); > > > DECLARE "test-cursor-vgg16_fc1" CURSOR WITH HOLD FOR > SELECT di.itemId, image_number, filename FROM (SELECT * > FROM "public".downloaded_images > WHERE property='categoryPagePhotoUrl' AND filename IS NOT NULL) > di > LEFT JOIN (SELECT itemId, MIN(last_update) as last_update FROM > "vgg16_fc1" GROUP BY itemId) computed ON di.itemId=computed.itemId > WHERE COALESCE(last_update, '1970-01-01') < download_time; > > > FETCH 10000 IN "test-cursor-vgg16_fc1"; > > > COMMIT; > EOF > psql -d testdb -f stress.sql > psql -d testdb -f evil.sql > > Anytime I run the evil.sql, it crashes the server. > > BEGIN > CREATE TABLE > CREATE INDEX > DECLARE CURSOR > psql:evil.sql:28: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > psql:evil.sql:28: fatal: connection to server was lost > > > On Tue, 7 Sept 2021 at 14:09, Christoph Berg <[email protected]> wrote: > >> Re: Tomas Barton >> > a slightly sophisticated SELECT query with a CURSOR can lead to >> > postgresql server segmentation fault. >> > >> > LOG: server process (PID 10722) was terminated by signal 11: >> Segmentation >> > fault >> > DETAIL: Failed process was running: COMMIT >> >> > I'll try to make an reproducable code, let me known if you need more >> > information. >> > >> > The query might be a bit nasty, but it shouldn't crash whole server. >> >> Can you share the query and the schema? >> >> Christoph >> >

