Hi, I previously complained about analyze keeping a snapshot while running in: http://archives.postgresql.org/message-id/20141018174909.GA5790%40alap3.anarazel.de
since then I've been bitten by that, and I've seen other people being bitten by it. on a scale 400 database (so analyze actually takes a while), using unlogged tables (for fewer unrelated effects) I see this: latency average: 2.665 ms latency stddev: 2.628 ms tps = 18002.712356 (including connections establishing) tps = 18004.177513 (excluding connections establishing) and there's many slumps like: progress: 104.0 s, 8161.9 tps, lat 5.878 ms stddev 3.431 progress: 105.0 s, 7936.3 tps, lat 5.990 ms stddev 2.978 progress: 106.0 s, 8003.9 tps, lat 6.047 ms stddev 5.269 progress: 107.0 s, 18609.6 tps, lat 2.582 ms stddev 2.858 progress: 108.0 s, 19227.0 tps, lat 2.496 ms stddev 2.715 these slumps coincide with moments where SELECT max(age(backend_xmin)) FROM pg_stat_activity WHERE backend_xmin IS NOT NULL; is high: 198551 hackishly pop'ing the snapshot during the bulk of the work in acquire_sample_rows results in: latency average: 2.325 ms latency stddev: 2.522 ms tps = 20634.532401 (including connections establishing) tps = 20636.481548 (excluding connections establishing) with the slowest 1 second interval being 19936.8 tps. So that's a pretty clear improvement. It's trivial to make this much more extreme by using bigger scales and/or different vacuum settings. I just don't want to do so on my poor laptop. It's not surprising that a long running analyze with a snapshot held causes problems for an update heavy OLTP workload. It's trivially reproducible without involving analyze by simply holding up a transaction with a snapshot. acquire_sample_rows just uses HeapTupleSatisfiesVacuum() to determine visibility, so it itself doesn't actually need (or use) a snapshot while acquiring the sample. But I think externally toasted tuples might prevent optimizing this, we'll potentially detoast them when computing the stats... Does anybody have a good idea how to handle toasted tuples? A simple approach would be to acquire a snapshot and re-check visibility for after each row with external datums after acquiring the sample. Our sample is already skewed over the runtime of acquire_sample_rows due to our usage of HTSV so I'm not particularly concerned about that part - but it'd also reduce the size of the sample which is a bit worrysome. Another angle would be trying to reduce the effects of longrunning transaction. Right now holding a snapshot open for 100 seconds results in profiles like this: After a 100 seconds of holding a snapshot a profile looks like: + 22.13% postgres postgres [.] heap_hot_search_buffer + 11.58% postgres postgres [.] hash_search_with_hash_value + 11.58% postgres postgres [.] XidInMVCCSnapshot + 4.58% postgres postgres [.] heap_page_prune_opt + 4.43% postgres postgres [.] PinBuffer + 3.95% postgres postgres [.] LWLockAcquire + 2.49% postgres postgres [.] heap_hot_search + 1.72% postgres postgres [.] HeapTupleSatisfiesMVCC + 1.61% postgres postgres [.] tbm_iterate + 0.99% postgres postgres [.] pg_qsort + 0.90% postgres postgres [.] LWLockRelease which is pretty extreme. It's not such a seldom thing to hold a snapshot (e.g. pg_dump...) open for a while. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers