On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <and...@anarazel.de> wrote: > > > > > > A separate issue is that TransactionIdDidAbort() can end up being very > > > slow if > > > a lot of transactions are in progress concurrently. As soon as the clog > > > buffers are extended all time is spent copying pages from the kernel > > > pagecache. I'd not at all be surprised if this changed causes a > > > substantial > > > slowdown in workloads with lots of small transactions, where most > > > transactions > > > commit. > > > > > > > Indeed. So it should check the transaction status less frequently. It > > doesn't benefit much even if we can skip collecting decoded changes of > > small transactions. Another idea is that we check the status of only > > large transactions. That is, when the size of decoded changes of an > > aborted transaction exceeds logical_decoding_work_mem, we mark it as > > aborted , free its changes decoded so far, and skip further > > collection. > > > > Your idea might work for large transactions but I have not come across > reports where this is reported as a problem. Do you see any such > reports and can we see how much is the benefit with large > transactions? Because we do have the handling of concurrent aborts > during sys table scans and that might help sometimes for large > transactions.
I've heard there was a case where a user had 29 million deletes in a single transaction with each one wrapped in a savepoint and rolled it back, which led to 11TB of spill files. If decoding such a large transaction fails for some reasons (e.g. a disk full), it would try decoding the same transaction again and again. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com