Re: strange perf regression with data checksums

2025-06-11 Thread Peter Geoghegan
On Tue, Jun 10, 2025 at 3:00 PM Peter Geoghegan wrote: > I have confirmed that this flavor of the problem has existed for a > long time. We'll need to backpatch the fix to all supported branches. > Current plan: Commit 0001 on all supported branches in the next couple > of days. Unless I hear obj

Re: strange perf regression with data checksums

2025-06-10 Thread Peter Geoghegan
On Mon, Jun 9, 2025 at 10:47 AM Peter Geoghegan wrote: > As I said, I think that this is actually an old bug. After all, > _bt_killitems is required to test so->currPos.lsn against the same > page's current LSN if the page pin was dropped since _bt_readpage was > first called -- regardless of any

Re: strange perf regression with data checksums

2025-06-09 Thread Peter Geoghegan
On Mon, Jun 9, 2025 at 10:47 AM Peter Geoghegan wrote: > I've always thought that the whole idiom of testing > BTScanPosIsPinned() in a bunch of places was very brittle. I wonder if > it makes sense to totally replace those calls/tests with similar tests > of the new so->dropPin field. It's defin

Re: strange perf regression with data checksums

2025-06-09 Thread Peter Geoghegan
On Mon, Jun 9, 2025 at 8:48 AM Mihail Nikalayeu wrote: > I was rebasing [0] and noticed dropPin is not initialized in > btbeginscan, which seems to be suspicious for me. > Is it intended? That's pretty normal. We don't have access to the scan descriptor within btbeginscan. This is also why we do

Re: strange perf regression with data checksums

2025-06-09 Thread Peter Geoghegan
Hi Alexander, On Mon, Jun 9, 2025 at 2:00 AM Alexander Lakhin wrote: > Please look at the following script, which falsifies an assertion > TRAP: failed Assert("!BTScanPosIsPinned(so->currPos)"), File: "nbtutils.c", > Line: 3379, PID: 1621028 I can reproduce this. I think that it's an old bug.

Re: strange perf regression with data checksums

2025-06-09 Thread Mihail Nikalayeu
Hello, Peter! > > I also relocated the code that sets so.drop_pin from > _bt_preprocess_keys to btrescan. That seems like the more logical > place for it. > I was rebasing [0] and noticed dropPin is not initialized in btbeginscan, which seems to be suspicious for me. Is it intended? [0]: https:/

Re: strange perf regression with data checksums

2025-06-08 Thread Alexander Lakhin
Hello Peter, 06.06.2025 19:33, Peter Geoghegan wrote: On Wed, Jun 4, 2025 at 1:39 PM Peter Geoghegan wrote: My current plan is to commit this in the next couple of days. Pushed. Please look at the following script, which falsifies an assertion introduced with e6eed40e4: create user u; grant

Re: strange perf regression with data checksums

2025-06-06 Thread Peter Geoghegan
On Wed, Jun 4, 2025 at 1:39 PM Peter Geoghegan wrote: > My current plan is to commit this in the next couple of days. Pushed. It would be great if we could also teach BufferGetLSNAtomic() to just call PageGetLSN() (at least on PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY platforms), but my sense is that

Re: strange perf regression with data checksums

2025-06-04 Thread Peter Geoghegan
On Wed, Jun 4, 2025 at 10:21 AM Peter Geoghegan wrote: > On Wed, Jun 4, 2025 at 7:33 AM Tomas Vondra wrote: > > So better to get this in now, otherwise we may have to wait until PG19, > > because of ABI (the patch adds a field into BTScanPosData, but maybe > > it'd be possible to add it into padd

Re: strange perf regression with data checksums

2025-06-04 Thread Peter Geoghegan
On Wed, Jun 4, 2025 at 7:33 AM Tomas Vondra wrote: > So better to get this in now, otherwise we may have to wait until PG19, > because of ABI (the patch adds a field into BTScanPosData, but maybe > it'd be possible to add it into padding, not sure). I agree. I can get this into shape for commit t

Re: strange perf regression with data checksums

2025-06-04 Thread Tomas Vondra
Hi, We had a RMT meeting yesterday, and we briefly discussed whether the v2 patch should go into PG18. And we concluded we probably should try to do that. On the one hand, it's not a PG18-specific bug, in the sense that older releases are affected the same way. But it requires data checksums to b

Re: strange perf regression with data checksums

2025-05-21 Thread Peter Geoghegan
On Wed, May 21, 2025 at 12:29 PM Andres Freund wrote: > > The relevant nbtree code seems more elegant if we avoid calling > > BufferGetLSNAtomic() unless and until its return value might actually > > be useful. It's quite a lot easier to understand the true purpose of > > so->currPos.lsn with this

Re: strange perf regression with data checksums

2025-05-21 Thread Andres Freund
Hi, On 2025-05-19 15:45:01 -0400, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 3:37 PM Andres Freund wrote: > > I think we can do better - something like > > > > #ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY > > lsn = PageGetLSN(page); > > #else > > buf_state = LockBufHdr(bufHdr); > >

Re: strange perf regression with data checksums

2025-05-20 Thread Peter Geoghegan
On Tue, May 20, 2025 at 8:43 AM Peter Geoghegan wrote: > I imagine bitmap index scans will be similar to plain index scans. To be clear, I meant that the magnitude of the problem will be similar. I do not mean that they aren't fixed by my v2. They should be fully fixed by v2. -- Peter Geoghegan

Re: strange perf regression with data checksums

2025-05-20 Thread Peter Geoghegan
On Mon, May 19, 2025 at 8:21 PM Tomas Vondra wrote: > With v2 the regression disappears, both for index-only scans and regular > index scans. I haven't tried anything with bitmap scans - I hit the > regression mostly by accident, on a workload that does IOS only. I may > try constructing something

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
On 5/19/25 22:29, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 4:17 PM Tomas Vondra wrote: >> Same effect as v1 for IOS, with regular index scans I see this: >> >> 64 clients: 0.7M tps >> 96 clients: 1.5M tps >> >> So very similar improvement as for IOS. > > Just to be clear, you're saying my

Re: strange perf regression with data checksums

2025-05-19 Thread Peter Geoghegan
On Mon, May 19, 2025 at 4:17 PM Tomas Vondra wrote: > Same effect as v1 for IOS, with regular index scans I see this: > > 64 clients: 0.7M tps > 96 clients: 1.5M tps > > So very similar improvement as for IOS. Just to be clear, you're saying my v2 appears to fix the problem fully, for both plain

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
On 5/19/25 20:44, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 2:19 PM Peter Geoghegan wrote: >> On Mon, May 19, 2025 at 2:01 PM Tomas Vondra wrote: >>> The regular index scan however still have this issue, although it's not >>> as visible as for IOS. >> >> We can do somewhat better with plai

Re: strange perf regression with data checksums

2025-05-19 Thread Peter Geoghegan
On Mon, May 19, 2025 at 3:37 PM Andres Freund wrote: > I think we can do better - something like > > #ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY > lsn = PageGetLSN(page); > #else > buf_state = LockBufHdr(bufHdr); > lsn = PageGetLSN(page); > UnlockBufHdr(bufHdr, buf_state); > #endif

Re: strange perf regression with data checksums

2025-05-19 Thread Andres Freund
Hi, On 2025-05-19 18:13:02 +0200, Tomas Vondra wrote: > I believe enabling data checksums simply makes it more severe, because > the BufferGetLSNAtomic() has to obtain header lock, which uses the same > "state" field, with exactly the same retry logic. It can probably happen > even without it, but

Re: strange perf regression with data checksums

2025-05-19 Thread Peter Geoghegan
On Mon, May 19, 2025 at 2:19 PM Peter Geoghegan wrote: > On Mon, May 19, 2025 at 2:01 PM Tomas Vondra wrote: > > The regular index scan however still have this issue, although it's not > > as visible as for IOS. > > We can do somewhat better with plain index scans than my initial v1 > prototype,

Re: strange perf regression with data checksums

2025-05-19 Thread Peter Geoghegan
On Mon, May 19, 2025 at 2:01 PM Tomas Vondra wrote: > For index-only scans, yes. Great. > The regular index scan however still have this issue, although it's not > as visible as for IOS. We can do somewhat better with plain index scans than my initial v1 prototype, without any major difficultie

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
On 5/19/25 19:20, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 12:42 PM Peter Geoghegan wrote: >> We don't actually need to call BufferGetLSNAtomic() from _bt_readpage >> during index-only scans (nor during bitmap index scans). We can just >> not call BufferGetLSNAtomic() at all (except dur

Re: strange perf regression with data checksums

2025-05-19 Thread Peter Geoghegan
On Mon, May 19, 2025 at 12:42 PM Peter Geoghegan wrote: > We don't actually need to call BufferGetLSNAtomic() from _bt_readpage > during index-only scans (nor during bitmap index scans). We can just > not call BufferGetLSNAtomic() at all (except during plain index > scans), with no possible downsi

Re: strange perf regression with data checksums

2025-05-19 Thread Peter Geoghegan
On Fri, May 9, 2025 at 9:06 AM Tomas Vondra wrote: > Good question. I haven't checked that explicitly, but it's a tiny data > set (15MB) and I observed this even on long benchmarks with tens of > millions of queries. So the hint bits should have been set. > > Also, I should have mentioned the quer

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
Hi, I looked at this again, and I think the reason is mostly obvious. Both why it's trashing, and why it happens with checksums=on ... The reason why it happens is that PinBuffer does this: old_buf_state = pg_atomic_read_u32(&buf->state); for (;;) { if (old_buf_state & BM_LOC

Re: strange perf regression with data checksums

2025-05-09 Thread Aleksander Alekseev
Hi Tomas, > While running some benchmarks comparing 17 and 18, I ran into a simple > workload where 18 throughput drops by ~80%. After pulling my hair for a > couple hours I realized the change that triggered this is 04bec894a04c, > which set checksums on by default. Which is very bizarre, because

Re: strange perf regression with data checksums

2025-05-09 Thread Aleksander Alekseev
Hi, > > I'm not claiming that hint bits are necessarily the reason for the > > observed behavior but when something is off with presumably read-only > > queries this is the first reason that comes to mind. At least we > > should make sure hint bits are excluded from the equation. If memory > > ser

Re: strange perf regression with data checksums

2025-05-09 Thread Tomas Vondra
On 5/9/25 14:53, Aleksander Alekseev wrote: > Hi Tomas, > >> While running some benchmarks comparing 17 and 18, I ran into a simple >> workload where 18 throughput drops by ~80%. After pulling my hair for a >> couple hours I realized the change that triggered this is 04bec894a04c, >> which set che

strange perf regression with data checksums

2025-05-09 Thread Tomas Vondra
Hi, While running some benchmarks comparing 17 and 18, I ran into a simple workload where 18 throughput drops by ~80%. After pulling my hair for a couple hours I realized the change that triggered this is 04bec894a04c, which set checksums on by default. Which is very bizarre, because the workload