On Thu, Apr 4, 2024 at 3:02 AM David Rowley <dgrowle...@gmail.com> wrote: > > On Thu, 4 Apr 2024 at 16:45, David Rowley <dgrowle...@gmail.com> wrote: > > I've pushed the v9-0001 with that rename done. > > I've now just pushed the 0002 patch with some revisions:
Thanks! > 1. The function declarations you added for heapgettup_advance_block() > and heapgettup_initial_block() didn't match the properties of their > definitions. You'd declared both of these static inline but neither > of these were. Ah, they needed to be defined static but I intentionally left off the inline in the definition and only put it in the forward declaration because I thought that was correct. Anyway, I'm fine with how you did it in the end. > 2. I felt inclined to rename heapfetchbuf() to heapfetchnextbuf() as > that's effectively what it does with v8-0002, however, that's just too > many words all shoved together, so I renamed it to > heap_fetch_next_buffer(). Sounds good. > 3. I changed heapgettup_initial_block() to pg_noinline as it both > makes more sense to have this out of line and it also fixed a small > performance regression. Ah, I guess it is in the unlikely path. I often forget that noinline exists. It's interesting that that made a noticeable difference since it is a pretty short function. Thanks for taking such a close look! > Looks like I also failed to grep for all the remaining instances of > "heapgetpage" in 44086b097. Those are now fixed by 3a4a3537a. > > I also rebased the 0003 patch which I've attached as a raw patch. Thanks! > FWIW, using Heikki's test in [1] with a pg_prewarm each time after > restarting the instance. No parallel aggregate. > > drowley@amd3990x:~$ cat bench.sql > select count(*) from giga; > > drowley@amd3990x:~$ pgbench -n -f bench.sql -T 120 postgres | grep latency > > 44086b097~1 > latency average = 34.323 ms > latency average = 34.332 ms > > 44086b097 > latency average = 34.811 ms > latency average = 34.862 ms > > 3a4a3537a > latency average = 34.497 ms > latency average = 34.538 ms > > 3a4a3537a + read_stream_for_seqscans.patch > latency average = 40.923 ms > latency average = 41.415 ms > > i.e. no meaningful change from the refactor, but a regression from a > cached workload that changes the page often without doing much work in > between with the read stread patch. Cool. Thanks for testing this out. Sounds like Thomas did some analysis of how to resolve this for the streaming read user, and I will do some too. - Melanie