Hi, The numbers presented in this thread seem very promising - clearly there's significant potential for improvements. I'll run similar benchmarks too, to get a better understanding of this.
Can you share some basic details about the hardware you used? Particularly the CPU model - I guess this might explain some of the results, e.g. if CPU caches are ~1MB, that'd explain why setting tup_queue_size to 1MB improves things, but 4MB is a bit slower. Similarly, number of cores might explain why 4 workers perform better than 8 or 16 workers. Now, this is mostly expected, but the consequence is that maybe things like queue size should be tunable/dynamic, not hard-coded? As for the patches, I think the proposed changes are sensible, but I wonder what queries might get slower. For example with the batching (updating the counter only once every 4kB, that pretty much transfers data in larger chunks with higher latency. So what if the query needs only a small chunk, like a LIMIT query? Similarly, this might mean the upper parts of the plan have to wait for the data for longer, and thus can't start some async operation (like send them to a FDW, or something like that). I do admit those are theoretical queries, I haven't tried creating such query. FWIW I've tried applying both patches at the same time, but there's a conflict in shm_mq_sendv - not a complex one, but I'm not sure what's the correct solution. Can you share a "combined" patch? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company