On 12/8/21 16:51, Ronan Dunklau wrote: > Le jeudi 9 septembre 2021, 15:37:59 CET Tomas Vondra a écrit : >> And now comes the funny part - if I run it in the same backend as the >> "full" benchmark, I get roughly the same results: >> >> block_size | chunk_size | mem_allocated | alloc_ms | free_ms >> ------------+------------+---------------+----------+--------- >> 32768 | 512 | 806256640 | 37159 | 76669 >> >> but if I reconnect and run it in the new backend, I get this: >> >> block_size | chunk_size | mem_allocated | alloc_ms | free_ms >> ------------+------------+---------------+----------+--------- >> 32768 | 512 | 806158336 | 233909 | 100785 >> (1 row) >> >> It does not matter if I wait a bit before running the query, if I run it >> repeatedly, etc. The machine is not doing anything else, the CPU is set >> to use "performance" governor, etc. > > I've reproduced the behaviour you mention. > I also noticed asm_exc_page_fault showing up in the perf report in that case. > > Running an strace on it shows that in one case, we have a lot of brk calls, > while when we run in the same process as the previous tests, we don't. > > My suspicion is that the previous workload makes glibc malloc change it's > trim_threshold and possibly other dynamic options, which leads to constantly > moving the brk pointer in one case and not the other. > > Running your fifo test with absurd malloc options shows that indeed that > might > be the case (I needed to change several, because changing one disable the > dynamic adjustment for every single one of them, and malloc would fall back > to > using mmap and freeing it on each iteration): > > mallopt(M_TOP_PAD, 1024 * 1024 * 1024); > mallopt(M_TRIM_THRESHOLD, 256 * 1024 * 1024); > mallopt(M_MMAP_THRESHOLD, 4*1024*1024*sizeof(long)); > > I get the following results for your self contained test. I ran the query > twice, in each case, seeing the importance of the first allocation and the > subsequent ones: > > With default malloc options: > > block_size | chunk_size | mem_allocated | alloc_ms | free_ms > ------------+------------+---------------+----------+--------- > 32768 | 512 | 795836416 | 300156 | 207557 > > block_size | chunk_size | mem_allocated | alloc_ms | free_ms > ------------+------------+---------------+----------+--------- > 32768 | 512 | 795836416 | 211942 | 77207 > > > With the oversized values above: > > block_size | chunk_size | mem_allocated | alloc_ms | free_ms > ------------+------------+---------------+----------+--------- > 32768 | 512 | 795836416 | 219000 | 36223 > > > block_size | chunk_size | mem_allocated | alloc_ms | free_ms > ------------+------------+---------------+----------+--------- > 32768 | 512 | 795836416 | 75761 | 78082 > (1 row) > > I can't tell how representative your benchmark extension would be of real > life > allocation / free patterns, but there is probably something we can improve > here. >
Thanks for looking at this. I think those allocation / free patterns are fairly extreme, and there probably are no workloads doing exactly this. The idea is the actual workloads are likely some combination of these extreme cases. > I'll try to see if I can understand more precisely what is happening. > Thanks, that'd be helpful. Maybe we can learn something about tuning malloc parameters to get significantly better performance. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company