В Ср, 13/07/2022 в 10:09 -0700, Nathan Bossart пишет: > Hi hackers, > > A few years ago, there was a proposal to create hash tables for long > [sub]xip arrays in snapshots [0], but the thread seems to have fizzled out. > I was curious whether this idea still showed measurable benefits, so I > revamped the patch and ran the same test as before [1]. Here are the > results for 60₋second runs on an r5d.24xlarge with the data directory on > the local NVMe storage: > > writers HEAD patch diff > ---------------------------- > 16 659 664 +1% > 32 645 663 +3% > 64 659 692 +5% > 128 641 716 +12% > 256 619 610 -1% > 512 530 702 +32% > 768 469 582 +24% > 1000 367 577 +57% > > As before, the hash table approach seems to provide a decent benefit at > higher client counts, so I felt it was worth reviving the idea. > > The attached patch has some key differences from the previous proposal. > For example, the new patch uses simplehash instead of open-coding a new > hash table. Also, I've bumped up the threshold for creating hash tables to > 128 based on the results of my testing. The attached patch waits until a > lookup of [sub]xip before generating the hash table, so we only need to > allocate enough space for the current elements in the [sub]xip array, and > we avoid allocating extra memory for workloads that do not need the hash > tables. I'm slightly worried about increasing the number of memory > allocations in this code path, but the results above seemed encouraging on > that front. > > Thoughts? > > [0] https://postgr.es/m/35960b8af917e9268881cd8df3f88320%40postgrespro.ru > [1] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e%402ndquadrant.com >
I'm glad my idea has been reborn. Well, may be simplehash is not bad idea. While it certainly consumes more memory and CPU instructions. I'll try to review. regards, Yura Sokolov