https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #31 from Dmitriy Ovdienko <dmitriy.ovdienko at gmail dot com> --- Created attachment 49202 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49202&action=edit Modified solution with 2-level memory pools I believe I'm done with this task. Attached is a solution based on our conversation: 1. I allocate memory by 4k blocks. As far as PMR adds 11, actuall allocated block size will be 4096 2. I use `std::pmr::unsynchronized_pool_resource` as it does not release memory in `deallocate()` memober function 3. I parallel `iterations` loop rather than `depth` loop 4. I do not use std::condition_variable + std::mutex as in fact it is veeeeery slow. std::atomic_int is much faster for this purpose. Following are results: | CPU counter | Rust | C++ now | C++ malloc | C++ pool | |----------------------|-----------:|-----------:|-----------:|-----------:| | cache-references, 1k | 29,949 | 60,495 | 56,695 | 54,608 | | cache-misses, 1k | 12,378 | 25,741 | 17,430 | 17,963 | | cycles, 1k | 24,700,616 | 23,714,062 | 18,304,126 | 22,599,042 | | instructions, 1k | 31,819,068 | 30,455,923 | 20,827,116 | 29,420,285 | | branches, 1k | 4,835,891 | 4,832,915 | 4,083,330 | 4,607,526 | | branch-misses, 1k | 10,768 | 11,643 | 8,868 | 9,379 | | faults, 1k | 82 | 239 | 92 | 82 | | migrations | 2 | 21 | 13 | 12 | | Time-elapsed | 2.0779 | 2.0061 | 1.6051 | 1.9445 | | Seconds-user | 7.1599 | 6.5808 | 5.2582 | 6.5315 | | Seconds-sys | 0.1803 | 0.4595 | 0.1872 | 0.1838 |