https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #31 from Dmitriy Ovdienko <dmitriy.ovdienko at gmail dot com> ---
Created attachment 49202
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49202&action=edit
Modified solution with 2-level memory pools

I believe I'm done with this task. Attached is a solution based on our
conversation:

1. I allocate memory by 4k blocks. As far as PMR adds 11, actuall allocated
block size will be 4096

2. I use `std::pmr::unsynchronized_pool_resource` as it does not release memory
in `deallocate()` memober function

3. I parallel `iterations` loop rather than `depth` loop

4. I do not use std::condition_variable + std::mutex as in fact it is veeeeery
slow. std::atomic_int is much faster for this purpose.

Following are results:


| CPU counter          |       Rust |    C++ now | C++ malloc |   C++ pool |
|----------------------|-----------:|-----------:|-----------:|-----------:|
| cache-references, 1k |     29,949 |     60,495 |     56,695 |     54,608 |
| cache-misses, 1k     |     12,378 |     25,741 |     17,430 |     17,963 |
| cycles, 1k           | 24,700,616 | 23,714,062 | 18,304,126 | 22,599,042 |
| instructions, 1k     | 31,819,068 | 30,455,923 | 20,827,116 | 29,420,285 |
| branches, 1k         |  4,835,891 |  4,832,915 |  4,083,330 |  4,607,526 |
| branch-misses, 1k    |     10,768 |     11,643 |      8,868 |      9,379 |
| faults, 1k           |         82 |        239 |         92 |         82 |
| migrations           |          2 |         21 |         13 |         12 |
| Time-elapsed         |     2.0779 |     2.0061 |     1.6051 |     1.9445 |
| Seconds-user         |     7.1599 |     6.5808 |     5.2582 |     6.5315 |
| Seconds-sys          |     0.1803 |     0.4595 |     0.1872 |     0.1838 |

Reply via email to