imay opened a new issue #1776: create chunk allocator for memory pool
URL: https://github.com/apache/incubator-doris/issues/1776
 
 
   ## Motivation
   
   In the case of high concurrency testing, many threads are waiting to be 
applied and released in memory, and a large part of them are released by Chunk 
in MemPool. One of the reasons for this is that MemPool is used everywhere in 
code. On the other hand, the memory usage of these chunks is relatively large 
4K - 512K. This large amount of memory make TCMalloc easily exceed the free 
memory reserved for each thread and needs to be applied to the central memory.
   
   Therefore, I implemented a demo ChunkAllocator to keep the released Chunk, 
avoiding frequent allocate from or release to TCMalloc. Using this demo to test 
the same high concurrency case, the throughput is more than doubled. The 
throughput has increased from 280 QPS to 650 QPS. So based on this, I want to 
implement a ChunkAllocator to reduce the allocation and release operations of 
Chunk from system allocator, thus improving the performance of the system.
   
   ## Design
   
   How to manage free Chunks? The size of the Chunk is power-of-two, so we can 
maintain a separate free chunk list for each size. When the Chunk is no longer 
used, it will be placed in the free list of the corresponding size. When 
allocating a new Chunk, it will first try to find it from the corresponding 
size free list. If it can't find it, try to allocate a new Chunk from the 
system allocator.
   
   In order to avoid the Chunk Allocator's lock conflict which will affect 
system performance, we need to reduce the collision domain. The idea here is to 
maintain an Chunk Arena for each CPU core. When allocating, try to allocate 
memory from the corresponding Chunk Arena.
   
   For memory limitations, there are two options. One is to set a limit on the 
total amount of memory that can be allocated; and the other is to set a limit 
on the maximum amount of free memory that is reserved. In order to be 
compatible with the current system behavior, I intend to limit only the total 
amount of reserved memory. This only fails when the system memory is completely 
drained, which is consistent with the current behavior. The larger the reserved 
free memory limit is, the better it will result in a better cache hit, but it 
will also lead to excessive free memory, causing other modules hard to allocate 
memory.
   
   What system allocator is used? malloc vs mmap? Currently, Malloc is used. If 
we change to mmap and do not change the system parameters(vm.max_map_count), it 
may cause the memory allocating to fail even if there is memory. We can 
implement these two types system allocator, and then leave a configure to 
choose which way to complete the system memory allocation. And configure malloc 
as default
   
   future work:
   All large memory applications in the system can be applied through Chunk 
Allocator, so that the Chunk Allocator can be changed from the reserved limit 
to the memory allocating limit.
   
   ## Structure
   
   ```
   
   Struct Chunk {
       Uint8_t* data;
       Size_t size;
       // core id from which this chunk was allocated
       Int core_id;
   };
   
   // Keep free chunk for each CPU core
   Class ChunkArena {
   Public:
       // Pop a free chunk from correspoding fres list
       // Return true if success with valid chunk saved in "chunk"
       Bool pop_free_chunk(size_t size, Chunk* chunk);
       
       // push a free chunk in this arena for later use
       Void push_free_chunk(const Chunk& chunk);
   };
   
   Class ChunkAllocator {
   Public:
       // Allocate memory in size, size must be power-of-two.
       // Return Status::OK() if success, and allocated chunk info will be 
saved in chunk
       Status allocate(size_t size, Chunk* chunk);
       
       Void free(const Chunk& chunk);
   };
   ```
   
   Allocate process:
   
   1. Get the current core_id
   2. Try to apply for an idle Chunk from the corresponding Arena. If 
successful, return the corresponding Chunk.
   3. Try to get free Chunk from Arena corresponding to other cores. If 
successful, return to Chunk
   4. Assign Chunk from the system allocator
   
   Release process:
   
   1. Determine if there is enough cache capacity, and if so, place the chunk 
in the idle queue for the corresponding Arena.
   2. Call the system release function to release the resource

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org

Reply via email to