Hi all, I started this new thread from another thread[1] where we're discussing a new storage for TIDs, TidStore, since we found a difficulty about the memory usage limit for TidStores on DSA.
TidStore is a new data structure to efficiently store TIDs, backed by a radix tree. In the patch series proposed on that thread, in addition to radix tree and TidStore, there is another patch for lazy (parallel) vacuum to replace the array of dead tuple TIDs with a TidStore. To support parallel vacuum, radix tree (and TidStore) can be created on a local memory as well as on DSA. Also, it has memory usage limit functionality; we can specify the memory limit (e.g., maintenance_work_mem) to TidStoreCreate() function. Once the total DSA segment size (area->control->total_segment_size) exceeds the limit, TidStoreIsFull() returns true. The lazy vacuum can continue scanning heap blocks to collect dead tuple TIDs until TidStoreIsFull() returns true. Currently lazy vacuum is the sole user of TidStore but maybe it can be used by other codes such as tidbitmap.c where will be limited by work_mem. During the development, we found out that DSA memory growth is unpredictable, leading to inefficient memory limitation. DSA is built on top of DSM segments and it manages a set of DSM segments, adding new segments as required and detaching them when they are no longer needed. The DSA segment starts with 1MB in size and a new segment size is at least big enough to follow a geometric series that approximately doubles the total storage each time we create a new segment. Because of this fact, it's not efficient to simply compare the memory limit to the total segment size. For example, if maintenance_work_mem is 512MB, the total segment size will be like: 2 * (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128) = 510MB -> less than the limit, continue heap scan. 2 * (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128) + 256 = 766MB -> stop (exceed 254MB). One might think we can use dsa_set_size_limit() but it cannot; lazy vacuum ends up with an error. If we set DSA_ALLOC_NO_OOM, we might end up stopping the insertion halfway. Besides excessively allocating memory, since the initial DSM segment size is fixed 1MB, memory usage of a shared TidStore will start from 1MB+. This is higher than the minimum values of both work_mem and maintenance_work_mem, 64kB and 1MB respectively. Increasing the minimum m_w_m to 2MB might be acceptable but not for work_mem. Researching possible solutions, we found that aset.c also has a similar characteristic; allocates an 8K block (by default) upon the first allocation in a context, and doubles that size for each successive block request. But we can specify the initial block size and max blocksize. This made me think of an idea to specify both to DSA and both values are calculated based on m_w_m. I've attached the patch for this idea. The changes to dsa.c are straightforward since dsa.c already uses macros DSA_INITIAL_SEGMENT_SIZE and DSA_MAX_SEGMENT_SIZE. I just made these values configurable. FYI with this patch, we can create a DSA in parallel_vacuum_init() with initial and maximum block sizes as follows: initial block size = min(m_w_m / 4, 1MB) max block size = max(m_w_m / 8, 8MB) In most cases, we can start with a 1MB initial segment, the same as before. For larger memory, the heap scan stops after DSA allocates 1.25 times more memory than m_w_m. For example, if m_w_m = 512MB, the both initial and maximum segment sizes are 1MB and 64MB respectively, and then DSA allocates the segments as follows until heap scanning stops: 2 * (1 + 2 + 4 + 8 + 16 + 32 + 64) + (64 * 4) = 510MB -> less than the limit, continue heap scan. 2 * (1 + 2 + 4 + 8 + 16 + 32 + 64) + (64 * 5) = 574MB -> stop (allocated additional 62MB). It also works with smaller memory; If the limit is 1MB, we start with a 256KB initial segment and heap scanning stops after DSA allocated 1.5MB (= 256kB + 256kB + 512kB + 512kB). There is room for considering better formulas for initial and maximum block sizes but making both values configurable is a promising idea. And the analogous behavior to aset could be a good thing for readability and maintainability. There is another test result where I used this idea on top of a radix tree[2]. We need to consider the total number of allocated DSA segments as the total number of DSM segments available on the system is fixed[3]. But it seems not problematic even with this patch since we allocate only a few additional segments (in above examples 17 segs vs. 19 segs). There was no big difference also in performance[2]. Regards, [1] https://www.postgresql.org/message-id/CAD21AoDBmD5q%3DeO%2BK%3DgyuVt53XvwpJ2dgxPwrtZ-eVOjVmtJjg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAD21AoDKr%3D4YHphy6cRojE5eyT6E2ao8xb44E309eTrUEOC6xw%40mail.gmail.com [3] from dsm.c, the total number of DSM segments available on the system is calculated by: #define PG_DYNSHMEM_FIXED_SLOTS 64 #define PG_DYNSHMEM_SLOTS_PER_BACKEND 5 maxitems = PG_DYNSHMEM_FIXED_SLOTS + PG_DYNSHMEM_SLOTS_PER_BACKEND * MaxBackends; -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/utils/mmgr/dsa.c b/src/backend/utils/mmgr/dsa.c index f5a62061a3..2cf1c0356d 100644 --- a/src/backend/utils/mmgr/dsa.c +++ b/src/backend/utils/mmgr/dsa.c @@ -60,14 +60,6 @@ #include "utils/freepage.h" #include "utils/memutils.h" -/* - * The size of the initial DSM segment that backs a dsa_area created by - * dsa_create. After creating some number of segments of this size we'll - * double this size, and so on. Larger segments may be created if necessary - * to satisfy large requests. - */ -#define DSA_INITIAL_SEGMENT_SIZE ((size_t) (1 * 1024 * 1024)) - /* * How many segments to create before we double the segment size. If this is * low, then there is likely to be a lot of wasted space in the largest @@ -77,17 +69,6 @@ */ #define DSA_NUM_SEGMENTS_AT_EACH_SIZE 2 -/* - * The number of bits used to represent the offset part of a dsa_pointer. - * This controls the maximum size of a segment, the maximum possible - * allocation size and also the maximum number of segments per area. - */ -#if SIZEOF_DSA_POINTER == 4 -#define DSA_OFFSET_WIDTH 27 /* 32 segments of size up to 128MB */ -#else -#define DSA_OFFSET_WIDTH 40 /* 1024 segments of size up to 1TB */ -#endif - /* * The maximum number of DSM segments that an area can own, determined by * the number of bits remaining (but capped at 1024). @@ -98,9 +79,6 @@ /* The bitmask for extracting the offset from a dsa_pointer. */ #define DSA_OFFSET_BITMASK (((dsa_pointer) 1 << DSA_OFFSET_WIDTH) - 1) -/* The maximum size of a DSM segment. */ -#define DSA_MAX_SEGMENT_SIZE ((size_t) 1 << DSA_OFFSET_WIDTH) - /* Number of pages (see FPM_PAGE_SIZE) per regular superblock. */ #define DSA_PAGES_PER_SUPERBLOCK 16 @@ -319,6 +297,10 @@ typedef struct dsa_segment_index segment_bins[DSA_NUM_SEGMENT_BINS]; /* The object pools for each size class. */ dsa_area_pool pools[DSA_NUM_SIZE_CLASSES]; + /* initial allocation segment size */ + size_t init_segment_size; + /* maximum allocation segment size */ + size_t max_segment_size; /* The total size of all active segments. */ size_t total_segment_size; /* The maximum total size of backing storage we are allowed. */ @@ -413,7 +395,9 @@ static dsa_segment_map *make_new_segment(dsa_area *area, size_t requested_pages) static dsa_area *create_internal(void *place, size_t size, int tranche_id, dsm_handle control_handle, - dsm_segment *control_segment); + dsm_segment *control_segment, + size_t init_segment_size, + size_t max_segment_size); static dsa_area *attach_internal(void *place, dsm_segment *segment, dsa_handle handle); static void check_for_freed_segments(dsa_area *area); @@ -429,7 +413,8 @@ static void check_for_freed_segments_locked(dsa_area *area); * we require the caller to provide one. */ dsa_area * -dsa_create(int tranche_id) +dsa_create_extended(int tranche_id, size_t init_segment_size, + size_t max_segment_size) { dsm_segment *segment; dsa_area *area; @@ -438,7 +423,7 @@ dsa_create(int tranche_id) * Create the DSM segment that will hold the shared control object and the * first segment of usable space. */ - segment = dsm_create(DSA_INITIAL_SEGMENT_SIZE, 0); + segment = dsm_create(init_segment_size, 0); /* * All segments backing this area are pinned, so that DSA can explicitly @@ -450,9 +435,10 @@ dsa_create(int tranche_id) /* Create a new DSA area with the control object in this segment. */ area = create_internal(dsm_segment_address(segment), - DSA_INITIAL_SEGMENT_SIZE, + init_segment_size, tranche_id, - dsm_segment_handle(segment), segment); + dsm_segment_handle(segment), segment, + init_segment_size, max_segment_size); /* Clean up when the control segment detaches. */ on_dsm_detach(segment, &dsa_on_dsm_detach_release_in_place, @@ -478,13 +464,15 @@ dsa_create(int tranche_id) * See dsa_create() for a note about the tranche arguments. */ dsa_area * -dsa_create_in_place(void *place, size_t size, - int tranche_id, dsm_segment *segment) +dsa_create_in_place_extended(void *place, size_t size, + int tranche_id, dsm_segment *segment, + size_t init_segment_size, size_t max_segment_size) { dsa_area *area; area = create_internal(place, size, tranche_id, - DSM_HANDLE_INVALID, NULL); + DSM_HANDLE_INVALID, NULL, + init_segment_size, max_segment_size); /* * Clean up when the control segment detaches, if a containing DSM segment @@ -1203,7 +1191,8 @@ static dsa_area * create_internal(void *place, size_t size, int tranche_id, dsm_handle control_handle, - dsm_segment *control_segment) + dsm_segment *control_segment, + size_t init_segment_size, size_t max_segment_size) { dsa_area_control *control; dsa_area *area; @@ -1213,6 +1202,11 @@ create_internal(void *place, size_t size, size_t metadata_bytes; int i; + /* Validate the initial and maximum block sizes */ + Assert(init_segment_size >= 1024); + Assert(max_segment_size >= init_segment_size); + Assert(max_segment_size <= DSA_MAX_SEGMENT_SIZE); + /* Sanity check on the space we have to work in. */ if (size < dsa_minimum_size()) elog(ERROR, "dsa_area space must be at least %zu, but %zu provided", @@ -1242,8 +1236,10 @@ create_internal(void *place, size_t size, control->segment_header.prev = DSA_SEGMENT_INDEX_NONE; control->segment_header.usable_pages = usable_pages; control->segment_header.freed = false; - control->segment_header.size = DSA_INITIAL_SEGMENT_SIZE; + control->segment_header.size = size; control->handle = control_handle; + control->init_segment_size = init_segment_size; + control->max_segment_size = max_segment_size; control->max_total_segment_size = (size_t) -1; control->total_segment_size = size; control->segment_handles[0] = control_handle; @@ -2112,9 +2108,9 @@ make_new_segment(dsa_area *area, size_t requested_pages) * move to huge pages in the future. Then we work back to the number of * pages we can fit. */ - total_size = DSA_INITIAL_SEGMENT_SIZE * + total_size = area->control->init_segment_size * ((size_t) 1 << (new_index / DSA_NUM_SEGMENTS_AT_EACH_SIZE)); - total_size = Min(total_size, DSA_MAX_SEGMENT_SIZE); + total_size = Min(total_size, area->control->max_segment_size); total_size = Min(total_size, area->control->max_total_segment_size - area->control->total_segment_size); diff --git a/src/include/utils/dsa.h b/src/include/utils/dsa.h index 3ce4ee300a..6f1144f956 100644 --- a/src/include/utils/dsa.h +++ b/src/include/utils/dsa.h @@ -77,6 +77,28 @@ typedef pg_atomic_uint64 dsa_pointer_atomic; /* A sentinel value for dsa_pointer used to indicate failure to allocate. */ #define InvalidDsaPointer ((dsa_pointer) 0) +/* + * The default size of the initial DSM segment that backs a dsa_area created + * by dsa_create. After creating some number of segments of this size we'll + * double this size, and so on. Larger segments may be created if necessary + * to satisfy large requests. + */ +#define DSA_INITIAL_SEGMENT_SIZE ((size_t) (1 * 1024 * 1024)) + +/* + * The number of bits used to represent the offset part of a dsa_pointer. + * This controls the maximum size of a segment, the maximum possible + * allocation size and also the maximum number of segments per area. + */ +#if SIZEOF_DSA_POINTER == 4 +#define DSA_OFFSET_WIDTH 27 /* 32 segments of size up to 128MB */ +#else +#define DSA_OFFSET_WIDTH 40 /* 1024 segments of size up to 1TB */ +#endif + +/* The maximum size of a DSM segment. */ +#define DSA_MAX_SEGMENT_SIZE ((size_t) 1 << DSA_OFFSET_WIDTH) + /* Check if a dsa_pointer value is valid. */ #define DsaPointerIsValid(x) ((x) != InvalidDsaPointer) @@ -88,6 +110,19 @@ typedef pg_atomic_uint64 dsa_pointer_atomic; #define dsa_allocate0(area, size) \ dsa_allocate_extended(area, size, DSA_ALLOC_ZERO) +/* Create dsa_area with default segment sizes */ +#define dsa_create(tranch_id) \ + dsa_create_extended(tranch_id, DSA_INITIAL_SEGMENT_SIZE, \ + DSA_MAX_SEGMENT_SIZE) + +/* + * Create dsa_area with default segment sizes in an existing share memory + * space. + */ +#define dsa_create_in_place(place, size, tranch_id, segment) \ + dsa_create_in_place_extended(place, size, tranch_id, segment, \ + DSA_INITIAL_SEGMENT_SIZE, DSA_MAX_SEGMENT_SIZE) + /* * The type used for dsa_area handles. dsa_handle values can be shared with * other processes, so that they can attach to them. This provides a way to @@ -102,10 +137,12 @@ typedef dsm_handle dsa_handle; /* Sentinel value to use for invalid dsa_handles. */ #define DSA_HANDLE_INVALID ((dsa_handle) DSM_HANDLE_INVALID) - -extern dsa_area *dsa_create(int tranche_id); -extern dsa_area *dsa_create_in_place(void *place, size_t size, - int tranche_id, dsm_segment *segment); +extern dsa_area *dsa_create_extended(int tranche_id, size_t init_segment_size, + size_t max_segment_size); +extern dsa_area *dsa_create_in_place_extended(void *place, size_t size, + int tranche_id, dsm_segment *segment, + size_t init_segment_size, + size_t max_segment_size); extern dsa_area *dsa_attach(dsa_handle handle); extern dsa_area *dsa_attach_in_place(void *place, dsm_segment *segment); extern void dsa_release_in_place(void *place);