On Sat, Dec 1, 2018 at 9:46 AM Justin Pryzby <pry...@telsasoft.com> wrote: > elog(FATAL, > "dsa_allocate could not find %zu free > pages", npages); > + abort()
If anyone can reproduce this problem with a debugger, it'd be interesting to see the output of dsa_dump(area), and FreePageManagerDump(segment_map->fpm). This error condition means that get_best_segment() selected a segment from a segment bin that holds segments with a certain minimum number of contiguous free pages >= the requested number npages, but then FreePageManagerGet() found that it didn't have npages of contiguous free memory after all when it consulted the segment's btree of free space. Possible explanations include: the segment bin lists are somehow messed up, the FPM in the segment was corrupted by someone scribbling on free pages (which hold the btree), the btree was corrupted by an incorrect sequence of allocate/free calls (for example double frees, allocating from one area and freeing to another etc), freepage.c fails to track its largest size correctly. There is a macro FPM_EXTRA_ASSERTS that can be defined to double-check the largest contiguous page tracking. I have also been wondering about a debug mode that would mprotect(PROT_READ) free pages when they aren't being modified to detect unexpected writes, which should work on systems that have 4k pages. One thing I noticed is that it is failing on a "large" allocation, where we go straight to the btree of 4k pages, but the equivalent code where we allocate a superblock for "small" allocations doesn't report the same kind of FATAL this-can't-happen error, it just fails the allocation via the regular error path without explanation. I also spotted a path that doesn't respect the DSA_ALLOC_NO_OOM flag (you get a null pointer instead of an error). I should fix those inconsistencies (draft patch attached), but those are incidental problems AFAIK. -- Thomas Munro http://www.enterprisedb.com
fix-dsa-area-handling.patch
Description: Binary data