> On Dec 5, 2017, at 4:07 PM, Thomas Munro <thomas.mu...@enterprisedb.com> 
> wrote:
> 
> On Wed, Dec 6, 2017 at 9:35 AM, Mark Dilger <hornschnor...@gmail.com> wrote:
>>> On Dec 5, 2017, at 11:25 AM, Thomas Munro <thomas.mu...@enterprisedb.com> 
>>> wrote:
>>> Does the plan have multiple Gather nodes with Parallel Bitmap Heap Scan?
>> 
>> This was encountered and logged by a java client.  The only data I got was:
>> 
>> org.postgresql.util.PSQLException: ERROR: dsa_allocate could not find 4 free 
>> pages
>>  Where: parallel worker
> 
> This means that the DSA area is corrupted.  Presumably
> get_best_segment(area, 4) returned a segment that wasn't actually good
> for 4 pages, either because it was incorrectly binned or because its
> free space btree was corrupted.  Another path would be that
> make_new_segment(area, 4) returned a segment that couldn't find 4
> pages, but that seems unlikely.
> 
>> [query plan with one Gather and no Parallel Bitmap Heap Scan]
> 
> I'm not sure why this plan would ever call dsa_allocate().
> 
>> [query plan with no Gather but plenty of Btimap Heap Scans]
> 
> And this one certainly can't.  I guess you must sometimes get a
> different variation that has Gather nodes and uses Parallel Bitmap
> Heap Scan.

Yes, I can believe that the plan is sometimes different.  This error has
occurred several times now, but it is still rather infrequent, so either the
plan that triggers it is rare, or the bug is intermittent even with the same
plan being chosen, or perhaps both.

>  Then the question is whether the es_query_dsa multiple
> Gather bug can explain this: for example, if dsa_free(wrong_dsa_area,
> p) was called, perhaps it could produce this type of corruption.
> Otherwise we have a different bug.  Any clues on how to reproduce the
> problem would be very welcome.

I have written (and rewritten, and rewritten) a tap test in the hopes of
getting a test case that reproduces this reliably (or even once), but
without luck so far.  I will keep trying.

mark


Reply via email to