Please find attached latest patch which fix the review point as well as additional clean-up.
- Get rid of funnel_slot as its not needed for the Gather Merge - renamed fill_tuple_array to form_tuple_array - Fix possible infinite loop into gather_merge_init (Reported by Amit Kaplia) - Fix tqueue.c memory leak, by calling TupleQueueReaderNext() with per-tuple context. - Code cleanup into ExecGatherMerge. - Added new function gather_merge_clear_slots(), which clear out all gather merge slots and also free tuple array at end of execution. I did the performance testing again with the latest patch and I haven't observed any regression. Some of TPC-H queries showing additional benefit with the latest patch, but its just under 5%. Do let me know if I missed anything. On Mon, Oct 24, 2016 at 11:55 AM, Rushabh Lathia <rushabh.lat...@gmail.com> wrote: > > > On Thu, Oct 20, 2016 at 1:12 PM, Amit Kapila <amit.kapil...@gmail.com> > wrote: > >> On Tue, Oct 18, 2016 at 5:29 PM, Rushabh Lathia >> <rushabh.lat...@gmail.com> wrote: >> > On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila <amit.kapil...@gmail.com> >> > wrote: >> >> >> >> There is lot of common code between ExecGatherMerge and ExecGather. >> >> Do you think it makes sense to have a common function to avoid the >> >> duplicity? >> >> >> >> I see there are small discrepancies in both the codes like I don't see >> >> the use of single_copy flag, as it is present in gather node. >> >> >> > >> > Yes, even I thought to centrilize some things of ExecGather and >> > ExecGatherMerge, >> > but its really not something that is fixed. And I thought it might >> change >> > particularly >> > for the Gather Merge. And as explained by Robert single_copy doesn't >> make >> > sense >> > for the Gather Merge. I will still look into this to see if something >> can be >> > make >> > centralize. >> > >> >> Okay, I haven't thought about it, but do let me know if you couldn't >> find any way to merge the code. >> >> > Sure, I will look into this. > > >> >> >> >> 3. >> >> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool >> >> force) >> >> { >> >> .. >> >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL); >> >> + >> >> + /* >> >> + >> >> * try to read more tuple into nowait mode and store it into the tuple >> >> + * array. >> >> + >> >> */ >> >> + if (HeapTupleIsValid(tup)) >> >> + fill_tuple_array(gm_state, reader); >> >> >> >> How the above read tuple is stored in array? In anycase the above >> >> interface seems slightly awkward to me. Basically, I think what you >> >> are trying to do here is after reading first tuple in waitmode, you >> >> are trying to fill the array by reading more tuples. So, can't we >> >> push reading of this fist tuple into that function and name it as >> >> form_tuple_array(). >> >> >> > >> > Yes, you are right. >> > >> >> You have not answered my first question. I will try to ask again, how >> the tuple read by below code is stored in the array: >> >> > Tuple directly get stored into related TupleTableSlot. > In gather_merge_readnext() > at the end of function it build the TupleTableSlot for the given tuple. So > tuple > read by above code is directly stored into TupleTableSlot. > > >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL); >> >> > First its trying to read tuple into wait-mode, and once >> > it >> > find tuple then its try to fill the tuple array (which basically try to >> read >> > tuple >> > into nowait-mode). Reason I keep it separate is because in case of >> > initializing >> > the gather merge, if we unable to read tuple from all the worker - while >> > trying >> > re-read, we again try to fill the tuple array for the worker who already >> > produced >> > atleast a single tuple (see gather_merge_init() for more details). >> > >> >> Whenever any worker produced one tuple, you already try to fill the >> array in gather_merge_readnext(), so does the above code help much? >> >> > Also I >> > thought >> > filling tuple array (which basically read tuple into nowait mode) and >> > reading tuple >> > (into wait-mode) are two separate task - and if its into separate >> function >> > that code >> > look more clear. >> >> To me that looks slightly confusing. >> >> > If you have any suggestion for the function name >> > (fill_tuple_array) >> > then I am open to change that. >> > >> >> form_tuple_array (form_tuple is used at many places in code, so it >> should look okay). > > > Ok, I rename it with next patch. > > >> I think you might want to consider forming array >> even for leader, although it might not be as beneficial as for >> workers, OTOH, the code will get simplified if we do that way. >> > > Yes, I did that earlier - and as you guessed its not be any beneficial > so to avoided extra memory allocation for the tuple array, I am not > forming array for leader. > > Today, I observed another issue in code: >> >> +gather_merge_init(GatherMergeState *gm_state) >> { >> .. >> +reread: >> + for (i = 0; i < nreaders + 1; i++) >> + { >> + if (TupIsNull(gm_state->gm_slots[i]) || >> + gm_state->gm_slots[i]->tts_isempty) >> + { >> + if (gather_merge_readnext(gm_state, i, initialize ? false : true)) >> + { >> + binaryheap_add_unordered(gm_state->gm_heap, >> + Int32GetDatum(i)); >> + } >> + } >> + else >> + fill_tuple_array(gm_state, i); >> + } >> + initialize = false; >> + >> + for (i = 0; i < nreaders; i++) >> + if (TupIsNull(gm_state->gm_slots[i]) || gm_state->gm_slots[i]->tts_ise >> mpty) >> + goto reread; >> .. >> } >> >> This code can cause infinite loop. Consider a case where one of the >> worker doesn't get any tuple because by the time it starts all the >> tuples are consumed by all other workers. The above code will keep on >> looping to fetch the tuple from that worker whereas that worker can't >> return any tuple. I think you can fix it by checking if the >> particular queue has been exhausted. >> >> > Oh yes. I will work on the fix and soon submit the next set of patch. > > >> >> > >> >> > Open Issue: >> >> > >> >> > - Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 fixed the leak >> into >> >> > tqueue.c by >> >> > calling gather_readnext() into per-tuple context. Now for gather >> merge >> >> > that >> >> > is >> >> > not possible, as we storing the tuple into Tuple array and we want >> tuple >> >> > to >> >> > be >> >> > free only its get pass through the merge sort algorithm. One idea >> is, we >> >> > can >> >> > also call gm_readnext_tuple() under per-tuple context (which will fix >> >> > the >> >> > leak >> >> > into tqueue.c) and then store the copy of tuple into tuple array. >> >> > >> >> >> >> Won't extra copy per tuple impact performance? Is the fix in >> >> mentioned commit was for record or composite types, if so, does >> >> GatherMerge support such types and if it support, does it provide any >> >> benefit over Gather? >> >> >> > >> > I don't think was specificially for the record or composite types - but >> I >> > might be >> > wrong. As per my understanding commit fix leak into tqueue.c. >> > >> >> Hmm, in tqueue.c, I think the memory leak was remapping logic, refer >> mail [1] of Tom (Just to add insult to injury, the backend's memory >> consumption bloats to something over 5.5G during that last query). >> >> > Fix was to add >> > standard to call TupleQueueReaderNext() with shorter memory context - so >> > that >> > tqueue.c doesn't leak the memory. >> > >> > I have idea to fix this by calling the TupleQueueReaderNext() with >> per-tuple >> > context, >> > and then copy the tuple and store it to the tuple array and later with >> the >> > next run of >> > ExecStoreTuple() will free the earlier tuple. I will work on that. >> > >> >> Okay, if you think that is viable, then you can do it, but do check >> the performance impact of same, because extra copy per fetched tuple >> can impact performance. >> >> > Sure, I will check the performance impact for the same. > > > >> >> [1] - https://www.postgresql.org/message-id/32763.1469821037%40sss >> .pgh.pa.us >> >> -- >> With Regards, >> Amit Kapila. >> EnterpriseDB: http://www.enterprisedb.com >> > > > > Thanks, > Rushabh Lathia > www.EnterpriseDB.com > -- Rushabh Lathia
gather_merge_v3.patch
Description: application/download
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers