On Thu, Oct 20, 2016 at 1:12 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Tue, Oct 18, 2016 at 5:29 PM, Rushabh Lathia > <rushabh.lat...@gmail.com> wrote: > > On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila <amit.kapil...@gmail.com> > > wrote: > >> > >> There is lot of common code between ExecGatherMerge and ExecGather. > >> Do you think it makes sense to have a common function to avoid the > >> duplicity? > >> > >> I see there are small discrepancies in both the codes like I don't see > >> the use of single_copy flag, as it is present in gather node. > >> > > > > Yes, even I thought to centrilize some things of ExecGather and > > ExecGatherMerge, > > but its really not something that is fixed. And I thought it might change > > particularly > > for the Gather Merge. And as explained by Robert single_copy doesn't make > > sense > > for the Gather Merge. I will still look into this to see if something > can be > > make > > centralize. > > > > Okay, I haven't thought about it, but do let me know if you couldn't > find any way to merge the code. > > Sure, I will look into this. > >> > >> 3. > >> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool > >> force) > >> { > >> .. > >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL); > >> + > >> + /* > >> + > >> * try to read more tuple into nowait mode and store it into the tuple > >> + * array. > >> + > >> */ > >> + if (HeapTupleIsValid(tup)) > >> + fill_tuple_array(gm_state, reader); > >> > >> How the above read tuple is stored in array? In anycase the above > >> interface seems slightly awkward to me. Basically, I think what you > >> are trying to do here is after reading first tuple in waitmode, you > >> are trying to fill the array by reading more tuples. So, can't we > >> push reading of this fist tuple into that function and name it as > >> form_tuple_array(). > >> > > > > Yes, you are right. > > > > You have not answered my first question. I will try to ask again, how > the tuple read by below code is stored in the array: > > Tuple directly get stored into related TupleTableSlot. In gather_merge_readnext() at the end of function it build the TupleTableSlot for the given tuple. So tuple read by above code is directly stored into TupleTableSlot. >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL); > > > First its trying to read tuple into wait-mode, and once > > it > > find tuple then its try to fill the tuple array (which basically try to > read > > tuple > > into nowait-mode). Reason I keep it separate is because in case of > > initializing > > the gather merge, if we unable to read tuple from all the worker - while > > trying > > re-read, we again try to fill the tuple array for the worker who already > > produced > > atleast a single tuple (see gather_merge_init() for more details). > > > > Whenever any worker produced one tuple, you already try to fill the > array in gather_merge_readnext(), so does the above code help much? > > > Also I > > thought > > filling tuple array (which basically read tuple into nowait mode) and > > reading tuple > > (into wait-mode) are two separate task - and if its into separate > function > > that code > > look more clear. > > To me that looks slightly confusing. > > > If you have any suggestion for the function name > > (fill_tuple_array) > > then I am open to change that. > > > > form_tuple_array (form_tuple is used at many places in code, so it > should look okay). Ok, I rename it with next patch. > I think you might want to consider forming array > even for leader, although it might not be as beneficial as for > workers, OTOH, the code will get simplified if we do that way. > Yes, I did that earlier - and as you guessed its not be any beneficial so to avoided extra memory allocation for the tuple array, I am not forming array for leader. Today, I observed another issue in code: > > +gather_merge_init(GatherMergeState *gm_state) > { > .. > +reread: > + for (i = 0; i < nreaders + 1; i++) > + { > + if (TupIsNull(gm_state->gm_slots[i]) || > + gm_state->gm_slots[i]->tts_isempty) > + { > + if (gather_merge_readnext(gm_state, i, initialize ? false : true)) > + { > + binaryheap_add_unordered(gm_state->gm_heap, > + Int32GetDatum(i)); > + } > + } > + else > + fill_tuple_array(gm_state, i); > + } > + initialize = false; > + > + for (i = 0; i < nreaders; i++) > + if (TupIsNull(gm_state->gm_slots[i]) || gm_state->gm_slots[i]->tts_ > isempty) > + goto reread; > .. > } > > This code can cause infinite loop. Consider a case where one of the > worker doesn't get any tuple because by the time it starts all the > tuples are consumed by all other workers. The above code will keep on > looping to fetch the tuple from that worker whereas that worker can't > return any tuple. I think you can fix it by checking if the > particular queue has been exhausted. > > Oh yes. I will work on the fix and soon submit the next set of patch. > >> > > >> > Open Issue: > >> > > >> > - Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 fixed the leak into > >> > tqueue.c by > >> > calling gather_readnext() into per-tuple context. Now for gather merge > >> > that > >> > is > >> > not possible, as we storing the tuple into Tuple array and we want > tuple > >> > to > >> > be > >> > free only its get pass through the merge sort algorithm. One idea is, > we > >> > can > >> > also call gm_readnext_tuple() under per-tuple context (which will fix > >> > the > >> > leak > >> > into tqueue.c) and then store the copy of tuple into tuple array. > >> > > >> > >> Won't extra copy per tuple impact performance? Is the fix in > >> mentioned commit was for record or composite types, if so, does > >> GatherMerge support such types and if it support, does it provide any > >> benefit over Gather? > >> > > > > I don't think was specificially for the record or composite types - but I > > might be > > wrong. As per my understanding commit fix leak into tqueue.c. > > > > Hmm, in tqueue.c, I think the memory leak was remapping logic, refer > mail [1] of Tom (Just to add insult to injury, the backend's memory > consumption bloats to something over 5.5G during that last query). > > > Fix was to add > > standard to call TupleQueueReaderNext() with shorter memory context - so > > that > > tqueue.c doesn't leak the memory. > > > > I have idea to fix this by calling the TupleQueueReaderNext() with > per-tuple > > context, > > and then copy the tuple and store it to the tuple array and later with > the > > next run of > > ExecStoreTuple() will free the earlier tuple. I will work on that. > > > > Okay, if you think that is viable, then you can do it, but do check > the performance impact of same, because extra copy per fetched tuple > can impact performance. > > Sure, I will check the performance impact for the same. > > [1] - https://www.postgresql.org/message-id/32763.1469821037% > 40sss.pgh.pa.us > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com > Thanks, Rushabh Lathia www.EnterpriseDB.com