Hi, Thanks for the patches!
On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote: > I found couple of places where the zheap is using some extra logic in > verifying > whether it is zheap AM or not, based on that it used to took some extra > decisions. > I am analyzing all the extra code that is done, whether any callbacks can > handle it > or not? and how? I can come back with more details later. Yea, I think some of them will need to stay (particularly around integrating undo) and som other ones we'll need to abstract. > >> And then: > >> - lotsa cleanups > >> - rebasing onto a newer version of the abstract slot patchset > >> - splitting out smaller patches > >> > >> > >> You'd moved the bulk insert into tableam callbacks - I don't quite get > >> why? There's not really anything AM specific in that code? > >> > > > > The main reason of adding them to AM is just to provide a control to > > the specific AM to decide whether they can support the bulk insert or > > not? > > > > Current framework doesn't support AM specific bulk insert state to be > > passed from one function to another and it's structure is fixed. This needs > > to be enhanced to add AM specific private members also. > > > > Do you want me to work on it to make it generic to AM methods to extend > the structure? I think the best thing here would be to *remove* all AM abstraction for bulk insert, until it's actuallly needed. The likelihood of us getting the interface right and useful without an actual user seems low. Also, this already is a huge patch... > @@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, EState > *estate, > CommandId mycid, int hi_options, > ResultRelInfo *resultRelInfo, > BulkInsertState bistate, > - int nBufferedTuples, TupleTableSlot > **bufferedSlots, > + int nBufferedSlots, TupleTableSlot > **bufferedSlots, > uint64 firstBufferedLineNo); > static bool CopyReadLine(CopyState cstate); > static bool CopyReadLineText(CopyState cstate); > @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate) > void *bistate; > uint64 processed = 0; > bool useHeapMultiInsert; > - int nBufferedTuples = 0; > + int nBufferedSlots = 0; > int prev_leaf_part_index = -1; > -#define MAX_BUFFERED_TUPLES 1000 > +#define MAX_BUFFERED_SLOTS 1000 What's the point of these renames? We're still dealing in tuples. Just seems to make the patch larger. > if (useHeapMultiInsert) > { > + int tup_size; > + > /* Add this tuple to the tuple buffer */ > - if (nBufferedTuples == 0) > + if (nBufferedSlots == 0) > + { > firstBufferedLineNo = > cstate->cur_lineno; > - Assert(bufferedSlots[nBufferedTuples] > == myslot); > - nBufferedTuples++; > + > + /* > + * Find out the Tuple size of > the first tuple in a batch and > + * use it for the rest tuples > in a batch. There may be scenarios > + * where the first tuple is > very small and rest can be large, but > + * that's rare and this should > work for majority of the scenarios. > + */ > + tup_size = > heap_compute_data_size(myslot->tts_tupleDescriptor, > + > myslot->tts_values, > + > myslot->tts_isnull); > + } This seems too exensive to me. I think it'd be better if we instead used the amount of input data consumed for the tuple as a proxy. Does that sound reasonable? Greetings, Andres Freund