On 1/21/19 11:15 PM, Tomas Vondra wrote: > > > On 1/21/19 7:51 PM, Andres Freund wrote: >> Hi, >> >> On 2019-01-21 16:22:11 +0100, Tomas Vondra wrote: >>> >>> >>> On 1/21/19 4:33 AM, Tomas Vondra wrote: >>>> >>>> >>>> On 1/21/19 3:12 AM, Andres Freund wrote: >>>>> On 2019-01-20 18:08:05 -0800, Andres Freund wrote: >>>>>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote: >>>>>>> >>>>>>> >>>>>>> On 1/20/19 8:24 PM, Andres Freund wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote: >>>>>>>>> On 1/14/19 10:25 PM, Tomas Vondra wrote: >>>>>>>>>> On 12/13/18 8:09 AM, Surafel Temesgen wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra >>>>>>>>>>> <tomas.von...@2ndquadrant.com >>>>>>>>>>> <mailto:tomas.von...@2ndquadrant.com>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Can you also update the docs to mention that the functions >>>>>>>>>>> called from >>>>>>>>>>> the WHERE clause does not see effects of the COPY itself? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> /Of course, i also add same comment to insertion method selection >>>>>>>>>>> / >>>>>>>>>> >>>>>>>>>> FWIW I've marked this as RFC and plan to get it committed this week. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Pushed, thanks for the patch. >>>>>>>> >>>>>>>> While rebasing the pluggable storage patch ontop of this I noticed that >>>>>>>> the qual appears to be evaluated in query context. Isn't that a bad >>>>>>>> idea? ISMT it should have been evaluated a few lines above, before the: >>>>>>>> >>>>>>>> /* Triggers and stuff need to be invoked in query >>>>>>>> context. */ >>>>>>>> MemoryContextSwitchTo(oldcontext); >>>>>>>> >>>>>>>> Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok? >>>>>>>> >>>>>>> >>>>>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but >>>>>>> I'll >>>>>>> fix that tomorrow. >>>>>> >>>>>> NP. On second thought, the problem is probably smaller than I thought at >>>>>> first, because ExecQual() switches to the econtext's per-tuple memory >>>>>> context. But it's only reset once for each batch, so there's some >>>>>> wastage. At least worth a comment. >>>>> >>>>> I'm tired, but perhaps its actually worse - what's being reset currently >>>>> is the ESTate's per-tuple context: >>>>> >>>>> if (nBufferedTuples == 0) >>>>> { >>>>> /* >>>>> * Reset the per-tuple exprcontext. We can only do this >>>>> if the >>>>> * tuple buffer is empty. (Calling the context the >>>>> per-tuple >>>>> * memory context is a bit of a misnomer now.) >>>>> */ >>>>> ResetPerTupleExprContext(estate); >>>>> } >>>>> >>>>> but the quals are evaluated in the ExprContext's: >>>>> >>>>> ExecQual(ExprState *state, ExprContext *econtext) >>>>> ... >>>>> ret = ExecEvalExprSwitchContext(state, econtext, &isnull); >>>>> >>>>> >>>>> which is created with: >>>>> >>>>> /* Get an EState's per-output-tuple exprcontext, making it if first use */ >>>>> #define GetPerTupleExprContext(estate) \ >>>>> ((estate)->es_per_tuple_exprcontext ? \ >>>>> (estate)->es_per_tuple_exprcontext : \ >>>>> MakePerTupleExprContext(estate)) >>>>> >>>>> and creates its own context: >>>>> /* >>>>> * Create working memory for expression evaluation in this context. >>>>> */ >>>>> econtext->ecxt_per_tuple_memory = >>>>> AllocSetContextCreate(estate->es_query_cxt, >>>>> "ExprContext", >>>>> >>>>> ALLOCSET_DEFAULT_SIZES); >>>>> >>>>> so this is currently just never reset. >>>> >>>> Actually, no. The ResetPerTupleExprContext boils down to >>>> >>>> MemoryContextReset((econtext)->ecxt_per_tuple_memory) >>>> >>>> and ExecEvalExprSwitchContext does this >>>> >>>> MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory); >>>> >>>> So it's resetting the right context, although only on batch boundary. >> >>>>> Seems just using ExecQualAndReset() ought to be sufficient? >>>>> >>>> >>>> That may still be the right thing to do. >>>> >>> >>> Actually, no, because that would reset the context far too early (and >>> it's easy to trigger segfaults). So the reset would have to happen after >>> processing the row, not this early. >> >> Yea, sorry, I was too tired yesterday evening. I'd spent 10h splitting >> up the pluggable storage patch into individual pieces... >> >> >>> But I think the current behavior is actually OK, as it matches what we >>> do for defexprs. And the comment before ResetPerTupleExprContext says this: >>> >>> /* >>> * Reset the per-tuple exprcontext. We can only do this if the >>> * tuple buffer is empty. (Calling the context the per-tuple >>> * memory context is a bit of a misnomer now.) >>> */ >>> >>> So the per-tuple context is not quite per-tuple anyway. Sure, we might >>> rework that but I don't think that's an issue in this patch. >> >> I'm *not* convinced by this. I think it's bad enough that we do this for >> normal COPY, but for WHEN, we could end up *never* resetting before the >> end. Consider a case where a single tuple is inserted, and then *all* >> rows are filtered. I think this needs a separate econtext that's reset >> every round. Or alternatively you could fix the code not to rely on >> per-tuple not being reset when tuples are buffered - that actually ought >> to be fairly simple. >> > > I think separating the per-tuple and per-batch contexts is the right > thing to do, here. It seems the batching was added somewhat later and > using the per-tuple context is rather confusing. >
OK, here is a WIP patch doing that. It creates a new "batch" context, and allocates tuples in it (instead of the per-tuple context). The per-tuple context is now reset always, irrespectedly of nBufferedTuples. And the batch context is reset every time the batch is emptied. It turned out to be a tad more complex due to partitioning, because when we find the partitions do not match, the tuple is already allocated in the "current" context (be it per-tuple or batch). So we can't just free the whole context at that point. The old code worked around this by alternating two contexts, but that seems a bit too cumbersome to me, so the patch simply copies the tuple to the new context. That allows us to reset the batch context always, right after emptying the buffer. I need to do some benchmarking to see if the extra copy causes any regression. Overall, separating the contexts makes it quite a bit clearer. I'm not entirely happy about the per-tuple context being "implicit" (hidden in executor context) while the batch context being explicitly created, but there's not much I can do about that. The patch also includes the fix correcting the volatility check on WHERE clause, although that shall be committed separately. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index c410e0a0dd..68d9409aef 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2323,9 +2323,9 @@ CopyFrom(CopyState cstate) ExprContext *econtext; TupleTableSlot *myslot; MemoryContext oldcontext = CurrentMemoryContext; + MemoryContext batchcxt; PartitionTupleRouting *proute = NULL; - ExprContext *secondaryExprContext = NULL; ErrorContextCallback errcallback; CommandId mycid = GetCurrentCommandId(true); int hi_options = 0; /* start with default heap_insert options */ @@ -2612,8 +2612,7 @@ CopyFrom(CopyState cstate) */ insertMethod = CIM_SINGLE; } - else if (cstate->whereClause != NULL || - contain_volatile_functions(cstate->whereClause)) + else if (contain_volatile_functions(cstate->whereClause)) { /* * Can't support multi-inserts if there are any volatile funcation @@ -2640,20 +2639,10 @@ CopyFrom(CopyState cstate) * Normally, when performing bulk inserts we just flush the insert * buffer whenever it becomes full, but for the partitioned table * case, we flush it whenever the current tuple does not belong to the - * same partition as the previous tuple, and since we flush the - * previous partition's buffer once the new tuple has already been - * built, we're unable to reset the estate since we'd free the memory - * in which the new tuple is stored. To work around this we maintain - * a secondary expression context and alternate between these when the - * partition changes. This does mean we do store the first new tuple - * in a different context than subsequent tuples, but that does not - * matter, providing we don't free anything while it's still needed. + * same partition as the previous tuple. */ if (proute) - { insertMethod = CIM_MULTI_CONDITIONAL; - secondaryExprContext = CreateExprContext(estate); - } else insertMethod = CIM_MULTI; @@ -2686,6 +2675,14 @@ CopyFrom(CopyState cstate) errcallback.previous = error_context_stack; error_context_stack = &errcallback; + /* + * Set up memory context for batches (in CIM_SINGLE mode this is equal + * to per-tuple context, effectively). + */ + batchcxt = AllocSetContextCreate(CurrentMemoryContext, + "copy batch context", + ALLOCSET_DEFAULT_SIZES); + for (;;) { TupleTableSlot *slot; @@ -2693,18 +2690,11 @@ CopyFrom(CopyState cstate) CHECK_FOR_INTERRUPTS(); - if (nBufferedTuples == 0) - { - /* - * Reset the per-tuple exprcontext. We can only do this if the - * tuple buffer is empty. (Calling the context the per-tuple - * memory context is a bit of a misnomer now.) - */ - ResetPerTupleExprContext(estate); - } + /* Reset the per-tuple exprcontext. */ + ResetPerTupleExprContext(estate); /* Switch into its memory context */ - MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); + MemoryContextSwitchTo(batchcxt); if (!NextCopyFrom(cstate, econtext, values, nulls)) break; @@ -2757,7 +2747,7 @@ CopyFrom(CopyState cstate) */ if (nBufferedTuples > 0) { - ExprContext *swapcontext; + MemoryContext oldcontext; CopyFromInsertBatch(cstate, estate, mycid, hi_options, prevResultRelInfo, myslot, bistate, @@ -2766,29 +2756,26 @@ CopyFrom(CopyState cstate) nBufferedTuples = 0; bufferedTuplesSize = 0; - Assert(secondaryExprContext); - /* - * Normally we reset the per-tuple context whenever - * the bufferedTuples array is empty at the beginning - * of the loop, however, it is possible since we flush - * the buffer here that the buffer is never empty at - * the start of the loop. To prevent the per-tuple - * context from never being reset we maintain a second - * context and alternate between them when the - * partition changes. We can now reset - * secondaryExprContext as this is no longer needed, - * since we just flushed any tuples stored in it. We - * also now switch over to the other context. This - * does mean that the first tuple in the buffer won't - * be in the same context as the others, but that does - * not matter since we only reset it after the flush. + * The tuple is allocated in the batch context, but we + * want to reset that (and keep the tuple). So we copy + * the tuple into the per-tuple context, do the reset + * and then copy the tuple back. */ - ReScanExprContext(secondaryExprContext); + oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); + tuple = heap_copytuple(tuple); + MemoryContextSwitchTo(oldcontext); + + /* free tuples from the batch we just processed */ + MemoryContextReset(batchcxt); - swapcontext = secondaryExprContext; - secondaryExprContext = estate->es_per_tuple_exprcontext; - estate->es_per_tuple_exprcontext = swapcontext; + /* copy the tuple back to the per-tuple context */ + oldcontext = MemoryContextSwitchTo(batchcxt); + tuple = heap_copytuple(tuple); + MemoryContextSwitchTo(oldcontext); + + /* and also store the copied tuple into the slot */ + ExecStoreHeapTuple(tuple, slot, false); } nPartitionChanges++; @@ -2894,10 +2881,10 @@ CopyFrom(CopyState cstate) slot = execute_attr_map_slot(map->attrMap, slot, new_slot); /* - * Get the tuple in the per-tuple context, so that it will be + * Get the tuple in the per-batch context, so that it will be * freed after each batch insert. */ - oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); + oldcontext = MemoryContextSwitchTo(batchcxt); tuple = ExecCopySlotHeapTuple(slot); MemoryContextSwitchTo(oldcontext); } @@ -2973,6 +2960,9 @@ CopyFrom(CopyState cstate) firstBufferedLineNo); nBufferedTuples = 0; bufferedTuplesSize = 0; + + /* free memory occupied by tuples from the batch */ + MemoryContextReset(batchcxt); } } else @@ -3054,6 +3044,8 @@ CopyFrom(CopyState cstate) MemoryContextSwitchTo(oldcontext); + MemoryContextDelete(batchcxt); + /* * In the old protocol, tell pqcomm that we can process normal protocol * messages again.