Re: COPY FROM WHEN condition

Tomas Vondra Tue, 22 Jan 2019 09:36:13 -0800


On 1/21/19 11:15 PM, Tomas Vondra wrote:
> 
> 
> On 1/21/19 7:51 PM, Andres Freund wrote:
>> Hi,
>>
>> On 2019-01-21 16:22:11 +0100, Tomas Vondra wrote:
>>>
>>>
>>> On 1/21/19 4:33 AM, Tomas Vondra wrote:
>>>>
>>>>
>>>> On 1/21/19 3:12 AM, Andres Freund wrote:
>>>>> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
>>>>>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 1/20/19 8:24 PM, Andres Freund wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
>>>>>>>>> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>>>>>>>>>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>>>>>>>>>> <[email protected] 
>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      Can you also update the docs to mention that the functions 
>>>>>>>>>>> called from
>>>>>>>>>>>      the WHERE clause does not see effects of the COPY itself?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> /Of course, i  also add same comment to insertion method selection
>>>>>>>>>>> /
>>>>>>>>>>
>>>>>>>>>> FWIW I've marked this as RFC and plan to get it committed this week.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Pushed, thanks for the patch.
>>>>>>>>
>>>>>>>> While rebasing the pluggable storage patch ontop of this I noticed that
>>>>>>>> the qual appears to be evaluated in query context. Isn't that a bad
>>>>>>>> idea? ISMT it should have been evaluated a few lines above, before the:
>>>>>>>>
>>>>>>>>                /* Triggers and stuff need to be invoked in query 
>>>>>>>> context. */
>>>>>>>>                MemoryContextSwitchTo(oldcontext);
>>>>>>>>
>>>>>>>> Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but 
>>>>>>> I'll
>>>>>>> fix that tomorrow.
>>>>>>
>>>>>> NP. On second thought, the problem is probably smaller than I thought at
>>>>>> first, because ExecQual() switches to the econtext's per-tuple memory
>>>>>> context. But it's only reset once for each batch, so there's some
>>>>>> wastage. At least worth a comment.
>>>>>
>>>>> I'm tired, but perhaps its actually worse - what's being reset currently
>>>>> is the ESTate's per-tuple context:
>>>>>
>>>>>           if (nBufferedTuples == 0)
>>>>>           {
>>>>>                   /*
>>>>>                    * Reset the per-tuple exprcontext. We can only do this 
>>>>> if the
>>>>>                    * tuple buffer is empty. (Calling the context the 
>>>>> per-tuple
>>>>>                    * memory context is a bit of a misnomer now.)
>>>>>                    */
>>>>>                   ResetPerTupleExprContext(estate);
>>>>>           }
>>>>>
>>>>> but the quals are evaluated in the ExprContext's:
>>>>>
>>>>> ExecQual(ExprState *state, ExprContext *econtext)
>>>>> ...
>>>>>   ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
>>>>>
>>>>>
>>>>> which is created with:
>>>>>
>>>>> /* Get an EState's per-output-tuple exprcontext, making it if first use */
>>>>> #define GetPerTupleExprContext(estate) \
>>>>>   ((estate)->es_per_tuple_exprcontext ? \
>>>>>    (estate)->es_per_tuple_exprcontext : \
>>>>>    MakePerTupleExprContext(estate))
>>>>>
>>>>> and creates its own context:
>>>>>   /*
>>>>>    * Create working memory for expression evaluation in this context.
>>>>>    */
>>>>>   econtext->ecxt_per_tuple_memory =
>>>>>           AllocSetContextCreate(estate->es_query_cxt,
>>>>>                                                     "ExprContext",
>>>>>                                                     
>>>>> ALLOCSET_DEFAULT_SIZES);
>>>>>
>>>>> so this is currently just never reset.
>>>>
>>>> Actually, no. The ResetPerTupleExprContext boils down to
>>>>
>>>>     MemoryContextReset((econtext)->ecxt_per_tuple_memory)
>>>>
>>>> and ExecEvalExprSwitchContext does this
>>>>
>>>>     MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
>>>>
>>>> So it's resetting the right context, although only on batch boundary.
>>
>>>>> Seems just using ExecQualAndReset() ought to be sufficient?
>>>>>
>>>>
>>>> That may still be the right thing to do.
>>>>
>>>
>>> Actually, no, because that would reset the context far too early (and
>>> it's easy to trigger segfaults). So the reset would have to happen after
>>> processing the row, not this early.
>>
>> Yea, sorry, I was too tired yesterday evening. I'd spent 10h splitting
>> up the pluggable storage patch into individual pieces...
>>
>>
>>> But I think the current behavior is actually OK, as it matches what we
>>> do for defexprs. And the comment before ResetPerTupleExprContext says this:
>>>
>>>     /*
>>>      * Reset the per-tuple exprcontext. We can only do this if the
>>>      * tuple buffer is empty. (Calling the context the per-tuple
>>>      * memory context is a bit of a misnomer now.)
>>>      */
>>>
>>> So the per-tuple context is not quite per-tuple anyway. Sure, we might
>>> rework that but I don't think that's an issue in this patch.
>>
>> I'm *not* convinced by this. I think it's bad enough that we do this for
>> normal COPY, but for WHEN, we could end up *never* resetting before the
>> end. Consider a case where a single tuple is inserted, and then *all*
>> rows are filtered.  I think this needs a separate econtext that's reset
>> every round. Or alternatively you could fix the code not to rely on
>> per-tuple not being reset when tuples are buffered - that actually ought
>> to be fairly simple.
>>
> 
> I think separating the per-tuple and per-batch contexts is the right
> thing to do, here. It seems the batching was added somewhat later and
> using the per-tuple context is rather confusing.
>


OK, here is a WIP patch doing that. It creates a new "batch" context,
and allocates tuples in it (instead of the per-tuple context). The
per-tuple context is now reset always, irrespectedly of nBufferedTuples.
And the batch context is reset every time the batch is emptied.

It turned out to be a tad more complex due to partitioning, because when
we find the partitions do not match, the tuple is already allocated in
the "current" context (be it per-tuple or batch). So we can't just free
the whole context at that point. The old code worked around this by
alternating two contexts, but that seems a bit too cumbersome to me, so
the patch simply copies the tuple to the new context. That allows us to
reset the batch context always, right after emptying the buffer. I need
to do some benchmarking to see if the extra copy causes any regression.

Overall, separating the contexts makes it quite a bit clearer. I'm not
entirely happy about the per-tuple context being "implicit" (hidden in
executor context) while the batch context being explicitly created, but
there's not much I can do about that.

The patch also includes the fix correcting the volatility check on WHERE
clause, although that shall be committed separately.

regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c410e0a0dd..68d9409aef 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2323,9 +2323,9 @@ CopyFrom(CopyState cstate)
 	ExprContext *econtext;
 	TupleTableSlot *myslot;
 	MemoryContext oldcontext = CurrentMemoryContext;
+	MemoryContext batchcxt;
 
 	PartitionTupleRouting *proute = NULL;
-	ExprContext *secondaryExprContext = NULL;
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			hi_options = 0; /* start with default heap_insert options */
@@ -2612,8 +2612,7 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
-	else if (cstate->whereClause != NULL ||
-			 contain_volatile_functions(cstate->whereClause))
+	else if (contain_volatile_functions(cstate->whereClause))
 	{
 		/*
 		 * Can't support multi-inserts if there are any volatile funcation
@@ -2640,20 +2639,10 @@ CopyFrom(CopyState cstate)
 		 * Normally, when performing bulk inserts we just flush the insert
 		 * buffer whenever it becomes full, but for the partitioned table
 		 * case, we flush it whenever the current tuple does not belong to the
-		 * same partition as the previous tuple, and since we flush the
-		 * previous partition's buffer once the new tuple has already been
-		 * built, we're unable to reset the estate since we'd free the memory
-		 * in which the new tuple is stored.  To work around this we maintain
-		 * a secondary expression context and alternate between these when the
-		 * partition changes.  This does mean we do store the first new tuple
-		 * in a different context than subsequent tuples, but that does not
-		 * matter, providing we don't free anything while it's still needed.
+		 * same partition as the previous tuple.
 		 */
 		if (proute)
-		{
 			insertMethod = CIM_MULTI_CONDITIONAL;
-			secondaryExprContext = CreateExprContext(estate);
-		}
 		else
 			insertMethod = CIM_MULTI;
 
@@ -2686,6 +2675,14 @@ CopyFrom(CopyState cstate)
 	errcallback.previous = error_context_stack;
 	error_context_stack = &errcallback;
 
+	/*
+	 * Set up memory context for batches (in CIM_SINGLE mode this is equal
+	 * to per-tuple context, effectively).
+	 */
+	batchcxt = AllocSetContextCreate(CurrentMemoryContext,
+									 "copy batch context",
+									 ALLOCSET_DEFAULT_SIZES);
+
 	for (;;)
 	{
 		TupleTableSlot *slot;
@@ -2693,18 +2690,11 @@ CopyFrom(CopyState cstate)
 
 		CHECK_FOR_INTERRUPTS();
 
-		if (nBufferedTuples == 0)
-		{
-			/*
-			 * Reset the per-tuple exprcontext. We can only do this if the
-			 * tuple buffer is empty. (Calling the context the per-tuple
-			 * memory context is a bit of a misnomer now.)
-			 */
-			ResetPerTupleExprContext(estate);
-		}
+		/* Reset the per-tuple exprcontext. */
+		ResetPerTupleExprContext(estate);
 
 		/* Switch into its memory context */
-		MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+		MemoryContextSwitchTo(batchcxt);
 
 		if (!NextCopyFrom(cstate, econtext, values, nulls))
 			break;
@@ -2757,7 +2747,7 @@ CopyFrom(CopyState cstate)
 					 */
 					if (nBufferedTuples > 0)
 					{
-						ExprContext *swapcontext;
+						MemoryContext	oldcontext;
 
 						CopyFromInsertBatch(cstate, estate, mycid, hi_options,
 											prevResultRelInfo, myslot, bistate,
@@ -2766,29 +2756,26 @@ CopyFrom(CopyState cstate)
 						nBufferedTuples = 0;
 						bufferedTuplesSize = 0;
 
-						Assert(secondaryExprContext);
-
 						/*
-						 * Normally we reset the per-tuple context whenever
-						 * the bufferedTuples array is empty at the beginning
-						 * of the loop, however, it is possible since we flush
-						 * the buffer here that the buffer is never empty at
-						 * the start of the loop.  To prevent the per-tuple
-						 * context from never being reset we maintain a second
-						 * context and alternate between them when the
-						 * partition changes.  We can now reset
-						 * secondaryExprContext as this is no longer needed,
-						 * since we just flushed any tuples stored in it.  We
-						 * also now switch over to the other context.  This
-						 * does mean that the first tuple in the buffer won't
-						 * be in the same context as the others, but that does
-						 * not matter since we only reset it after the flush.
+						 * The tuple is allocated in the batch context, but we
+						 * want to reset that (and keep the tuple). So we copy
+						 * the tuple into the per-tuple context, do the reset
+						 * and then copy the tuple back.
 						 */
-						ReScanExprContext(secondaryExprContext);
+						oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+						tuple = heap_copytuple(tuple);
+						MemoryContextSwitchTo(oldcontext);
+
+						/* free tuples from the batch we just processed */
+						MemoryContextReset(batchcxt);
 
-						swapcontext = secondaryExprContext;
-						secondaryExprContext = estate->es_per_tuple_exprcontext;
-						estate->es_per_tuple_exprcontext = swapcontext;
+						/* copy the tuple back to the per-tuple context */
+						oldcontext = MemoryContextSwitchTo(batchcxt);
+						tuple = heap_copytuple(tuple);
+						MemoryContextSwitchTo(oldcontext);
+
+						/* and also store the copied tuple into the slot */
+						ExecStoreHeapTuple(tuple, slot, false);
 					}
 
 					nPartitionChanges++;
@@ -2894,10 +2881,10 @@ CopyFrom(CopyState cstate)
 				slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
 
 				/*
-				 * Get the tuple in the per-tuple context, so that it will be
+				 * Get the tuple in the per-batch context, so that it will be
 				 * freed after each batch insert.
 				 */
-				oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+				oldcontext = MemoryContextSwitchTo(batchcxt);
 				tuple = ExecCopySlotHeapTuple(slot);
 				MemoryContextSwitchTo(oldcontext);
 			}
@@ -2973,6 +2960,9 @@ CopyFrom(CopyState cstate)
 											firstBufferedLineNo);
 						nBufferedTuples = 0;
 						bufferedTuplesSize = 0;
+
+						/* free memory occupied by tuples from the batch */
+						MemoryContextReset(batchcxt);
 					}
 				}
 				else
@@ -3054,6 +3044,8 @@ CopyFrom(CopyState cstate)
 
 	MemoryContextSwitchTo(oldcontext);
 
+	MemoryContextDelete(batchcxt);
+
 	/*
 	 * In the old protocol, tell pqcomm that we can process normal protocol
 	 * messages again.

Re: COPY FROM WHEN condition

Reply via email to