Re: Batch insert in CTAS/MatView code

Heikki Linnakangas Thu, 07 Mar 2019 00:54:39 -0800

On 06/03/2019 22:06, Paul Guo wrote:

The patch also modifies heap_multi_insert() a bit to do a bit furthercode-level optimization by using static memory, instead of using memorycontext and dynamic allocation.

If toasting is required, heap_prepare_insert() creates a palloc'd tuple.That is still leaked to the current memory context.

Leaking into the current memory context is not a bad thing, becauseresetting a memory context is faster than doing a lot of pfree() calls.The callers just need to be prepared for that, and use a short-livedmemory context.

By the way, while looking at the code, I noticed that there are 9 localarrays with large length in toast_insert_or_update() which seems to be arisk of stack overflow. Maybe we should put it as static or global.

Hmm. We currently reserve 512 kB between the kernel's limit, and thelimit we check in check_stack_depth(). See STACK_DEPTH_SLOP. Thosearrays add up to 52800 bytes on a 64-bit maching, if I did my mathright. So there's still a lot of headroom. I agree that it neverthelessseems a bit excessive, though.

With the patch,

Time: 4728.142 ms (00:04.728)
Time: 14203.983 ms (00:14.204)
Time: 1008.669 ms (00:01.009)

Baseline,
Time: 11096.146 ms (00:11.096)
Time: 13106.741 ms (00:13.107)
Time: 1100.174 ms (00:01.100)


Nice speedup!

While for toast and large column size there is < 10% decrease but forsmall column size the improvement is super good. Actually if I hardcodethe batch count as 4 all test cases are better but the improvement forsmall column size is smaller than that with current patch. Pretty muchthe number 4 is quite case specific so I can not hardcode that in thepatch. Of course we could further tune that but the current value seemsto be a good trade-off?

Have you done any profiling, on why the multi-insert is slower withlarge tuples? In principle, I don't see why it should be slower.


- Heikki

Re: Batch insert in CTAS/MatView code

Reply via email to