On 06/03/2019 22:06, Paul Guo wrote:
The patch also modifies heap_multi_insert() a bit to do a bit further code-level optimization by using static memory, instead of using memory context and dynamic allocation.

If toasting is required, heap_prepare_insert() creates a palloc'd tuple. That is still leaked to the current memory context.

Leaking into the current memory context is not a bad thing, because resetting a memory context is faster than doing a lot of pfree() calls. The callers just need to be prepared for that, and use a short-lived memory context.

By the way, while looking at the code, I noticed that there are 9 local arrays with large length in toast_insert_or_update() which seems to be a risk of stack overflow. Maybe we should put it as static or global.

Hmm. We currently reserve 512 kB between the kernel's limit, and the limit we check in check_stack_depth(). See STACK_DEPTH_SLOP. Those arrays add up to 52800 bytes on a 64-bit maching, if I did my math right. So there's still a lot of headroom. I agree that it nevertheless seems a bit excessive, though.

With the patch,

Time: 4728.142 ms (00:04.728)
Time: 14203.983 ms (00:14.204)
Time: 1008.669 ms (00:01.009)

Baseline,
Time: 11096.146 ms (00:11.096)
Time: 13106.741 ms (00:13.107)
Time: 1100.174 ms (00:01.100)

Nice speedup!

While for toast and large column size there is < 10% decrease but for small column size the improvement is super good. Actually if I hardcode the batch count as 4 all test cases are better but the improvement for small column size is smaller than that with current patch. Pretty much the number 4 is quite case specific so I can not hardcode that in the patch. Of course we could further tune that but the current value seems to be a good trade-off?

Have you done any profiling, on why the multi-insert is slower with large tuples? In principle, I don't see why it should be slower.

- Heikki

Reply via email to