On Fri, Nov 27, 2020 at 12:22 PM Luc Vlaming <l...@swarm64.com> wrote: > > > > > I wonder, how can the do_multi_insert() API decide on when to flush, I > > mean, based on the number of pages to flush? Do we need to pass the > > maximum number of pages the buffered tuples can occupy and track the > > pages currently buffered tuples occupy to decide when to flush? Or is > > it something that the existing table AM infrastructure already > > supports? If we use the number of pages to decide on when to flush, > > how well it works with parallel inserts? > > I was assuming each tableam to use its own logic, based on its needs and > the tradeoffs a storage engine might want to provide. This does not mean > it should not consider outside parameters, like the aforementioned > memory usage. > I think it would imply that each tableam implements its own tracking > mechanism for how much has accumulated, how, and when to flush, because > they might track different statistics. IMHO given that each tableam > anyway would want to implement its own logic on how to store a slot into > a page, tracking the logic for tracking these statistics seemed minor to > me. Maybe I missed some parts that should be extracted out to a generic > interface however? >
Agree with you that tracking page level or some other info is dependent on table am implementations. > > Some examples of why a tableam could decide on its own on when to flush: > - the current heap implementation could accumulate a few pages (say up > to 64) and thereby limit the amount of calls to write() and limit the > accompanying blocks/context switches. This would also then make the > writes more sequential wrt the processes which can help with the > flushing I presume, like how the sequential scan was optimized to > process a consequtive set of blocks per worker (see > table_block_parallelscan_nextpage). > - something like zheap could accumulate data based on the amount of > columns so that a page with column data is completely filled, thereby > limiting the write amplification. > - something that would implement an lsm storage might accumulate a full > in-memory level before flushing it out. > Thanks for the details. > > >> One thing I'm wondering is in which memory context the slots end up > >> being allocated. I'd assume we would want to keep the slots around > >> between flushes. If they are in the temporary context this might prove > >> problematic however? > >> > > > > I should not have used the word temporary, it actually is not > > temporary. This memory conext will be created in begin_multi_insert(), > > all the buffered tuples are copied using this context, it will be > > reset at the end of each flush and reused. It can get destroyed at the > > end in end_multi_insert(). I think we should even do this with the new > > APIs implementation. > > > > Okay. But in which context are the slots themselves allocated then? > Because if we allocate the slots themselves in the context and then > reset with each flush we would have to also re-allocate the slots every > flush, which seems wasteful to me? > Buffer slots are allocated in the memory context in which the new APIs get called. We don't have to re-allocate the slots every time after flushing, but we have to clear them using ExecClearTuple() and reuse. And the memory context I specified in the MultiInsertStateData structure is for using table_multi_insert() inside the new do_multi_insert API after we decide to flush. There's a comment in the existing table_multi_insert() usage in copy code, which says that table_multi_insert() may leak the memory, for the same reason we need that temporary memory context, which gets set just before table_multi_insert(), and reset after that. This happens for each batch of tuples. And in the end this context can be deleted in the end_multi_insert API. Hope this helps. I'm planning to summarize and post the new APIs description here again for other opinions. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com