On Tue, Mar 26, 2024 at 9:07 PM Jeff Davis <pg...@j-davis.com> wrote: > > On Tue, 2024-03-26 at 01:28 +0530, Bharath Rupireddy wrote: > > I'm thinking > > of dropping the COPY FROM patch using the new multi insert API for > > the > > following reasons: ... > > I agree with all of this. We do want COPY ... FROM support, but there > are some details to work out and we don't want to make a big code > change at this point in the cycle.
Right. > > Please see the attached v14 patches. > > * No need for a 'kind' field in TableModifyState. The state should be > aware of the kinds of changes that it has received and that may need to > be flushed later -- for now, only inserts, but possibly updates/deletes > in the future. Removed 'kind' field with lazy initialization of required AM specific modify (insert in this case) state. Since we don't have 'kind', I chose the callback approach to cleanup the modify (insert in this case) specific state at the end. > * If the AM doesn't support the bulk methods, fall back to retail > inserts instead of throwing an error. For instance, CREATE MATERIALIZED VIEW foo_mv AS SELECT * FROM foo USING bar_tam; doesn't work if bar_tam doesn't have the table_tuple_insert implemented. Similarly, with this new AM, the onus lies on the table AM implementers to provide an implementation for these new AMs even if they just do single inserts. But, I do agree that we must catch this ahead during parse analysis itself, so I've added assertions in GetTableAmRoutine(). > * It seems like this API will eventually replace table_multi_insert and > table_finish_bulk_insert completely. Do those APIs have any advantage > remaining over the new one proposed here? table_multi_insert needs to be there for sure as COPY ... FROM uses it. Not sure if we need to remove the optional callback table_finish_bulk_insert though. Heap AM doesn't implement one, but some other AM might. Having said that, with this new AM, whatever the logic that used to be there in table_finish_bulk_insert previously, table AM implementers will have to move them to table_modify_end. FWIW, I can try writing a test table AM that uses this new AM but just does single inserts, IOW, equivalent to table_tuple_insert(). Thoughts? > * Right now I don't any important use of the flush method. It seems > that could be accomplished in the finish method, and flush could just > be an internal detail when the memory is exhausted. If we find a use > for it later, we can always add it, but right now it seems unnecessary. Firstly, we are not storing CommandId and options in TableModifyState, because we expect CommandId to be changing (per Andres comment). Secondly, we don't want to pass just the CommandId and options to table_modify_end(). Thirdly, one just has to call the table_modify_buffer_flush before the table_modify_end. Do you have any other thoughts here? > * We need to be careful about cases where the command can be successful > but the writes are not flushed. I don't tihnk that's a problem with the > current patch, but we will need to do something here when we expand to > INSERT INTO ... SELECT. You mean, writes are not flushed to the disk? Can you please elaborate why it's different for INSERT INTO ... SELECT and not others? Can't the new flush AM be helpful here to implement any flush related things? Please find the attached v15 patches with the above review comments addressed. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
v15-0001-Introduce-new-table-modify-access-methods.patch
Description: Binary data
v15-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch
Description: Binary data
v15-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch
Description: Binary data