RE: POC: postgres_fdw insert batching

tsunakawa.ta...@fujitsu.com Wed, 18 Nov 2020 18:43:36 -0800

From: Tomas Vondra <tomas.von...@enterprisedb.com>
> Unfortunately, this does not compile for me, because nodeModifyTable calls
> ExecGetTouchedPartitions, which is not defined anywhere. Not sure what's
> that about, so I simply commented-out this. That probably fails the 
> partitioned
> cases, but it allowed me to do some review and testing.


Ouch, sorry.  I'm ashamed to have forgotten including execPartition.c.


> The are a couple other smaller changes. E.g. it undoes changes to
> finish_foreign_modify, and instead calls separate functions to prepare the 
> bulk
> statement. It also adds list_make5/list_make6 macros, so as to not have to do
> strange stuff with the parameter lists.

Thanks, I'll take them thankfully!  I wonder why I didn't think of separating 
deallocate_query() from finish_foreign_modify() ... perhaps my brain was dying. 
 As for list_make5/6(), I saw your first patch avoid adding them, so I thought 
you found them ugly (and I felt so, too.)  But thinking now, there's no reason 
to hesitate it.


> A finally, this should probably add a bunch of regression tests.

Sure.


> 1) As I mentioned before, I really don't think we should be doing deparsing in
> execute_foreign_modify - that's something that should happen earlier, and
> should be in a deparse.c function.
...
> The attached patch tries to address both of these points.
> 
> Firstly, it adds a new deparseBulkInsertSql function, that builds a query for 
> the
> "full" batch, and then uses those two queries - when we get a full batch we 
> use
> the bulk query, otherwise we use the single-row query in a loop. IMO this is
> cleaner than deparsing queries ad hoc in the execute_foreign_modify.
...
> Of course, this might be worse when we don't have a full batch, e.g. for a 
> query
> that insert only 50 rows with batch_size=100. If this case is common, one
> option would be lowering the batch_size accordingly. If we really want to
> improve this case too, I suggest we pass more info than just a position of the
> VALUES clause - that seems a bit too hackish.
...
> Secondly, it adds the batch_size option to server/foreign table, and uses 
> that.
> This is not complete, though. postgresPlanForeignModify currently passes a
> hard-coded value at the moment, it needs to lookup the correct value for the
> server/table from RelOptInfo or something. And I suppose ModifyTable
> inftractructure will need to determine the value in order to pass the correct
> number of slots to the FDW API.

I can sort of understand your feeling, but I'd like to reconstruct the query 
and prepare it in execute_foreign_modify() because:

* Some of our customers use bulk insert in ECPG (INSERT ... VALUES(record1, 
(record2), ...) to insert variable number of records per query.  (Oracle's 
Pro*C has such a feature.)  So, I want to be prepared to enable such a thing 
with FDW.

* The number of records to insert is not known during planning (in general), so 
it feels natural to get prepared during execution phase, or not unnatural at 
least.

* I wanted to avoid the overhead of building the full query string for 
100-record insert statement during query planning, because it may be a bit 
costly for usual 1-record inserts.  (The overhead may be hidden behind the high 
communication cost of postgres_fdw, though.)

So, in terms of code cleanness, how about moving my code for rebuilding query 
string from execute_foreign_modify() to some new function in deparse.c?


> 2) I think the GUC should be replaced with an server/table option, similar to
> fetch_size.

Hmm, batch_size differs from fetch_size.  fetch_size is a postgres_fdw-specific 
feature with no relevant FDW routine, while batch_size is a configuration 
parameter for all FDWs that implement ExecForeignBulkInsert().  The ideas I can 
think of are:

1. Follow JDBC/ODBC and add standard FDW properties.  For example, the JDBC 
standard defines standard connection pool properties such as maxPoolSize and 
minPoolSize.  JDBC drivers have to provide them with those defined names.  
Likewise, the FDW interface requires FDW implementors to handle the foreign 
server option name "max_bulk_insert_tuples" if he/she wants to provide bulk 
insert feature and implement ExecForeignBulkInsert().  The core executor gets 
that setting from the FDW by calling a new FDW routine like 
GetMaxBulkInsertTuples().  Sigh...

2. Add a new max_bulk_insert_tuples reloption to CREATE/ALTER FOREIGN TABLE.  
executor gets the value from Relation and uses it.  (But is this a 
table-specific configuration?  I don't think so, sigh...)

3. Adopt the current USERSET GUC max_bulk_insert_tuples.  I think this is 
enough because the user can change the setting per session, application, and 
database.



Regards
Takayuki Tsunakawa

RE: POC: postgres_fdw insert batching

Reply via email to