Attached is a diff containing the original (working) patch from my (incomplete) patch, plus regression test changes and documentation changes.
While it's easy to regression-test the persistence of the fetch_size options, I am confounded as to how we would show that the fetch_size setting was respected. I've seen it with my own eyes viewing the query log on redshift, but I see no way to automate that. Suggestions welcome. On Fri, Feb 6, 2015 at 3:11 AM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello, > > > Redshift has a table, stv_inflight, which serves about the same purpose > as > > pg_stat_activity. Redshift seems to perform better with very high fetch > > sizes (10,000 is a good start), so the default foreign data wrapper > didn't > > perform so well. > > I agree with you. > > > I was able to confirm that the first query showed "FETCH 150 FROM c1" as > > the query, which is normal highly unhelpful, but in this case it confirms > > that tables created in redshift_server do by default use the fetch_size > > option given during server creation. > > > > I was also able to confirm that the second query showed "FETCH 151 FROM > c1" > > as the query, which shows that per-table overrides also work. > > > > I'd be happy to stop here, but Kyotaro might feel differently. > > This is enough in its own way, of course. > > > With this > > limited patch, one could estimate the number of rows that would fit into > > the desired memory block based on the row width and set fetch_size > > accordingly. > > The users including me will be happy with it when the users know > how to determin the fetch size. Especially the remote tables are > very few or the configuration will be enough stable. > > On widely distributed systems, it would be far difficult to tune > fetch size manually for every foreign tables, so finally it would > be left at the default and safe size, it's 100 or so. > > This is the similar discussion about max_wal_size on another > thread. Calculating fetch size is far tougher for users than > setting expected memory usage, I think. > > > But we could go further, and have a fetch_max_memory option only at the > > table level, and the fdw could do that same memory / estimated_row_size > > calculation and derive fetch_size that way *at table creation time*, not > > query time. > > We cannot know the real length of the text type data in advance, > other than that, even defined as char(n), the n is the > theoretically(or in-design) maximum size for the field but in the > most cases the mean length of the real data would be far small > than that. For that reason, calculating the ratio at the table > creation time seems to be difficult. > > However, I agree to the Tom's suggestion that the changes in > FETCH statement is defenitly ugly, especially the "overhead" > argument is prohibitive even for me:( > > > Thanks to Kyotaro and Bruce Momjian for their help. > > Not at all. > > regardes, > > > At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker <corey.huin...@gmail.com> > wrote in <CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN= > p6mxmg7s86c...@mail.gmail.com> > > I applied this patch to REL9_4_STABLE, and I was able to connect to a > > foreign database (redshift, actually). > > > > the basic outline of the test is below, names changed to protect my > > employment. > > > > create extension if not exists postgres_fdw; > > > > create server redshift_server foreign data wrapper postgres_fdw > > options ( host 'some.hostname.ext', port '5439', dbname 'redacted', > > fetch_size '150' ); > > > > create user mapping for public server redshift_server options ( user > > 'redacted_user', password 'comeonyouarekiddingright' ); > > > > create foreign table redshift_tab150 ( <colspecs> ) > > server redshift_server options (table_name 'redacted_table', schema_name > > 'redacted_schema' ); > > > > create foreign table redshift_tab151 ( <colspecs> ) > > server redshift_server options (table_name 'redacted_table', schema_name > > 'redacted_schema', fetch_size '151' ); > > > > -- i don't expect the fdw to push the aggregate, this is just a test to > see > > what query shows up in stv_inflight. > > select count(*) from redshift_ccp150; > > > > -- i don't expect the fdw to push the aggregate, this is just a test to > see > > what query shows up in stv_inflight. > > select count(*) from redshift_ccp151; > > > > > > For those of you that aren't familiar with Redshift, it's a columnar > > database that seems to be a fork of postgres 8.cough. You can connect to > it > > with modern libpq programs (psql, psycopg2, etc). > > Redshift has a table, stv_inflight, which serves about the same purpose > as > > pg_stat_activity. Redshift seems to perform better with very high fetch > > sizes (10,000 is a good start), so the default foreign data wrapper > didn't > > perform so well. > > > > I was able to confirm that the first query showed "FETCH 150 FROM c1" as > > the query, which is normal highly unhelpful, but in this case it confirms > > that tables created in redshift_server do by default use the fetch_size > > option given during server creation. > > > > I was also able to confirm that the second query showed "FETCH 151 FROM > c1" > > as the query, which shows that per-table overrides also work. > > > > I'd be happy to stop here, but Kyotaro might feel differently. With this > > limited patch, one could estimate the number of rows that would fit into > > the desired memory block based on the row width and set fetch_size > > accordingly. > > > > But we could go further, and have a fetch_max_memory option only at the > > table level, and the fdw could do that same memory / estimated_row_size > > calculation and derive fetch_size that way *at table creation time*, not > > query time. > > > > Thanks to Kyotaro and Bruce Momjian for their help. > > > > > > > > > > > > > > On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI < > > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > > > Hmm, somehow I removed some recipients, especially the > > > list. Sorry for the duplicate. > > > > > > ----- > > > Sorry, I've been back. Thank you for the comment. > > > > > > > Do you have any insight into where I would pass the custom row > fetches > > > from > > > > the table struct to the scan struct? > > > > > > Yeah it's one simple way to tune it, if the user knows the > > > appropreate value. > > > > > > > Last year I was working on a patch to postgres_fdw where the > fetch_size > > > > could be set at the table level and the server level. > > > > > > > > I was able to get the settings parsed and they would show up in > > > > pg_foreign_table > > > > and pg_foreign_servers. Unfortunately, I'm not very familiar with how > > > > foreign data wrappers work, so I wasn't able to figure out how to get > > > these > > > > custom values passed from the PgFdwRelationInfo struct into the > > > > query's PgFdwScanState > > > > struct. > > > > > > Directly answering, the states needed to be shared among several > > > stages are holded within fdw_private. Your new variable > > > fpinfo->fetch_size can be read in postgresGetForeignPlan. It > > > newly creates another fdw_private. You can pass your values to > > > ForeignPlan making it hold the value there. Finally, you will get > > > it in postgresBeginForeginScan and can set it into > > > PgFdwScanState. > > > > > > > I bring this up only because it might be a simpler solution, in that > the > > > > table designer could set the fetch size very high for narrow tables, > and > > > > lower or default for wider tables. It's also a very clean syntax, > just > > > > another option on the table and/or server creation. > > > > > > > > My incomplete patch is attached. > > > > > > However, the fetch_size is not needed by planner (so far), so we > > > can simply read the options in postgresBeginForeignScan() and set > > > into PgFdwScanState. This runs once per exection. > > > > > > Finally, the attached patch will work as you intended. > > > > > > What do you think about this? > > > > > > regards, > > > > > > -- > > > Kyotaro Horiguchi > > > NTT Open Source Software Center > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > >
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 583cce7..84334e6 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -106,7 +106,8 @@ ALTER SERVER testserver1 OPTIONS ( sslcert 'value', sslkey 'value', sslrootcert 'value', - sslcrl 'value' + sslcrl 'value', + fetch_size '101' --requirepeer 'value', -- krbsrvname 'value', -- gsslib 'value', @@ -114,18 +115,34 @@ ALTER SERVER testserver1 OPTIONS ( ); ALTER USER MAPPING FOR public SERVER testserver1 OPTIONS (DROP user, DROP password); -ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1'); +ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1', fetch_size '102'); ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1'); ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1'); ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1'); \det+ - List of foreign tables - Schema | Table | Server | FDW Options | Description ---------+-------+----------+---------------------------------------+------------- - public | ft1 | loopback | (schema_name 'S 1', table_name 'T 1') | - public | ft2 | loopback | (schema_name 'S 1', table_name 'T 1') | + List of foreign tables + Schema | Table | Server | FDW Options | Description +--------+-------+----------+---------------------------------------------------------+------------- + public | ft1 | loopback | (schema_name 'S 1', table_name 'T 1', fetch_size '102') | + public | ft2 | loopback | (schema_name 'S 1', table_name 'T 1') | (2 rows) +-- Test what options made it into pg_foreign_server. +-- Filter for just the server we created. +SELECT srvoptions FROM pg_foreign_server WHERE srvname = 'testserver1'; + srvoptions +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + {use_remote_estimate=false,updatable=true,fdw_startup_cost=123.456,fdw_tuple_cost=0.123,service=value,connect_timeout=value,dbname=value,host=value,hostaddr=value,port=value,application_name=value,keepalives=value,keepalives_idle=value,keepalives_interval=value,sslcompression=value,sslmode=value,sslcert=value,sslkey=value,sslrootcert=value,sslcrl=value,fetch_size=101} +(1 row) + +-- Test what options made it into pg_foreign_table. +-- Filter this heavily because we cannot specify which foreign server. +SELECT ftoptions FROM pg_foreign_table WHERE ftoptions @> array['table_name=T 1','fetch_size=102']; + ftoptions +----------------------------------------------------- + {"schema_name=S 1","table_name=T 1",fetch_size=102} +(1 row) + -- Now we should be able to run ANALYZE. -- To exercise multiple code paths, we use local stats on ft1 -- and remote-estimate mode on ft2. diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c index 7547ec2..2a3ab7d 100644 --- a/contrib/postgres_fdw/option.c +++ b/contrib/postgres_fdw/option.c @@ -153,6 +153,12 @@ InitPgFdwOptions(void) /* updatable is available on both server and table */ {"updatable", ForeignServerRelationId, false}, {"updatable", ForeignTableRelationId, false}, + /* + * fetch_size is available on both server and table, the table setting + * overrides the server setting. + */ + {"fetch_size", ForeignServerRelationId, false}, + {"fetch_size", ForeignTableRelationId, false}, {NULL, InvalidOid, false} }; diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 63f0577..733ffb6 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -134,6 +134,7 @@ typedef struct PgFdwScanState /* extracted fdw_private data */ char *query; /* text of SELECT command */ List *retrieved_attrs; /* list of retrieved attribute numbers */ + int fetch_size; /* number of tuples per fetch */ /* for remote query execution */ PGconn *conn; /* connection for the scan */ @@ -871,6 +872,22 @@ postgresGetForeignPlan(PlannerInfo *root, fdw_private); } +static DefElem* +get_option(List *options, char *optname) +{ + ListCell *lc; + + foreach(lc, options) + { + DefElem *def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, optname) == 0) + return def; + } + return NULL; +} + + /* * postgresBeginForeignScan * Initiate an executor scan of a foreign PostgreSQL table. @@ -889,6 +906,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags) int numParams; int i; ListCell *lc; + DefElem *def; /* * Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays NULL. @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags) server = GetForeignServer(table->serverid); user = GetUserMapping(userid, server->serverid); + /* Reading table options */ + fsstate->fetch_size = -1; + + def = get_option(table->options, "fetch_size"); + if (!def) + def = get_option(server->options, "fetch_size"); + + if (def) + { + fsstate->fetch_size = strtod(defGetString(def), NULL); + if (fsstate->fetch_size < 0) + elog(ERROR, "invalid fetch size for foreign table \"%s\"", + get_rel_name(table->relid)); + } + else + fsstate->fetch_size = 100; + /* * Get connection to the foreign server. Connection manager will * establish new connection if necessary. @@ -2031,7 +2066,7 @@ fetch_more_data(ForeignScanState *node) int i; /* The fetch size is arbitrary, but shouldn't be enormous. */ - fetch_size = 100; + fetch_size = fsstate->fetch_size; snprintf(sql, sizeof(sql), "FETCH %d FROM c%u", fetch_size, fsstate->cursor_number); diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 83e8fa7..d12925a 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -115,7 +115,8 @@ ALTER SERVER testserver1 OPTIONS ( sslcert 'value', sslkey 'value', sslrootcert 'value', - sslcrl 'value' + sslcrl 'value', + fetch_size '101' --requirepeer 'value', -- krbsrvname 'value', -- gsslib 'value', @@ -123,12 +124,20 @@ ALTER SERVER testserver1 OPTIONS ( ); ALTER USER MAPPING FOR public SERVER testserver1 OPTIONS (DROP user, DROP password); -ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1'); +ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1', fetch_size '102'); ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1'); ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1'); ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1'); \det+ +-- Test what options made it into pg_foreign_server. +-- Filter for just the server we created. +SELECT srvoptions FROM pg_foreign_server WHERE srvname = 'testserver1'; + +-- Test what options made it into pg_foreign_table. +-- Filter this heavily because we cannot specify which foreign server. +SELECT ftoptions FROM pg_foreign_table WHERE ftoptions @> array['table_name=T 1','fetch_size=102']; + -- Now we should be able to run ANALYZE. -- To exercise multiple code paths, we use local stats on ft1 -- and remote-estimate mode on ft2. diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml index 43adb61..02b004d 100644 --- a/doc/src/sgml/postgres-fdw.sgml +++ b/doc/src/sgml/postgres-fdw.sgml @@ -294,6 +294,34 @@ </sect3> <sect3> + <title>Fetch Size Options</title> + + <para> + By default, rows are fetched from the remote server 100 at a time. + This may be overridden using the following option: + </para> + + <variablelist> + + <varlistentry> + <term><literal>fetch_size</literal></term> + <listitem> + <para> + This option specifies the number of rows <filename>postgres_fdw</> + should get in each fetch operation. It can be specified for a foreign + table or a foreign server. The option specified on a table overrides + an option specified for the server. + The default is <literal>100</>. + </para> + + </listitem> + </varlistentry> + + </variablelist> + </sect3> + + + <sect3> <title>Importing Options</title> <para>
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers