postgres_fdw: wrong results with self join + enable_nestloop off
n pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp; QUERY PLAN - Foreign Scan (cost=100.00..49310.40 rows=2183680 width=16) (actual time=0.652..0.652 rows=0 loops=1) Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2 Filter: (now() < '2020-02-23 00:00:00'::timestamp without time zone) Rows Removed by Filter: 9 Relations: (public.pg_tbl_foreign tbl1) INNER JOIN (public.pg_tbl_foreign tbl2) Remote SQL: SELECT r1.id1, r1.id2, r2.id1, r2.id2 FROM (public.pg_tbl r1 INNER JOIN public.pg_tbl r2 ON (((r1.id1 < 5 Planning Time: 0.133 ms Execution Time: 1.127 ms (8 rows) postgres@78754=#set enable_nestloop=on; SET postgres@78754=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp; id1 | id2 | id1 | id2 -+-+-+- (0 rows) postgres@78754=#explain (analyze, verbose) select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp; QUERY PLAN --- Result (cost=200.00..27644.00 rows=2183680 width=16) (actual time=0.004..0.005 rows=0 loops=1) Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2 One-Time Filter: (now() < '2020-02-23 00:00:00'::timestamp without time zone) -> Nested Loop (cost=200.00..27644.00 rows=2183680 width=16) (never executed) Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2 -> Foreign Scan on public.pg_tbl_foreign tbl2 (cost=100.00..186.80 rows=2560 width=8) (never executed) Output: tbl2.id1, tbl2.id2 Remote SQL: SELECT id1, id2 FROM public.pg_tbl -> Materialize (cost=100.00..163.32 rows=853 width=8) (never executed) Output: tbl1.id1, tbl1.id2 -> Foreign Scan on public.pg_tbl_foreign tbl1 (cost=100.00..159.06 rows=853 width=8) (never executed) Output: tbl1.id1, tbl1.id2 Remote SQL: SELECT id1, id2 FROM public.pg_tbl WHERE ((id1 < 5)) Planning Time: 0.134 ms Execution Time: 0.347 ms (15 rows) || Kindly please comment if I am in the correct direction or not? Regards, Nishant Sharma. Developer at EnterpriseDB, Pune, India. P.S Steps that I used to create local postgres FDW setup ( followed link - https://www.postgresql.org/docs/current/postgres-fdw.html <https://www.postgresql.org/docs/current/postgres-fdw.html):-> ) 1) ./configure --prefix=/home/edb/POSTGRES_INSTALL/MASTER --with-pgport=9996 --with-openssl --with-libxml --with-zlib --with-tcl --with-perl --with-libxslt --with-ossp-uuid --with-ldap --with-pam --enable-nls --enable-debug --enable-depend --enable-dtrace --with-selinux --with-icu --enable-tap-tests --enable-cassert CFLAGS="-g -O0" 2) make 3) make install 4) cd contrib/postgres_fdw/ 5) make install 6) Start the server 7) [edb@localhost MASTER]$ bin/psql postgres edb; psql (16devel) Type "help" for help. postgres@70613=#create database remote_db; CREATE DATABASE postgres@70613=#quit [edb@localhost MASTER]$ bin/psql remote_db edb; psql (16devel) Type "help" for help. remote_db@70613=#CREATE USER fdw_user; CREATE ROLE remote_db@70613=#GRANT ALL ON SCHEMA public TO fdw_user; GRANT remote_db@70613=#quit [edb@localhost MASTER]$ bin/psql remote_db fdw_user; psql (16devel) Type "help" for help. remote_db@70613=#create table pg_tbl(id1 int, id2 int); CREATE TABLE remote_db@70613=#insert into pg_tbl values(1, 10); INSERT 0 1 remote_db@70613=#insert into pg_tbl values(2, 20); INSERT 0 1 remote_db@70613=#insert into pg_tbl values(3, 30); INSERT 0 1 8) New terminal/Tab:- [edb@localhost MASTER]$ bin/psql postgres edb; postgres@71609=#create extension postgres_fdw; CREATE EXTENSION postgres@71609=#CREATE SERVER localhost_fdw FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname 'remote_db', host 'localhost', port '9996'); CREATE SERVER postgres@71609=#CREATE USER MAPPING for edb SERVER localhost_fdw OPTIONS (user 'fdw_user', password ''); CREATE USER MAPPING postgres@71609=#GRANT ALL ON FOREIGN SERVER localhost_fdw TO edb; GRANT postgres@71609=#CREATE FOREIGN TABLE pg_tbl_foreign(id1 int, id2 int) SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'pg_tbl'); CREATE FOREIGN TABLE postgres@71609=#select * from pg_tbl_foreign; id1 | id2 -+- 1 | 10 2 | 20 3 | 30 (3 rows) PG_PATCH_set_nestloop_off_issue_fix.patch Description: Binary data
Re: postgres_fdw: wrong results with self join + enable_nestloop off
Thanks Etsuro for your response! One small typo correction in my answer to "What is the technical issue?" "it is *not* considered a pseudo constant" --> "it is considered a pseudo constant" Regards, Nishant. On Fri, Apr 14, 2023 at 6:21 PM Etsuro Fujita wrote: > Hi Nishant, > > On Fri, Apr 14, 2023 at 8:39 PM Nishant Sharma > wrote: > > I debugged this issue and was able to find a fix for the same. Kindly > please refer to the attached fix. With the fix I am able to resolve the > issue. > > Thanks for the report and patch! > > > What is the technical issue? > > The problem here is the use of extract_actual_clauses. Because of which > the plan creation misses adding the second condition of AND i.e "now() < > '23-Feb-2020'::timestamp" in the plan. Because it is not considered a > pseudo constant and extract_actual_clause is passed with false as the > second parameter and it gets skipped from the list. As a result that > condition is never taken into consideration as either one-time filter > (before or after) or part of SQL remote execution > > > > Why do I think the fix is correct? > > The fix is simple, where we have created a new function similar to > extract_actual_clause which just extracts all the conditions from the list > with no checks and returns the list to the caller. As a result all > conditions would be taken into consideration in the query plan. > > I think that the root cause for this issue would be in the > create_scan_plan handling of pseudoconstant quals when creating a > foreign-join (or custom-join) plan. Anyway, I will look at your patch > closely, first. > > Best regards, > Etsuro Fujita >
Re: postgres_fdw: wrong results with self join + enable_nestloop off
Hi Etsuro Fujita, Any updates? -- did you get a chance to look into this? Regards, Nishant. On Mon, Apr 17, 2023 at 11:00 AM Nishant Sharma < nishant.sha...@enterprisedb.com> wrote: > Thanks Etsuro for your response! > > One small typo correction in my answer to "What is the technical issue?" > "it is *not* considered a pseudo constant" --> "it is considered a pseudo > constant" > > > Regards, > Nishant. > > On Fri, Apr 14, 2023 at 6:21 PM Etsuro Fujita > wrote: > >> Hi Nishant, >> >> On Fri, Apr 14, 2023 at 8:39 PM Nishant Sharma >> wrote: >> > I debugged this issue and was able to find a fix for the same. Kindly >> please refer to the attached fix. With the fix I am able to resolve the >> issue. >> >> Thanks for the report and patch! >> >> > What is the technical issue? >> > The problem here is the use of extract_actual_clauses. Because of which >> the plan creation misses adding the second condition of AND i.e "now() < >> '23-Feb-2020'::timestamp" in the plan. Because it is not considered a >> pseudo constant and extract_actual_clause is passed with false as the >> second parameter and it gets skipped from the list. As a result that >> condition is never taken into consideration as either one-time filter >> (before or after) or part of SQL remote execution >> > >> > Why do I think the fix is correct? >> > The fix is simple, where we have created a new function similar to >> extract_actual_clause which just extracts all the conditions from the list >> with no checks and returns the list to the caller. As a result all >> conditions would be taken into consideration in the query plan. >> >> I think that the root cause for this issue would be in the >> create_scan_plan handling of pseudoconstant quals when creating a >> foreign-join (or custom-join) plan. Anyway, I will look at your patch >> closely, first. >> >> Best regards, >> Etsuro Fujita >> >
Re: pg_basebackup: Always return valid temporary slot names
Hi Jelte, Please find my reviews below:- *1)* With what I have understood from above, the pgbouncer fills up be_pid (int, 32 bits) with random bits as it does not have an associated server connection yet. With this, I was thinking, isn't this a problem of pgbouncer filling be_pid with random bits? Maybe it should have filled the be_pid with a random positive integer instead of any integer as it represents a pid? -- If this makes sense here, then maybe the fix should be in pgbouncer instead of how the be_pid is processed in pg_basebackup? *2)* Rest, the patch looks straightforward, with these two changes : "%d" --> "%u" and "(int)" --> "(unsigned)". Regards, Nishant. On Thu, Aug 31, 2023 at 2:43 PM Jelte Fennema wrote: > With PgBouncer in the middle PQbackendPID can return negative values > due to it filling all 32 bits of the be_pid with random bits. > > When this happens it results in pg_basebackup generating an invalid slot > name (when no specific slot name is passed in) and thus throwing an > error like this: > > pg_basebackup: error: could not send replication command > "CREATE_REPLICATION_SLOT "pg_basebackup_-1201966863" TEMPORARY > PHYSICAL ( RESERVE_WAL)": ERROR: replication slot name > "pg_basebackup_-1201966863" contains invalid character > HINT: Replication slot names may only contain lower case letters, > numbers, and the underscore character. > > This patch fixes that problem by formatting the result from PQbackendPID > as an unsigned integer when creating the temporary replication slot > name. > > I think it would be good to backport this fix too. > > Replication connection support for PgBouncer is not merged yet, but > it's pretty much ready: > https://github.com/pgbouncer/pgbouncer/pull/876 > > The reason PgBouncer does not pass on the actual Postgres backend PID > is that it doesn't have an associated server connection yet when it > needs to send the startup message to the client. It also cannot use > it's own PID, because that would be the same for all clients, since > pgbouncer is a single process. >
Re: pg_basebackup: Always return valid temporary slot names
On Tue, Sep 5, 2023 at 4:40 PM Jelte Fennema wrote: > I modified the PgBouncer PR to always set the sign bit to 0. But I > still I think it makes sense to also address this in pg_basebackup. Sounds good to me. Thank you! On Tue, Sep 5, 2023 at 5:36 PM Daniel Gustafsson wrote: > Since the value in the temporary slotname isn't used to convey meaning, but > merely to ensure uniqueness, I don't think it's unreasonable to guard > aginst > malformed input (ie negative integer). > Ok. In this case, I also agree. +1 to the patch from my side. Thank you! Regards, Nishant.
[Code Cleanup] : Small code cleanup in twophase.sql
Hi, PFA small code cleanup in twophase.sql. Which contains a drop table statement for 'test_prepared_savepoint'. Which, to me, appears to be missing in the cleanup section of that file. To support it I have below points:- 1) Grepping this table 'test_prepared_savepoint' shows occurrences only in twophase.out & twophase.sql files. This means that table is local to that sql test file and not used in any other test file. 2) I don't see any comment on why this was not added in the cleanup section of twophase.sql, but drop for other two test tables are done. 3) I ran "make check-world" with the patch and I don't see any failures. Kindly correct, if I missed anything. Regards, Nishant (EDB). v1-0001-Small-code-cleanup-in-twophase.sql.patch Description: Binary data
Re: [Code Cleanup] : Small code cleanup in twophase.sql
Hi, Any taker or rejector for above? -- It's a very small 'good to have' change patch for cleanup. Thanks, Nishant (EDB). On Tue, Sep 26, 2023 at 6:31 PM Nishant Sharma < nishant.sha...@enterprisedb.com> wrote: > Hi, > > > PFA small code cleanup in twophase.sql. Which contains a drop table > statement for 'test_prepared_savepoint'. Which, to me, appears to be > missing in the cleanup section of that file. > > To support it I have below points:- > > 1) Grepping this table 'test_prepared_savepoint' shows occurrences > only in twophase.out & twophase.sql files. This means that table is > local to that sql test file and not used in any other test file. > > 2) I don't see any comment on why this was not added in the cleanup > section of twophase.sql, but drop for other two test tables are done. > > 3) I ran "make check-world" with the patch and I don't see any failures. > > Kindly correct, if I missed anything. > > > Regards, > Nishant (EDB). >
[PROPOSAL] : Use of ORDER BY clause in insert.sql
Hi, We would like to share a proposal of a patch, where we have added order by clause in two select statements in src/test/regress/sql/insert.sql file and respective changes in src/test/regress/expected/insert.out output file. This would help in generating output in consistent sequence, as sometimes we have observed change in sequence in output. Please find the patch attached Regards, Nishant Sharma EDB: http://www.enterprisedb.com Proposal_OrderBy_insert.sql.out.patch Description: Binary data
Re: on_error table, saving error info to a table
Thanks for the patch! I was not able to understand why we need a special error table for COPY? Can't we just log it in a new log error file for COPY in a proper format? Like using table approach, PG would be unnecessarily be utilising its resources like extra CPU to format the data in pages to store in its table format, writing and keeping the table in its store (which may or may not be required), the user would be required to make sure it creates the error table with proper columns to be used in COPY, etc.. Meanwhile, please find some quick review comments:- 1) Patch needs re-base. 2) If the columns of the error table are fixed, then why not create it internally using some function or something instead of making the user create the table correctly with all the columns? 3) I think, error messages can be improved, which looks big to me. 4) I think no need of below pstrdup, As CStringGetTextDatum creates a text copy for the same:- err_code = pstrdup(unpack_sql_state(cstate->escontext->error_data->sqlerrcode)); t_values[9] = CStringGetTextDatum(err_code); 5) Should 'on_error_rel' as not NULL be checked along with below, as I can see it is passed as NULL from two locations? + if (cstate->opts.on_error == COPY_ON_ERROR_TABLE) + { 6) Below declarations can be shifted to the if block, where they are used. Instead of keeping them as global in function? + HeapTuple on_error_tup; + TupleDesc on_error_tupDesc; + if (cstate->opts.on_error == COPY_ON_ERROR_TABLE) + { 7) Below comment going beyond 80 char width:- * if on_error is specified with 'table', then on_error_rel is the error saving table 8) Need space after 'false' err_detail ? false: true; 9) Below call can fit in a single line. No need to keep the 2nd and 3rd parameter in nextlines. + on_error_tup = heap_form_tuple(on_error_tupDesc, + t_values, + t_isnull); 10) Variable declarations Tab Spacing issue at multiple places. 11) Comments in the patch are not matched to PG comment style. Kindly note I have not tested the patch properly yet. Only checked it with a positive test case. As I will wait for a conclusion on my opinion of the proposed patch. Regards, Nishant Sharma. EnterpriseDB, Pune. On Sat, Feb 3, 2024 at 11:52 AM jian he wrote: > Hi. > I previously did some work in COPY FROM save error information to a table. > still based on this suggestion: > https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us > Now I refactored it. > > the syntax: > ON_ERROR 'table', TABLE 'error_saving_tbl' > > if ON_ERROR is not specified with 'table', TABLE is specified, then error. > if ON_ERROR is specified with 'table', TABLE is not specified or > error_saving_tbl does not exist, then error. > > In BeginCopyFrom, we check the data definition of error_saving_table, > we also check if the user has INSERT privilege to error_saving_table > (all the columns). > We also did a preliminary check of the lock condition of > error_saving_table. > > if it does not meet these conditions, we quickly error out. > error_saving_table will be the same schema as the copy from table. > > Because "table" is a keyword, I have to add the following changes to > gram.y. > --- a/src/backend/parser/gram.y > +++ b/src/backend/parser/gram.y > @@ -3420,6 +3420,10 @@ copy_opt_item: > { > $$ = makeDefElem("null", (Node *) makeString($3), @1); > } > + | TABLE opt_as Sconst > + { > + $$ = makeDefElem("table", (Node *) makeString($3), @1); > + } > > since "table" is already a keyword, so there is no influence on the > parsing speed? > > demo: > > create table err_tbl( > userid oid, -- the user oid while copy generated this entry > copy_tbl oid, --copy table > filename text, > lineno int8, > linetext, > colname text, > raw_field_value text, > err_message text, > err_detail text, > errorcode text > ); > create table t_copy_tbl(a int, b int, c int); > COPY t_copy_tbl FROM STDIN WITH (delimiter ',', on_error 'table', > table err_tbl); > 1,2,a > \. > > table err_tbl \gx > -[ RECORD 1 ]---+--- > userid | 10 > copy_tbl| 17920 > filename| STDIN > lineno | 1 > line| 1,2,a > colname | c > raw_field_value | a > err_message | invalid input syntax for type integer: "a" > err_detail | > errorcode | 22P02 >
Re: postgres_fdw: wrong results with self join + enable_nestloop off
I also agree that Richard's patch is better. As it fixes the issue at the backend and does not treat pseudoconstant as local condition. I have tested Richard's patch and observe that it is resolving the problem. Patch looks good to me as well. *I only had a minor comment on below change:-* *- gating_clauses = get_gating_quals(root, scan_clauses);+ if (best_path->pathtype == T_ForeignScan && IS_JOIN_REL(rel))+ gating_clauses = get_gating_quals(root, ((ForeignPath *) best_path)->joinrestrictinfo);+ else+ gating_clauses = get_gating_quals(root, scan_clauses);* >> Instead of using 'if' and creating a special case here can't we do something in the above switch? Regards, Nishant. P.S I tried something quickly but I am seeing a crash:- *case T_IndexOnlyScan:scan_clauses = castNode(IndexPath, best_path)->indexinfo->indrestrictinfo; break;+ case T_ForeignScan:+ /*+* Note that for scans of foreign joins, we do not have restriction clauses+* stored in baserestrictinfo and we do not consider parameterization.+ * Instead we need to check against joinrestrictinfo stored in ForeignPath.+*/+ if (IS_JOIN_REL(rel))+ scan_clauses = ((ForeignPath *) best_path)->joinrestrictinfo;+ else+ scan_clauses = rel->baserestrictinfo;+ break;default: scan_clauses = rel->baserestrictinfo;break;* On Fri, Jun 2, 2023 at 9:00 AM Suraj Kharage wrote: > +1 for fixing this in the backend code rather than FDW code. > > Thanks, Richard, for working on this. The patch looks good to me at > a glance. > > On Tue, Apr 25, 2023 at 3:36 PM Richard Guo > wrote: > >> >> On Fri, Apr 14, 2023 at 8:51 PM Etsuro Fujita >> wrote: >> >>> I think that the root cause for this issue would be in the >>> create_scan_plan handling of pseudoconstant quals when creating a >>> foreign-join (or custom-join) plan. >> >> >> Yes exactly. In create_scan_plan, we are supposed to extract all the >> pseudoconstant clauses and use them as one-time quals in a gating Result >> node. Currently we check against rel->baserestrictinfo and ppi_clauses >> for the pseudoconstant clauses. But for scans of foreign joins, we do >> not have any restriction clauses in these places and thus the gating >> Result node as well as the pseudoconstant clauses would just be lost. >> >> I looked at Nishant's patch. IIUC it treats the pseudoconstant clauses >> as local conditions. While it can fix the wrong results issue, I think >> maybe it's better to still treat the pseudoconstant clauses as one-time >> quals in a gating node. So I wonder if we can store the restriction >> clauses for foreign joins in ForeignPath, just as what we do for normal >> JoinPath, and then check against them for pseudoconstant clauses in >> create_scan_plan, something like attached. >> >> BTW, while going through the codes I noticed one place in >> add_foreign_final_paths that uses NULL for List *. I changed it to NIL. >> >> Thanks >> Richard >> > > > -- > -- > > Thanks & Regards, > Suraj kharage, > > > > edbpostgres.com >
Re: postgres_fdw: wrong results with self join + enable_nestloop off
Hi, Etsuro's patch is also showing the correct output for "set enable_nestloop=off". Looks good to me for back branches due to backport issues. But below are a few observations for the same:- 1) I looked into the query plan for both "set enable_nestloop" on & off case and observe that they are the same. That is, what we see with "set enable_nestloop=on". 2) In back branches for "set enable_nestloop" on & off value, at least this type of query execution won't make any difference. No comparison of plans to be selected based on total cost of two plans old (Nested Loop with Foreign Scans) & new (Only Foreign Scan) will be done, because we are avoiding the call to "postgresGetForeignJoinPaths()" up front when we have pseudo constants. Regards, Nishant. On Tue, Jun 6, 2023 at 8:50 AM Richard Guo wrote: > > On Mon, Jun 5, 2023 at 9:19 PM Etsuro Fujita > wrote: > >> If the patch is intended for HEAD only, I also think it goes in the >> right direction. But if it is intended for back branches as well, I >> do not think so, because it would cause ABI breakage due to changes >> made to the ForeignPath struct and the create_foreign_join_path() API. >> (For the former, I think we could avoid doing so by adding the new >> member at the end of the struct, not in the middle, though.) > > > Thanks for pointing this out. You're right. The patch has backport > issue because of the ABI breakage. So it can only be applied on HEAD. > > >> To avoid this issue, I am wondering if we should modify >> add_paths_to_joinrel() in back branches so that it just disallows the >> FDW to consider pushing down joins when the restrictlist has >> pseudoconstant clauses. Attached is a patch for that. > > > I think we can do that in back branches. But I'm a little concerned > that we'd miss a better plan if FDW cannot push down joins in such > cases. I may be worrying over nothing though if it's not common that > the restrictlist has pseudoconstant clauses. > > Thanks > Richard >
Re: postgres_fdw: wrong results with self join + enable_nestloop off
Looks good to me. Tested on master and it works. New patch used a bool flag to avoid calls for both FDW and custom hook's call. And a slight change in comment of "has_pseudoconstant_clauses" function. Regards, Nishant. On Wed, Jun 14, 2023 at 12:19 PM Etsuro Fujita wrote: > On Mon, Jun 5, 2023 at 10:19 PM Etsuro Fujita > wrote: > > To avoid this issue, I am wondering if we should modify > > add_paths_to_joinrel() in back branches so that it just disallows the > > FDW to consider pushing down joins when the restrictlist has > > pseudoconstant clauses. Attached is a patch for that. > > I think that custom scans have the same issue, so I modified the patch > further so that it also disallows custom-scan providers to consider > join pushdown in add_paths_to_joinrel() if necessary. Attached is a > new version of the patch. > > Best regards, > Etsuro Fujita >
Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
Thanks Tom and Ashutosh for your responses! I also agree that, v1 patch set was applying SQL syntax restrictions to all FDWs, which is not reasonable. PFA v2 patch set. This is based on the suggestion given by Ashutosh to have the check in postgres_fdw validator. As it fits to apply the SQL syntax restriction only to FDWs which follow SQL syntax restrictions. With this change, it gives hint/idea to any user using PostgresSQL with their own FDWs to add this check in their equivalent fdw validator. Regarding, 'empty' vs 'misspelled' remote column name, from my point of view, I see some differences between them:- 1. SYNTAX wise - SQL syntax has restrictions for not allowing creating column names as empty. And it does not bother about, whether user used a misspelled word or not for the column name. 2. IMPLEMENTATION wise - we don't need any extra info to decide that the column name received from command is empty, but we do need column name infos from remote server to decide whether user misspelled the remote column name, which not only applies to the column_name options, but maybe to the column names used while creating the table syntax for foreign tables if the fdw column_name option is not added. Ex:- "CREATE FOREIGN TABLE test_fdw(name VARCHAR(15), id VARCHAR(5)) " - to 'name' and 'id' here. I may be wrong, but just wanted to share my thoughts on the differences. So, it can be considered a different issue/mistake and can be handled separately in another email thread. I ran "make check world" and do not see any failure related to patches. But, I do see an existing failure "t/001_pgbench_with_server.pl". Regards, Nishant Sharma EnterpriseDB, Pune. On Mon, Aug 19, 2024 at 4:27 PM Ashutosh Bapat wrote: > On Fri, Aug 16, 2024 at 8:26 PM Tom Lane wrote: > > > > Nishant Sharma writes: > > > Actual column names used while creation of foreign table are not > allowed to > > > be an > > > empty string, but when we use column_name as an empty string in OPTIONS > > > during > > > CREATE or ALTER of foreign tables, it is allowed. > > > > Is this really a bug? The valid remote names are determined by > > whatever underlies the FDW, and I doubt we should assume that > > SQL syntax restrictions apply to every FDW. Perhaps it would > > be reasonable to apply such checks locally in SQL-based FDWs, > > but I object to assuming such things at the level of > > ATExecAlterColumnGenericOptions. > > I agree. > > > > > More generally, I don't see any meaningful difference between > > this mistake and the more common one of misspelling the remote > > column name, which is something we're not going to be able > > to check for (at least not in anything like this way). If > > you wanted to move the ease-of-use goalposts materially, > > you should be looking for a way to do that. > > I think this check should be delegated to an FDW validator. > > -- > Best Wishes, > Ashutosh Bapat > v2-0001-Disallow-empty-Foreign-Table-column-option-name-i.patch Description: Binary data v2-0002-Test-Cases-Changes.patch Description: Binary data
[PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
Hi, -- Actual column names used while creation of foreign table are not allowed to be an empty string, but when we use column_name as an empty string in OPTIONS during CREATE or ALTER of foreign tables, it is allowed. *EXAMPLES:-* 1) CREATE FOREIGN TABLE test_fdw(*"" *VARCHAR(15), id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'test'); ERROR: zero-length delimited identifier at or near LINE 1: CREATE FOREIGN TABLE test_fdw("" VARCHAR(15), id VARCHAR(5))... 2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) *OPTIONS* *(column_name '')*, id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'test'); CREATE FOREIGN TABLE postgres@43832=#\d test_fdw Foreign table "public.test_fdw" Column | Type | Collation | Nullable | Default | FDW options +---+---+--+-+-- name | character varying(15) | | | | *(column_name '')* id | character varying(5) | | | | Server: localhost_fdw FDW options: (schema_name 'public', table_name 'test') -- Due to the above, when we try to simply select a remote table, the remote query uses the empty column name from the FDW column option and the select fails. *EXAMPLES:-* 1) select * from test_fdw; ERROR: zero-length delimited identifier at or near CONTEXT: remote SQL command: SELECT "", id FROM public.test 2) explain verbose select * from test_fdw; QUERY PLAN -- Foreign Scan on public.test_fdw (cost=100.00..297.66 rows=853 width=72) Output: name, id Remote SQL: SELECT "", id FROM public.test (3 rows) -- We can fix this issue either during fetching of FDW column option names while building remote query or we do not allow at CREATE or ALTER of foreign tables itself. We think it would be better to disallow adding the column_name option as empty in CREATE or ALTER itself as we do not allow empty actual column names for a foreign table. Unless I missed to understand the purpose of allowing column_name as empty. *THE PROPOSED SOLUTION OUTPUT:-* 1) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) OPTIONS *(column_name '')*, id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'test'); ERROR: column generic option name cannot be empty 2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15), id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'test'); CREATE FOREIGN TABLE ALTER FOREIGN TABLE test_fdw ALTER COLUMN id OPTIONS *(column_name '')*; ERROR: column generic option name cannot be empty -- PFA, the fix and test cases patches attached. I ran "make check world" and do not see any failure related to patches. But, I do see an existing failure t/001_pgbench_with_server.pl Regards, Nishant. P.S Thanks to Jeevan Chalke and Suraj Kharage for their inputs for the proposal.
Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
Oops... I forgot to attach the patch. Thanks to Amul Sul for pointing that out. :) On Fri, Aug 16, 2024 at 2:37 PM Nishant Sharma < nishant.sha...@enterprisedb.com> wrote: > Hi, > > > > -- > Actual column names used while creation of foreign table are not allowed > to be an > empty string, but when we use column_name as an empty string in OPTIONS > during > CREATE or ALTER of foreign tables, it is allowed. > > *EXAMPLES:-* > 1) CREATE FOREIGN TABLE test_fdw(*"" *VARCHAR(15), id VARCHAR(5)) SERVER > localhost_fdw OPTIONS (schema_name 'public', table_name 'test'); > ERROR: zero-length delimited identifier at or near """" > LINE 1: CREATE FOREIGN TABLE test_fdw("" VARCHAR(15), id VARCHAR(5))... > > 2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) *OPTIONS* *(column_name > '')*, id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', > table_name 'test'); > CREATE FOREIGN TABLE > > postgres@43832=#\d test_fdw > Foreign table "public.test_fdw" > Column | Type | Collation | Nullable | Default | FDW > options > > +---+---+--+-+-- > name | character varying(15) | | | | > *(column_name > '')* > id | character varying(5) | | | | > Server: localhost_fdw > FDW options: (schema_name 'public', table_name 'test') > > > -- > > Due to the above, when we try to simply select a remote table, the remote > query uses > the empty column name from the FDW column option and the select fails. > > *EXAMPLES:-* > 1) select * from test_fdw; > ERROR: zero-length delimited identifier at or near """" > CONTEXT: remote SQL command: SELECT "", id FROM public.test > > 2) explain verbose select * from test_fdw; > QUERY PLAN > -- > Foreign Scan on public.test_fdw (cost=100.00..297.66 rows=853 width=72) >Output: name, id >Remote SQL: SELECT "", id FROM public.test > (3 rows) > > > -- > > We can fix this issue either during fetching of FDW column option names > while > building remote query or we do not allow at CREATE or ALTER of foreign > tables itself. > We think it would be better to disallow adding the column_name option as > empty in > CREATE or ALTER itself as we do not allow empty actual column names for a > foreign > table. Unless I missed to understand the purpose of allowing column_name > as empty. > > *THE PROPOSED SOLUTION OUTPUT:-* > 1) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15) OPTIONS *(column_name > '')*, id VARCHAR(5)) SERVER localhost_fdw OPTIONS (schema_name 'public', > table_name 'test'); > ERROR: column generic option name cannot be empty > > 2) CREATE FOREIGN TABLE test_fdw(name VARCHAR(15), id VARCHAR(5)) SERVER > localhost_fdw OPTIONS (schema_name 'public', table_name 'test'); > CREATE FOREIGN TABLE > > ALTER FOREIGN TABLE test_fdw ALTER COLUMN id OPTIONS *(column_name '')*; > ERROR: column generic option name cannot be empty > > > -- > > PFA, the fix and test cases patches attached. I ran "make check world" and > do > not see any failure related to patches. But, I do see an existing failure > t/001_pgbench_with_server.pl > > > Regards, > Nishant. > > P.S > Thanks to Jeevan Chalke and Suraj Kharage for their inputs for the > proposal. > v1-0001-Disallow-empty-Foreign-Table-column-option-name-i.patch Description: Binary data v1-0002-Test-Cases-Changes.patch Description: Binary data
Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
On Mon, Oct 7, 2024 at 10:16 AM Michael Paquier wrote: > On Thu, Aug 22, 2024 at 04:00:13PM +0530, Nishant Sharma wrote: > > I may be wrong, but just wanted to share my thoughts on the differences. > > So, it > > can be considered a different issue/mistake and can be handled > separately in > > another email thread. > > +else if (strcmp(def->defname, "column_name") == 0) > +{ > +char *col_name_opt = defGetString(def); > + > +/* > + * PostgresSQL follows SQL syntax, so we do not allow empty > + * column_name option. > + */ > +if (col_name_opt && col_name_opt[0] == '\0') > +ereport(ERROR, > +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("colum_name option cannot be empty for > postgres_fdw"))); > +} > > If we begin to care about empty names in column_name in the FDW > command, shouldn't we also care about empry values in schema_name and > table_name? > > Typos: PostgresSQL -> PostgreSQL and colum_name -> column_name. > > Once you associate table_name and schema_name, you can save in > translation by rewording the errmsg like that (no need to mention > postgres_fdw, note the quotes around the option name): > errmsg("cannot use empty value for option \"%s\"", >"column_name"); > -- > Michael > Thanks Michael for your review comments and response! I have included them in v3. v3 does not allow empty schema_name & table_name options along with column_name. Thanks, Nishant. v3-0002-Test-Cases-Changes.patch Description: Binary data v3-0001-Disallow-empty-Foreign-Table-column-option-name-i.patch Description: Binary data
Re: on_error table, saving error info to a table
Thanks for the v2 patch! I see v1 review comments got addressed in v2 along with some further improvements. 1) v2 Patch again needs re-base. 2) I think we need not worry whether table name is unique or not, table name can be provided by user and we can check if it does not exists then simply we can create it with appropriate columns, if it exists we use it to check if its correct on_error table and proceed. 3) Using #define in between the code? I don't see that style in copyfromparse.c file. I do see such style in other src file. So, not sure if committer would allow it or not. #define ERROR_TBL_COLUMNS 10 4) Below appears redundant to me, it was not the case in v1 patch set, where it had only one return and one increment of error as new added code was at the end of the block:- + cstate->num_errors++; + return true; + } cstate->num_errors++; I was not able to test the v2 due to conflicts in v2, I did attempt to resolve but I saw many failures in make world. Regards, Nishant. On Tue, Aug 20, 2024 at 5:31 AM jian he wrote: > On Mon, Jul 15, 2024 at 1:42 PM Nishant Sharma > wrote: > > > > Thanks for the patch! > > > > I was not able to understand why we need a special error table for COPY? > > Can't we just log it in a new log error file for COPY in a proper > format? Like > > using table approach, PG would be unnecessarily be utilising its > resources like > > extra CPU to format the data in pages to store in its table format, > writing and > > keeping the table in its store (which may or may not be required), the > user > > would be required to make sure it creates the error table with proper > columns > > to be used in COPY, etc.. > > > > > > Meanwhile, please find some quick review comments:- > > > > 1) Patch needs re-base. > > > > 2) If the columns of the error table are fixed, then why not create it > internally using > > some function or something instead of making the user create the table > correctly > > with all the columns? > > I'll think about it more. > previously, i tried to use SPI to create tables, but at that time, i > thought that's kind of excessive. > you need to create the table, check whether the table name is unique, > check the privilege. > now we quickly error out if the error saving table definition does not > meet. I guess that's less work to do. > > > > 3) I think, error messages can be improved, which looks big to me. > > > > 4) I think no need of below pstrdup, As CStringGetTextDatum creates a > text copy for > > the same:- > > err_code = > pstrdup(unpack_sql_state(cstate->escontext->error_data->sqlerrcode)); > > > > t_values[9] = CStringGetTextDatum(err_code); > > > > 5) Should 'on_error_rel' as not NULL be checked along with below, as I > can see it is > > passed as NULL from two locations? > > + if (cstate->opts.on_error == COPY_ON_ERROR_TABLE) > > + { > > > > 6) Below declarations can be shifted to the if block, where they are > used. Instead of > > keeping them as global in function? > > + HeapTuple on_error_tup; > > + TupleDesc on_error_tupDesc; > > > > + if (cstate->opts.on_error == COPY_ON_ERROR_TABLE) > > + { > > > > 7) Below comment going beyond 80 char width:- > > * if on_error is specified with 'table', then on_error_rel is the error > saving table > > > > 8) Need space after 'false' > > err_detail ? false: true; > > > > 9) Below call can fit in a single line. No need to keep the 2nd and 3rd > parameter in > > nextlines. > > + on_error_tup = heap_form_tuple(on_error_tupDesc, > > + t_values, > > + t_isnull); > > > > 10) Variable declarations Tab Spacing issue at multiple places. > > > > 11) Comments in the patch are not matched to PG comment style. > > > > > > Kindly note I have not tested the patch properly yet. Only checked it > with a positive test > > case. As I will wait for a conclusion on my opinion of the proposed > patch. > > > almost all these issues have been addressed. > The error messages are still not so good. I need to polish it. >
Re: on_error table, saving error info to a table
Thanks for the v3 patch! Please find review comments on v3:- 1) I think no need to change the below if condition, we can keep it the way it was before i.e with "cstate->opts.on_error != COPY_ON_ERROR_STOP" and we add a new error ereport the way v3 has. Because for cstate->opts.on_error as COPY_ON_ERROR_STOP cases we can avoid two if conditions inside upper if. +if (cstate->num_errors > 0 && cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) 2) No need for the below "if" check for maxattnum. We can simply increment it with "++maxattnum" and later check if we have exactly 10 attributes for the error table. Because even if we drop any attribute and maxattnum is 10 in pg_attribute for that rel, we should still error out. Maybe we can rename it to "totalatts"? + if (maxattnum <= attForm->attnum) + maxattnum = attForm->attnum; 3) #define would be better, also as mentioned by Kirill switch condition with proper #define would be better. + if (maxattnum != 10) + on_error_tbl_ok = false; 4) > > that would be more work. > so i stick to if there is a table can use to > error saving then use it, otherwise error out. > YES. but that would lead to a better design with an error table. Also, I think Krill mentions the same. That is to auto create, if it does not exist. Regards, Nishant Sharma (EDB). On Tue, Dec 3, 2024 at 3:58 PM Kirill Reshke wrote: > On Tue, 3 Dec 2024 at 09:29, jian he wrote: > > > > On Tue, Nov 5, 2024 at 6:30 PM Nishant Sharma > > wrote: > > > > > > Thanks for the v2 patch! > > > > > > I see v1 review comments got addressed in v2 along with some > > > further improvements. > > > > > > 1) v2 Patch again needs re-base. > > > > > > 2) I think we need not worry whether table name is unique or not, > > > table name can be provided by user and we can check if it does > > > not exists then simply we can create it with appropriate columns, > > > if it exists we use it to check if its correct on_error table and > > > proceed. > > > > "simply we can create it with appropriate columns," > > that would be more work. > > so i stick to if there is a table can use to > > error saving then use it, otherwise error out. > > > > > > > > > > 3) Using #define in between the code? I don't see that style in > > > copyfromparse.c file. I do see such style in other src file. So, not > > > sure if committer would allow it or not. > > > #define ERROR_TBL_COLUMNS 10 > > > > > let's wait and see. > > > > > 4) Below appears redundant to me, it was not the case in v1 patch > > > set, where it had only one return and one increment of error as new > > > added code was at the end of the block:- > > > + cstate->num_errors++; > > > + return true; > > > + } > > > cstate->num_errors++; > > > > > changed per your advice. > > > > > I was not able to test the v2 due to conflicts in v2, I did attempt to > > > resolve but I saw many failures in make world. > > > > > I get rid of all the SPI code. > > > > Instead, now I iterate through AttributeRelationId to check if the > > error saving table is ok or not, > > using DirectFunctionCall3 to do the privilege check. > > removed gram.y change, turns out it is not necessary. > > and other kinds of refactoring. > > > > please check attached. > > > Hi! > > 1) > > + switch (attForm->attnum) > > + { > > + case 1: > > + (.) > > + case 2: > > case 1,2,3 ... Is too random. Other parts of core tend to use `#define > Anum__ `. Can we follow this style? > > 2) > >+ /* > > + * similar to commit a9cf48a > > + * ( > https://postgr.es/m/7bcfc39d4176faf85ab317d0c26786953646a411.ca...@cybertec.at > ) > > + * in COPY FROM keep error saving table locks until the transaction end. > > + */ > > I can rarely see other comments referencing commits, and even few > referencing a mail archive thread. > Can we just write proper comment explaining the reasons? > > > = overall > > Patch design is a little dubious for me. We give users some really > incomprehensible API. To use on_error *relation* feature user must > create tables with proper schema. > Maybe a better design will be to auto-create on_error table if this > table does not exist. > > > Thoughts? > > -- > Best regards, > Kirill Reshke >
Re: on_error table, saving error info to a table
On Fri, Dec 13, 2024 at 1:57 PM jian he wrote: > On Wed, Dec 11, 2024 at 7:41 PM Nishant Sharma > wrote: > > > > Thanks for the v3 patch! > > > > Please find review comments on v3:- > > > > 1) I think no need to change the below if condition, we can keep > > it the way it was before i.e with > > "cstate->opts.on_error != COPY_ON_ERROR_STOP" and we > > add a new error ereport the way v3 has. Because for > > cstate->opts.on_error as COPY_ON_ERROR_STOP cases we > > can avoid two if conditions inside upper if. > > > > +if (cstate->num_errors > 0 && > > cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) > > > 2) No need for the below "if" check for maxattnum. We can simply > > increment it with "++maxattnum" and later check if we have exactly > > 10 attributes for the error table. Because even if we drop any > > attribute and maxattnum is 10 in pg_attribute for that rel, we should > > still error out. Maybe we can rename it to "totalatts"? > > > > + if (maxattnum <= attForm->attnum) > > + maxattnum = attForm->attnum; > > > > 3) #define would be better, also as mentioned by Kirill switch > > condition with proper #define would be better. > > > > + if (maxattnum != 10) > > + on_error_tbl_ok = false; > > > > 4) > > hi. Thanks for the review. > The attached v4 patch addressed these two issues. > > > > that would be more work. > > > so i stick to if there is a table can use to > > > error saving then use it, otherwise error out. > > > > > YES. but that would lead to a better design with an error table. > > Also, I think Krill mentions the same. That is to auto create, if it > > does not exist. > > > I decided not to auto-create the table. > main reason not to do it: > 1. utility COPY command with another SPI utility CREATE TABLE command may > work. > but there is no precedent. > > 2. if we auto-create the on_error table with BeginCopyFrom. > then later we have to use get_relname_relid to get the newly created table > Oid, > I think it somehow counts as repeating name lookups, see relevant > linke [1], [2]. > > [1] https://postgr.es/m/20240808171351.a9.nmi...@google.com > [2] > https://postgr.es/m/CA+TgmobHYix=nn8d4ruha6fhuvpr88kgamq1pbfngfofejr...@mail.gmail.com Thanks for the v4 patch! Review comment on v4:- 1) The new switch logic does not look correct to me. It will pass for a failing scenario. I think you can use v3's logic instead with below changes:- a) while (HeapTupleIsValid(atup = systable_getnext(ascan))) --> while (HeapTupleIsValid(atup = systable_getnext(ascan)) && on_error_tbl_ok) b) attcnt++; --> just before the "switch (attForm->attnum)". Thats it. Also, I think Andrew's suggestion can resolve the concern me and Krill had on forcing users to create tables with correct column names and numbers. Also, will make error table checking simpler. No need for the above kind of checks. Regards, Nishant.
Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
On Tue, Dec 17, 2024 at 10:12 PM Robert Haas wrote: > On Wed, Oct 9, 2024 at 7:12 AM Nishant Sharma > wrote: > > I have included them in v3. v3 does not allow empty schema_name & > > table_name options along with column_name. > > I was looking at these patches today and trying to understand whether > the proposed error message is consistent with what we have done > elsewhere. > > The proposed error message was "cannot use empty value for option > \"%s\". I looked for error messages that contained the phrase "empty" > or "zero". I did not find a case exactly like this one. The closet > analogues I found were things like this: > > invalid cursor name: must not be empty > output cannot be empty string > DSM segment name cannot be empty > row path filter must not be empty string > > These messages aren't quite as consistent as one might wish, so it's a > little hard to judge here. Nonetheless, I propose that the error > message we use here should end with either "cannot" or "must not" > followed by either "be empty" or "be empty string". I think my > preferred variant would be "value for option \"%s\" must not be empty > string". But there's certainly oodles of room for bikeshedding. > > Apart from that, I see very little to complain about here. Once we > agree on the error message text I think something can be committed. > For everyone's convenience, I suggest merging the two patches into > one. I highly approve of separating patches by topic, but there's > usually no point in separating them by directory. > > -- > Robert Haas > EDB: http://www.enterprisedb.com Thanks Robert for your response and the review comments! I have addressed both your suggestions, by changing the error message and merging both the patches to one. PFA, patch set v4. Regards, Nishant. v4-0001-Disallow-empty-Foreign-Table-column_name-schema_n.patch Description: Binary data
Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
Hi, Summary of this thread- Tom had the following concern:- Regarding location of check in ATExecAlterColumnGenericOptions. And spell check issue comparison. Which I believe was addressed by v2 and its response. Ashutosh had the suggestion:- Check should be delegated to an FDW validator. Which I believe was addressed in v2. Michael had the following concern:- We should also care about empty values in schema_name and table_name. Which I believe I have addressed in v3 patch. Robert had the following concern:- Regarding error message and single patch for this. Which I believe I have addressed in v4 patch. Tom, Ashutosh, Michael, Robert please let me know if I was able to address your concerns or if you feel differently. Assuming Tom, Ashutosh, Michael and Robert feel as though I have addressed the concerns mentioned above, does anybody have any additional feedback or concerns with this patch? If not, I would request we move to commit phase. Thank you! Regards, Nishant Sharma (EDB). >