RE: Added schema level support for publication.

2021-09-23 Thread houzj.f...@fujitsu.com
On Thur, Sep 23, 2021 11:06 AM Greg Nancarrow  wrote:
> On Wed, Sep 22, 2021 at 9:33 PM houzj.f...@fujitsu.com 
>  wrote:
> >
> > >
> > > How do you suggest changing it?
> >
> > Personally, I think we'd better move the code about changing
> > publication's tablelist into AlterPublicationTables and the code about
> > changing publication's schemalist into AlterPublicationSchemas. It's
> > similar to what the v29-patchset did, the difference is the SET
> > action, I suggest we drop all the tables in function
> > AlterPublicationTables when user only set schemas and drop all the
> > schema in AlterPublicationSchemas when user only set tables. In this
> > approach, we can keep schema and relation code separate and don't need to
> worry about the locking order.
> >
> > Attach a top-up patch which refactor the code like above.
> > Thoughts ?
> >
> 
> Sounds like a good idea.
> Is it possible to incorporate the existing
> CheckPublicationAlterTables() and CheckPublicationAlterSchemas() functions
> into your suggested update?
> I think it might tidy up the error-checking a bit.

I agreed we can put the check about ALL TABLE and superuser into a function
like what the v30-patchset did. But I have some hesitations about the code
related to CheckObjSchemaNotAlreadyInPublication(). Currently, we need to open
and lock the table before invoking the CheckObjxxx function, ISTM we'd better
open the table in function AlterPublicationTables. Maybe we can wait for the
author's(Vignesh) opinion.

Best regards,
Hou zj


Compile fail on macos big sur

2021-09-23 Thread zhang listar
Hi, guys, I encount a problem on compiling pssql, the environment is:
os: macos big sur version 11.5.2 (20G95)
compiler:  gcc-11 (Homebrew GCC 11.2.0) 11.2.0
error message:
/usr/local/bin/gcc-11 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation -Wno-stringop-truncation -O2  zic.o -L../../src/port
-L../../src/common -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk
-L/usr/local/opt/binutils/lib  -Wl,-dead_strip_dylibs   -lpgcommon -lpgport
-lz -lreadline -lm  -o zic
ld: warning: ignoring file ../../src/common/libpgcommon.a, building for
macOS-x86_64 but attempting to link with file built for macOS-x86_64
ld: warning: ignoring file ../../src/port/libpgport.a, building for
macOS-x86_64 but attempting to link with file built for macOS-x86_64
Undefined symbols for architecture x86_64:
  "_pg_fprintf", referenced from:
  _close_file in zic.o
  _usage in zic.o
  _memory_exhausted in zic.o
  _verror in zic.o
  _warning in zic.o
  _dolink in zic.o
  _writezone in zic.o
  ...
  "_pg_printf", referenced from:
  _main in zic.o
  "_pg_qsort", referenced from:
  _writezone in zic.o
  _main in zic.o
  "_pg_sprintf", referenced from:
  _stringoffset in zic.o
  _stringrule in zic.o
  _doabbr in zic.o
  "_pg_strerror", referenced from:
  _close_file in zic.o
  _memcheck.part.0 in zic.o
  _mkdirs in zic.o
  _dolink in zic.o
  _writezone in zic.o
  _infile in zic.o
  _main in zic.o
  ...
  "_pg_vfprintf", referenced from:
  _verror in zic.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
make[2]: *** [zic] Error 1
make[1]: *** [all-timezone-recurse] Error 2
make: *** [all-src-recurse] Error 2

Need help, thanks in advance.


Re: [Proposal] Global temporary tables

2021-09-23 Thread wenjing
Andrew Dunstan  于2021年3月28日周日 下午9:07写道:

>
> On 3/17/21 7:59 AM, wenjing wrote:
> > ok
> >
> > The cause of the problem is that the name of the dependent function
> > (readNextTransactionID) has changed. I fixed it.
> >
> > This patch(V43) is base on 9fd2952cf4920d563e9cea51634c5b364d57f71a
> >
> > Wenjing
> >
> >
>
> I have fixed this patch so that
>
> a) it applies cleanly
>
> b) it uses project best practice for catalog Oid assignment.
>
> However, as noted elsewhere it fails the recovery TAP test.
>
> I also note this:
>
>
> diff --git a/src/test/regress/parallel_schedule
> b/src/test/regress/parallel_schedule
> index 312c11a4bd..d44fa62f4e 100644
> --- a/src/test/regress/parallel_schedule
> +++ b/src/test/regress/parallel_schedule
> @@ -129,3 +129,10 @@ test: fast_default
>
>  # run stats by itself because its delay may be insufficient under heavy
> load
>  test: stats
> +
> +# global temp table test
> +test: gtt_stats
> +test: gtt_function
> +test: gtt_prepare
> +test: gtt_parallel_1 gtt_parallel_2
> +test: gtt_clean
>
>
> Tests that need to run in parallel should use either the isolation
> tester framework (which is explicitly for testing things concurrently)
> or the TAP test framework.
>
> Adding six test files to the regression test suite for this one feature
> is not a good idea. You should have one regression test script ideally,
> and it should be added as appropriate to both the parallel and serial
> schedules (and not at the end). Any further tests should be added using
> the other frameworks mentioned.
>
Thank you for your advice.
I have simplified the case in regress and put further tests into the
Isolation Tester Framework based on your suggestion.
And I found a few bugs and fixed them.

Please review the GTT v52 and give me feedback.
https://commitfest.postgresql.org/31/2349/


Wenjing



>
>
> cheers
>
>
> andrew
>
>
> --
>
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


Re: Compile fail on macos big sur

2021-09-23 Thread Sergey Shinderuk
Hi,

On 23.09.2021 10:09, zhang listar wrote:
> Hi, guys, I encount a problem on compiling pssql, the environment is:
> os: macos big sur version 11.5.2 (20G95)
> compiler:  gcc-11 (Homebrew GCC 11.2.0) 11.2.0

I've just tried building with gcc-11 on Catalina (yes, it's time to
upgrade) and it went fine, no warnings.

Maybe `git clean -fdx` would help? (Be careful, though, it removes any
changes you may have maid.)


> /usr/local/bin/gcc-11 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> -Wformat-security -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard -Wno-format-truncation
> -Wno-stringop-truncation -O2  zic.o -L../../src/port -L../../src/common
> -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk
> -L/usr/local/opt/binutils/lib  -Wl,-dead_strip_dylibs   -lpgcommon
> -lpgport -lz -lreadline -lm  -o zic

Here is what I have:

/usr/local/bin/gcc-11 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -Wno-format-truncation
-Wno-stringop-truncation -g -O2  zic.o -L../../src/port
-L../../src/common -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk
-Wl,-dead_strip_dylibs   -lpgcommon -lpgport -lz -lreadline -lm  -o zic

Looks the same, and gives no warnings.

Just in case, I configured like that:

./configure --prefix=$(cd ..;pwd)/install-gcc-11 --enable-cassert
--enable-debug --enable-tap-tests CC=/usr/local/bin/gcc-11

Hope that helps.

-- 
Sergey Shinderukhttps://postgrespro.com/




Re: Compile fail on macos big sur

2021-09-23 Thread zhang listar
Thanks for your reply, I do make distclean and git clean -fdx, but it does
no help.

the code: master, c7aeb775df895db240dcd6f47242f7e08899adfb
It looks like the macos issue, because of the ignoring of some lib, it
drives the compiling error.

Sergey Shinderuk  于2021年9月23日周四 下午3:35写道:

> Hi,
>
> On 23.09.2021 10:09, zhang listar wrote:
> > Hi, guys, I encount a problem on compiling pssql, the environment is:
> > os: macos big sur version 11.5.2 (20G95)
> > compiler:  gcc-11 (Homebrew GCC 11.2.0) 11.2.0
>
> I've just tried building with gcc-11 on Catalina (yes, it's time to
> upgrade) and it went fine, no warnings.
>
> Maybe `git clean -fdx` would help? (Be careful, though, it removes any
> changes you may have maid.)
>
>
> > /usr/local/bin/gcc-11 -Wall -Wmissing-prototypes -Wpointer-arith
> > -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> > -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> > -Wformat-security -fno-strict-aliasing -fwrapv
> > -fexcess-precision=standard -Wno-format-truncation
> > -Wno-stringop-truncation -O2  zic.o -L../../src/port -L../../src/common
> > -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk
> > -L/usr/local/opt/binutils/lib  -Wl,-dead_strip_dylibs   -lpgcommon
> > -lpgport -lz -lreadline -lm  -o zic
>
> Here is what I have:
>
> /usr/local/bin/gcc-11 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> -Wformat-security -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard -Wno-format-truncation
> -Wno-stringop-truncation -g -O2  zic.o -L../../src/port
> -L../../src/common -isysroot
> /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk
> -Wl,-dead_strip_dylibs   -lpgcommon -lpgport -lz -lreadline -lm  -o zic
>
> Looks the same, and gives no warnings.
>
> Just in case, I configured like that:
>
> ./configure --prefix=$(cd ..;pwd)/install-gcc-11 --enable-cassert
> --enable-debug --enable-tap-tests CC=/usr/local/bin/gcc-11
>
> Hope that helps.
>
> --
> Sergey Shinderukhttps://postgrespro.com/
>


Re: Added schema level support for publication.

2021-09-23 Thread Greg Nancarrow
On Thu, Sep 23, 2021 at 5:02 PM houzj.f...@fujitsu.com
 wrote:
>
> > Sounds like a good idea.
> > Is it possible to incorporate the existing
> > CheckPublicationAlterTables() and CheckPublicationAlterSchemas() functions
> > into your suggested update?
> > I think it might tidy up the error-checking a bit.
>
> I agreed we can put the check about ALL TABLE and superuser into a function
> like what the v30-patchset did. But I have some hesitations about the code
> related to CheckObjSchemaNotAlreadyInPublication(). Currently, we need to open
> and lock the table before invoking the CheckObjxxx function, ISTM we'd better
> open the table in function AlterPublicationTables. Maybe we can wait for the
> author's(Vignesh) opinion.
>

Yes, I think you're right, the code related to
CheckObjSchemaNotAlreadyInPublication() should be left where it is
(according to your schema refactor patch).

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Compile fail on macos big sur

2021-09-23 Thread Sergey Shinderuk
On 23.09.2021 10:50, zhang listar wrote:
> Thanks for your reply, I do make distclean and git clean -fdx, but it
> does no help.
> 
> the code: master, c7aeb775df895db240dcd6f47242f7e08899adfb
> It looks like the macos issue, because of the ignoring of some lib, it
> drives the compiling error. 

Maybe you could try adding -v to the problematic gcc command to see what
really goes on.

I see that gcc calls /usr/bin/ld, not binutils ld installed with
Homebrew.  I saw an advice to `brew unlink binutils` somewhere.




Re: mark the timestamptz variant of date_bin() as stable

2021-09-23 Thread Aleksander Alekseev
Hi John,

> Any thoughts on the doc patch?

It so happened that I implemented a similar feature in TimescaleDB [1].

I discovered that it's difficult from both developer's and user's
perspectives to think about the behavior of the function in the
context of given TZ and its complicated rules, as you are trying to do
in the doc patch. So what we did instead is saying: for timestamptz
the function works as if it was timestamp. E.g. time_bucket_ng("3
month", "2021 Oct 03 12:34:56 TZ") = "2021 Jan 01 00:00:00 TZ" no
matter what TZ it is and what rules (DST, corrections, etc) it has. It
seems to be not only logical and easy to understand, but also easy to
implement [2].

Do you think it would be possible to adopt a similar approach in
respect of documenting for date_bin()? To be honest, I didn't try to
figure out what is the actual implementation of date_bin() for TZ
case.

[1]: https://docs.timescale.com/api/latest/hyperfunctions/time_bucket_ng/
[2]: https://github.com/timescale/timescaledb/blob/master/src/time_bucket.c#L470

-- 
Best regards,
Aleksander Alekseev




Re: Compile fail on macos big sur

2021-09-23 Thread zhang listar
Thanks. It is the binuitls problem. I do "brew uninstall binutils" and
compile successfully.
Actually it is the lib $(which ranlib) -V problem.
The similar issue here: https://github.com/bitcoin/bitcoin/issues/20825

Sergey Shinderuk  于2021年9月23日周四 下午4:03写道:

> On 23.09.2021 10:50, zhang listar wrote:
> > Thanks for your reply, I do make distclean and git clean -fdx, but it
> > does no help.
> >
> > the code: master, c7aeb775df895db240dcd6f47242f7e08899adfb
> > It looks like the macos issue, because of the ignoring of some lib, it
> > drives the compiling error.
>
> Maybe you could try adding -v to the problematic gcc command to see what
> really goes on.
>
> I see that gcc calls /usr/bin/ld, not binutils ld installed with
> Homebrew.  I saw an advice to `brew unlink binutils` somewhere.
>


Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-23 Thread Juan José Santamaría Flecha
On Thu, Sep 23, 2021 at 4:58 AM Thomas Munro  wrote:

>
> One small detail I'd like to draw attention to is this bit, which
> differs slightly from the [non-working] previous attempts by mapping
> to two different errors:
>
> + * If there's no O_CREAT flag, then we'll pretend the file is
> + * invisible.  With O_CREAT, we have no choice but to report that
> + * there's a file in the way (which wouldn't happen on Unix).
>
> ...
>
> +if (fileFlags & O_CREAT)
> +err = ERROR_FILE_EXISTS;
> +else
> +err = ERROR_FILE_NOT_FOUND;
>

When GetTempFileName() finds a duplicated file name and the file is pending
for deletion, it fails with an "ERROR_ACCESS_DENIED" error code. That may
describe the situation better than "ERROR_FILE_EXISTS".

Regards,

Juan José Santamaría Flecha


Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

2021-09-23 Thread Joe Wildish
Hi Tom,

On Wed, 22 Sep 2021, at 17:09, Tom Lane wrote:
> > The main change is a switch to using SPI for expression evaluation.  The 
> > plans are also cached along the same lines as the RI trigger plans.
> 
> I really dislike this implementation technique.  Aside from the likely
> performance hit for existing triggers, I think it opens serious security
> holes, because we can't fully guarantee that deparse-and-reparse doesn't
> change the semantics.  For comparison, the RI trigger code goes to
> ridiculous lengths to force exact parsing of the queries it makes,
> and succeeds only because it needs just a very stylized subset of SQL.
> With a generic user-written expression, we'd be at risk for several
> inherently-ambiguous SQL constructs such as IS DISTINCT FROM (see
> relevant reading at [1]).

Where do you consider the performance hit to be? I just read the code again. 
The only time the new code paths are hit are when a FOR EACH STATEMENT trigger 
fires that has a WHEN condition. Given the existing restrictions for such a 
scenario, that can only mean a WHEN condition that is a simple function call; 
so, "SELECT foo()" vs "foo()"? Or have I misunderstood?

Regarding the deparse-and-reparse --- if I understand correctly, the core 
problem is that we have no way of going from a node tree to a string, such that 
the string is guaranteed to have the same meaning as the node tree? (I did try 
just now to produce such a scenario with the patch but I couldn't get ruleutils 
to emit the wrong thing).  Moreover, we couldn't store the string for use with 
SPI, as the string would be subject to trigger-time search path lookups.  That 
pretty much rules out SPI for this then.  Do you have a suggestion for an 
alternative? I guess it would be go to the planner/executor directly with the 
node tree?

>  In general, users may expect that
> once those are parsed by the accepting DDL command, they'll hold still,
> not get re-interpreted at runtime.

I agree entirely. I wasn't aware of the deparsing/reparsing problem.

> ...
> (Relevant to that, I wonder why this patch is only concerned with
> WHEN clauses and not all the other places where we forbid subqueries
> for implementation simplicity.)

I don't know how many other places WHEN clauses are used. Rules, perhaps? The 
short answer is this patch was written to solve a specific problem I had rather 
than it being a more general attempt to remove all "subquery forbidden" 
restrictions.

> 
> > b. If a WHEN expression is defined as "n = (SELECT ...)", there is the 
> > possibility that a user gets the error "more than one row returned by a 
> > subquery used as an expression" when performing DML, which would be rather 
> > cryptic if they didn't know there was a trigger involved.  To avoid this, 
> > we could disallow scalar expressions, with a hint to use the ANY/ALL 
> > quantifiers.
> 
> How is that any more cryptic than any other error that can occur
> in a WHEN expression?

It isn't.  What *is* different about it, is that -- AFAIK -- it is the only 
cryptic message that can come about due to the syntactic structure of an 
expression.  Yes, someone could have a function that does "RAISE ERROR 'foo'".  
There's not a lot that can be done about that.  But scalar subqueries are 
detectable and they have an obvious rewrite using the quantifiers, hence the 
suggestion. However, it was just that; a suggestion.

-Joe


Re: Column Filtering in Logical Replication

2021-09-23 Thread Amit Kapila
On Fri, Sep 17, 2021 at 9:36 AM vignesh C  wrote:
>
> On Thu, Sep 16, 2021 at 7:20 PM Alvaro Herrera  
> wrote:
> >
> > On 2021-Sep-16, vignesh C wrote:
> >
> > > diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> > > index e3068a374e..c50bb570ea 100644
> > > --- a/src/backend/parser/gram.y
> > > +++ b/src/backend/parser/gram.y
> >
> > Yeah, on a quick glance this looks all wrong.  Your PublicationObjSpec
> > production should return a node with tag PublicationObjSpec, and
> > pubobj_expr should not exist at all -- that stuff is just making it all
> > more confusing.
> >
> > I think it'd be something like this:
> >
> > PublicationObjSpec:
> > ALL TABLES
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_ALL_TABLES;
> > $$->location = @1;
> > }
> > | TABLE qualified_name
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_TABLE;
> > $$->pubobj = $2;
> > $$->location = @1;
> > }
> > | ALL TABLES IN_P SCHEMA name
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_ALL_TABLES_IN_SCHEMA;
> > $$->pubobj = makeRangeVar( 
> > ... $5 ... );
> > $$->location = @1;
> > }
> > | qualified_name
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_CONTINUATION;
> > $$->pubobj = $1;
> > $$->location = @1;
> > };
> >
> > You need a single object name under TABLE, not a list -- this was Tom's
> > point about needing post-processing to determine how to assign a type to
> > a object that's what I named PUBLICATIONOBJ_CONTINUATION here.
>
> In the above, we will not be able to use qualified_name, as
> qualified_name will not support the following syntaxes:
> create publication pub1 for table t1 *;
> create publication pub1 for table ONLY t1 *;
> create publication pub1 for table ONLY (t1);
>
> To solve this problem we can change qualified_name to relation_expr
> but the problem with doing that is that the user will be able to
> provide the following syntaxes:
> create publication pub1 for all tables in schema sch1 *;
> create publication pub1 for all tables in schema ONLY sch1 *;
> create publication pub1 for all tables in schema ONLY (sch1);
>
> To handle this we will need some special flag which will differentiate
> these and throw errors at post processing time. We need to define an
> expression similar to relation_expr say pub_expr which handles all
> variants of qualified_name and then use a special flag so that we can
> throw an error if somebody uses the above type of syntax for schema
> names. And then if we have to distinguish between schema name and
> relation name variant, then we need few other things.
>
> We proposed the below solution which handles all these problems and
> also used Node type which need not store schemaname in RangeVar type:
>

Alvaro, do you have any thoughts on these proposed grammar changes?

-- 
With Regards,
Amit Kapila.




Re: logical replication restrictions

2021-09-23 Thread Marcos Pegoraro
>
> What kind of reasons do you see where users prefer to delay except to
> avoid data loss in the case where users unintentionally removed some
> data from the primary?
>
>
> Debugging. Suppose I have a problem,  but that problem occurs once a week
or a month. When this problem occurs again a monitoring system sends me a
message ... Hey, that problem occurred again. Then, as I configured my
replica to Delay = '30 min', I have time to connect to it and wait, record
by record coming and see exactly what made that mistake.


Re: logical decoding and replication of sequences

2021-09-23 Thread Peter Eisentraut

On 30.07.21 20:26, Tomas Vondra wrote:
Here's a an updated version of this patch - rebased to current master, 
and fixing some of the issues raised in Peter's review.


This patch needs an update, as various conflicts have arisen now.

As was discussed before, it might be better to present a separate patch 
for just the logical decoding part for now, since the replication and 
DDL stuff has the potential to conflict heavily with other patches being 
discussed right now.  It looks like cutting this patch in two should be 
doable easily.


I looked through the test cases in test_decoding again.  It all looks 
pretty sensible.  If anyone can think of any other tricky or dubious 
cases, we can add them there.  It's easiest to discuss these things with 
concrete test cases rather than in theory.


One slightly curious issue is that this can make sequence values go 
backwards, when seen by the logical decoding consumer, like in the test 
case:


+ BEGIN
+ sequence: public.test_sequence transactional: 1 created: 1 last_value: 
1, log_cnt: 0 is_called: 0

+ COMMIT
+ sequence: public.test_sequence transactional: 0 created: 0 last_value: 
33, log_cnt: 0 is_called: 1

+ BEGIN
+ sequence: public.test_sequence transactional: 1 created: 1 last_value: 
4, log_cnt: 0 is_called: 1

+ COMMIT
+ sequence: public.test_sequence transactional: 0 created: 0 last_value: 
334, log_cnt: 0 is_called: 1


I suppose that's okay, since it's not really the intention that someone 
is concurrently consuming sequence values on the subscriber.  Maybe 
something for the future.  Fixing that would require changing the way 
transactional sequence DDL updates these values, so it's not directly 
the job of the decoding to address this.


A small thing I found: Maybe the text that test_decoding produces for 
sequences can be made to look more consistent with the one for tables. 
For example, in


+ BEGIN
+ sequence: public.test_table_a_seq transactional: 1 created: 1 
last_value: 1, log_cnt: 0 is_called: 0
+ sequence: public.test_table_a_seq transactional: 1 created: 0 
last_value: 33, log_cnt: 0 is_called: 1

+ table public.test_table: INSERT: a[integer]:1 b[integer]:100
+ table public.test_table: INSERT: a[integer]:2 b[integer]:200
+ COMMIT

note how the punctuation is different.




Re: Proposal: Save user's original authenticated identity for logging

2021-09-23 Thread Michael Paquier
On Wed, Sep 22, 2021 at 03:18:43PM +, Jacob Champion wrote:
> On Wed, 2021-09-22 at 10:20 +0200, Peter Eisentraut wrote:
>> This should be added to each level of a function call that represents a 
>> test.  This ensures that when a test fails, the line number points to 
>> the top-level location of the test_role() call.  Otherwise it would 
>> point to the connect_ok() call inside test_role().
> 
> Patch LGTM, sorry about that. Thanks!

For the places of the patch, that seems fine then.  Thanks!

Do we need to care about that in other places?  We have tests in
src/bin/ using subroutines that call things from PostgresNode.pm or
TestLib.pm, like pg_checksums, pg_ctl or pg_verifybackup, just to name
three.
--
Michael


signature.asc
Description: PGP signature


Re: Eval expression R/O once time (src/backend/executor/execExpr.c)

2021-09-23 Thread Ranier Vilela
Em ter., 21 de set. de 2021 às 20:12, Ranier Vilela 
escreveu:

> Em ter., 21 de set. de 2021 às 19:21, Tom Lane 
> escreveu:
>
>> Andres Freund  writes:
>> > On 2021-09-21 15:09:11 -0300, Ranier Vilela wrote:
>> >> Currently when determining where CoerceToDomainValue can be read,
>> >> it evaluates every step in a loop.
>> >> But, I think that the expression is immutable and should be solved only
>> >> once.
>>
>> > What is immutable here?
>>
>> I think Ranier has a point here.  The clear intent of this bit:
>>
>> /*
>>  * If first time through, determine where
>> CoerceToDomainValue
>>  * nodes should read from.
>>  */
>> if (domainval == NULL)
>> {
>>
>> is that we only need to emit the EEOP_MAKE_READONLY once when there are
>> multiple CHECK constraints.  But because domainval has the wrong lifespan,
>> that test is constant-true, and we'll do it over each time to little
>> purpose.
>>
> Exactly, thanks for the clear explanation.
>
>
>> > And it has to, the allocation intentionally is separate for each
>> > constraint. As the comment even explicitly says:
>> > /*
>> >  * Since value might be read multiple times, force
>> to R/O
>> >  * - but only if it could be an expanded datum.
>> >  */
>>
>> No, what that's on about is that each constraint might contain multiple
>> VALUE symbols.  But once we've R/O-ified the datum, we can keep using
>> it across VALUE symbols in different CHECK expressions, not just one.
>>
>> (AFAICS anyway)
>>
>> I'm unexcited by the proposed move of the save_innermost_domainval/null
>> variables, though.  It adds no correctness and it forces an additional
>> level of indentation of a good deal of code, as the patch fails to show.
>>
> Ok, but I think that still has a value in reducing the scope.
> save_innermost_domainval and save_innermost_domainnull,
> only are needed with DOM_CONSTRAINT_CHECK expressions,
> and both are declared even when they will not be used.
>
> Anyway, the v1 patch fixes only the expression eval.
>
Created a new entry at next CF.

https://commitfest.postgresql.org/35/3327/

regards,
Ranier Vilela


Re: Atomic rename feature for Windows.

2021-09-23 Thread Juan José Santamaría Flecha
On Tue, Sep 7, 2021 at 11:40 PM Victor Spirin 
wrote:

>
> I checked the pgrename_windows_posix_semantics() function on Windows 7.
> It returns error 87: Parameter is incorrect. Hence, it is necessary to
> check the Windows version and call the old pgrename function for old
> Windows.
>
> The FILE_RENAME_FLAGs are available starting from Windows 10 Release id
1607, NTDDI_WIN10_RS1. The check should be using something like
IsWindowsVersionOrGreater(10, 0, 1607). Or you could test this
using RtlGetVersion(), loading it from ntdll infrastructure coming from
stat() patch [1], which doesn't need a manifest.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BoLqfBVJ_j3C03QgoshrX1KxYq0LB1vJV0OXPOcZZfhA%40mail.gmail.com#bfcc256e4eda369e369275f5b4e38185

Regards,

Juan José Santamaría Flecha


Re: Atomic rename feature for Windows.

2021-09-23 Thread Thomas Munro
On Wed, Sep 8, 2021 at 9:40 AM Victor Spirin  wrote:
> Is this code better? Maybe there is another correct method?

Hmm, if we want to use the system header's struct definition, add some
space for a path at the end, and avoid heap allocation, perhaps we
could do something like:

struct {
FILE_RENAME_INFO fri;
WCHAR extra_space[MAX_PATH];
} x;




Re: windows build slow due to windows.h includes

2021-09-23 Thread Michael Paquier
On Wed, Sep 22, 2021 at 02:14:59PM +0530, Amit Kapila wrote:
> On Wed, Sep 22, 2021 at 11:14 AM Noah Misch  wrote:
> > +1, great win for a one-liner.
> 
> +1. It reduced the build time of Postgres from "Time Elapsed
> 00:01:57.60" to "Time Elapsed 00:01:38.11" in my Windows env. (Win 10,
> MSVC 2019).

That's nice.  Great find!
--
Michael


signature.asc
Description: PGP signature


Re: mark the timestamptz variant of date_bin() as stable

2021-09-23 Thread Aleksander Alekseev
Hi hackers,

> the function works as if it was timestamp. E.g. time_bucket_ng("3
> month", "2021 Oct 03 12:34:56 TZ") = "2021 Jan 01 00:00:00 TZ" no

That was a typo. What I meant was:

time_bucket_ng("3 month", "2021 Feb 03 12:34:56 TZ")

February, not October. Sorry for the confusion.

-- 
Best regards,
Aleksander Alekseev




Re: row filtering for logical replication

2021-09-23 Thread Tomas Vondra

Hi,

I finally had time to take a closer look at the patch again, so here's 
some review comments. The thread is moving fast, so chances are some of 
the comments are obsolete or were already raised in the past.



1) I wonder if we should use WHERE or WHEN to specify the expression. 
WHERE is not wrong, but WHEN (as used in triggers) might be better.



2) create_publication.sgml says:

   A NULL value causes the expression to evaluate
   to false; avoid using columns without not-null constraints in the
   WHERE clause.

That's not quite correct, I think - doesn't the expression evaluate to 
NULL (which is not TRUE, so it counts as mismatch)?


I suspect this whole paragraph (talking about NULL in old/new rows) 
might be a bit too detailed / low-level for user docs.



3) create_subscription.sgml

WHERE clauses, rows must satisfy all expressions
to be copied. If the subscriber is a

I'm rather skeptical about the principle that all expressions have to 
match - I'd have expected exactly the opposite behavior, actually.


I see a subscription as "a union of all publications". Imagine for 
example you have a data set for all customers, and you create a 
publication for different parts of the world, like


  CREATE PUBLICATION customers_france
 FOR TABLE customers WHERE (country = 'France');

  CREATE PUBLICATION customers_germany
 FOR TABLE customers WHERE (country = 'Germany');

  CREATE PUBLICATION customers_usa
 FOR TABLE customers WHERE (country = 'USA');

and now you want to subscribe to multiple publications, because you want 
to replicate data for multiple countries (e.g. you want EU countries). 
But if you do


  CREATE SUBSCRIPTION customers_eu
 PUBLICATION customers_france, customers_germany;

then you won't get anything, because each customer belongs to just a 
single country. Yes, I could create multiple individual subscriptions, 
one for each country, but that's inefficient and may have a different 
set of issues (e.g. keeping them in sync when a customer moves between 
countries).


I might have missed something, but I haven't found any explanation why 
the requirement to satisfy all expressions is the right choice.


IMHO this should be 'satisfies at least one expression' i.e. we should 
connect the expressions by OR, not AND.



4) pg_publication.c

It's a bit suspicious we're adding includes for parser to a place where 
there were none before. I wonder if this might indicate some layering 
issue, i.e. doing something in the wrong place ...



5) publicationcmds.c

I mentioned this in my last review [1] already, but I really dislike the 
fact that OpenTableList accepts a list containing one of two entirely 
separate node types (PublicationTable or Relation). It was modified to 
use IsA() instead of a flag, but I still find it ugly, confusing and 
possibly error-prone.


Also, not sure mentioning the two different callers explicitly in the 
OpenTableList comment is a great idea - it's likely to get stale if 
someone adds another caller.



6) parse_oper.c

I'm having some second thoughts about (not) allowing UDFs ...

Yes, I get that if the function starts failing, e.g. because querying a 
dropped table or something, that breaks the replication and can't be 
fixed without a resync.


That's pretty annoying, but maybe disallowing anything user-defined 
(functions and operators) is maybe overly anxious? Also, extensibility 
is one of the hallmarks of Postgres, and disallowing all custom UDF and 
operators seems to contradict that ...


Perhaps just explaining that the expression can / can't do in the docs, 
with clear warnings of the risks, would be acceptable.



7) exprstate_list

I'd just call the field / variable "exprstates", without indicating the 
data type. I don't think we do that anywhere.



8) RfCol

Do we actually need this struct? Why not to track just name or attnum, 
and lookup the other value in syscache when needed?



9)  rowfilter_expr_checker

   * Walk the parse-tree to decide if the row-filter is valid or not.

I don't see any clear explanation what does "valid" mean.


10) WHERE expression vs. data type

Seem ATExecAlterColumnType might need some changes, because changing a 
data type for column referenced by the expression triggers this:


  test=# alter table t alter COLUMN c type text;
  ERROR:  unexpected object depending on column: publication of
  table t in publication p


11) extra (unnecessary) parens in the deparsed expression

test=# alter publication p add table t where ((b < 100) and (c < 100));
ALTER PUBLICATION
test=# \dRp+ p
  Publication p
 Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
---++-+-+-+---+--
 user  | f  | t   | t   | t   | t | f
Tables:
"public.t" WHERE (((b < 100) AND (c < 100)))


12) WHERE expression vs. changing replica identity

Peter Smith already mentioned this

Re: Release 14 Schedule

2021-09-23 Thread Magnus Hagander
On Wed, Sep 22, 2021 at 11:01 PM Tom Lane  wrote:
>
> Magnus Hagander  writes:
> > On Wed, Sep 22, 2021 at 6:30 PM Tom Lane  wrote:
> >> I thought the point about FDWs was important because actual work (by
> >> FDW authors) is needed to make anything happen.  The extra parallelism
> >> inside plpgsql functions doesn't require user effort, so I don't see
> >> that it needs to be called out separately.
>
> > True, but I'm willing to guess we have a lot more people who are using
> > stored procs with return query and who are going to be very happy
> > about them now being much faster in cases where parallelism worked,
> > than we have people who are writing FDWs..
>
> Certainly.  But my sentence about "Numerous performance improvements"
> already mashes down dozens of other it-just-works-better-now
> performance improvements.  Wny call out that one in particular?

Just my guestimate that that one is going to be one of the more
popular ones. But it is a guess. And I'm not feeling strongly enough
about it to argue that one - if you feel the one there now is more
important,t hen we go with it.


> If I had to pick out just one, I might actually lean towards mentioning
> 86dc90056, which might change users' calculus about how many partitions
> they can use.  (But I may be biased about that.)

Ah, so you posting that led me to re-read the whole thing again, this
time directly after caffeine.

It starts mentioning parallel query and it finishes with parallel
query. At a quick glance that gave me the impression the whole
paragraph was about "things related to improvements of parallel
query", given how it started.

Knowing that, I'd leave the "numerous improvements for parallel
queries" and just drop the specific mention of FDWs personally. We we
want to call out something in particular at the end, I agree that
86dc90056 is probably a better choice than either of the other two for
being the called-out one.


> > My take on that is audience. It's an important feature for existing
> > users of PostgreSQL, and an important change over how the system
> > behaved before. They are more likely to read the release notes. The
> > press release is more about reaching people who are not already using
> > postgres, or who are so but more tangentially.
>
> Perhaps, but on those grounds, the business about reducing B-tree bloat
> doesn't belong in the press release either.  Anyway I'm not sure the
> audiences are so different --- if I thought they were, I'd not have
> started from the press release.

True on the btree point.


> My feeling is that the initial summary in the release notes is meant
> to be a 1-meter overview of what's new in the release.  As soon
> as you get past that bullet list, you find yourself right down in the
> weeds, so an overview is good to have.  The press release can afford
> to fly a little lower than 1 meters, though not by all that much.

I'm not really sure of the value of having two different set of
summaries if they're basically targeting the same group. But now is
not the time to bikeshed about the overall structure I think, so I'm
fine keeping it that way. And if that is the target audience then yes,
it makes sense not to have something in the release notes summary
that's not in the press release. I would then argue for *including*
the emergency mode vacuum in the press release.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




DOC: Progress Reporting page

2021-09-23 Thread Marcos Pegoraro
Page explaining progress reporting views, for all versions, have "The
tables" expression several times when it points to a single table. So,
singular expressions should be used, right ?

"The tables below describe the information that will be reported and
provide information about how to interpret it."

regards,
Marcos


Re: DOC: Progress Reporting page

2021-09-23 Thread Ranier Vilela
Em qui., 23 de set. de 2021 às 09:41, Marcos Pegoraro 
escreveu:

> Page explaining progress reporting views, for all versions, have "The
> tables" expression several times when it points to a single table. So,
> singular expressions should be used, right ?
>
> "The tables below describe the information that will be reported and
> provide information about how to interpret it."
>
I think documentation refers to "html tables" and not "postgres tables".
Maybe could use other terms to be clearer.

regards,
Ranier Vilela


Re: Hook for extensible parsing.

2021-09-23 Thread Julien Rouhaud
On Thu, Sep 23, 2021 at 07:37:27AM +0100, Simon Riggs wrote:
> On Thu, 16 Sept 2021 at 05:33, Julien Rouhaud  wrote:
> 
> > Would any of that be a reasonable approach?
> 
> The way I summarize all of the above is that
> 1) nobody is fundamentally opposed to the idea
> 2) we just need to find real-world example(s) and show that any
> associated in-core patch provides all that is needed in a clean way,
> since that point is currently in-doubt by senior committers.
> 
> So what is needed is some actual prototypes that explore this. I guess
> that means they have to be open source, but those examples could be
> under a different licence, as long as the in-core patch is clearly a
> project submission to PostgreSQL.
> 
> I presume a few real-world examples could be:
> * Grammar extensions to support additional syntax for Greenplum, Citus, XL
> * A grammar that adds commands for an extension, such as pglogical
> (Jim's example)
> * A strict SQL Standard grammar/parser
> * GQL implementation

As I mentioned, there's at least one use case that would work with that
approach that I will be happy to code in hypopg, which is an open source
project.  As a quick prototype, here's a basic overview of how I can use this
hook to implement a CREATE HYPOTHETICAL INDEX command:

rjuju=# LOAD 'hypopg';
LOAD
rjuju=# create hypothetical index meh on t1(id);
CREATE INDEX
rjuju=# explain select * from t1 where id = 1;
   QUERY PLAN

 Index Scan using "<13543>btree_t1_id" on t1  (cost=0.04..8.05 rows=1 width=13)
   Index Cond: (id = 1)
(2 rows)

rjuju=# \d t1
 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 id | integer |   |  |
 val| text|   |  |


My POC's grammar is only like:

CREATE HYPOTHETICAL INDEX opt_index_name ON relation_expr '(' index_params ')'
{
IndexStmt *n = makeNode(IndexStmt);
n->idxname = $4;
n->relation = $6;
n->accessMethod = DEFAULT_INDEX_TYPE;
n->indexParams = $8;
n->options = list_make1(makeDefElem("hypothetical", NULL, -1));
$$ = (Node *) n;
}

as I'm not willing to import the whole CREATE INDEX grammar for now for a patch
that may be rejected.  I can however publish this POC if that helps.  Note
that once my parser returns this parse tree, all I need to do is intercept
IndexStmt containing this option in a utility_hook and run my code rather than
a plain DefineIndex(), which works as intended as I showed above.

One could easily imagine similar usage to extend existing commands, like
implementing a new syntax on top of CREATE TABLE to implement an automatic
partition creation grammar (which would return multiple CreateStmt),
or even a partition manager.

I'm not an expert in other RDBMS syntax, but maybe you could use such a
hook to implement SQL Server or mysql syntax, which use at least different
quoting rules.  Maybe Amazon people could confirm that as it looks like they
implemented an SQL Server parser using a similar hook?

So yes you can't create new commands or implement grammars that require
additional semantic analysis with this hook, but I think that there are still
real use cases that can be implemented using only a different parser.




Re: DOC: Progress Reporting page

2021-09-23 Thread Marcos Pegoraro
>
>
> Page explaining progress reporting views, for all versions, have "The
>> tables" expression several times when it points to a single table. So,
>> singular expressions should be used, right ?
>>
>> "The tables below describe the information that will be reported and
>> provide information about how to interpret it."
>>
> I think documentation refers to "html tables" and not "postgres tables".
> Maybe could use other terms to be clearer.
>

Sure, The tables are HTML tables, but it points to one table, why is it
using plural form ?


Re: DOC: Progress Reporting page

2021-09-23 Thread Ranier Vilela
Em qui., 23 de set. de 2021 às 10:37, Marcos Pegoraro 
escreveu:

>
>> Page explaining progress reporting views, for all versions, have "The
>>> tables" expression several times when it points to a single table. So,
>>> singular expressions should be used, right ?
>>>
>>> "The tables below describe the information that will be reported and
>>> provide information about how to interpret it."
>>>
>> I think documentation refers to "html tables" and not "postgres tables".
>> Maybe could use other terms to be clearer.
>>
>
> Sure, The tables are HTML tables, but it points to one table, why is it
> using plural form ?
>
Because there are two html tables:
*Table 27.32. pg_stat_progress_analyze View*
and
*Table 27.33. ANALYZE phases*

regards,
Ranier Vilela


Re: Logical replication timeout problem

2021-09-23 Thread Amit Kapila
On Wed, Sep 22, 2021 at 9:46 PM Fabrice Chapuis  wrote:
>
> If you would like I can test the patch you send to me.
>

Okay, please find an attached patch for additional logs. I would like
to see the logs during the time when walsender appears to be writing
to files. We might need to add more logs to find the exact problem but
let's start with this.

-- 
With Regards,
Amit Kapila.


log_keep_alive_1.patch
Description: Binary data


Re: DOC: Progress Reporting page

2021-09-23 Thread Marcos Pegoraro
> *Table 27.32. pg_stat_progress_analyze View*
> and
> *Table 27.33. ANALYZE phases*
>

you´re right, I didn´t see that always have a phases table later. sorry ...


Re: [Proposal] Global temporary tables

2021-09-23 Thread Tony Zhu
Hi Wenjing

we have reviewed the code, and done the regression tests,  all tests is pass,  
we believe the feature code quality is ready for production ; and I will change 
the status to "Ready for commit"

Re: Hook for extensible parsing.

2021-09-23 Thread Tom Lane
Julien Rouhaud  writes:
> My POC's grammar is only like:

> CREATE HYPOTHETICAL INDEX opt_index_name ON relation_expr '(' index_params ')'
>   {
>   IndexStmt *n = makeNode(IndexStmt);
>   n->idxname = $4;
>   n->relation = $6;
>   n->accessMethod = DEFAULT_INDEX_TYPE;
>   n->indexParams = $8;
>   n->options = list_make1(makeDefElem("hypothetical", NULL, -1));
>   $$ = (Node *) n;
>   }

I'm not too impressed by this example, because there seems little
reason why you couldn't just define "hypothetical" as an index_param
option, and not need to touch the grammar at all.

> as I'm not willing to import the whole CREATE INDEX grammar for now for a 
> patch
> that may be rejected.

The fact that that's so daunting seems to me to be a perfect illustration
of the problems with this concept.  Doing anything interesting with a
hook like this will create a maintenance nightmare, because you'll have
to duplicate (and track every change in) large swaths of gram.y.  To the
extent that you fail to, say, match every detail of the core's expression
grammar, you'll be creating a crappy user experience.

> that once my parser returns this parse tree, all I need to do is intercept
> IndexStmt containing this option in a utility_hook and run my code rather than
> a plain DefineIndex(), which works as intended as I showed above.

And I'm even less impressed by the idea of half a dozen extensions
each adding its own overhead to the parser and also to ProcessUtility
so that they can process statements in this klugy, highly-restricted
way.

I do have sympathy for the idea that extensions would like to define
their own statement types.  I just don't see a practical way to do it
in our existing parser infrastructure.  This patch certainly doesn't
offer that.

regards, tom lane




Re: mark the timestamptz variant of date_bin() as stable

2021-09-23 Thread John Naylor
On Thu, Sep 23, 2021 at 4:13 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:
>
> Hi John,
>
> > Any thoughts on the doc patch?
>
> It so happened that I implemented a similar feature in TimescaleDB [1].
>
> I discovered that it's difficult from both developer's and user's
> perspectives to think about the behavior of the function in the
> context of given TZ and its complicated rules, as you are trying to do
> in the doc patch. So what we did instead is saying: for timestamptz
> the function works as if it was timestamp. E.g. time_bucket_ng("3
> month", "2021 Oct 03 12:34:56 TZ") = "2021 Jan 01 00:00:00 TZ" no
> matter what TZ it is and what rules (DST, corrections, etc) it has. It
> seems to be not only logical and easy to understand, but also easy to
> implement [2].
>
> Do you think it would be possible to adopt a similar approach in
> respect of documenting for date_bin()? To be honest, I didn't try to
> figure out what is the actual implementation of date_bin() for TZ
> case.

I think you have a point that it could be stated more simply and generally.
I'll try to move in that direction.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: extensible options syntax for replication parser?

2021-09-23 Thread Robert Haas
On Thu, Sep 23, 2021 at 2:55 AM tushar  wrote:
> l checked and look like the issue is still not fixed against v7-* patches -
>
> postgres=#  create subscription test CONNECTION 'host=127.0.0.1 user=edb 
> dbname=postgres' PUBLICATION p with (create_slot = true);
> ERROR:  could not create replication slot "test": ERROR:  syntax error

Thanks. Looks like that version had some stupid mistakes. Here's a new one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v8-0001-Flexible-options-for-BASE_BACKUP.patch
Description: Binary data


v8-0002-Flexible-options-for-CREATE_REPLICATION_SLOT.patch
Description: Binary data


Re: refactoring basebackup.c

2021-09-23 Thread Robert Haas
On Wed, Sep 22, 2021 at 12:41 PM Jeevan Ladhe
 wrote:
> If I set prefs->autoFlush to 0, then LZ4F_compressUpdate() returns an
> error: ERROR_dstMaxSize_tooSmall after a few iterations.
>
> After digging a bit in the source of LZ4F_compressUpdate() in LZ4 repository, 
> I
> see that it throws this error when the destination buffer capacity, which in
> our case is mysink->base.bbs_next->bbs_buffer_length is less than the
> compress bound which it calculates internally by calling LZ4F_compressBound()
> internally for buffered_bytes + input buffer(CHUNK_SIZE in this case). Not 
> sure
> how can we control this.

Uggh. It had been my guess was that the reason why
LZ4F_compressBound() was returning such a large value was because it
had to allow for the possibility of bytes inside of its internal
buffers. But, if the amount of internally buffered data counts against
the argument that you have to pass to LZ4F_compressBound(), then that
makes it more complicated.

Still, there's got to be a simple way to make this work, and it can't
involve setting autoFlush. Like, look at this:

https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c

That uses the same APIs that we're here and a fixed-size input buffer
and a fixed-size output buffer, just as we have here, to compress a
file. And it probably works, because otherwise it likely wouldn't be
in the "examples" directory. And it sets autoFlush to 0.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Logical replication timeout problem

2021-09-23 Thread Fabrice Chapuis
Thanks for your patch, we are going to set up a lab in order to debug the
function.
Regards
Fabrice

On Thu, Sep 23, 2021 at 3:50 PM Amit Kapila  wrote:

> On Wed, Sep 22, 2021 at 9:46 PM Fabrice Chapuis 
> wrote:
> >
> > If you would like I can test the patch you send to me.
> >
>
> Okay, please find an attached patch for additional logs. I would like
> to see the logs during the time when walsender appears to be writing
> to files. We might need to add more logs to find the exact problem but
> let's start with this.
>
> --
> With Regards,
> Amit Kapila.
>


Re: when the startup process doesn't (logging startup delays)

2021-09-23 Thread Robert Haas
On Wed, Sep 22, 2021 at 10:28 AM Nitin Jadhav
 wrote:
> Thanks Justin for the detailed explanation.  Done the necessary changes.

Not really. The documentation here does not make a ton of sense:

+ Sets the time interval between each progress update of the operations
+ performed by the startup process. This produces the log messages for
+ those operations which take longer than the specified
duration. The unit
+ used to specify the value is milliseconds. For example, if
you set it to
+  10s , then every  10s
, a log is
+ emitted indicating which operation is ongoing, and the
elapsed time from
+ the beginning of the operation. If the value is set to
 0 ,
+ then all messages for such operations are logged. 
-1 
+ disables the feature. The default value is  10s 

I really don't know what to say about this. You say that the time is
measured in milliseconds, and then immediately turn around and say
"For example, if you set it to 10s". Now we do expect that most people
will set it to intervals that are measured in seconds rather than
milliseconds, but saying that setting it to a value measured in
seconds is an example of setting it in milliseconds is not logical. It
also looks pretty silly to say that if you set the value to 10s,
something will happen every 10s. What else would anyone expect to
happen? You really need to give some thought to how to explain the
behavior in a way that is clear and logical but not overly wordy.
Also, please note that you've got spaces around the literals, which
does not match the surrounding style and does not render properly in
HTML.

It is also not logical to define 0 as meaning that "all messages for
such operations are logged". What does that even mean? It makes sense
for something like log_autovacuum_min_duration, because there we are
talking about logging one message per operation, and we could log
messages for all operations or just some of them. Here we are talking
about the time between one message and the next, so talking about "all
messages" does not really seem to make a lot of sense. Experimentally,
what 0 actually does is cause the system to spam log lines in a tight
loop, which cannot be what anyone wants. I think you should make 0
mean disabled, and a positive value mean log at that interval, and
disallow -1 altogether.

And on that note, I tested your patch with
log_startup_progress_interval=-1 and found that -1 behaves just like
0. In other words, contrary to what the documentation says, -1 does
not disable the feature. It instead behaves just like 0. It's really
confusing to me how you write documentation that says -1 has a certain
behavior without thinking about the fact that you haven't written any
code that would make -1 behave that way. And apparently you didn't
test it, either. It took me approximately 1 minute of testing to find
that this is broken, which really is not very much.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: mark the timestamptz variant of date_bin() as stable

2021-09-23 Thread John Naylor
I wrote:

> I think you have a point that it could be stated more simply and
generally. I'll try to move in that direction.

On second thought, I don't think this is helpful. Concrete examples are
easier to reason about.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: prevent immature WAL streaming

2021-09-23 Thread Alvaro Herrera
This took some time because backpatching the tests was more work than I
anticipated -- function name changes, operators that don't exist,
definition of the WAL segment size in pg_settings.  I had to remove the
second test in branches 13 and earlier due to lack of LSN+bytes
operator.  Fortunately, the first test (which is not as clean) still
works all the way back.

However, I notice now that the pg_rewind tests reproducibly fail in
branch 14 for reasons I haven't yet understood.  It's strange that no
other branch fails, even when run quite a few times.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)




Re: Estimating HugePages Requirements?

2021-09-23 Thread Robert Haas
On Tue, Sep 21, 2021 at 11:53 PM Michael Paquier  wrote:
> I am not sure either as that's a tradeoff between an underestimation
> and no information.  The argument of consistency indeed matters.
> Let's see if others have any opinion to share on this point.

Well, if we think the information won't be safe to use, it's better to
report nothing than a wrong value, I think.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: mem context is not reset between extended stats

2021-09-23 Thread Tomas Vondra

Hi,

I've pushed both of these patches, with some minor tweaks (freeing the 
statistics list, and deleting the new context), and backpatched them all 
the way to 10.


Thanks for the report & patch, Justin!


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Release 14 Schedule

2021-09-23 Thread Robert Haas
On Wed, Sep 22, 2021 at 12:00 PM Jonathan S. Katz  wrote:
> - Numerous performance ...
> - B-tree...
> - Subscripting ...
> - Range types ...
> - Stored ...
> - Extended ...
> - SEARCH / CYCLE ...
> - libpq ...
> - TOAST ...
> (- emergency mode vacuum ...)

My opinion is that this is awfully long for a list of major features.
But Tom said 10 or so was typical, so perhaps I am all wet.

Still, this kind of seems like a laundry list to me. I'd argue for
cutting range types, extended statistics, SEARCH / CYCLE, TOAST, and
emergency mode vacuum. They're all nice, and I'm glad we have them,
but they're also things that only people who are deeply steeped in
PostgreSQL already seem likely to appreciate. Better scalability, less
bloat, working OUT parameters, and query pipelining have benefits
anyone can understand.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




missing indexes in indexlist with partitioned tables

2021-09-23 Thread Arne Roland
Hi,


I did a bit of testing today and noticed that we don't set indexlist properly 
at the right time in some cases when using partitioned tables.


I attached a simple case where the indexlist doesn't seems to be set at the 
right time. get_relation_info in plancat.c seems to process it only after 
analyzejoins.c checked for it.


Can someone explain to me why it is deferred at all?


Regards

Arne



indexlist_to_late.sql
Description: indexlist_to_late.sql


Re: CLUSTER on partitioned index

2021-09-23 Thread Matthias van de Meent
On Sun, 12 Sept 2021 at 22:10, Justin Pryzby  wrote:
>
> On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:
> > I have to wonder if there really *is* a use case for CLUSTER in the
> > first place on regular tables, let alone on partitioned tables, which
> > are likely to be large and thus take a lot of time.
>
> The cluster now is done one partition at a time, so it might take a long time,
> but doesn't lock the whole partition heirarchy.  Same as VACUUM (since v10) 
> and
> (since v14) REINDEX.

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

I noticed that you store the result of find_all_inheritors(...,
NoLock) in get_tables_to_cluster_partitioned, without taking care of
potential concurrent partition hierarchy changes that the comment on
find_all_inheritors warns against or documenting why it is safe, which
sounds dangerous in case someone wants to adapt the code. One problem
I can think of is that only storing reloid and indoid is not
necessarily safe, as they might be reused by drop+create table running
parallel to the CLUSTER command.

Apart from that, I think it would be useful (though not strictly
necessary for this patch) if you could adapt the current CLUSTER
progress reporting to report the progress for the whole partition
hierarchy, instead of a new progress report for each relation, as was
the case in earlier versions of the patchset.

The v11 patch seems quite incomplete when compared to previous
discussions, or at the very least is very limited (no ALTER TABLE
clustering commands for partitioned tables, but `CLUSTER ptable USING
pindex` is supported). If v11 is the new proposed direction for ptable
clustering, could you also document these limitations in the
cluster.sgml and alter_table.sgml docs?

> [ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ]

> diff --git a/src/test/regress/expected/cluster.out 
> b/src/test/regress/expected/cluster.out
> ...
> +ALTER TABLE clstrpart SET WITHOUT CLUSTER;
> +ERROR:  cannot mark index clustered in partitioned table

This error message does not seem to match my expectation as a user: I
am not trying to mark an index as clustered, and for a normal table
"SET WITHOUT CLUSTER" does not fail for unclustered tables. I think
that behaviour of normal unclustered tables should be shared here as
well. At the very least, the error message should be changed.

>  ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
>  ERROR:  cannot mark index clustered in partitioned table

A "HINT: use the CLUSTER command to cluster partitioned tables" (or
equivalent) should be added if we decide to keep the clustering APIs
of ALTER TABLE disabled for partitioned tables, as CLUSTER is now
implemented for partitioned tables.

> -DROP TABLE clstrpart;

I believe that this cleanup should not be fully removed, but moved to
before '-- Test CLUSTER with external tuplesorting', as the table is
not used after that line.

Kind regards,

Matthias van de Meent




Re: relation OID in ReorderBufferToastReplace error message

2021-09-23 Thread Alvaro Herrera
On 2021-Sep-23, Jeremy Schneider wrote:

> On 9/22/21 20:11, Amit Kapila wrote:
> > 
> > On Thu, Sep 23, 2021 at 3:06 AM Jeremy Schneider  
> > wrote:
> >>
> >> Any chance of back-patching this?
> > 
> > Normally, we don't back-patch code improvements unless they fix some
> > bug or avoid future back-patch efforts. So, I am not inclined to
> > back-patch this but if others also feel strongly about this then we
> > can consider it.
> 
> The original thread about the logical replication bugs spawned a few
> different threads and code changes. The other code changes coming out of
> those threads were all back-patched, but I guess I can see arguments
> both ways on this one.

I think that for patches that are simple debugging aids we do
backpatch, with the intent to get them deployed in users' systems as
soon and as widely possible.  I did that in this one, for example

Author: Alvaro Herrera 
Branch: master [961dd7565] 2021-08-30 16:29:12 -0400
Branch: REL_14_STABLE [eae08e216] 2021-08-30 16:29:12 -0400
Branch: REL_13_STABLE [6197d7b53] 2021-08-30 16:29:12 -0400
Branch: REL_12_STABLE [fa8ae19be] 2021-08-30 16:29:12 -0400
Branch: REL_11_STABLE [0105b7aaa] 2021-08-30 16:29:12 -0400
Branch: REL_10_STABLE [02797ffa9] 2021-08-30 16:29:12 -0400
Branch: REL9_6_STABLE [37e468252] 2021-08-30 16:29:12 -0400

Report tuple address in data-corruption error message

Most data-corruption reports mention the location of the problem, but
this one failed to.  Add it.

Backpatch all the way back.  In 12 and older, also assign the
ERRCODE_DATA_CORRUPTED error code as was done in commit fd6ec93bf890 for
13 and later.

Discussion: https://postgr.es/m/202108191637.oqyzrdtnheir@alvherre.pgsql

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from 
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
 Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg0.php




Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-23 Thread Robert Haas
On Mon, Sep 20, 2021 at 11:20 AM Amul Sul  wrote:
> Ok, understood, I have separated my changes into 0001 and 0002 patch,
> and the refactoring patches start from 0003.

I think it would be better in the other order, with the refactoring
patches at the beginning of the series.

> In the 0001 patch, I have copied ArchiveRecoveryRequested to shared
> memory as said previously. Coping ArchiveRecoveryRequested value to
> shared memory is not really interesting, and I think somehow we should
> reuse existing variable, (perhaps, with some modification of the
> information it can store, e.g. adding one more enum value for
> SharedRecoveryState or something else), thinking on the same.
>
> In addition to that, I tried to turn down the scope of
> ArchiveRecoveryRequested global variable. Now, this is a static
> variable, and the scope is limited to xlog.c file like
> LocalXLogInsertAllowed and can be accessed through the newly added
> function ArchiveRecoveryIsRequested() (like PromoteIsTriggered()). Let
> me know what you think about the approach.

I'm not sure yet whether I like this or not, but it doesn't seem like
a terrible idea. You spelled UNKNOWN wrong, though, which does seem
like a terrible idea. :-) "acccsed" is not correct either.

Also, the new comments for ArchiveRecoveryRequested /
ARCHIVE_RECOVERY_REQUEST_* are really not very clear. All you did is
substitute the new terminology into the existing comment, but that
means that the purpose of the new "unknown" value is not at all clear.

Consider the following two patch fragments:

+ * SharedArchiveRecoveryRequested indicates whether an archive recovery is
+ * requested. Protected by info_lck.
...
+ * Remember archive recovery request in shared memory state.  A lock is not
+ * needed since we are the only ones who updating this.

These two comments directly contradict each other.

+ SpinLockAcquire(&XLogCtl->info_lck);
+ XLogCtl->SharedArchiveRecoveryRequested = false;
+ ArchiveRecoveryRequested = ARCHIVE_RECOVERY_REQUEST_UNKOWN;
+ SpinLockRelease(&XLogCtl->info_lck);

This seems odd to me. In the first place, there doesn't seem to be any
value in clearing this -- we're just expending extra CPU cycles to get
rid of a value that wouldn't be used anyway. In the second place, if
somehow someone checked the value after this point, with this code,
they might get the wrong answer, whereas if you just deleted this,
they would get the right answer.

> In 0002 patch is a mixed one where I tried to remove the dependencies
> on global variables and local variables belonging to StartupXLOG(). I
> am still worried about the InRecovery value that needs to be deduced
> afterward inside XLogAcceptWrites().  Currently, relying on
> ControlFile->state != DB_SHUTDOWNED check but I think that will not be
> good for ASRO where we plan to skip XLogAcceptWrites() work only and
> let the StartupXLOG() do the rest of the work as it is where it will
> going to update ControlFile's DBState to DB_IN_PRODUCTION, then we
> might need some ugly kludge to call PerformRecoveryXLogAction() in
> checkpointer irrespective of DBState, which makes me a bit
> uncomfortable.

I think that replacing the if (InRecovery) test with if
(ControlFile->state != DB_SHUTDOWNED) is probably just plain wrong. I
mean, there are three separate places where we set InRecovery = true.
One of those executes if ControlFile->state != DB_SHUTDOWNED, matching
what you have here, but it also can happen if checkPoint.redo <
RecPtr, or if read_backup_label is true and ReadCheckpointRecord
returns non-NULL. Now maybe you're going to tell me that in those
other two cases we can't reach here anyway, but I don't see off-hand
why that should be true, and even if it is true, it seems like kind of
a fragile thing to rely on. I think we need to rely on something in
shared memory that is more explicitly connected to the question of
whether we are in recovery.

The other part of this patch has to do with whether we can use the
return value of GetLastSegSwitchData as a substitute for relying on
EndOfLog. Now as you have it, you end up creating a local variable
called EndOfLog that shadows another such variable in an outer scope,
which probably would not make anyone who noticed things in such a
state very happy. However, that will naturally get fixed if you
reorder the patches as per above, so let's turn to the central
question: is this a good way of getting EndOfLog? The value that would
be in effect at the time this code is executed is set here:

XLogBeginRead(xlogreader, LastRec);
record = ReadRecord(xlogreader, PANIC, false);
EndOfLog = EndRecPtr;

Subsequently we do this:

/* start the archive_timeout timer and LSN running */
XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
XLogCtl->lastSegSwitchLSN = EndOfLog;

So at that point the value that GetLastSegSwitchData() would return
has to match what's in the existing variable. But later XLogWrite()
will change the value. So the question boils down 

Recent cpluspluscheck failures

2021-09-23 Thread Tom Lane
On HEAD, I see these headers failing to compile standalone:

$ src/tools/pginclude/cpluspluscheck 
In file included from /tmp/cpluspluscheck.XxTv1i/test.cpp:3:
./src/include/common/unicode_east_asian_fw_table.h:3:32: error: elements of 
array 'const mbinterval east_asian_fw []' have incomplete type
 static const struct mbinterval east_asian_fw[] = {
^
./src/include/common/unicode_east_asian_fw_table.h:3:32: error: storage size of 
'east_asian_fw' isn't known
In file included from /tmp/cpluspluscheck.XxTv1i/test.cpp:3:
./src/include/replication/worker_internal.h:60:2: error: 'FileSet' does not 
name a type
  FileSet*stream_fileset;
  ^~~

The first of these is evidently the fault of bab982161 (Update display
widths as part of updating Unicode), which introduced that header.
The second seems to have been introduced by 31c389d8d (Optimize fileset
usage in apply worker).

Please fix.

regards, tom lane




Re: OpenSSL 3.0.0 compatibility

2021-09-23 Thread Daniel Gustafsson
> On 22 Sep 2021, at 10:06, Daniel Gustafsson  wrote:

> Agreed, I will go ahead and prep backpatches for 318df8 and 72bbff4cd.

These commits are enough to keep 14 happy, and I intend to apply them tomorrow
after another round of testing and caffeine.

For the 13- backbranches we also need to backport 22e1943f1 ("pgcrypto: Check
for error return of px_cipher_decrypt()" by Peter E) in order to avoid
incorrect results for decrypt tests on disallowed ciphers.  Does anyone have
any concerns about applying this to backbranches?

13 and older will, when compiled against OpenSSL 3.0.0, produce a fair amount
of compiler warnings on usage of depreceted functionality but there is really
anything we can do as suppressing that is beyond the scope of a backpatchable
fix IMHO.

--
Daniel Gustafsson   https://vmware.com/





Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-09-23 Thread Robert Haas
On Wed, Sep 22, 2021 at 3:07 PM Shruthi Gowda  wrote:
> > - The comment in binary_upgrade_set_pg_class_oids() is still not
> > accurate. You removed the sentence which says "Indexes cannot have
> > toast tables, so we need not make this probe in the index code path"
> > but the immediately preceding sentence is still inaccurate in at least
> > two ways. First, it only talks about tables, but the code now applies
> > to indexes. Second, it only talks about OIDs, but now also deals with
> > refilenodes. It's really important to fully update every comment that
> > might be affected by your changes!
>
> The comment is updated.

Looks good.

> The SQL query will not result in duplicate rows because the first join
> filters the duplicate rows if any with the on clause ' i.indisvalid'
> on it. The result of the first join is further left joined with pg_class and
> pg_class will not have duplicate rows for a given oid.

Oh, you're right. My mistake.

> As per your suggestion, reloid and relfilenode are absorbed in the same place.
> An additional parameter called 'suppress_storage' is passed to heap_create()
> which indicates whether or not to create the storage when the caller
> passed a valid relfilenode.

I find it confusing to have both suppress_storage and create_storage
with one basically as the negation of the other. To avoid that sort of
thing I generally have a policy that variables and options should say
whether something should happen, rather than whether it should be
prevented from happening. Sometimes there are good reasons - such as
strong existing precedent - to deviate from this practice but I think
it's good to follow when possible. So my proposal is to always have
create_storage and never suppress_storage, and if some function needs
to adjust the value of create_storage that was passed to it then OK.

> I did not make the changes to set the oid and relfilenode in the same call.
> I feel the uniformity w.r.t the other function signatures in
> pg_upgrade_support.c will be lost because currently each function sets
> only one attribute.
> Also, renaming the applicable function names to represent that they
> set both oid and relfilenode will make the function name even longer.
> We may opt to not include the relfilenode in the function name instead
> use a generic name like binary_upgrade_set_next_xxx_pg_class_oid() but
> then
> we will end up with some functions that set two attributes and some
> functions that set one attribute.

OK.

> I have also attached the patch to preserve the DB oid. As discussed,
> template0 will be created with a fixed oid during initdb. I am using
> OID 2 for template0. Even though oid 2 is already in use for the
> 'pg_am' catalog I see no harm in using it for template0 DB because oid
> doesn’t have to be unique across the database - it has to be unique
> for the particular catalog table. Kindly let me know if I am missing
> something?
> Apparently, if we did decide to pick an unused oid for template0 then
> I see a challenge in removing that oid from the unused oid list. I
> could not come up with a feasible solution for handling it.

You are correct that there is no intrinsic reason why the same OID
can't be used in various different catalogs. We have a policy of not
doing that, though; I'm not clear on the reason. Maybe it'd be OK to
deviate from that policy here, but another option would be to simply
change the unused_oids script (and maybe some of the others). It
already has:

my $FirstGenbkiObjectId =
  Catalog::FindDefinedSymbol('access/transam.h', '..', 'FirstGenbkiObjectId');
push @{$oids}, $FirstGenbkiObjectId;

Presumably it could be easily adapted to push the value of some other
defined symbol into @{$oids} also, thus making that OID in effect
used.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Recent cpluspluscheck failures

2021-09-23 Thread John Naylor
On Thu, Sep 23, 2021 at 2:37 PM Tom Lane  wrote:
>
> On HEAD, I see these headers failing to compile standalone:
>
> $ src/tools/pginclude/cpluspluscheck
> In file included from /tmp/cpluspluscheck.XxTv1i/test.cpp:3:
> ./src/include/common/unicode_east_asian_fw_table.h:3:32: error: elements
of array 'const mbinterval east_asian_fw []' have incomplete type
>  static const struct mbinterval east_asian_fw[] = {
> ^
> ./src/include/common/unicode_east_asian_fw_table.h:3:32: error: storage
size of 'east_asian_fw' isn't known

Okay, this file is used similarly to
src/include/common/unicode_combining_table.h, which has an exception in the
check script, so I'll add another exception.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Column Filtering in Logical Replication

2021-09-23 Thread Tomas Vondra

Hi,

I wanted to do a review of this patch, but I'm a bit confused about 
which patch(es) to review. There's the v5 patch, and then these two 
patches - which seem to be somewhat duplicate, though.


Can anyone explain what's the "current" patch version, or perhaps tell 
me which of the patches to combine?



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Recent cpluspluscheck failures

2021-09-23 Thread Tom Lane
John Naylor  writes:
> On Thu, Sep 23, 2021 at 2:37 PM Tom Lane  wrote:
>> On HEAD, I see these headers failing to compile standalone:
>> $ src/tools/pginclude/cpluspluscheck
>> In file included from /tmp/cpluspluscheck.XxTv1i/test.cpp:3:
>> ./src/include/common/unicode_east_asian_fw_table.h:3:32: error: elements
> of array 'const mbinterval east_asian_fw []' have incomplete type

> Okay, this file is used similarly to
> src/include/common/unicode_combining_table.h, which has an exception in the
> check script, so I'll add another exception.

OK, but see also src/tools/pginclude/headerscheck.

regards, tom lane




Re: very long record lines in expanded psql output

2021-09-23 Thread Andrew Dunstan


On 8/23/21 2:00 PM, Platon Pronko wrote:
> Hi!
>
> Apparently I did forget something, and that's the patch itself :)
> Thanks for Justin Pryzby for pointing this out.
>
> Attaching the patch now.
>
>

(Please avoid top-posting on PostgreSQL lists)


This patch seems basically sound. A couple of things:


1. It's not following project indentation style (BSD brace placement)

2. It would possibly be better to pass the relevant parts of the options
to print_aligned_vertical_line() rather than the whole options
structure. It feels odd to pass both that and opt_border.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Recent cpluspluscheck failures

2021-09-23 Thread John Naylor
On Thu, Sep 23, 2021 at 3:24 PM Tom Lane  wrote:
>
> John Naylor  writes:
> > On Thu, Sep 23, 2021 at 2:37 PM Tom Lane  wrote:
> >> On HEAD, I see these headers failing to compile standalone:
> >> $ src/tools/pginclude/cpluspluscheck
> >> In file included from /tmp/cpluspluscheck.XxTv1i/test.cpp:3:
> >> ./src/include/common/unicode_east_asian_fw_table.h:3:32: error:
elements
> > of array 'const mbinterval east_asian_fw []' have incomplete type
>
> > Okay, this file is used similarly to
> > src/include/common/unicode_combining_table.h, which has an exception in
the
> > check script, so I'll add another exception.
>
> OK, but see also src/tools/pginclude/headerscheck.
>
> regards, tom lane

Oh, I didn't know there was another one, will add it there also.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Gather performance analysis

2021-09-23 Thread Robert Haas
On Wed, Sep 8, 2021 at 2:06 AM Dilip Kumar  wrote:
> But I am attaching both the patches in case you want to play around.

I don't really see any reason not to commit 0001. Perhaps some very
minor grammatical nitpicking is in order here, but apart from that I
can't really see anything to criticize with this approach. It seems
safe enough, it's not invasive in any way that matters, and we have
benchmark results showing that it works well. If someone comes up with
something even better, no harm done, we can always change it again.

Objections?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

2021-09-23 Thread Tom Lane
Joe Wildish"  writes:
> On Wed, 22 Sep 2021, at 17:09, Tom Lane wrote:
> The main change is a switch to using SPI for expression evaluation.  The 
> plans are also cached along the same lines as the RI trigger plans.
>> 
>> I really dislike this implementation technique.  Aside from the likely
>> performance hit for existing triggers, I think it opens serious security
>> holes, because we can't fully guarantee that deparse-and-reparse doesn't
>> change the semantics.

> Where do you consider the performance hit to be?

The deparse/reparse cost might not be negligible, and putting SPI into
the equation where it was not before is certainly going to add overhead.
Now maybe those things are negligible in context, but I wouldn't believe
it without seeing some performance numbers.

> Do you have a suggestion for an alternative? I guess it would be go to the 
> planner/executor directly with the node tree?

What I'd be thinking about is what it'd take to extend expression_planner
and related infrastructure to allow expressions containing SubLinks.
I fear there are a lot of moving parts there though, since the restriction
has been in place so long.

>> (Relevant to that, I wonder why this patch is only concerned with
>> WHEN clauses and not all the other places where we forbid subqueries
>> for implementation simplicity.)

> I don't know how many other places WHEN clauses are used. Rules, perhaps?

I'm thinking of things like CHECK constraints.  Grepping for calls to
expression_planner would give you a clearer idea of the scope.

> The short answer is this patch was written to solve a specific problem I had 
> rather than it being a more general attempt to remove all "subquery 
> forbidden" restrictions.

I won't carp too much if the initial patch only removes the restriction
for WHEN; but I'd like to see it lay the groundwork to remove the
restriction elsewhere as well.

regards, tom lane




Re: OpenSSL 3.0.0 compatibility

2021-09-23 Thread Robert Haas
On Thu, Sep 23, 2021 at 2:51 PM Daniel Gustafsson  wrote:
> For the 13- backbranches we also need to backport 22e1943f1 ("pgcrypto: Check
> for error return of px_cipher_decrypt()" by Peter E) in order to avoid
> incorrect results for decrypt tests on disallowed ciphers.  Does anyone have
> any concerns about applying this to backbranches?

To me it looks like it would be more concerning if we did not apply it
to back-branches.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Gather performance analysis

2021-09-23 Thread Tomas Vondra

On 9/23/21 9:31 PM, Robert Haas wrote:

On Wed, Sep 8, 2021 at 2:06 AM Dilip Kumar  wrote:

But I am attaching both the patches in case you want to play around.


I don't really see any reason not to commit 0001. Perhaps some very
minor grammatical nitpicking is in order here, but apart from that I
can't really see anything to criticize with this approach. It seems
safe enough, it's not invasive in any way that matters, and we have
benchmark results showing that it works well. If someone comes up with
something even better, no harm done, we can always change it again.

Objections?


Yeah, it seems like a fairly clear win, according to the benchmarks.

I did find some suspicious behavior on the bigger box I have available 
(with 2x xeon e5-2620v3), see the attached spreadsheet. But it seems 
pretty weird because the worst affected case is with no parallel workers 
(so the queue changes should affect it). Not sure how to explain it, but 
the behavior seems consistent.


Anyway, the other machine with a single CPU seems perfectly clean.



regards


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


queue-results.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

2021-09-23 Thread Robert Haas
On Thu, Sep 23, 2021 at 5:34 AM Joe Wildish  wrote:
> Regarding the deparse-and-reparse --- if I understand correctly, the core 
> problem is that we have no way of going from a node tree to a string, such 
> that the string is guaranteed to have the same meaning as the node tree? (I 
> did try just now to produce such a scenario with the patch but I couldn't get 
> ruleutils to emit the wrong thing).  Moreover, we couldn't store the string 
> for use with SPI, as the string would be subject to trigger-time search path 
> lookups.  That pretty much rules out SPI for this then.  Do you have a 
> suggestion for an alternative? I guess it would be go to the planner/executor 
> directly with the node tree?

I think hoping that you can ever make deparse and reparse reliably
produce the same result is a hopeless endeavor. Tom mentioned hazards
related to ambiguous constructs, but there's also often the risk of
concurrent DDL. Commit 5f173040e324f6c2eebb90d86cf1b0cdb5890f0a is a
cautionary tale, demonstrating that you can't even count on
schema_name.table_name to resolve to the same OID for the entire
duration of a single DDL command. The same hazard exists for
functions, operators, and anything else that gets looked up in a
system catalog.

I don't know what all of that means for your patch, but just wanted to
get my $0.02 in on the general topic.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Gather performance analysis

2021-09-23 Thread Robert Haas
On Thu, Sep 23, 2021 at 4:00 PM Tomas Vondra
 wrote:
> I did find some suspicious behavior on the bigger box I have available
> (with 2x xeon e5-2620v3), see the attached spreadsheet. But it seems
> pretty weird because the worst affected case is with no parallel workers
> (so the queue changes should affect it). Not sure how to explain it, but
> the behavior seems consistent.

That is pretty odd. I'm inclined to mostly discount the runs with
1 tuples because sending such a tiny number of tuples doesn't
really take any significant amount of time, and it seems possible that
variations in the runtime of other code due to code movement effects
could end up mattering more than the changes to the performance of
shm_mq. However, the results with a million tuples seem like they're
probably delivering statistically significant results ... and I guess
maybe what's happening is that the patch hurts when the tuples are too
big relative to the queue size.

I guess your columns are an md5 value each, which is 32 bytes +
overhead, so a 20-columns tuple is ~1kB. Since Dilip's patch flushes
the value to shared memory when more than a quarter of the queue has
been filled, that probably means we flush every 4-5 tuples. I wonder
if that means we need a smaller threshold, like 1/8 of the queue size?
Or maybe the behavior should be adaptive somehow, depending on whether
the receiver ends up waiting for data? Or ... perhaps only small
tuples are worth batching, so that the threshold for posting to shared
memory should be a constant rather than a fraction of the queue size?
I guess we need to know why we see the time spike up in those cases,
if we want to improve them.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-09-23 Thread Melanie Plageman
On Tue, Sep 14, 2021 at 9:30 PM Alvaro Herrera  wrote:
>
> On 2021-Sep-13, Melanie Plageman wrote:
>
> > I also think it makes sense to rename the pg_stat_buffer_actions view to
> > pg_stat_buffers and to name the columns using both the buffer action
> > type and buffer type -- e.g. shared, strategy, local. This leaves open
> > the possibility of counting buffer actions done on other non-shared
> > buffers -- like those done while building indexes or those using local
> > buffers. The third patch in the set does this (I wanted to see if it
> > made sense before fixing it up into the first patch in the set).
>
> What do you think of the idea of having the "shared/strategy/local"
> attribute be a column?  So you'd have up to three rows per buffer action
> type.  Users wishing to see an aggregate can just aggregate them, just
> like they'd do with pg_buffercache.  I think that leads to an easy
> decision with regards to this point:

I have rewritten the code to implement this.

>
>
> (It's weird to have enum values that are there just to indicate what's
> the maximum value.  I think that sort of thing is better done by having
> a "#define LAST_THING" that takes the last valid value from the enum.
> That would free you from having to handle the last value in switch
> blocks, for example.  LAST_OCLASS in dependency.h is a precedent on this.)
>

I have made this change.

The attached v8 patchset is rewritten to add in an additional dimension
-- buffer type. Now, a backend keeps track of how many buffers of a
particular type (e.g. shared, local) it has accessed in a particular way
(e.g. alloc, write). It also changes the naming of various structures
and the view members.

Previously, stats reset did not work since it did not consider live
backends' counters. Now, the reset message includes the current live
backends' counters to be tracked by the stats collector and used when
the view is queried.

The reset message is one of the areas in which I still need to do some
work -- I shoved the array of PgBufferAccesses into the existing reset
message used for checkpointer, bgwriter, etc. Before making a new type
of message, I would like feedback from a reviewer about the approach.

There are various TODOs in the code which are actually questions for the
reviewer. Once I have some feedback, it will be easier to address these
items.

There a few other items which may be material for other commits that
I would also like to do:
1) write wrapper functions for smgr* functions which count buffer
accesses of the appropriate type. I wasn't sure if these should
literally just take all the parameters that the smgr* functions take +
buffer type. Once these exist, there will be less possibility for
regressions in which new code is added using smgr* functions without
counting this buffer activity. Once I add these, I was going to go
through and replace existing calls to smgr* functions and thereby start
counting currently uncounted buffer type accesses (direct, local, etc).

2) Separate checkpointer and bgwriter into two views and add additional
stats to the bgwriter view.

3) Consider adding a helper function to pgstatfuncs.c to help create the
tuplestore. These functions all have quite a few lines which are exactly
the same, and I thought it might be nice to do something about that:
  pg_stat_get_progress_info(PG_FUNCTION_ARGS)
  pg_stat_get_activity(PG_FUNCTION_ARGS)
  pg_stat_get_buffers_accesses(PG_FUNCTION_ARGS)
  pg_stat_get_slru(PG_FUNCTION_ARGS)
  pg_stat_get_progress_info(PG_FUNCTION_ARGS)
I can imagine a function that takes a Datums array, a nulls array, and a
ResultSetInfo and then makes the tuplestore -- though I think that will
use more memory. Perhaps we could make a macro which does the initial
error checking (checking if caller supports returning a tuplestore)? I'm
not sure if there is something meaningful here, but I thought I would
ask.

Finally, I haven't removed the test in pg_stats and haven't done a final
pass for comment clarity, alphabetization, etc on this version.

- Melanie
From 479fdfc53d1a6ee9943bdc580884866e99497673 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 2 Sep 2021 11:47:41 -0400
Subject: [PATCH v8 2/2] Remove superfluous bgwriter stats

Remove stats from pg_stat_bgwriter which are now more clearly expressed
in pg_stat_buffers.

TODO:
- make pg_stat_checkpointer view and move relevant stats into it
- add additional stats to pg_stat_bgwriter
---
 doc/src/sgml/monitoring.sgml  | 47 ---
 src/backend/catalog/system_views.sql  |  6 +---
 src/backend/postmaster/checkpointer.c | 26 ---
 src/backend/postmaster/pgstat.c   |  5 ---
 src/backend/storage/buffer/bufmgr.c   |  6 
 src/backend/utils/adt/pgstatfuncs.c   | 30 -
 src/include/catalog/pg_proc.dat   | 22 -
 src/include/pgstat.h  | 10 --
 src/test/regress/expected/rules.out   |  5 ---
 9 files changed, 1 insertion(+), 156 deletions(-)

Re: Proposal: Save user's original authenticated identity for logging

2021-09-23 Thread Peter Eisentraut

On 23.09.21 12:34, Michael Paquier wrote:

On Wed, Sep 22, 2021 at 03:18:43PM +, Jacob Champion wrote:

On Wed, 2021-09-22 at 10:20 +0200, Peter Eisentraut wrote:

This should be added to each level of a function call that represents a
test.  This ensures that when a test fails, the line number points to
the top-level location of the test_role() call.  Otherwise it would
point to the connect_ok() call inside test_role().


Patch LGTM, sorry about that. Thanks!


For the places of the patch, that seems fine then.  Thanks!


committed


Do we need to care about that in other places?  We have tests in
src/bin/ using subroutines that call things from PostgresNode.pm or
TestLib.pm, like pg_checksums, pg_ctl or pg_verifybackup, just to name
three.


Yeah, at first glance, there is probably more that could be done.  Here, 
I was just looking at a place where it was already and was accidentally 
removed.





extended stats on partitioned tables

2021-09-23 Thread Justin Pryzby
extended stats objects are allowed on partitioned tables since v10.
https://www.postgresql.org/message-id/flat/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA%40mail.gmail.com
8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
This was the consequence of a commit to avoid an error I reported with stats on
inheritence parents (not partitioned tables).

preceding 859b3003de, stats on the parent table *did* improve the estimate,
so this part of the commit message seems to have been wrong?
|commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
|Don't build extended statistics on inheritance trees
...
|Moreover, the current selectivity estimation code only works with 
individual
|relations, so building statistics on inheritance trees would be pointless
|anyway.

|CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
|CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
|TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
|CREATE STATISTICS pp ON (a),(b) FROM p;
|VACUUM ANALYZE p;
|SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;

|postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP 
BY 1,2; abort;
| HashAggregate  (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 
rows=10 loops=1)

|postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
| HashAggregate  (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 
rows=10 loops=1)

So I think this is a regression, and extended stats should be populated for
partitioned tables - I had actually done that for some parent tables and hadn't
noticed that the stats objects no longer do anything.

That begs the question if the current behavior for inheritence parents is
correct..

CREATE TABLE p (i int, a int, b int);
CREATE TABLE pd () INHERITS (p);
INSERT INTO pd SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
CREATE STATISTICS pp ON (a),(b) FROM p;
VACUUM ANALYZE p;
explain analyze SELECT a,b FROM p GROUP BY 1,2;

| HashAggregate  (cost=25.99..26.99 rows=100 width=8) (actual time=3.268..3.284 
rows=10 loops=1)

Since child tables can be queried directly, it's a legitimate question whether
we should collect stats for the table heirarchy or (since the catalog only
supports one) only the table itself.  I'd think that stats for the table
hierarchy would be more commonly useful (but we shouldn't change the behavior
in existing releases again).  Anyway it seems unfortunate that
statistic_ext_data still has no stxinherited.

Note that for partitioned tables if I enable enable_partitionwise_aggregate,
then stats objects on the child tables can be helpful (but that's also
confusing to the question at hand).




Re: OpenSSL 3.0.0 compatibility

2021-09-23 Thread Peter Eisentraut

On 23.09.21 20:51, Daniel Gustafsson wrote:

For the 13- backbranches we also need to backport 22e1943f1 ("pgcrypto: Check
for error return of px_cipher_decrypt()" by Peter E) in order to avoid
incorrect results for decrypt tests on disallowed ciphers.  Does anyone have
any concerns about applying this to backbranches?


This should be backpatched as a bug fix.


13 and older will, when compiled against OpenSSL 3.0.0, produce a fair amount
of compiler warnings on usage of depreceted functionality but there is really
anything we can do as suppressing that is beyond the scope of a backpatchable
fix IMHO.


Right, that's just a matter of adjusting the compiler warnings.

Earlier in this thread, I had suggested backpatching the 
OPENSSL_API_COMPAT definition to PG13, but now I'm thinking I wouldn't 
bother, since that still wouldn't help with anything older.






Re: Gather performance analysis

2021-09-23 Thread Tomas Vondra




On 9/23/21 10:31 PM, Robert Haas wrote:

On Thu, Sep 23, 2021 at 4:00 PM Tomas Vondra
 wrote:

I did find some suspicious behavior on the bigger box I have available
(with 2x xeon e5-2620v3), see the attached spreadsheet. But it seems
pretty weird because the worst affected case is with no parallel workers
(so the queue changes should affect it). Not sure how to explain it, but
the behavior seems consistent.


That is pretty odd. I'm inclined to mostly discount the runs with
1 tuples because sending such a tiny number of tuples doesn't
really take any significant amount of time, and it seems possible that
variations in the runtime of other code due to code movement effects
could end up mattering more than the changes to the performance of
shm_mq. However, the results with a million tuples seem like they're
probably delivering statistically significant results ... and I guess
maybe what's happening is that the patch hurts when the tuples are too
big relative to the queue size.



Agreed on 10k rows being too small, we can ignore that. And yes, binary 
layout might make a difference, of course. My rule of thumb is 5% (in 
both directions) is about the difference that might make, and most 
results are within that range.



I guess your columns are an md5 value each, which is 32 bytes +
overhead, so a 20-columns tuple is ~1kB. Since Dilip's patch flushes
the value to shared memory when more than a quarter of the queue has
been filled, that probably means we flush every 4-5 tuples. I wonder
if that means we need a smaller threshold, like 1/8 of the queue size?
Or maybe the behavior should be adaptive somehow, depending on whether
the receiver ends up waiting for data? Or ... perhaps only small
tuples are worth batching, so that the threshold for posting to shared
memory should be a constant rather than a fraction of the queue size?
I guess we need to know why we see the time spike up in those cases,
if we want to improve them.



Not sure about this, because

(a) That should affect both CPUs, I think, but i5-2500k does not have 
any such issue.


(b) One thing I haven't mentioned is I tried with larger queue sizes too 
(that's the 16kB, 64kB, 256kB and 1MB in columns). Although it's true 
larger queue improve the situation a bit.


(c) This can't explain the slowdown for cases without any Gather nodes 
(and it's ~17%, so unlikely due to binary layout).



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: OpenSSL 3.0.0 compatibility

2021-09-23 Thread Daniel Gustafsson
> On 23 Sep 2021, at 23:26, Peter Eisentraut 
>  wrote:
> 
> On 23.09.21 20:51, Daniel Gustafsson wrote:
>> For the 13- backbranches we also need to backport 22e1943f1 ("pgcrypto: Check
>> for error return of px_cipher_decrypt()" by Peter E) in order to avoid
>> incorrect results for decrypt tests on disallowed ciphers.  Does anyone have
>> any concerns about applying this to backbranches?
> 
> This should be backpatched as a bug fix.

Thanks for confirming, I will go ahead and do that.

--
Daniel Gustafsson   https://vmware.com/





Re: pgbench bug candidate: negative "initial connection time"

2021-09-23 Thread Yugo NAGATA
Hello Fujii-san,

On Tue, 7 Sep 2021 02:34:17 +0900
Fujii Masao  wrote:

> On 2021/07/29 13:23, Yugo NAGATA wrote:
> > Hello,
> > 
> > On Fri, 18 Jun 2021 15:58:48 +0200 (CEST)
> > Fabien COELHO  wrote:
> > 
> >> Attached Yugo-san patch with some updates discussed in the previous mails,
> >> so as to move things along.
> > 
> > I attached the patch rebased to a change due to 856de3b39cf.
> 
> + pg_log_fatal("connection for initialization failed");
> + pg_log_fatal("setup connection failed");
> + pg_log_fatal("cannot create connection for 
> client %d",
> 
> These fatal messages output when doConnect() fails should be a bit more 
> consistent each other? For example,
> 
>  could not create connection for initialization
>  could not create connection for setup
>  could not create connection for client %d

Ok. I fixed as your suggestion.

> > Exit status 1 indicates static problems such as invalid command-line 
> > options.
> > Errors during the run such as database errors or problems in the script will
> > result in exit status 2.
> 
> While reading the code and docs related to the patch, I found
> these descriptions in pgbench docs. The first description needs to be
> updated? Because even database error (e.g., failure of connection for setup)
> can result in exit status 1 if it happens before the benchmark actually runs.

That makes sense. Failures of setup connection or initial connection doesn't
seem 'static problems'. I rewrote this description to explain exit status 1
indicates also interal errors and early errors.

  Exit status 1 indicates static problems such as invalid command-line options
  or internal errors which are supposed to never occur.  Early errors that occur
  when starting benchmark such as initial connection failures also exit with
  status 1.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0f432767c2..c71dab644c 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -904,10 +904,12 @@ pgbench  options  d
 
   
A successful run will exit with status 0.  Exit status 1 indicates static
-   problems such as invalid command-line options.  Errors during the run such
-   as database errors or problems in the script will result in exit status 2.
-   In the latter case, pgbench will print partial
-   results.
+   problems such as invalid command-line options or internal errors which
+   are supposed to never occur.  Early errors that occur when starting
+   benchmark such as initial connection failures also exit with status 1.
+   Errors during the run such as database errors or problems in the script
+   will result in exit status 2. In the latter case,
+   pgbench will print partial results.
   
  
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 433abd954b..8d44399c16 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3181,6 +3181,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 	if ((st->con = doConnect()) == NULL)
 	{
+		/* as the bench is already running, we do not abort the process */
 		pg_log_error("client %d aborted while establishing connection", st->id);
 		st->state = CSTATE_ABORTED;
 		break;
@@ -4456,7 +4457,10 @@ runInitSteps(const char *initialize_steps)
 	initPQExpBuffer(&stats);
 
 	if ((con = doConnect()) == NULL)
+	{
+		pg_log_fatal("could not create connection for initialization");
 		exit(1);
+	}
 
 	setup_cancel_handler(NULL);
 	SetCancelConn(con);
@@ -6399,7 +6403,10 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con == NULL)
+	{
+		pg_log_fatal("could not create connection for setup");
 		exit(1);
+	}
 
 	/* report pgbench and server versions */
 	printVersion(con);
@@ -6553,7 +6560,7 @@ main(int argc, char **argv)
 #endif			/* ENABLE_THREAD_SAFETY */
 
 		for (int j = 0; j < thread->nstate; j++)
-			if (thread->state[j].state == CSTATE_ABORTED)
+			if (thread->state[j].state != CSTATE_FINISHED)
 exit_code = 2;
 
 		/* aggregate thread level stats */
@@ -6625,7 +6632,7 @@ threadRun(void *arg)
 		if (thread->logfile == NULL)
 		{
 			pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-			goto done;
+			exit(1);
 		}
 	}
 
@@ -6650,16 +6657,10 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 			{
-/*
- * On connection failure, we meet the barrier here in place of
- * GO before proceeding to the "done" path which will cleanup,
- * so as to avoid locking the process.
- *
- * It is unclear whether it is worth doing anything rather
- * than coldly exiting with an error message.
- */
-THREAD_BARRIER_WAIT(&barrier);
-goto done;
+/* coldly abort on initial connection failure */
+pg_log_fatal("could not create connection for client %d",
+			 state[i].id);
+exit(1);
 			}
 		}

Re: PostgreSQL 14 press release draft

2021-09-23 Thread Jonathan S. Katz
On 9/22/21 12:57 PM, Greg Sabino Mullane wrote:
> Some super quick nitpicks; feel free to ignore/apply/laugh off.

Thanks. I incorporated many of the suggestions.

Here is the press release at is stands. As we are past the deadline for
feedback, we are going to start the translation effort for the press kit.

Of course, critical changes will still be applied but this is the press
release as it stands.

Thank you for your feedback!

Jonathan
The PostgreSQL Global Development Group today announced the release of
[PostgreSQL 14](https://www.postgresql.org/docs/14/release-14.html), the latest
version of the world’s [most advanced open source 
database](https://www.postgresql.org/).

PostgreSQL 14 brings a variety of features that help developers and
administrators deploy their data-backed applications. PostgreSQL continues to
add innovations on complex data types, including more convenient access for
JSON and support for noncontiguous ranges of data. This latest release adds
to PostgreSQL's trend on improving high performance and distributed
data workloads, with advances in connection concurrency, high-write
workloads, query parallelism and logical replication.

"This latest release of PostgreSQL advances our users' ability to manage data
workloads at scale, enhances observability, and contains new features that help
application developers," said Magnus Hagander, a PostgreSQL Core Team member.
PostgreSQL 14 is a testament to the dedication of the global PostgreSQL
community in addressing feedback and continuing to deliver innovative database
software that is deployed by organizations large and small."

[PostgreSQL](https://www.postgresql.org), an innovative data management system
known for its reliability and robustness, benefits from over 25 years of open
source development from a [global developer 
community](https://www.postgresql.org/community/)
and has become the preferred open source relational database for organizations
of all sizes.

### JSON Conveniences and Multiranges

PostgreSQL has supported manipulating 
[JSON](https://www.postgresql.org/docs/14/datatype-json.html)
data since the release of PostgreSQL 9.2, though retrieval of values used a
unique syntax. PostgreSQL 14 now lets you [access JSON data using 
subscripts](https://www.postgresql.org/docs/14/datatype-json.html#JSONB-SUBSCRIPTING),
 e.g. a query like `SELECT ('{ "postgres": { "release": 14 
}}'::jsonb)['postgres']['release'];`
now works. This aligns PostgreSQL with syntax that is commonly recognized for
retrieving information from JSON data. The subscripting framework added to
PostgreSQL 14 can be generally extended to other nested data structures, and is
also applied to the [`hstore`](https://www.postgresql.org/docs/14/hstore.html)
data type in this release.

[Range types](https://www.postgresql.org/docs/14/rangetypes.html), also first
released in PostgreSQL 9.2, now have support for noncontiguous ranges through
the introduction of the 
"[multirange](https://www.postgresql.org/docs/14/rangetypes.html#RANGETYPES-BUILTIN)"
data type. A multirange is an ordered list of ranges that are nonoverlapping,
which lets developers write simpler queries for dealing with complex sequences
of ranges. The range types native to PostgreSQL (dates, times, numbers) support
multiranges, and other data types can be extended to use multirange support.

### Performance Improvements for Heavy Workloads

PostgreSQL 14 provides a significant throughput boost on workloads that use many
connections, with some benchmarks showing a 2x speedup. This release continues
on the recent improvements to the management of B-tree indexes by reducing index
bloat on tables with [frequently updated 
indexes](https://www.postgresql.org/docs/14/btree-implementation.html#BTREE-DELETION).

PostgreSQL 14 introduces the ability to [pipeline 
queries](https://www.postgresql.org/docs/14/libpq-pipeline-mode.html)
to a database, which can significantly improve performance over high latency
connections or for workloads with many small write (`INSERT`/`UPDATE`/`DELETE`)
operations. As this is a client-side feature, you can use pipeline mode with any
modern PostgreSQL database with the version 14 client
or [a client driver built with version 14 of 
libpq](https://wiki.postgresql.org/wiki/List_of_drivers).

### Enhancements for Distributed Workloads

Distributed PostgreSQL databases stand to benefit from PostgreSQL 14. When using
[logical 
replication](https://www.postgresql.org/docs/current/logical-replication.html),
PostgreSQL can now stream in-progress transactions to subscribers, with
significant performance benefits for applying large transactions on subscribers.
PostgreSQL 14 also adds several other performance enhancements to the logical
decoding system that powers logical replication.

[Foreign data 
wrappers](https://www.postgresql.org/docs/14/sql-createforeigndatawrapper.html),
which are used for working with federated workloads across PostgreSQL and other
databases, can now leverage

Re: Gather performance analysis

2021-09-23 Thread Robert Haas
On Thu, Sep 23, 2021 at 5:36 PM Tomas Vondra
 wrote:
> (c) This can't explain the slowdown for cases without any Gather nodes
> (and it's ~17%, so unlikely due to binary layout).

Yeah, but none of the modified code would even execute in those cases,
so it's either binary layout, something wrong in your test
environment, or gremlins.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: CLUSTER on partitioned index

2021-09-23 Thread Justin Pryzby
On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote:
> On Sun, 12 Sept 2021 at 22:10, Justin Pryzby  wrote:
> >
> > On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:
> > > I have to wonder if there really *is* a use case for CLUSTER in the
> > > first place on regular tables, let alone on partitioned tables, which
> > > are likely to be large and thus take a lot of time.
> >
> > The cluster now is done one partition at a time, so it might take a long 
> > time,
> > but doesn't lock the whole partition heirarchy.  Same as VACUUM (since v10) 
> > and
> > (since v14) REINDEX.
> 
> Note: The following review is based on the assumption that this v11
> revision was meant to contain only one patch. I put this up as a note,
> because it seemed quite limited when compared to earlier versions of
> the patchset.

Alvaro's critique was that the patchset was too complicated for what was
claimed to be a marginal feature.  My response was to rearrange the patchset to
its minimal form, supporting CLUSTER without marking the index as clustered.

> I noticed that you store the result of find_all_inheritors(...,
> NoLock) in get_tables_to_cluster_partitioned, without taking care of
> potential concurrent partition hierarchy changes that the comment on
> find_all_inheritors warns against or documenting why it is safe, which
> sounds dangerous in case someone wants to adapt the code. One problem
> I can think of is that only storing reloid and indoid is not
> necessarily safe, as they might be reused by drop+create table running
> parallel to the CLUSTER command.

The parallel code in vacuum is expand_vacuum_rel(), which is where the
corresponding things happens for vacuum full.  This patch is to make cluster()
do all the same stuff before calling cluster_rel().

What VACUUM tries to do is to avoid erroring if a partition is dropped while
cluster is running.  cluster_rel() does the same thing by calling
cluster_multiple_rels() ,which uses CLUOPT_RECHECK.

If the OIDs wrapped around, I think existing vacuum could accidentally process
a new table with the same OID as a dropped partition.  I think cluster would
*normally* catch that case and error in check_index_is_clusterable():
| Check that index is in fact an index on the given relation

Arguably VACUUM FULL could call cluster() (not cluster_rel()) and pass the
partitioned table rather than first expanding it.  But non-full vacuum needs to 
expand partitioned tables anyway.

> Apart from that, I think it would be useful (though not strictly
> necessary for this patch) if you could adapt the current CLUSTER
> progress reporting to report the progress for the whole partition
> hierarchy, instead of a new progress report for each relation, as was
> the case in earlier versions of the patchset.

Yea, but this is already true for VACUUM FULL (which uses CLUSTER and supports
partitioned tables since v10) and REINDEX.
See also https://postgr.es/m/20210216064214.gi28...@telsasoft.com

My goal is to present a minimal patch and avoid any nonessential complexity.

> The v11 patch seems quite incomplete when compared to previous
> discussions, or at the very least is very limited (no ALTER TABLE
> clustering commands for partitioned tables, but `CLUSTER ptable USING
> pindex` is supported). If v11 is the new proposed direction for ptable
> clustering, could you also document these limitations in the
> cluster.sgml and alter_table.sgml docs?

You said it's less complete, but it's is due to deliberate reduction in scope.
cluster.sgml says:
+Clustering a partitioned table clusters each of its partitions using the
+partition of the specified partitioned index (which must be specified).

The ALTER restriction hasn't changed, so I didn't touch the documentation.

I am still curious myself to know if this is the direction the patch should
move.

> > [ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ]
> 
> > diff --git a/src/test/regress/expected/cluster.out 
> > b/src/test/regress/expected/cluster.out
> > ...
> > +ALTER TABLE clstrpart SET WITHOUT CLUSTER;
> > +ERROR:  cannot mark index clustered in partitioned table
> 
> This error message does not seem to match my expectation as a user: I
> am not trying to mark an index as clustered, and for a normal table
> "SET WITHOUT CLUSTER" does not fail for unclustered tables. I think
> that behaviour of normal unclustered tables should be shared here as
> well. At the very least, the error message should be changed.

This is the pre-existing behavior.

> > -DROP TABLE clstrpart;
> 
> I believe that this cleanup should not be fully removed, but moved to
> before '-- Test CLUSTER with external tuplesorting', as the table is
> not used after that line.

You're right - this was from when the patchset handled CLUSTER ON.
Leaving the index allows testing in pg_dump - a large part of the complexity of
the elided patches is to handle restoring a partitioned index, without
violating the rule that partitions of an c

Re: Spelling change in LLVM 14 API

2021-09-23 Thread Thomas Munro
On Mon, Aug 30, 2021 at 5:41 AM Andres Freund  wrote:
> -   if (F.getAttributes().hasFnAttribute(llvm::Attribute::NoInline))
> +   if (F.getAttributes().hasAttribute(llvm::Attribute::FunctionIndex, 
> llvm::Attribute::NoInline))

Yeah, that's already stopped working since you wrote it a few weeks
ago... There's also been a second breakage site in our code due to
LLVM commit 85b732b5.  New fix attached.  Tested on LLVM 7, 9, 13, 14
(= LLVM main branch commit 945df8bc4cf3 built 2021-09-15).

Even though the macro approach is ugly, we're already handling every
other API change that way, and at least it's at least very clear which
lines to delete in a few years when older LLVMs fall off the conveyor
belt of time.  Admittedly renaming an identifiers is a new kludge, but
I didn't want to duplicate the whole if block or confuse pgindent.
From 8678db35ef27b7f197bebcd5da1f91f55f583f8c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 24 Sep 2021 12:45:22 +1200
Subject: [PATCH] Track LLVM 14 API changes.

Discussion: https://postgr.es/m/CA%2BhUKGL%3Dyg6qqgg6W6SAuvRQejditeoDNy-X3b9H_6Fnw8j5Wg%40mail.gmail.com
---
 src/backend/jit/llvm/llvmjit_inline.cpp | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp
index 6f03595db5..9bb4b672a7 100644
--- a/src/backend/jit/llvm/llvmjit_inline.cpp
+++ b/src/backend/jit/llvm/llvmjit_inline.cpp
@@ -594,7 +594,11 @@ function_inlinable(llvm::Function &F,
 	if (F.materialize())
 		elog(FATAL, "failed to materialize metadata");
 
-	if (F.getAttributes().hasFnAttribute(llvm::Attribute::NoInline))
+#if LLVM_VERSION_MAJOR < 14
+#define hasFnAttr hasFnAttribute
+#endif
+
+	if (F.getAttributes().hasFnAttr(llvm::Attribute::NoInline))
 	{
 		ilog(DEBUG1, "ineligibile to import %s due to noinline",
 			 F.getName().data());
@@ -871,7 +875,9 @@ create_redirection_function(std::unique_ptr &importMod,
 	llvm::Function *AF;
 	llvm::BasicBlock *BB;
 	llvm::CallInst *fwdcall;
+#if LLVM_VERSION_MAJOR < 14
 	llvm::Attribute inlineAttribute;
+#endif
 
 	AF = llvm::Function::Create(F->getFunctionType(),
 LinkageTypes::AvailableExternallyLinkage,
@@ -880,9 +886,13 @@ create_redirection_function(std::unique_ptr &importMod,
 
 	Builder.SetInsertPoint(BB);
 	fwdcall = Builder.CreateCall(F, &*AF->arg_begin());
+#if LLVM_VERSION_MAJOR < 14
 	inlineAttribute = llvm::Attribute::get(Context,
 		   llvm::Attribute::AlwaysInline);
 	fwdcall->addAttribute(~0U, inlineAttribute);
+#else
+	fwdcall->addFnAttr(llvm::Attribute::AlwaysInline);
+#endif
 	Builder.CreateRet(fwdcall);
 
 	return AF;
-- 
2.30.2



Re: Skipping logical replication transactions on subscriber side

2021-09-23 Thread Masahiko Sawada
Hi,

On Fri, Sep 3, 2021 at 4:33 AM Mark Dilger  wrote:
>
>
>
> > On Aug 30, 2021, at 12:06 AM, Masahiko Sawada  wrote:
> >
> > I've attached rebased patches.
>
> Thanks for these patches, Sawada-san!

Sorry for the very late response.

Thank you for the suggestions and the patch!

>
> The first patch in your series, v12-0001, seems useful to me even before 
> committing any of the rest.  I would like to integrate the new 
> pg_stat_subscription_errors view it creates into regression tests for other 
> logical replication features under development.
>
> In particular, it can be hard to write TAP tests that need to wait for 
> subscriptions to catch up or fail.  With your view committed, a new 
> PostgresNode function to wait for catchup or for failure can be added, and 
> then developers of different projects can all use that.

I like the idea of creating a common function that waits for the
subscription to be ready (i.e., all relations are either in 'r' or 's'
state). There are many places where we wait for all subscription
relations to be ready in existing tap tests. We would be able to
replace those codes with the function. But I'm not sure that it's
useful to have a function that waits for the subscriptions to either
be ready or raise an error. In tap tests, I think that if we wait for
the subscription to raise an error, we should wait only for the error
but not for the subscription to be ready. Thoughts?

> I am attaching a version of such a function, plus some tests of your patch 
> (since it does not appear to have any).  Would you mind reviewing these and 
> giving comments or including them in your next patch version?
>

I've looked at the patch and here are some comments:

+
+-- no errors should be reported
+SELECT * FROM pg_stat_subscription_errors;
+

+
+-- Test that the subscription errors view exists, and has the right columns
+-- If we expected any rows to exist, we would need to filter out unstable
+-- columns.  But since there should be no errors, we just select them all.
+select * from pg_stat_subscription_errors;

The patch adds checks of pg_stat_subscription_errors in order to test
if the subscription doesn't have any error. But since the subscription
errors are updated in an asynchronous manner, we cannot say the
subscription is working fine by checking the view only once.

---
The newly added tap tests by 025_errors.pl have two subscribers raise
a table sync error, which seems very similar to the tests that
024_skip_xact.pl adds. So I'm not sure we need this tap test as a
separate tap test file.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: pgbench bug candidate: negative "initial connection time"

2021-09-23 Thread Fujii Masao




On 2021/09/24 7:26, Yugo NAGATA wrote:

That makes sense. Failures of setup connection or initial connection doesn't
seem 'static problems'. I rewrote this description to explain exit status 1
indicates also interal errors and early errors.

   Exit status 1 indicates static problems such as invalid command-line options
   or internal errors which are supposed to never occur.  Early errors that 
occur
   when starting benchmark such as initial connection failures also exit with
   status 1.


LGTM. Barring any objection, I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: prevent immature WAL streaming

2021-09-23 Thread Alvaro Herrera
On 2021-Sep-23, Alvaro Herrera wrote:

> However, I notice now that the pg_rewind tests reproducibly fail in
> branch 14 for reasons I haven't yet understood.  It's strange that no
> other branch fails, even when run quite a few times.

Turns out that this is a real bug (setting EndOfLog seems insufficient).
I'm looking into it.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"No necesitamos banderas
 No reconocemos fronteras"  (Jorge González)




Re: Column Filtering in Logical Replication

2021-09-23 Thread Amit Kapila
On Fri, Sep 24, 2021 at 12:45 AM Tomas Vondra
 wrote:
>
> Hi,
>
> I wanted to do a review of this patch, but I'm a bit confused about
> which patch(es) to review. There's the v5 patch, and then these two
> patches - which seem to be somewhat duplicate, though.
>
> Can anyone explain what's the "current" patch version, or perhaps tell
> me which of the patches to combine?
>

I think v5 won't work atop a common grammar patch. There need some
adjustments in v5. I think it would be good if we can first get the
common grammar patch reviewed/committed and then build this on top of
it. The common grammar and the corresponding implementation are being
accomplished in the Schema support patch, the latest version of which
is at [1]. Now, Vignesh seems to have extracted just the grammar
portion of that work in his patch
Generic_object_type_parser_002_table_schema_publication [2] (there are
some changes after that but not anything fundamentally different till
now) then he seems to have prepared a patch
(Generic_object_type_parser_001_table_publication [2]) on similar
lines only for tables.

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB571844A87B6A83B7C10F9D6B94A39%40OS3PR01MB5718.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/CALDaNm1YoxJCs%3DuiyPM%3DtFDDc2qn0ja01nb2TCPqrjZH2jR0sQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Atomic rename feature for Windows.

2021-09-23 Thread Thomas Munro
On Wed, Sep 8, 2021 at 10:13 PM Juan José Santamaría Flecha
 wrote:
> On Thu, Jul 8, 2021 at 12:32 AM Victor Spirin  wrote:
>> >   #if defined(_MSC_VER) && _MSC_VER >= 1900
>> > -#define MIN_WINNT 0x0600
>> > +#define MIN_WINNT 0x0A00
>> >   #else
>> >   #define MIN_WINNT 0x0501
>> >   #endif
>> > This is a large bump for Studio >= 2015 I am afraid.  That does not
>> > seem acceptable, as it means losing support for GetLocaleInfoEx()
>> > across older versions.
>> >
>> It seems that the MIN_WINNT value 0x0600 or 0x0A00 does not affect the
>> use of the GetLocaleInfoEx () function
>>
>  Anything below Windows Server 2012 (_WIN32_WINNT = 0x0602) is no longer 
> supported. A patch with a bump on MIN_WINNT might be due.

+1




回复:回复:Queries that should be canceled will get stuck on secure_write function

2021-09-23 Thread 蔡梦娟(玊于)
Yes, it is more appropriate to set a duration time to determine whether 
secure_write() is stuck, but it is difficult to define how long the duration 
time is.
in my first patch, I add a GUC to allow the user to set the time, or it can be 
hardcoded if a time deemed reasonable is provided?



--I agree that 
something like the patch (i.e., introduction of promotion
from cancel request to terminate one) is necessary for the fix. One concern
on the patch is that the cancel request can be promoted to the terminate one
even when secure_write() doesn't actually get stuck. Is this acceptable?
Maybe I'm tempted to set up the duration until the promotion happens
Or we should introduce the dedicated timer for communication on the socket?



Re: proposal: possibility to read dumped table's name from file

2021-09-23 Thread Pavel Stehule
Hi


> The above comments are fixed in the attached, as well as a pass over the
> docs
> and extended tests to actually test matching a foreign server.  What do
> think
> about this version?  I'm still not convinced that there aren't more quoting
> bugs in the parser, but I've left that intact for now.
>

This patch is based on the version that you sent 21.9. Just I modified
string comparison in keyword detection. If we don't allow support
abbreviations of keywords (and I dislike it), then the check of size is
necessary. Any other is without change.

Regards

Pavel




>
> --
> Daniel Gustafsson   https://vmware.com/
>
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7682226b99..6e9a56f68c 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -789,6 +789,67 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+  Filtering rules for which objects to dump are read from the specified
+  file.  Specify - to read from
+STDIN.  The file has the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the object is to be included or
+excluded, and the second keyword specifies the type of object to be
+filtered:
+
+ 
+   table: table
+ 
+ 
+   schema: schema
+ 
+ 
+   foreign_data: foreign server
+ 
+ 
+   data: table data
+ 
+
+   
+
+   
+Example:
+
+include table mytable*
+exclude table mytable2
+
+With the this filter file, the dump would include all tables with 
+name starting by mytable, but exclude table
+mytable2.
+   
+
+   
+Lines starting with # are considered comments and
+are ignored. Comments can be placed after filter as well. Blank lines
+are also ignored.
+   
+
+   
+The --filter option works just like the other
+options to include or exclude tables (-t or
+--table), schemas (-n or
+--schema), table data (--exclude-table-data),
+or foreign tables (--include-foreign-data).
+It isn't possible to exclude a specific foreign table or
+to include a specific table's data.
+   
+  
+ 
+
  
   --if-exists
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a485fb2d07..ec979b9f6f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,10 +55,12 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
+#include "common/string.h"
 #include "dumputils.h"
 #include "fe_utils/option_utils.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/libpq-fs.h"
 #include "parallel.h"
 #include "pg_backup_db.h"
@@ -90,6 +92,28 @@ typedef enum OidOptions
 	zeroAsNone = 4
 } OidOptions;
 
+/*
+ * State data for reading filter items from stream.
+ */
+typedef struct
+{
+	FILE	   *fp;
+	char	   *filename;
+	int			lineno;
+} FilterStateData;
+
+/*
+ * List of objects that can be specified in filter file
+ */
+typedef enum
+{
+	FILTER_OBJECT_TYPE_NONE,
+	FILTER_OBJECT_TYPE_TABLE,
+	FILTER_OBJECT_TYPE_SCHEMA,
+	FILTER_OBJECT_TYPE_FOREIGN_DATA,
+	FILTER_OBJECT_TYPE_DATA
+} FilterObjectType;
+
 /* global decls */
 static bool dosync = true;		/* Issue fsync() to make dump durable on disk. */
 
@@ -308,6 +332,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
+static void read_filters_from_file(char *filename, DumpOptions *dopt);
 
 
 int
@@ -380,6 +405,7 @@ main(int argc, char **argv)
 		{"enable-row-security", no_argument, &dopt.enable_row_security, 1},
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"extra-float-digits", required_argument, NULL, 8},
+		{"filter", required_argument, NULL, 12},
 		{"if-exists", no_argument, &dopt.if_exists, 1},
 		{"inserts", no_argument, NULL, 9},
 		{"lock-wait-timeout", required_argument, NULL, 2},
@@ -613,6 +639,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* filter implementation */
+read_filters_from_file(optarg, &dopt);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -1038,6 +1068,8 @@ help(const char *progname)
 			 "   access to)\n"));
 	printf(_("  --exclude-table-data=PATTERN do NOT dump data for the specified table(s)\n"));
 	printf(_("  --extra-float-digits=NUM override default setting for extra_float_digits\n"));
+	printf(_("  --filter=FILENAMEdump objects and data based on the filter expressi

Re: extensible options syntax for replication parser?

2021-09-23 Thread Fujii Masao




On 2021/09/24 0:05, Robert Haas wrote:

On Thu, Sep 23, 2021 at 2:55 AM tushar  wrote:

l checked and look like the issue is still not fixed against v7-* patches -

postgres=#  create subscription test CONNECTION 'host=127.0.0.1 user=edb 
dbname=postgres' PUBLICATION p with (create_slot = true);
ERROR:  could not create replication slot "test": ERROR:  syntax error


Thanks. Looks like that version had some stupid mistakes. Here's a new one.


- BASE_BACKUP
+BASE_BACKUP [ ( option [, ...] ) ]

You seem to accidentally remove the index term for BASE_BACKUP.

+ident_or_keyword:
+   IDENT   
{ $$ = $1; }

ident_or_keyword seems to be used only for generic options,
but it includes the keywords for legacy options like "FAST".
Isn't it better to remove the keywords for legacy options from
ident_or_keyword?

OTOH, the keywords for newly-introduced generic options like
CHECKPOINT should be defined in repl_scanner.l and repl_gram.y?


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Skipping logical replication transactions on subscriber side

2021-09-23 Thread Greg Nancarrow
On Tue, Sep 21, 2021 at 2:53 PM Masahiko Sawada  wrote:
>
> I've attached the updated version patches. Please review them.
>

Some comments on the v14-0001 patch:

(1)
Patch comment

The existing patch comment doesn't read well. I suggest the following updates:

BEFORE:
Add pg_stat_subscription_errors statistics view.

This commits adds new system view pg_stat_logical_replication_error,
showing errors happening during applying logical replication changes
as well as during performing initial table synchronization.

The subscription error entries are removed by autovacuum workers when
the table synchronization competed in table sync worker cases and when
dropping the subscription in apply worker cases.

It also adds SQL function pg_stat_reset_subscription_error() to
reset the single subscription error.

AFTER:
Add a subscription errors statistics view "pg_stat_subscription_errors".

This commit adds a new system view pg_stat_logical_replication_errors,
that shows information about any errors which occur during application
of logical replication changes as well as during performing initial table
synchronization.

The subscription error entries are removed by autovacuum workers after
table synchronization completes in table sync worker cases and after
dropping the subscription in apply worker cases.

It also adds an SQL function pg_stat_reset_subscription_error() to
reset a single subscription error.


src/backend/postmaster/pgstat.c
(2)
In pgstat_read_db_statsfile_timestamp(), you've added the following
code for case 'S':

+ case 'S':
+ {
+ PgStat_StatSubEntry subbuf;
+ PgStat_StatSubErrEntry errbuf;
+ int32 nerrors;
+
+ if (fread(&subbuf, 1, sizeof(PgStat_StatSubEntry), fpin)
+ != sizeof(PgStat_StatSubEntry))
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("corrupted statistics file \"%s\"",
+ statfile)));
+ FreeFile(fpin);
+ return false;
+ }
+
+ if (fread(&nerrors, 1, sizeof(nerrors), fpin) != sizeof(nerrors))
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("corrupted statistics file \"%s\"",
+ statfile)));
+ goto done;
+ }
+
+ for (int i = 0; i < nerrors; i++)
+ {
+ if (fread(&errbuf, 1, sizeof(PgStat_StatSubErrEntry), fpin) !=
+ sizeof(PgStat_StatSubErrEntry))
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("corrupted statistics file \"%s\"",
+ statfile)));
+ goto done;
+ }
+ }
+ }
+
+ break;
+

Why in the 2nd and 3rd instances of calling fread() and detecting a
corrupted statistics file, does it:
goto done;
instead of:
FreeFile(fpin);
return false;

?
(so ends up returning true for these instances)

It looks like a mistake, but if it's intentional then comments need to
be added to explain it.

(3)
In pgstat_get_subscription_error_entry(), there seems to be a bad comment.

Shouldn't:

+ /* Return the apply error worker */
+ return &(subent->apply_error);

be:

+ /* Return the apply worker error */
+ return &(subent->apply_error);


src/tools/pgindent/typedefs.list
(4)

"PgStat_MsgSubscriptionErrReset" is missing from the list.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: very long record lines in expanded psql output

2021-09-23 Thread Platon Pronko

On 2021-09-23 22:28, Andrew Dunstan wrote:


2. It would possibly be better to pass the relevant parts of the options
to print_aligned_vertical_line() rather than the whole options
structure. It feels odd to pass both that and opt_border.


What do you think about doing it the other way around - passing only whole
options structure? That way we will roll 4 parameters (opt_border, 
printTextFormat,
and two xheader ones) into only one argument.
This increases code coupling a bit, but I'm not sure if that's relevant here.

Best regards,
Platon Pronko




Re: Added schema level support for publication.

2021-09-23 Thread vignesh C
On Tue, Sep 21, 2021 at 6:05 PM Greg Nancarrow  wrote:
>
> On Tue, Sep 21, 2021 at 4:12 PM vignesh C  wrote:
> >
> > > (1)
> > > In get_object_address_publication_schema(), the error message:
> > >
> > > + errmsg("publication tables of schema \"%s\" in publication \"%s\"
> > > does not exist",
> > >
> > > isn't grammatically correct. It should probably be:
> > >
> > > + errmsg("publication tables of schema \"%s\" in publication \"%s\" do
> > > not exist",
> >
> > "does not exist" is used across the file. Should we keep it like that
> > to maintain consistency. Thoughts?
> >
>
> When it's singular, "does not exist" is correct.
> I think currently only this case exists in the publication code.
> e.g.
> "publication \"%s\" does not exist"
> "publication relation \"%s\" in publication \"%s\" does not exist"
>
> But "publication tables" is plural, so it needs to say "do not exist"
> rather than "does not exist".
>
> > >
> > > In the case of "if (!(*nspname))", the following line should probably
> > > be added before returning false:
> > >
> > >*pubname = NULL;
> >
> > In case of failure we return false and don't access it. I felt we
> > could keep it as it is. Thoughts?
> >
>
> OK then, I might be being a bit pedantic.
> (I was just thinking, strictly speaking, we probably shouldn't be
> writing into the caller's pass-by-reference parameters in the case
> false is returned)
>
> > > (5)
> > > I'm wondering, in CheckObjSchemaNotAlreadyInPublication(), instead of
> > > checking "checkobjtype" each loop iteration, wouldn't it be better to
> > > just use the same for-loop in each IF block?
> >
> > I will be changing it to:
> > static void
> > CheckObjSchemaNotAlreadyInPublication(List *rels, List *schemaidlist,
> > PublicationObjSpecType checkobjtype)
> > {
> >   ListCell   *lc;
> >
> >   foreach(lc, rels)
> >   {
> > Relation  rel = (Relation) lfirst(lc);
> > Oid relSchemaId = RelationGetNamespace(rel);
> >
> > if (list_member_oid(schemaidlist, relSchemaId))
> > {
> >   if (checkobjtype == PUBLICATIONOBJ_REL_IN_SCHEMA)
> > ereport(ERROR,
> > errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > errmsg("cannot add schema \"%s\" to publication",
> >  get_namespace_name(relSchemaId)),
> > errdetail("Table \"%s\" in schema \"%s\" is already part
> > of the publication, adding the same schema is not supported.",
> >   RelationGetRelationName(rel),
> >   get_namespace_name(relSchemaId)));
> >   else if (checkobjtype == PUBLICATIONOBJ_TABLE)
> > ereport(ERROR,
> > errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > errmsg("cannot add relation \"%s.%s\" to publication",
> >  get_namespace_name(relSchemaId),
> >  RelationGetRelationName(rel)),
> > errdetail("Table's schema \"%s\" is already part of the
> > publication.",
> >   get_namespace_name(relSchemaId)));
> > }
> >   }
> > }
> > After the change checkobjtype will be checked only once in case of error.
> >
>
> OK.
>
> One thing related to this code is the following:
>
> i)
> postgres=# create publication pub1 for all tables in schema sch1,
> table sch1.test;
> ERROR:  cannot add relation "sch1.test" to publication
> DETAIL:  Table's schema "sch1" is already part of the publication.
>
> ii)
> postgres=# create publication pub1 for table sch1.test, all tables in
> schema sch1;
> ERROR:  cannot add relation "sch1.test" to publication
> DETAIL:  Table's schema "sch1" is already part of the publication.
>
> Notice that in case (ii), the same error message is used, but the
> order of items to be "added" to the publication is the reverse of case
> (i), and really implies the table "sch1.test" was added first, but
> this is not reflected by the error message. So it seems slightly odd
> to say the schema is already part of the publication, when the table
> was actually listed first.
> I'm wondering if this can be improved?
>
> One idea I had was the following more generic type of message, but I'm
> not 100% happy with the wording:
>
> DETAIL:  Schema "sch1" and one of its tables can't separately be
> part of the publication.

This is common code applicable for the following scenarios:
i) create publication pub1 for all tables in schema sch1, table sch1.test;
ii) create publication pub1 for table sch1.test, all tables in schema sch1;
iii) create publication pub1 for table sch1.test;
alter publication pub1 add all tables in schema sch1;

I have changed it to make it suitable for all the cases.

ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("cannot add relation \"%s.%s\" to publication",
get_namespace_name(relSchemaId),
RelationGetRelationName(rel)),
errdetail("Table's schema \"%s\" is already part of the publication or
part of the specified schema list.",
get_namespace_name(relSchemaId)));

The v32 patch attached at [1] handles the abov

Re: Column Filtering in Logical Replication

2021-09-23 Thread vignesh C
On Fri, Sep 24, 2021 at 8:40 AM Amit Kapila  wrote:
>
> On Fri, Sep 24, 2021 at 12:45 AM Tomas Vondra
>  wrote:
> >
> > Hi,
> >
> > I wanted to do a review of this patch, but I'm a bit confused about
> > which patch(es) to review. There's the v5 patch, and then these two
> > patches - which seem to be somewhat duplicate, though.
> >
> > Can anyone explain what's the "current" patch version, or perhaps tell
> > me which of the patches to combine?
> >
>
> I think v5 won't work atop a common grammar patch. There need some
> adjustments in v5. I think it would be good if we can first get the
> common grammar patch reviewed/committed and then build this on top of
> it. The common grammar and the corresponding implementation are being
> accomplished in the Schema support patch, the latest version of which
> is at [1].

I have posted an updated patch with the fixes at [1], please review
the updated patch.
[1] - 
https://www.postgresql.org/message-id/CALDaNm1R-xbQvz4LU5OXu3KKwbWOz3uDcT_YjRU6V0R5FZDYDg%40mail.gmail.com

Regards,
Vignesh




Re: Allow escape in application_name

2021-09-23 Thread Fujii Masao




On 2021/09/21 15:08, kuroda.hay...@fujitsu.com wrote:

Dear Fujii-san, Horiguchi-san,

Based on your advice, I made a patch
that communize two parsing functions into one.
new internal function parse_format_string() was added.
(This name may be too generic...)
log_line_prefix() and parse_pgfdw_appname() become just the wrapper function.
My prerpimary checking was passed for server and postgres_fdw,
but could you review that?


Yes. Thanks for updating the patch!

---
elog.c:2785:14: warning: expression which evaluates to zero treated as a null 
pointer constant of type 'char *' [-Wnon-literal-null-conversion]
*endptr = '\0';
  ^~~~
1 warning generated.
---

I got the above compiler warning.


+ * Note: StringInfo datatype cannot be accepted
+ * because elog.h should not include postgres-original header file.

How about moving the function to guc.c from elog.c because it's for
the parameters, i.e., log_line_prefix and postgres_fdw.application_name?
This allows us to use StringInfo in the function?


+   parse_pgfdw_appname(buf, values[i]);
+   /*
+* Note that appname may becomes an empty string
+* if an input string has wrong format.
+*/
+   values[i] = *buf;

If postgres_fdw.application_name contains only invalid escape characters like
"%b", parse_pgfdw_appname() returns an empty string. We discussed
there are four options to handle this case and we concluded (4) is better.
Right? But ISTM that the patch uses (2).


(1) Use the original postgres_fdw.application_name like "%b"
(2) Use the application_name of the server object (if set)
(3) Use fallback_application_name
(4) Use empty string as application_name


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Gather performance analysis

2021-09-23 Thread Dilip Kumar
On Fri, Sep 24, 2021 at 2:01 AM Robert Haas  wrote:
>
> On Thu, Sep 23, 2021 at 4:00 PM Tomas Vondra
>  wrote:
> > I did find some suspicious behavior on the bigger box I have available
> > (with 2x xeon e5-2620v3), see the attached spreadsheet. But it seems
> > pretty weird because the worst affected case is with no parallel workers
> > (so the queue changes should affect it). Not sure how to explain it, but
> > the behavior seems consistent.
>
> That is pretty odd. I'm inclined to mostly discount the runs with
> 1 tuples because sending such a tiny number of tuples doesn't
> really take any significant amount of time, and it seems possible that
> variations in the runtime of other code due to code movement effects
> could end up mattering more than the changes to the performance of
> shm_mq. However, the results with a million tuples seem like they're
> probably delivering statistically significant results ... and I guess
> maybe what's happening is that the patch hurts when the tuples are too
> big relative to the queue size.

I am looking at the "query-results.ods" file shared by Tomas, with a
million tuple I do not really see where the patch hurts? because I am
seeing in most of the cases the time taken by the patch is 60-80%
compared to the head.  And the worst case with a million tuple is
100.32% are are we pointing to that 0.32% or there is something else
that I am missing here.

>
> I guess your columns are an md5 value each, which is 32 bytes +
> overhead, so a 20-columns tuple is ~1kB. Since Dilip's patch flushes
> the value to shared memory when more than a quarter of the queue has
> been filled, that probably means we flush every 4-5 tuples. I wonder
> if that means we need a smaller threshold, like 1/8 of the queue size?
> Or maybe the behavior should be adaptive somehow, depending on whether
> the receiver ends up waiting for data? Or ... perhaps only small
> tuples are worth batching, so that the threshold for posting to shared
> memory should be a constant rather than a fraction of the queue size?
> I guess we need to know why we see the time spike up in those cases,
> if we want to improve them.

I will test with the larger tuple sizes and will see the behavior with
different thresholds.  With 250 bytes tuple size, I have tested with
different thresholds and it appeared that 1/4 of the queue size works
best.  But I will do more detailed testing and share the results.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: row filtering for logical replication

2021-09-23 Thread Amit Kapila
On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
 wrote:
>
> 6) parse_oper.c
>
> I'm having some second thoughts about (not) allowing UDFs ...
>
> Yes, I get that if the function starts failing, e.g. because querying a
> dropped table or something, that breaks the replication and can't be
> fixed without a resync.
>

The other problem is that users can access/query any table inside the
function and that also won't work in a logical decoding environment as
we use historic snapshots using which we can access only catalog
tables.

> That's pretty annoying, but maybe disallowing anything user-defined
> (functions and operators) is maybe overly anxious? Also, extensibility
> is one of the hallmarks of Postgres, and disallowing all custom UDF and
> operators seems to contradict that ...
>
> Perhaps just explaining that the expression can / can't do in the docs,
> with clear warnings of the risks, would be acceptable.
>

I think the right way to support functions is by the explicit marking
of functions and in one of the emails above Jeff Davis also agreed
with the same. I think we should probably introduce a new marking for
this. I feel this is important because without this it won't be safe
to access even some of the built-in functions that can access/update
database (non-immutable functions) due to logical decoding environment
restrictions.

>
> 12) misuse of REPLICA IDENTITY
>
> The more I think about this, the more I think we're actually misusing
> REPLICA IDENTITY for something entirely different. The whole purpose of
> RI was to provide a row identifier for the subscriber.
>
> But now we're using it to ensure we have all the necessary columns,
> which is entirely orthogonal to the original purpose. I predict this
> will have rather negative consequences.
>
> People will either switch everything to REPLICA IDENTITY FULL, or create
> bogus unique indexes with extra columns. Which is really silly, because
> it wastes network bandwidth (transfers more data) or local resources
> (CPU and disk space to maintain extra indexes).
>
> IMHO this needs more infrastructure to request extra columns to decode
> (e.g. for the filter expression), and then remove them before sending
> the data to the subscriber.
>

Yeah, but that would have an additional load on write operations and I
am not sure at this stage but maybe there could be other ways to
extend the current infrastructure wherein we build the snapshots using
which we can access the user tables instead of only catalog tables.
Such enhancements if feasible would be useful not only for allowing
additional column access in row filters but for other purposes like
allowing access to functions that access user tables. I feel we can
extend this later as well seeing the usage and requests. For the first
version, this doesn't sound too limiting to me.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2021-09-23 Thread Dilip Kumar
On Fri, Sep 24, 2021 at 10:50 AM Amit Kapila  wrote:

> > 12) misuse of REPLICA IDENTITY
> >
> > The more I think about this, the more I think we're actually misusing
> > REPLICA IDENTITY for something entirely different. The whole purpose of
> > RI was to provide a row identifier for the subscriber.
> >
> > But now we're using it to ensure we have all the necessary columns,
> > which is entirely orthogonal to the original purpose. I predict this
> > will have rather negative consequences.
> >
> > People will either switch everything to REPLICA IDENTITY FULL, or create
> > bogus unique indexes with extra columns. Which is really silly, because
> > it wastes network bandwidth (transfers more data) or local resources
> > (CPU and disk space to maintain extra indexes).
> >
> > IMHO this needs more infrastructure to request extra columns to decode
> > (e.g. for the filter expression), and then remove them before sending
> > the data to the subscriber.
> >
>
> Yeah, but that would have an additional load on write operations and I
> am not sure at this stage but maybe there could be other ways to
> extend the current infrastructure wherein we build the snapshots using
> which we can access the user tables instead of only catalog tables.
> Such enhancements if feasible would be useful not only for allowing
> additional column access in row filters but for other purposes like
> allowing access to functions that access user tables. I feel we can
> extend this later as well seeing the usage and requests. For the first
> version, this doesn't sound too limiting to me.

I agree with one point from Tomas, that if we bind the row filter with
the RI, then if the user has to use the row filter on any column 1)
they have to add an unnecessary column to the index 2) Since they have
to add it to RI so now we will have to send it over the network as
well.  3). We anyway have to WAL log it if it is modified because now
we forced users to add some columns to RI because they wanted to use
the row filter on that.   Now suppose we remove that limitation and we
somehow make these changes orthogonal to RI, i.e. if we have a row
filter on some column then we WAL log it, so now the only extra cost
we are paying is to just WAL log that column, but the user is not
forced to add it to index, not forced to send it over the network.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: decoupling table and index vacuum

2021-09-23 Thread Masahiko Sawada
On Thu, Sep 16, 2021 at 7:09 AM Peter Geoghegan  wrote:
>
> On Wed, Apr 21, 2021 at 8:21 AM Robert Haas  wrote:
> > Now, the reason for this is that when we discover dead TIDs, we only
> > record them in memory, not on disk. So, as soon as VACUUM ends, we
> > lose all knowledge of whether those TIDs were and must rediscover
> > them. Suppose we didn't do this, and instead had a "dead TID" fork for
> > each table. Suppose further that this worked like a conveyor belt,
> > similar to WAL, where every dead TID we store into the fork is
> > assigned an identifying 64-bit number that is never reused.
>
> Enabling index-only scans is a good enough reason to pursue this
> project, even on its own.

+1

> I attached a POC autovacuum logging instrumentation patch that shows
> how VACUUM uses *and* sets VM bits.

Logging how vacuum uses and sets VM bits seems a good idea.

I've read the proposed PoC patch. Probably it's better to start a new
thread for this patch and write the comment for it there but let me
leave one comment on the patch:

With the patch, we increment allfrozen_pages counter, which is used to
determine whether or not we advance relfrozenxid and relminmxid, at
two places:

@@ -1141,7 +1201,9 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
*params, bool aggressive)
  * in this case an approximate answer is OK.
  */
 if (aggressive ||
VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-vacrel->frozenskipped_pages++;
+vacrel->allfrozen_pages++;
+else
+vacrel->allvisible_pages++;
 continue;

@@ -1338,6 +1400,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
*params, bool aggressive)
  */
 if (!PageIsAllVisible(page))
 {
+vacrel->allfrozen_pages++;
+

I think that we will end up doubly counting the page as scanned_pages
and allfrozen_pages due to the newly added latter change. This seems
wrong to me because we calculate as follows:

@@ -644,7 +656,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
  * NB: We need to check this before truncating the relation,
because that
  * will change ->rel_pages.
  */
-if ((vacrel->scanned_pages + vacrel->frozenskipped_pages)
+if ((vacrel->scanned_pages + vacrel->allfrozen_pages)
 < vacrel->rel_pages)
 {
 Assert(!aggressive);

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Added schema level support for publication.

2021-09-23 Thread vignesh C
On Wed, Sep 22, 2021 at 8:52 AM Masahiko Sawada  wrote:
>
> On Wed, Sep 22, 2021 at 3:02 AM vignesh C  wrote:
> >
> >
> > Attached v30 patch has the fixes for the same.
> >
>
> Thank you for updating the patches.
>
> Here are random comments on v30-0002 patch:
>
> +
> +   if (stmt->action == DEFELEM_SET &&
> !list_length(schemaidlist))
> +   {
> +   delschemas =
> GetPublicationSchemas(pubform->oid);
> +   LockSchemaList(delschemas);
> +   }
>
> I think "list_length(schemaidlist) > 0" would be more readable.

We have refactored the patch which  has removed these changes.

> ---
> This patch introduces some new static functions to publicationcmds.c
> but some have function prototypes but some don't. As far as I checked,
>
> ObjectsInPublicationToOids()
> CheckObjSchemaNotAlreadyInPublication()
> GetAlterPublicationDelRelations()
> AlterPublicationSchemas()
> CheckPublicationAlterTables()
> CheckPublicationAlterSchemas()
> LockSchemaList()
> OpenReliIdList()
> PublicationAddSchemas()
> PublicationDropSchemas()
>
> are newly introduced but only four functions:
>
> OpenReliIdList()
> LockSchemaList()
> PublicationAddSchemas()
> PublicationDropSchemas()
>
> have function prototypes. Actually, there already are functions that
> don't have their prototype in publicationcmds.c. But it seems better
> to clarify what kind of functions don't need to have a prototype at
> least in this file.

My thoughts are the same as that Amit had replied at [1].

> ---
> ISTM we can inline the contents of three functions:
> GetAlterPublicationDelRelations(), CheckPublicationAlterTables(), and
> CheckPublicationAlterSchemas(). These have only one caller and ISTM
> makes the readability worse. I think it's not necessary to make
> functions for them.

We have refactored the patch which  has removed these changes.

> ---
> + * This is dispatcher function for AlterPublicationOptions,
> + * AlterPublicationSchemas and AlterPublicationTables.
>
> As this comment mentioned, AlterPublication() calls
> AlterPublicationTables() and AlterPublicationSchemas() but this
> function also a lot of pre-processing such as building the list and
> some checks, depending on stmt->action before calling these two
> functions. And those two functions also perform some operation
> depending on stmt->action. So ISTM it's better to move those
> pre-processing to these two functions and have AlterPublication() just
> call these two functions. What do you think?

We have refactored the patch which has removed these changes.

> ---
> +List *
> +GetAllSchemasPublicationRelations(Oid puboid, PublicationPartOpt pub_partopt)
>
> Since this function gets all relations in the schema publication, I
> think GetAllSchemaPublicationRelations() would be better as a function
> name (removing 's' before 'P').

Modified

> ---
> +   if (!IsA(node, String))
> +   ereport(ERROR,
> +   errcode(ERRCODE_SYNTAX_ERROR),
> +   errmsg("invalid schema
> name at or near"),
> +
> parser_errposition(pstate, pubobj->location));
>
> The error message should mention where the invalid schema name is at
> or near. Also, In the following example, the error position in the
> error message seems not to be where the invalid schemaname s.s is:

> postgres(1:47707)=# create publication p for all tables in schema s.s;
> ERROR:  invalid schema name at or near
> LINE 1: create publication p for all tables in schema s.s;
>  ^

Modified

> ---
> +   if (pubobj->pubobjtype == PUBLICATIONOBJ_TABLE)
> +   {
> +   if (IsA(node, RangeVar))
> +   *rels = lappend(*rels, (RangeVar *) node);
> +   else if (IsA(node, String))
> +   {
> +   RangeVar   *rel = makeRangeVar(NULL,
> strVal(node),
> +
> pubobj->location);
> +
> +   *rels = lappend(*rels, rel);
> +   }
> +   }
> +   else if (pubobj->pubobjtype == PUBLICATIONOBJ_REL_IN_SCHEMA)
> +   {
> (snip)
> +   /* Filter out duplicates if user specifies
> "sch1, sch1" */
> +   *schemas = list_append_unique_oid(*schemas, schemaid);
> +   }
>
> Do we need to filter out duplicates also in PUBLICATIONOBJ_TABLE case
> since users can specify things like "TABLE tbl, tbl, tbl"?

Currently the handling of tables is taken care at OpenTableList:
.
/*
 * Filter out duplicates if user specifies "foo, foo".
 *
 * Note that this algorithm is known to not be very efficient (O(N^2))
 * but given that it only works on list of tables given to us by user
 * it's deemed acceptable.
 */

Re: Added schema level support for publication.

2021-09-23 Thread vignesh C
On Wed, Sep 22, 2021 at 11:27 AM tanghy.f...@fujitsu.com
 wrote:
>
> On Wednesday, September 22, 2021 11:22 AM Masahiko Sawada 
>  wrote:
> >
> > ---
> > +   if (!IsA(node, String))
> > +   ereport(ERROR,
> > +   
> > errcode(ERRCODE_SYNTAX_ERROR),
> > +   errmsg("invalid schema
> > name at or near"),
> > +
> > parser_errposition(pstate, pubobj->location));
> >
> > The error message should mention where the invalid schema name is at
> > or near. Also, In the following example, the error position in the
> > error message seems not to be where the invalid schemaname s.s is:
> >
> > postgres(1:47707)=# create publication p for all tables in schema s.s;
> > ERROR:  invalid schema name at or near
> > LINE 1: create publication p for all tables in schema s.s;
> >  ^
> >
>
> I noticed this, too. And I think it could be fixed by the following change, 
> thoughts?

I fixed it by updating the location at pubobj_expr

> Besides, about this change in tab-complete.c:
>
> +   else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD|DROP|SET", 
> "SCHEMA"))
> +   COMPLETE_WITH_QUERY(Query_for_list_of_schemas
> +   " UNION SELECT 
> 'CURRENT_SCHEMA'");
>
> It should be "ALL TABLES IN SCHEMA" not "SCHEMA" at the first line, right?

Modified.

The v32 patch attached at [1] handles the above.
[1] - 
https://www.postgresql.org/message-id/CALDaNm1R-xbQvz4LU5OXu3KKwbWOz3uDcT_YjRU6V0R5FZDYDg%40mail.gmail.com

Regards.
Vignesh




Re: Added schema level support for publication.

2021-09-23 Thread vignesh C
On Wed, Sep 22, 2021 at 11:31 AM Amit Kapila  wrote:
>
> On Tue, Sep 21, 2021 at 11:39 PM vignesh C  wrote:
> >
> > On Tue, Sep 21, 2021 at 9:03 AM Greg Nancarrow  wrote:
> > >
> > > On Fri, Sep 17, 2021 at 10:09 PM vignesh C  wrote:
> > > >
> > > > Attached v29 patch has the fixes for the same.
> > > >
> > >
> > > Some minor comments on the v29-0002 patch:
> > >
> > > (1)
> > > In get_object_address_publication_schema(), the error message:
> > >
> > > + errmsg("publication tables of schema \"%s\" in publication \"%s\"
> > > does not exist",
> > >
> > > isn't grammatically correct. It should probably be:
> > >
> > > + errmsg("publication tables of schema \"%s\" in publication \"%s\" do
> > > not exist",
> >
> > Modified
> >
>
> I still see the old message in v30. But I have a different suggestion
> for this message. How about changing it to: "publication schema \"%s\"
> in publication \"%s\" does not exist"? This will make it similar to
> other messages and I don't see the need here to add 'tables' as we
> have it in grammar.

Modified

The v32 patch attached at [1] handles the above.
[1] - 
https://www.postgresql.org/message-id/CALDaNm1R-xbQvz4LU5OXu3KKwbWOz3uDcT_YjRU6V0R5FZDYDg%40mail.gmail.com

Regards,
Vignesh




Re: Added schema level support for publication.

2021-09-23 Thread vignesh C
On Thu, Sep 23, 2021 at 12:22 PM houzj.f...@fujitsu.com
 wrote:
>
> From Thurs, Sep 23, 2021 12:09 PM Amit Kapila  wrote:
> > On Wed, Sep 22, 2021 at 5:03 PM Hou Zhijie  wrote:
> > >
> > > Personally, I think we'd better move the code about changing publication's
> > > tablelist into AlterPublicationTables and the code about changing
> > publication's
> > > schemalist into AlterPublicationSchemas. It's similar to what the
> > v29-patchset
> > > did, the difference is the SET action, I suggest we drop all the tables in
> > > function AlterPublicationTables when user only set schemas and drop all 
> > > the
> > > schema in AlterPublicationSchemas when user only set tables. In this
> > approach,
> > > we can keep schema and relation code separate and don't need to worry
> > > about the locking order.
> > >
> > > Attach a top-up patch which refactor the code like above.
> > >
> >
> > Good suggestion. I think it would still be better if we can move the
> > checks related to superuser and puballtables into a separate function
> > that gets called before taking a lock on publication.
>
> I agreed.
>
> I noticed v30-0001 has been committed with some minor changes, and the 
> V30-0002
> patchset need to be rebased accordingly. Attach a rebased version patch set to
> make cfbot happy. Also Attach the two top-up patches which refactor the code 
> as
> suggested. (top-up patch 1 is to keep schema and table code separate, top-up
> patch 2 is to move some cheap check into a function and invoke it before
> locking.)

Thanks for the patches, the changes simplifies alterpublications code
and handles the drop object in a better way. I have merged it to 0001
patch at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm1R-xbQvz4LU5OXu3KKwbWOz3uDcT_YjRU6V0R5FZDYDg%40mail.gmail.com

Regards,
Vignesh




Re: row filtering for logical replication

2021-09-23 Thread Amit Kapila
On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
 wrote:
>
> 13) turning update into insert
>
> I agree with Ajin Cherian [4] that looking at just old or new row for
> updates is not the right solution, because each option will "break" the
> replica in some case. So I think the goal "keeping the replica in sync"
> is the right perspective, and converting the update to insert/delete if
> needed seems appropriate.
>
> This seems a somewhat similar to what pglogical does, because that may
> also convert updates (although only to inserts, IIRC) when handling
> replication conflicts. The difference is pglogical does all this on the
> subscriber, while this makes the decision on the publisher.
>
> I wonder if this might have some negative consequences, or whether
> "moving" this to downstream would be useful for other purposes in the
> fuure (e.g. it might be reused for handling other conflicts).
>

Apart from additional traffic, I am not sure how will we handle all
the conditions on subscribers, say if the new row doesn't match, how
will subscribers know about this unless we pass row_filter or some
additional information along with tuple. Previously, I have done some
research and shared in one of the emails above that IBM's InfoSphere
Data Replication [1] performs filtering in this way which also
suggests that we won't be off here.

>
>
> 15) pgoutput_row_filter initializing filter
>
> I'm not sure I understand why the filter initialization gets moved from
> get_rel_sync_entry. Presumably, most of what the replication does is
> replicating rows, so I see little point in not initializing this along
> with the rest of the rel_sync_entry.
>

Sorry, IIRC, this has been suggested by me and I thought it was best
to do any expensive computation the first time it is required. I have
shared few cases like in [2] where it would lead to additional cost
without any gain. Unless I am missing something, I don't see any
downside of doing it in a delayed fashion.

[1] - https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-search-conditions
[2] - 
https://www.postgresql.org/message-id/CAA4eK1JBHo2U2sZemFdJmcwEinByiJVii8wzGCDVMxOLYB3CUw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-09-23 Thread Masahiko Sawada
On Thu, Sep 23, 2021 at 1:56 PM Amit Kapila  wrote:
>
> On Wed, Sep 22, 2021 at 8:52 AM Masahiko Sawada  wrote:
> >
> > On Wed, Sep 22, 2021 at 3:02 AM vignesh C  wrote:
> > >
> > ---
> > This patch introduces some new static functions to publicationcmds.c
> > but some have function prototypes but some don't. As far as I checked,
> >
> > ObjectsInPublicationToOids()
> > CheckObjSchemaNotAlreadyInPublication()
> > GetAlterPublicationDelRelations()
> > AlterPublicationSchemas()
> > CheckPublicationAlterTables()
> > CheckPublicationAlterSchemas()
> > LockSchemaList()
> > OpenReliIdList()
> > PublicationAddSchemas()
> > PublicationDropSchemas()
> >
> > are newly introduced but only four functions:
> >
> > OpenReliIdList()
> > LockSchemaList()
> > PublicationAddSchemas()
> > PublicationDropSchemas()
> >
> > have function prototypes. Actually, there already are functions that
> > don't have their prototype in publicationcmds.c. But it seems better
> > to clarify what kind of functions don't need to have a prototype at
> > least in this file.
> >
>
> I think if the function is defined after its use then we declare it at
> the top. Do you prefer to declare all static functions to allow ease
> of usage? Do you have something else in mind?

I prefer to declare all static functions since if we have a function
prototype we don't need to include the change about the function in
future (unrelated) commits that might add a new function which uses
the function and is defined before their declarations. But it seems to
me that the policy varies per file. For instance, all functions in
vacuumlazy.c have their function prototype but functions in
publicationcmds.c seems not. I'm not going to insist on that so please
ignore this comment.

>
> >
> > ---
> > +   if ((action == DEFELEM_ADD || action == DEFELEM_SET) && 
> > !superuser())
> > +   ereport(ERROR,
> > +   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > +errmsg("must be superuser to add or
> > set schemas")));
> >
> > Why do we require the superuser privilege only for ADD and SET but not for 
> > DROP?
> >
>
> For Add/Set of for all tables of Schema is similar to all tables
> publication requirement. For Drop, I don't think it is mandatory to
> allow only to superuser. The same applies to Alter Publication ...
> Drop table case where you don't need to be table owner whereas, for
> Add, you need to be. We had a discussion on these points in this
> thread. See [1] and some emails prior to it.

Thank you for sharing the link. That makes sense.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Added schema level support for publication.

2021-09-23 Thread vignesh C
On Thu, Sep 23, 2021 at 12:32 PM houzj.f...@fujitsu.com
 wrote:
>
> On Thur, Sep 23, 2021 11:06 AM Greg Nancarrow  wrote:
> > On Wed, Sep 22, 2021 at 9:33 PM houzj.f...@fujitsu.com 
> >  wrote:
> > >
> > > >
> > > > How do you suggest changing it?
> > >
> > > Personally, I think we'd better move the code about changing
> > > publication's tablelist into AlterPublicationTables and the code about
> > > changing publication's schemalist into AlterPublicationSchemas. It's
> > > similar to what the v29-patchset did, the difference is the SET
> > > action, I suggest we drop all the tables in function
> > > AlterPublicationTables when user only set schemas and drop all the
> > > schema in AlterPublicationSchemas when user only set tables. In this
> > > approach, we can keep schema and relation code separate and don't need to
> > worry about the locking order.
> > >
> > > Attach a top-up patch which refactor the code like above.
> > > Thoughts ?
> > >
> >
> > Sounds like a good idea.
> > Is it possible to incorporate the existing
> > CheckPublicationAlterTables() and CheckPublicationAlterSchemas() functions
> > into your suggested update?
> > I think it might tidy up the error-checking a bit.
>
> I agreed we can put the check about ALL TABLE and superuser into a function
> like what the v30-patchset did. But I have some hesitations about the code
> related to CheckObjSchemaNotAlreadyInPublication(). Currently, we need to open
> and lock the table before invoking the CheckObjxxx function, ISTM we'd better
> open the table in function AlterPublicationTables. Maybe we can wait for the
> author's(Vignesh) opinion.

I felt keeping the code related to
CheckObjSchemaNotAlreadyInPublication as it is in
AlterPublicationTables and AlterPublicationSchemas is better.

Regards,
Vignesh




Re: row filtering for logical replication

2021-09-23 Thread Amit Kapila
On Fri, Sep 24, 2021 at 11:06 AM Dilip Kumar  wrote:
>
> On Fri, Sep 24, 2021 at 10:50 AM Amit Kapila  wrote:
>
> > > 12) misuse of REPLICA IDENTITY
> > >
> > > The more I think about this, the more I think we're actually misusing
> > > REPLICA IDENTITY for something entirely different. The whole purpose of
> > > RI was to provide a row identifier for the subscriber.
> > >
> > > But now we're using it to ensure we have all the necessary columns,
> > > which is entirely orthogonal to the original purpose. I predict this
> > > will have rather negative consequences.
> > >
> > > People will either switch everything to REPLICA IDENTITY FULL, or create
> > > bogus unique indexes with extra columns. Which is really silly, because
> > > it wastes network bandwidth (transfers more data) or local resources
> > > (CPU and disk space to maintain extra indexes).
> > >
> > > IMHO this needs more infrastructure to request extra columns to decode
> > > (e.g. for the filter expression), and then remove them before sending
> > > the data to the subscriber.
> > >
> >
> > Yeah, but that would have an additional load on write operations and I
> > am not sure at this stage but maybe there could be other ways to
> > extend the current infrastructure wherein we build the snapshots using
> > which we can access the user tables instead of only catalog tables.
> > Such enhancements if feasible would be useful not only for allowing
> > additional column access in row filters but for other purposes like
> > allowing access to functions that access user tables. I feel we can
> > extend this later as well seeing the usage and requests. For the first
> > version, this doesn't sound too limiting to me.
>
> I agree with one point from Tomas, that if we bind the row filter with
> the RI, then if the user has to use the row filter on any column 1)
> they have to add an unnecessary column to the index 2) Since they have
> to add it to RI so now we will have to send it over the network as
> well.  3). We anyway have to WAL log it if it is modified because now
> we forced users to add some columns to RI because they wanted to use
> the row filter on that.   Now suppose we remove that limitation and we
> somehow make these changes orthogonal to RI, i.e. if we have a row
> filter on some column then we WAL log it, so now the only extra cost
> we are paying is to just WAL log that column, but the user is not
> forced to add it to index, not forced to send it over the network.
>

I am not suggesting adding additional columns to RI just for using
filter expressions. If most users that intend to publish delete/update
wanted to use filter conditions apart from replica identity then we
can later extend this functionality but not sure if the only way to
accomplish that is to log additional data in WAL. I am just trying to
see if we can provide meaningful functionality without extending too
much the scope of this work.

-- 
With Regards,
Amit Kapila.




Re: Hook for extensible parsing.

2021-09-23 Thread Julien Rouhaud
On Thu, Sep 23, 2021 at 10:21:20AM -0400, Tom Lane wrote:
> 
> I do have sympathy for the idea that extensions would like to define
> their own statement types.  I just don't see a practical way to do it
> in our existing parser infrastructure.  This patch certainly doesn't
> offer that.

Allowing extensions to define their own (utility) statement type is just a
matter of allowing ExtensibleNode as top level statement.  As far as I can
see the only change required for that is to give those a specific command tag
in CreateCommandTag(), since transformStmt() default to emitting a utility
command.  You can then easily intercept such statement in the utility hook and
fetch your custom struct.

I could do that but I'm assuming that you still wouldn't be satisfied as
custom parser would still be needed, whihc may or may not require to
copy/paste chunks of the core grammar?

If so, do you have any suggestion for an approach you would accept?




a comment in joinrel.c: compute_partition_bounds()

2021-09-23 Thread Amit Langote
Hi,

I think there's a word missing in the following comment:

/*
 * See if the partition bounds for inputs are exactly the same, in
 * which case we don't need to work hard: the join rel have the same
 * partition bounds as inputs, and the partitions with the same
 * cardinal positions form the pairs.

": the join rel have the same..." seems to be missing a "will".

Attached a patch to fix.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


comment-missing-word.patch
Description: Binary data


Re: row filtering for logical replication

2021-09-23 Thread Amit Kapila
On Fri, Sep 24, 2021 at 11:52 AM Amit Kapila  wrote:
>
> On Fri, Sep 24, 2021 at 11:06 AM Dilip Kumar  wrote:
> >
> > On Fri, Sep 24, 2021 at 10:50 AM Amit Kapila  
> > wrote:
> >
> > > > 12) misuse of REPLICA IDENTITY
> > > >
> > > > The more I think about this, the more I think we're actually misusing
> > > > REPLICA IDENTITY for something entirely different. The whole purpose of
> > > > RI was to provide a row identifier for the subscriber.
> > > >
> > > > But now we're using it to ensure we have all the necessary columns,
> > > > which is entirely orthogonal to the original purpose. I predict this
> > > > will have rather negative consequences.
> > > >
> > > > People will either switch everything to REPLICA IDENTITY FULL, or create
> > > > bogus unique indexes with extra columns. Which is really silly, because
> > > > it wastes network bandwidth (transfers more data) or local resources
> > > > (CPU and disk space to maintain extra indexes).
> > > >
> > > > IMHO this needs more infrastructure to request extra columns to decode
> > > > (e.g. for the filter expression), and then remove them before sending
> > > > the data to the subscriber.
> > > >
> > >
> > > Yeah, but that would have an additional load on write operations and I
> > > am not sure at this stage but maybe there could be other ways to
> > > extend the current infrastructure wherein we build the snapshots using
> > > which we can access the user tables instead of only catalog tables.
> > > Such enhancements if feasible would be useful not only for allowing
> > > additional column access in row filters but for other purposes like
> > > allowing access to functions that access user tables. I feel we can
> > > extend this later as well seeing the usage and requests. For the first
> > > version, this doesn't sound too limiting to me.
> >
> > I agree with one point from Tomas, that if we bind the row filter with
> > the RI, then if the user has to use the row filter on any column 1)
> > they have to add an unnecessary column to the index 2) Since they have
> > to add it to RI so now we will have to send it over the network as
> > well.  3). We anyway have to WAL log it if it is modified because now
> > we forced users to add some columns to RI because they wanted to use
> > the row filter on that.   Now suppose we remove that limitation and we
> > somehow make these changes orthogonal to RI, i.e. if we have a row
> > filter on some column then we WAL log it, so now the only extra cost
> > we are paying is to just WAL log that column, but the user is not
> > forced to add it to index, not forced to send it over the network.
> >
>
> I am not suggesting adding additional columns to RI just for using
> filter expressions. If most users that intend to publish delete/update
> wanted to use filter conditions apart from replica identity then we
> can later extend this functionality but not sure if the only way to
> accomplish that is to log additional data in WAL.
>

One possibility in this regard could be that we enhance Replica
Identity .. Include (column_list) where all the columns in the include
list won't be sent but I think it is better to postpone such
enhancements for a later version. Like, I suggested above, we might
want to extend our infrastructure in a way where not only this extra
columns request can be accomplished but we should be able to allow
UDF's (where user tables can be accessed) and probably sub-queries as
well.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2021-09-23 Thread Dilip Kumar
On Fri, Sep 24, 2021 at 12:04 PM Amit Kapila  wrote:
>
> One possibility in this regard could be that we enhance Replica
> Identity .. Include (column_list) where all the columns in the include
> list won't be sent

Instead of RI's include column list why we can not think of
row_filter's columns list?  I mean like we log the old RI column can't
we make similar things for the row filter columns?  With that, we
don't have to all the columns instead we only log the columns which
are in row filter, or is this too hard to identify during write
operation?  So now the WAL logging requirement for RI and row filter
is orthogonal and if some columns are common then we can log only
once?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com