Re: .ready and .done files considered harmful

2021-09-20 Thread Dipesh Pandit
Hi,

> 1. I've removed several calls to PgArchForceDirScan() in favor of
> calling it at the top of pgarch_ArchiverCopyLoop().  I believe
> there is some disagreement about this change, but I don't think
> we gain enough to justify the complexity.  The main reason we
> exit pgarch_ArchiverCopyLoop() should ordinarily be that we've
> run out of files to archive, so incurring a directory scan the
> next time it is called doesn't seem like it would normally be too
> bad.  I'm sure there are exceptions (e.g., lots of .done files,
> archive failures), but the patch is still not making things any
> worse than they presently are for these cases.

Yes, I think when archiver is lagging behind then a call to force
directory scan at the top of pgarch_ArchiverCopyLoop() does not
have any impact. This may result into a directory scan in next cycle
only when the archiver is ahead or in sync but in that case also a
directory scan may not incur too much cost since the archiver is
ahead.I agree that we can remove the separate calls to force a
directory scan in failure scenarios with a single call at the top of
PgArchForceDirScan().

> 2. I removed all the logic that attempted to catch out-of-order
> .ready files.  Instead, XLogArchiveNotify() only forces a
> directory scan for files other than regular WAL files, and we
> depend on our periodic directory scans to pick up anything that's
> been left behind.
> 3. I moved the logic that forces directory scans every once in a
> while.  We were doing that in the checkpoint/restartpoint logic,
> which, upon further thought, might not be the best idea.  The
> checkpoint interval can vary widely, and IIRC we won't bother
> creating checkpoints at all if database activity stops.  Instead,
> I've added logic in pgarch_readyXlog() that forces a directory
> scan if one hasn't happened in a few minutes.
> 4. Finally, I've tried to ensure comments are clear and that the
> logic is generally easy to reason about.
>
> What do you think?

I agree, If we force a periodic directory scan then we may not
require any special logic for handling scenarios where a .ready file
is created out of order in XLogArchiveNotify(). We need to force a
directory scan only in case of a non-regular WAL file in
XLogArchiveNotify().

Overall I think the periodic directory scan simplifies the patch and
makes sure that any missing file gets archived within a few mins.

Thanks,
Dipesh


Re: psql: \dl+ to list large objects privileges

2021-09-20 Thread Pavel Luzanov

Hi,

Thank you for testing.
As far as I understand, for the patch to move forward, someone has to become a 
reviewer
and change the status in the commitfest app.

Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company

On 18.09.2021 05:41, Neil Chen wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi, I think this is an interesting patch. +1
I tested it for the latest version, and it works well.




RE: Improve logging when using Huge Pages

2021-09-20 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi,

Thank you for your comment.

> I was afraid that logging the message like "could not ..." every time when 
> the server starts up would surprise users unnecessarily.
> Because the message sounds like it reports a server error.

Fujii-san, 
I was worried about the same thing as you.
So the attached patch gets the value of the kernel parameter vm.nr_hugepages, 
and if it is the default value of zero, the log level is the same as before. 
On the other hand, if any value is set, the level is set to LOG.
I hope I can find a better message other than this kind of implementation.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Kyotaro Horiguchi [mailto:horikyota@gmail.com] 
Sent: Friday, September 17, 2021 1:15 PM
To: masao.fu...@oss.nttdata.com
Cc: pry...@telsasoft.com; Shinoda, Noriyoshi (PN Japan FSIP) 
; pgsql-hack...@postgresql.org; rjuju...@gmail.com; 
t...@sss.pgh.pa.us
Subject: Re: Improve logging when using Huge Pages

At Fri, 17 Sep 2021 00:13:41 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/09/08 11:17, Kyotaro Horiguchi wrote:
> > I don't dislike the message, but I'm not sure I like the message is 
> > too verbose, especially about it has DETAILS. It seems to me 
> > something like the following is sufficient, or I'd like see it even 
> > more concise.
> > "fall back anonymous shared memory to non-huge pages: required %zu 
> > bytes for huge pages"
> > If we emit an error message for other than mmap failure, it would be 
> > like the following.
> > "fall back anonymous shared memory to non-huge pages: huge pages not 
> > available"
> 
> How about simpler message like "disabling huge pages" or "disabling 
> huge pages due to lack of huge pages available"?

Honestly, I cannot have conficence on my wording, but "disabling huge pages" 
souds like somthing that happens on the OS layer.  "did not use/gave up using 
huge pages for anounymous shared memory" might work well, I'm not sure, 
though...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


huge_pages_log_v5.diff
Description: huge_pages_log_v5.diff


Re: WIP: System Versioned Temporal Table

2021-09-20 Thread Hannu Krosing
On Mon, Sep 20, 2021 at 7:09 AM Corey Huinker  wrote:
>
> On Sun, Sep 19, 2021 at 3:12 PM Hannu Krosing  wrote:
>>
>> A side table has the nice additional benefit that we can very easily
>> version the *table structure* so when we ALTER TABLE and the table
>> structure changes we just make a new side table with now-currents
>> structure.
>
>
> It's true that would allow for perfect capture of changes to the table 
> structure, but how would you query the thing?
>
> If a system versioned table was created with a column foo that is type float, 
> and then we dropped that column, how would we ever query what the value of 
> foo was in the past?


We can query that thing only in tables AS OF the time when the column
was still there.

We probably could get away with pretending the dropped columns to be
NULL in newer versions (and the versions before the column was added)

Even more tricky case would be changing the column data type.

>
> Would the columns returned from SELECT * change based on the timeframe 
> requested?


If we want to emulate real table history, then it should.

But the * thing was not really specified well even for original
PostgreSQL inheritance.

Maybe we could do SELECT (* AS OF 'yesterday afternoon'::timestamp) FROM ... :)

> If we then later added another column that happened to also be named foo but 
> now was type JSONB, would we change the datatype returned based on the time 
> period being queried?

Many databases do allow returning multiple result sets, and actually
the PostgreSQL wire *protocol* also theoretically supports this, just
that it is not supported by any current client library.

With current libraries it would be possible to return a dynamic
version of  row_to_json(t.*) which changes based on actual historical
table structure

> Is the change in structure a system versioning which itself must be captured?

We do capture it (kind of) for logical decoding. That is, we decode
according to the structure in effect at the time of row creation,
though we currently miss the actual DDL itself.


So there is a lot to figure out and define, but at least storing the
history in a separate table gives a good foundation to build upon.



-
Hannu Krosing
Google Cloud - We have a long list of planned contributions and we are hiring.
Contact me if interested.




Re: [PATCH] Full support for index LP_DEAD hint bits on standby

2021-09-20 Thread Antonin Houska
Michail Nikolaev  wrote:

> Hello.
> 
> Added a check for standby promotion with the long transaction to the
> test (code and docs are unchanged).

I'm trying to continue the review, sorry for the delay. Following are a few
question about the code:

* Does the masking need to happen in the AM code, e.g. _bt_killitems()? I'd
  expect that the RmgrData.rm_fpi_mask can do all the work.

  Maybe you're concerned about clearing the "LP-safe-on-standby" bits after
  promotion, but I wouldn't consider this a problem: once the standby is
  allowed to set the hint bits (i.e. minRecoveryPoint is high enough, see
  IsIndexLpDeadAllowed() -> XLogNeedsFlush()), promotion shouldn't break
  anything because it should not allow minRecoveryPoint to go backwards.

* How about modifying rm_mask() instead of introducing rm_fpi_mask()?  Perhaps
  a boolean argument can be added to distinguish the purpose of the masking.

* Are you sure it's o.k. to use mask_lp_flags() here? It sets the item flags
  to LP_UNUSED unconditionally, which IMO should only be done by VACUUM. I
  think you only need to revert the effect of prior ItemIdMarkDead(), so you
  only need to change the status LP_DEAD to LP_NORMAL if the tuple still has
  storage. (And maybe add an assertion to ItemIdMarkDead() confirming that
  it's only used for LP_NORMAL items?)

  As far as I understand, the current code only uses mask_lp_flags() during
  WAL consistency check on copies of pages which don't eventually get written
  to disk.

* IsIndexLpDeadAllowed()

  ** is bufmgr.c the best location for this function?

  ** the header comment should explain the minLsn argument.

  ** comment

/* It is always allowed on primary if *all_dead. */

should probably be

/* It is always allowed on primary if ->all_dead. */

* comment: XLOG_HEAP2_CLEAN has been renamed to XLOG_HEAP2_PRUNE in PG14.


On regression tests:

* Is the purpose of the repeatable read (RR) snapshot to test that
  heap_hot_search_buffer() does not set deadness->all_dead if some transaction
  can still see a tuple of the chain? If so, I think the RR snapshot does not
  have to be used in the tests because this patch does not really affect the
  logic: heap_hot_search_buffer() only sets deadness->all_dead to false, just
  like it sets *all_dead in the current code. Besides that,
  IsIndexLpDeadAllowed() too can avoid setting of the LP_DEAD flag on an index
  tuple (at least until the commit record of the deleting/updating transaction
  gets flushed to disk), so it can hide the behaviour of
  heap_hot_search_buffer().

* Unless I miss something, the tests check that the hint bits are not
  propagated from primary (or they are propagated but marked non-safe),
  however there's no test to check that standby does set the hint bits itself.

* I'm also not sure if promotion needs to be tested. What's specific about the
  promoted cluster from the point of view of this feature? The only thing I
  can think of is clearing of the "LP-safe-on-standby" bits, but, as I said
  above, I'm not sure if the tests ever let standby to set those bits before
  the promotion.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Added schema level support for publication.

2021-09-20 Thread Amit Kapila
On Fri, Sep 17, 2021 at 5:39 PM vignesh C  wrote:
>
> On Wed, Sep 15, 2021 at 12:30 PM Amit Kapila  wrote:
> >
>
> > 4.
> > AlterPublicationSchemas()
> > {
> > ..
> > + /*
> > + * If the table option was not specified remove the existing tables
> > + * from the publication.
> > + */
> > + if (!tables)
> > + {
> > + rels = GetPublicationRelations(pubform->oid, PUBLICATION_PART_ROOT);
> > + PublicationDropTables(pubform->oid, rels, false, true);
> > + }
> > +
> > + /* Identify which schemas should be dropped */
> > + delschemas = list_difference_oid(oldschemaids, schemaidlist);
> > +
> > + /* And drop them */
> > + PublicationDropSchemas(pubform->oid, delschemas, true);
> >
> > Here, you have neither locked tables to be dropped nor schemas. I
> > think both need to be locked as we do for tables in similar code in
> > AlterPublicationTables(). Can you please test via debugger what
> > happens if we try to drop without taking lock here and concurrently
> > try to drop the actual object? It should give some error. If we decide
> > to lock here then we should be able to pass the list of relations to
> > PublicationDropTables() instead of Oids which would then obviate the
> > need for any change to that function.
> >
> > Similarly don't we need to lock schemas before dropping them in
> > AlterPublicationTables()?
>
> we will get the following error, if concurrently dropped from another
> session during debugging:
> postgres=# alter publication pub1 set all tables in schema sch2;
> ERROR:  cache lookup failed for publication table 16418
> Modified to add locking
>

But you haven't followed my other suggestion related to
PublicationDropTables(). I don't think after doing this, you need to
pass 'true' as the last parameter to PublicationDropTables. In fact,
you can remove that parameter altogether or in other words, we don't
need any change in PublicationDropTables for this patch. Is there a
reason why we shouldn't make this change?

Few other comments:
===
1. The ordering of lock acquisition for schema and relation in
AlterPublicationSchemas() and AlterPublicationTables() is opposite
which would generally lead to deadlock but it won't here because we
acquire share lock on the schema. But, I think it may still be better
to keep the locking order the same and it might help us to keep schema
and relation code separate

2. One more thing, I think one can concurrently add-relation for a
particular schema and that particular schema. To protect that
AlterPublication should acquire an exclusive lock similar to how we do
in AlterSubscription.

3.
+ /*
+ * If the table option was not specified remove the existing tables
+ * from the publication.
+ */
+ if (!relsSpecified)
+ {
+ List*relations = NIL;
+ List*tables = NIL;
+
+ rels = GetPublicationRelations(pubform->oid, PUBLICATION_PART_ROOT);
+ tables = RelationOidsToRangevars(rels);
+ relations = OpenTableList(tables);

One problem with using OpenTableList here is that it might try to lock
inherited children twice. Also, you don't need to first convert to
rangevar for locking relations, you can directly use table_open here.

4.
+ | extended_relation_expr
+ {
+ $$ = makeNode(PublicationObjSpec);
+ $$->object = (Node *)$1;
+ }
+ | CURRENT_SCHEMA
+ {
+ $$ = makeNode(PublicationObjSpec);
+ $$->object = (Node *)makeString("CURRENT_SCHEMA");
+ }
  ;

-publication_for_tables:
- FOR TABLE publication_table_list
+/* This can be either a schema or relation name. */
+pubobj_name:
+ ColId
  {
- $$ = (Node *) $3;
+ $$ = (Node *) makeString($1);
  }
- | FOR ALL TABLES
+ | ColId indirection
  {
- $$ = (Node *) makeInteger(true);
+ $$ = (Node *) makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
  }

In some places, you have given space after (Node *) and at other
places, there is no space. Isn't it better to be consistent?

5.
+/* This can be either a schema or relation name. */
+pubobj_name:

Here, we can modify the comment as "This can be either a schema or
relation name. For relations, the inheritance will be implicit." And
then remove the inheritance related comment from code below:

+ /* inheritance query, implicitly */
+ $$ = makeNode(PublicationObjSpec);
+ $$->object = $1;

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-09-20 Thread Amit Kapila
On Fri, Sep 17, 2021 at 5:40 PM vignesh C  wrote:
>
> On Thu, Sep 16, 2021 at 9:54 AM Amit Kapila  wrote:
> >
> > I think there is one more similar locking problem.
> > AlterPublicationSchemas()
> > {
> > ..
> > + if (stmt->action == DEFELEM_ADD)
> > + {
> > + List *rels;
> > +
> > + rels = GetPublicationRelations(pubform->oid, PUBLICATION_PART_ROOT);
> > + RelSchemaIsMemberOfSchemaList(rels, schemaidlist, true);
> > ...
> > ...
> > }
> >
> > Here, we don't have a lock on the relation. So, what if the relation
> > is concurrently dropped after you get the rel list by
> > GetPublicationRelations?
>
> This works fine without locking even after concurrent drop, I felt
> this works because of MVCC.
>

Can you share the exact scenario you have tested? I think here it can
give a wrong error because it might access invalid cache entry, so I
think a lock is required here. Also, as said before, this might help
to make the rel list consistent in function
CheckObjSchemaNotAlreadyInPublication().

-- 
With Regards,
Amit Kapila.




Coding guidelines for braces + spaces - link 404's

2021-09-20 Thread Kevin Burke
Hi,
I have been working on a patch for Postgres. I'm curious about the
suggested style for braces around if statements - some places don't include
braces around an if statement body, if the if statement body is a single
line.

The "Coding Conventions" don't contain any advice here (although maybe they
should link to the "Developer FAQ"?)
https://www.postgresql.org/docs/devel/source.html

The Postgres Wiki has a bit that says to "See also the Formatting section
 in the
documentation," but that link 404's, so I'm not sure where it is supposed
to go.
https://wiki.postgresql.org/wiki/Developer_FAQ#What.27s_the_formatting_style_used_in_PostgreSQL_source_code.3F

Thanks,
Kevin

--
Kevin Burke
phone: 925-271-7005 | kevin.burke.dev


Re: Logical replication timeout problem

2021-09-20 Thread Fabrice Chapuis
Hi Amit,

We can replay the problem: we load a table of several Gb in the schema of
the publisher, this generates the worker's timeout after one minute from
the end of this load. The table on which this load is executed is not
replicated.

2021-09-16 12:06:50 CEST [24881]: [1-1]
user=postgres,db=db012a00,client=[local] LOG:  duration: 1281408.171 ms
statement: COPY db.table (col1, col2) FROM stdin;

2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic
analyze of table "db.table " system usage: CPU: user: 4.13 s, system: 0.55
s, elapsed: 9.58 s

2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:
terminating logical replication worker due to timeout

Before increasing value for wal_sender_timeout and wal_receiver_timeout I
thought to further investigate the mechanisms leading to this timeout.

Thanks for your help

Fabrice



On Sun, Sep 19, 2021 at 6:25 AM Amit Kapila  wrote:

> On Fri, Sep 17, 2021 at 8:08 PM Fabrice Chapuis 
> wrote:
> >
> > the publisher and the subscriber run on the same postgres instance.
> >
>
> Okay, but there is no log corresponding to operations being performed
> by the publisher. By looking at current logs it is not very clear to
> me what might have caused this. Did you try increasing
> wal_sender_timeout and wal_receiver_timeout?
>
> --
> With Regards,
> Amit Kapila.
>


Re: Added schema level support for publication.

2021-09-20 Thread vignesh C
On Mon, Sep 20, 2021 at 3:57 PM Amit Kapila  wrote:
>
> On Fri, Sep 17, 2021 at 5:40 PM vignesh C  wrote:
> >
> > On Thu, Sep 16, 2021 at 9:54 AM Amit Kapila  wrote:
> > >
> > > I think there is one more similar locking problem.
> > > AlterPublicationSchemas()
> > > {
> > > ..
> > > + if (stmt->action == DEFELEM_ADD)
> > > + {
> > > + List *rels;
> > > +
> > > + rels = GetPublicationRelations(pubform->oid, PUBLICATION_PART_ROOT);
> > > + RelSchemaIsMemberOfSchemaList(rels, schemaidlist, true);
> > > ...
> > > ...
> > > }
> > >
> > > Here, we don't have a lock on the relation. So, what if the relation
> > > is concurrently dropped after you get the rel list by
> > > GetPublicationRelations?
> >
> > This works fine without locking even after concurrent drop, I felt
> > this works because of MVCC.
> >
>
> Can you share the exact scenario you have tested? I think here it can
> give a wrong error because it might access invalid cache entry, so I
> think a lock is required here. Also, as said before, this might help
> to make the rel list consistent in function
> CheckObjSchemaNotAlreadyInPublication().

This is the scenario I tried:
create schema sch1;
create table sch1.t1 (c1 int);
create publication pub1 for table sch1.t1;
alter publication pub1 add all tables in schema sch1;  -- concurrently
drop table sch1.t1 from another session.

I will add the locking and changing of
CheckObjSchemaNotAlreadyInPublication in the next version.

Regards,
Vignesh




Re: mark the timestamptz variant of date_bin() as stable

2021-09-20 Thread John Naylor
> On Wed, Sep 1, 2021 at 3:25 PM Tom Lane  wrote:
> >
> > John Naylor  writes:
> > > On Wed, Sep 1, 2021 at 2:44 PM Tom Lane  wrote:
> > >> I see that these two answers are both exactly multiples of 24 hours
away
> > >> from the given origin.  But if I'm binning on the basis of "days" or
> > >> larger units, I would sort of expect to get local midnight, and I'm
not
> > >> getting that once I cross a DST boundary.
> >
> > > Hmm, that's seems like a reasonable expectation. I can get local
midnight
> > > if I recast to timestamp:
> >
> > > # select date_bin('1 day', '2021-11-10 00:00
+00'::timestamptz::timestamp,
> > > '2021-09-01 00:00 -04'::timestamptz::timestamp);
> > >   date_bin
> > > -
> > >  2021-11-09 00:00:00
> > > (1 row)
> >
> > Yeah, and then back to timestamptz if that's what you really need :-(
> >
> > > It's a bit unintuitive, though.
> >
> > Agreed.  If we keep it like this, adding some documentation around
> > the point would be a good idea I think.
>
> Attached is a draft doc patch using the above examples. Is there anything
else that would be useful to mention?

Any thoughts on the doc patch?

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Coding guidelines for braces + spaces - link 404's

2021-09-20 Thread Peter Eisentraut

On 20.09.21 05:37, Kevin Burke wrote:
I have been working on a patch for Postgres. I'm curious about the 
suggested style for braces around if statements - some places don't 
include braces around an if statement body, if the if statement body is 
a single line.


Generally, the braces should be omitted if the body is only a single 
line.  An exception is sometimes made for symmetry if another branch 
uses more than one line.  So


if (foo)
bar();

but

if (foo)
{
bar();
}
else
{
baz();
qux();
}




Re: Logical replication timeout problem

2021-09-20 Thread Amit Kapila
On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis  wrote:
>
> Hi Amit,
>
> We can replay the problem: we load a table of several Gb in the schema of the 
> publisher, this generates the worker's timeout after one minute from the end 
> of this load. The table on which this load is executed is not replicated.
>
> 2021-09-16 12:06:50 CEST [24881]: [1-1] 
> user=postgres,db=db012a00,client=[local] LOG:  duration: 1281408.171 ms  
> statement: COPY db.table (col1, col2) FROM stdin;
>
> 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic 
> analyze of table "db.table " system usage: CPU: user: 4.13 s, system: 0.55 s, 
> elapsed: 9.58 s
>
> 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:  terminating 
> logical replication worker due to timeout
>
> Before increasing value for wal_sender_timeout and wal_receiver_timeout I 
> thought to further investigate the mechanisms leading to this timeout.
>

The basic problem here seems to be that WAL Sender is not able to send
a keepalive or any other message for the configured
wal_receiver_timeout. I am not sure how that can happen but can you
once try by switching autovacuum = off? I wanted to ensure that
WALSender is not blocked due to the background process autovacuum.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2021-09-20 Thread Amit Kapila
On Mon, Sep 20, 2021 at 3:17 PM Ajin Cherian  wrote:
>
> On Wed, Sep 8, 2021 at 7:59 PM Ajin Cherian  wrote:
> >
> > On Wed, Sep 1, 2021 at 9:23 PM Euler Taveira  wrote:
> > >
>
> > Somehow this approach of either new_tuple or old_tuple doesn't seem to
> > be very fruitful if the user requires that his replica is up-to-date
> > based on the filter condition. For that, I think you will need to
> > convert UPDATES to either INSERTS or DELETES if only new_tuple or
> > old_tuple matches the filter condition but not both matches the filter
> > condition.
> >
> > UPDATE
> > old-row (match)   new-row (no match)  -> DELETE
> > old-row (no match)  new row (match)   -> INSERT
> > old-row (match)   new row (match)   -> UPDATE
> > old-row (no match)  new-row (no match)  -> (drop change)
> >
>
> Adding a patch that strives to do the logic that I described above.
> For updates, the row filter is applied on both old_tuple
> and new_tuple. This patch assumes that the row filter only uses
> columns that are part of the REPLICA IDENTITY. (the current patch-set
> only
> restricts this for row-filters that are delete only)
> The old_tuple only has columns that are part of the old_tuple and have
> been changed, which is a problem while applying the row-filter. Since
> unchanged REPLICA IDENTITY columns
> are not present in the old_tuple, this patch creates a temporary
> old_tuple by getting such column values from the new_tuple and then
> applies the filter on this hand-created temp old_tuple. The way the
> old_tuple is created can be better optimised in future versions.
>

Yeah, this is the kind of idea which can work. One thing you might
want to check is the overhead of the additional deform/form cycle. You
might want to use Peter's tests above. I think you need to only form
old/new tuples when you have changed something in it but on a quick
look, it seems you are always re-forming both the tuples.

-- 
With Regards,
Amit Kapila.




Re: proposal: possibility to read dumped table's name from file

2021-09-20 Thread Daniel Gustafsson
> Will do a closer review on the patch shortly.

Had a read through, and tested, the latest posted version today:

+Read objects filters from the specified file. Specify "-" to read from
+stdin. Lines of this file must have the following format:
I think this should be - and STDIN to
match the rest of the docs.


+   
+With the following filter file, the dump would include table
+mytable1 and data from foreign tables of
+some_foreign_server foreign server, but exclude data
+from table mytable2.
+
+include table mytable1
+include foreign_data some_foreign_server
+exclude table mytable2
+
+   
This example is highlighting the issue I've previously raised with the UX/doc
of this feature.  The "exclude table mytable2" is totally pointless in the
above since the exact match of "mytable1" will remove all other objects.  What
we should be doing instead is use the pattern matching aspect along the lines
of the below:

include table mytable*
exclude table mytable2

+The --filter option works just like the other
+options to include or exclude tables, schemas, table data, or foreign
This should refer to the actual options by name to make it clear which we are
talking about.


+   printf(_("  --filter=FILENAMEdump objects and data based on 
the filter expressions\n"
+"   from the filter 
file\n"));
Before we settle on --filter I think we need to conclude whether this file is
intended to be included from a config file, or used on it's own.  If we gow tih
the former then we might not want a separate option for just --filter.


+if (filter_is_keyword(keyword, size, "include"))
I would prefer if this function call was replaced by just the pg_strcasecmp()
call in filter_is_keyword() and the strlen optimization there removed.  The is
not a hot-path, we can afford the string comparison in case of errors.  Having
the string comparison done inline here will improve readability saving the
reading from jumping to another function to see what it does.


+initStringInfo(&line);
Why is this using a StringInfo rather than a PQExpBuffer as the rest of pg_dump
does?


+typedef struct
I think these should be at the top of the file with the other typedefs.


When testing strange object names, I was unable to express this name in the 
filter file:

$ ./bin/psql
psql (15devel)
Type "help" for help.

danielg=# create table "
danielg"# t
danielg"# t
danielg"# " (a integer);
CREATE TABLE
danielg=# select relname from pg_class order by oid desc limit 1;
 relname
-
+
 t  +
 t  +

(1 row)

--
Daniel Gustafsson   https://vmware.com/





Re: Timeout failure in 019_replslot_limit.pl

2021-09-20 Thread Michael Paquier
On Sat, Sep 18, 2021 at 05:19:04PM -0300, Alvaro Herrera wrote:
> Hmm, sounds a possibly useful idea to explore, but I would only do so if
> the other ideas prove fruitless, because it sounds like it'd have more
> moving parts.  Can you please first test if the idea of sending the signal
> twice is enough?

This idea does not work.  I got one failure after 5 tries.

> If that doesn't work, let's try Horiguchi-san's idea
> of using some `ps` flags to find the process.

Tried this one as well, to see the same failure.  I was just looking
at the state of the test while it was querying pg_replication_slots
and that was the expected state after the WAL sender received SIGCONT:
USERPID  %CPU %MEM  VSZRSS   TT  STAT STARTED  TIME COMMAND
toto  12663   0.0  0.0  5014468   3384   ??  Ss8:30PM   0:00.00 postgres: 
primary3: walsender toto [local] streaming 0/72   
toto  12662   0.0  0.0  4753092   3936   ??  Ts8:30PM   0:00.01 postgres: 
standby_3: walreceiver streaming 0/7000D8 

The test gets the right PIDs, as the logs showed:
ok 17 - have walsender pid 12663
ok 18 - have walreceiver pid 12662

So it does not seem that this is not an issue with the signals.
Perhaps we'd better wait for a checkpoint to complete by for example
scanning the logs before running the query on pg_replication_slots to
make sure that the slot is invalidated?
--
Michael


signature.asc
Description: PGP signature


Re: Logical replication timeout problem

2021-09-20 Thread Amit Kapila
On Mon, Sep 20, 2021 at 5:21 PM Amit Kapila  wrote:
>
> On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis  
> wrote:
> >
> > Hi Amit,
> >
> > We can replay the problem: we load a table of several Gb in the schema of 
> > the publisher, this generates the worker's timeout after one minute from 
> > the end of this load. The table on which this load is executed is not 
> > replicated.
> >
> > 2021-09-16 12:06:50 CEST [24881]: [1-1] 
> > user=postgres,db=db012a00,client=[local] LOG:  duration: 1281408.171 ms  
> > statement: COPY db.table (col1, col2) FROM stdin;
> >
> > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic 
> > analyze of table "db.table " system usage: CPU: user: 4.13 s, system: 0.55 
> > s, elapsed: 9.58 s
> >
> > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:  
> > terminating logical replication worker due to timeout
> >
> > Before increasing value for wal_sender_timeout and wal_receiver_timeout I 
> > thought to further investigate the mechanisms leading to this timeout.
> >
>
> The basic problem here seems to be that WAL Sender is not able to send
> a keepalive or any other message for the configured
> wal_receiver_timeout. I am not sure how that can happen but can you
> once try by switching autovacuum = off? I wanted to ensure that
> WALSender is not blocked due to the background process autovacuum.
>

The other thing we can try out is to check the data in pg_locks on
publisher during one minute after the large copy is finished. This we
can try out both with and without autovacuum.

-- 
With Regards,
Amit Kapila.




Re: Timeout failure in 019_replslot_limit.pl

2021-09-20 Thread Alvaro Herrera
On 2021-Sep-20, Michael Paquier wrote:

> > Can you please first test if the idea of sending the signal twice is
> > enough?
> 
> This idea does not work.  I got one failure after 5 tries.

OK, thanks for taking the time to test it.

> > If that doesn't work, let's try Horiguchi-san's idea of using some
> > `ps` flags to find the process.
> 
> Tried this one as well, to see the same failure.

Hmm, do you mean that you used Horiguchi-san's patch in [1] and the
failure still occurred?
[1] 
https://postgr.es/m/20210907.120106.1483239898065111540.horikyota@gmail.com

> I was just looking at the state of the test while it was querying
> pg_replication_slots and that was the expected state after the WAL
> sender received SIGCONT:
> USERPID  %CPU %MEM  VSZRSS   TT  STAT STARTED  TIME COMMAND
> toto  12663   0.0  0.0  5014468   3384   ??  Ss8:30PM   0:00.00 postgres: 
> primary3: walsender toto [local] streaming 0/72   
> toto  12662   0.0  0.0  4753092   3936   ??  Ts8:30PM   0:00.01 postgres: 
> standby_3: walreceiver streaming 0/7000D8 
> 
> The test gets the right PIDs, as the logs showed:
> ok 17 - have walsender pid 12663
> ok 18 - have walreceiver pid 12662

As I understood, Horiguchi-san's point isn't that the PIDs might be
wrong -- the point is to make sure that the process is in state T before
moving on to the next step in the test.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




psql: tab completion differs on semicolon placement

2021-09-20 Thread Daniel Gustafsson
While testing a patch I fat-fingered a CREATE DATABASE statement by tab
completing *after* the semicolon, with no space between the objname and
semicolon.  The below options were presented, which at this point aren't really
applicable:

db=# create database foo;
ALLOW_CONNECTIONS ENCODING  LC_COLLATELOCALE
TABLESPACE
CONNECTION LIMIT  IS_TEMPLATE   LC_CTYPE  OWNER TEMPLATE

DROP DATABASE has a similar tab completion which makes about as much sense:

db=# drop database foo;WITH (

Checking prev_wd for not ending with ';' as per the attached makes "objname;"
behave like "objname ;".  Is there a reason for not doing that which I'm
missing?  I didn't check for others, but if this seems reasonable I'll go
through to find any other similar cases.

--
Daniel Gustafsson   https://vmware.com/



psql_createdb-tabcomplete.diff
Description: Binary data


Re: Timeout failure in 019_replslot_limit.pl

2021-09-20 Thread Michael Paquier
On Mon, Sep 20, 2021 at 09:38:29AM -0300, Alvaro Herrera wrote:
> On 2021-Sep-20, Michael Paquier wrote:
>>> If that doesn't work, let's try Horiguchi-san's idea of using some
>>> `ps` flags to find the process.
>> 
>> Tried this one as well, to see the same failure.
> 
> Hmm, do you mean that you used Horiguchi-san's patch in [1] and the
> failure still occurred?
> [1] 
> https://postgr.es/m/20210907.120106.1483239898065111540.horikyota@gmail.com

Yes, that's what I mean.
--
Michael


signature.asc
Description: PGP signature


Re: row filtering for logical replication

2021-09-20 Thread Dilip Kumar
On Mon, Sep 20, 2021 at 5:37 PM Amit Kapila  wrote:
> > >
> >
> > Adding a patch that strives to do the logic that I described above.
> > For updates, the row filter is applied on both old_tuple
> > and new_tuple. This patch assumes that the row filter only uses
> > columns that are part of the REPLICA IDENTITY. (the current patch-set
> > only
> > restricts this for row-filters that are delete only)
> > The old_tuple only has columns that are part of the old_tuple and have
> > been changed, which is a problem while applying the row-filter. Since
> > unchanged REPLICA IDENTITY columns
> > are not present in the old_tuple, this patch creates a temporary
> > old_tuple by getting such column values from the new_tuple and then
> > applies the filter on this hand-created temp old_tuple. The way the
> > old_tuple is created can be better optimised in future versions.

I understand why this is done, but I have 2 concerns here 1) We are
having extra deform and copying the field from new to old in case it
is unchanged replica identity.  2) The same unchanged attribute values
get qualified in the old tuple as well as in the new tuple.  What
exactly needs to be done is that the only updated field should be
validated as part of the old as well as the new tuple, the unchanged
field does not make sense to have redundant validation.   For that we
will have to change the filter for the old tuple to just validate the
attributes which are actually modified and remaining unchanged and new
values will anyway get validated in the new tuple.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Coding guidelines for braces + spaces - link 404's

2021-09-20 Thread Tom Lane
Kevin Burke  writes:
> The Postgres Wiki has a bit that says to "See also the Formatting section
>  in the
> documentation," but that link 404's, so I'm not sure where it is supposed
> to go.

Obsolete link, evidently.  It should point to

https://www.postgresql.org/docs/devel/source-format.html

Will fix.

regards, tom lane




Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-20 Thread Amul Sul
On Wed, Sep 15, 2021 at 9:38 PM Robert Haas  wrote:
>
> On Wed, Sep 15, 2021 at 10:32 AM Robert Haas  wrote:
> > Putting these changes into 0001 seems to make no sense. It seems like
> > they should be part of 0003, or maybe a new 0004 patch.
>
> After looking at this a little bit more, I think it's really necessary
> to separate out all of your changes into separate patches at least for
> initial review. It's particularly important to separate code movement
> changes from other kinds of changes. 0001 was just moving code before,
> and so was 0002, but now both are making other changes, which is not
> easy to see from looking at the 'git diff' output. For that reason
> it's not so easy to understand exactly what you've changed here and
> analyze it.
>

Ok, understood, I have separated my changes into 0001 and 0002 patch,
and the refactoring patches start from 0003.

In the 0001 patch, I have copied ArchiveRecoveryRequested to shared
memory as said previously. Coping ArchiveRecoveryRequested value to
shared memory is not really interesting, and I think somehow we should
reuse existing variable, (perhaps, with some modification of the
information it can store, e.g. adding one more enum value for
SharedRecoveryState or something else), thinking on the same.

In addition to that, I tried to turn down the scope of
ArchiveRecoveryRequested global variable. Now, this is a static
variable, and the scope is limited to xlog.c file like
LocalXLogInsertAllowed and can be accessed through the newly added
function ArchiveRecoveryIsRequested() (like PromoteIsTriggered()). Let
me know what you think about the approach.

In 0002 patch is a mixed one where I tried to remove the dependencies
on global variables and local variables belonging to StartupXLOG(). I
am still worried about the InRecovery value that needs to be deduced
afterward inside XLogAcceptWrites().  Currently, relying on
ControlFile->state != DB_SHUTDOWNED check but I think that will not be
good for ASRO where we plan to skip XLogAcceptWrites() work only and
let the StartupXLOG() do the rest of the work as it is where it will
going to update ControlFile's DBState to DB_IN_PRODUCTION, then we
might need some ugly kludge to call PerformRecoveryXLogAction() in
checkpointer irrespective of DBState, which makes me a bit
uncomfortable.

> I poked around a little bit at these patches, looking for
> perhaps-interesting global variables upon which the code called from
> XLogAcceptWrites() would depend with your patches applied. The most
> interesting ones seem to be (1) ThisTimeLineID, which you mentioned
> and which may be fine but I'm not totally convinced yet, (2)
> LocalXLogInsertAllowed, which is probably not broken but I'm thinking
> we may want to redesign that mechanism somehow to make it cleaner, and

Thanks for the off-list detailed explanation on this.

For somebody else who might be reading this, the concern here is (not
really a concern, it is a good thing to improve) the
LocalSetXLogInsertAllowed() function call, is a kind of hack that
enables wal writes irrespective of RecoveryInProgress() for a shorter
period. E.g. see following code in StartupXLOG:

"
  LocalSetXLogInsertAllowed();
  UpdateFullPageWrites();
  LocalXLogInsertAllowed = -1;


  /*
   * If any of the critical GUCs have changed, log them before we allow
   * backends to write WAL.
   */
  LocalSetXLogInsertAllowed();
  XLogReportParameters();
"

Instead of explicitly enabling wal insert, somehow that implicitly
allowed for the startup process and/or the checkpointer doing the
first checkpoint and/or wal writes after the recovery. Well, the
current LocalSetXLogInsertAllowed() mechanism is not really harming
anything or bad and does not necessarily need to change but it would
be nice if we were able to come up with something much cleaner,
bug-free, and 100% perfect enough design.

(Hope I am not missing anything from the discussion).

> (3) CheckpointStats, which is called from RemoveXlogFile which is
> called from RemoveNonParentXlogFiles which is called from
> CleanupAfterArchiveRecovery which is called from XLogAcceptWrites.
> This last one is actually pretty weird already in the existing code.
> It sort of looks like RemoveXlogFile() only expects to be called from
> the checkpointer (or a standalone backend) so that it can update
> CheckpointStats and have that just work, but actually it's also called
> from the startup process when a timeline switch happens. I don't know
> whether the fact that the increments to ckpt_segs_recycled get lost in
> that case should be considered an intentional behavior that should be
> preserved or an inadvertent mistake.
>

Maybe I could be wrong, but I think that is intentional.  It removes
pre-allocated or bogus files of the old timeline which are not
supposed to be considered in stats. The comments for
CheckpointStatsData might not be clear but comment at the calling
RemoveNonParentXlogFiles() place inside StartupXLOG hints the same:

"
/*
 * Be

Re: What are exactly bootstrap processes, auxiliary processes, standalone backends, normal backends(user sessions)?

2021-09-20 Thread Alvaro Herrera
On 2021-Sep-13, Alvaro Herrera wrote:

> Thanks Bharath and Justin -- I think I took all the suggestions and made
> a few other changes of my own.  Here's the result.

Pushed this with very minor additional changes, thanks.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: So, about that cast-to-typmod-minus-one business

2021-09-20 Thread Tom Lane
Dean Rasheed  writes:
> On Sat, 18 Sep 2021, 18:06 Tom Lane,  wrote:
>> I'm inclined to back-patch

> +1

Done.

regards, tom lane




Re: Logical replication timeout problem

2021-09-20 Thread Fabrice Chapuis
By passing the autovacuum parameter to off the problem did not occur right
after loading the table as in our previous tests. However, the timeout
occurred later. We have seen the accumulation of .snap files for several Gb.

...
-rw---. 1 postgres postgres 16791226 Sep 20 15:26
xid-1238444701-lsn-2D2B-F500.snap
-rw---. 1 postgres postgres 16973268 Sep 20 15:26
xid-1238444701-lsn-2D2B-F600.snap
-rw---. 1 postgres postgres 16790984 Sep 20 15:26
xid-1238444701-lsn-2D2B-F700.snap
-rw---. 1 postgres postgres 16988112 Sep 20 15:26
xid-1238444701-lsn-2D2B-F800.snap
-rw---. 1 postgres postgres 16864593 Sep 20 15:26
xid-1238444701-lsn-2D2B-F900.snap
-rw---. 1 postgres postgres 16902167 Sep 20 15:26
xid-1238444701-lsn-2D2B-FA00.snap
-rw---. 1 postgres postgres 16914638 Sep 20 15:26
xid-1238444701-lsn-2D2B-FB00.snap
-rw---. 1 postgres postgres 16782471 Sep 20 15:26
xid-1238444701-lsn-2D2B-FC00.snap
-rw---. 1 postgres postgres 16963667 Sep 20 15:27
xid-1238444701-lsn-2D2B-FD00.snap
...



2021-09-20 17:11:29 CEST [12687]: [1283-1] user=,db=,client= LOG:
checkpoint starting: time
2021-09-20 17:11:31 CEST [12687]: [1284-1] user=,db=,client= LOG:
checkpoint complete: wrote 13 buffers (0.0%); 0 WAL file(s) added, 0
removed, 0 recycled; write=1.713 s, sync=0.001 s, total=1.718 s
; sync files=12, longest=0.001 s, average=0.001 s; distance=29 kB,
estimate=352191 kB
2021-09-20 17:12:43 CEST [59986]: [2-1] user=,db=,client= ERROR:
terminating logical replication worker due to timeout
2021-09-20 17:12:43 CEST [12546]: [1068-1] user=,db=,client= LOG:  worker
process: logical replication worker for subscription 24215702 (PID 59986)
exited with exit code 1
2021-09-20 17:12:43 CEST [39945]: [1-1] user=,db=,client= LOG:  logical
replication apply worker for subscription "sub" has started
2021-09-20 17:12:43 CEST [39946]: [1-1] user=repuser,db=db,client=127.0.0.1
LOG:  received replication command: IDENTIFY_SYSTEM

Regards,

Fabrice



On Mon, Sep 20, 2021 at 1:51 PM Amit Kapila  wrote:

> On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis 
> wrote:
> >
> > Hi Amit,
> >
> > We can replay the problem: we load a table of several Gb in the schema
> of the publisher, this generates the worker's timeout after one minute from
> the end of this load. The table on which this load is executed is not
> replicated.
> >
> > 2021-09-16 12:06:50 CEST [24881]: [1-1]
> user=postgres,db=db012a00,client=[local] LOG:  duration: 1281408.171 ms
> statement: COPY db.table (col1, col2) FROM stdin;
> >
> > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:
> automatic analyze of table "db.table " system usage: CPU: user: 4.13 s,
> system: 0.55 s, elapsed: 9.58 s
> >
> > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:
> terminating logical replication worker due to timeout
> >
> > Before increasing value for wal_sender_timeout and wal_receiver_timeout
> I thought to further investigate the mechanisms leading to this timeout.
> >
>
> The basic problem here seems to be that WAL Sender is not able to send
> a keepalive or any other message for the configured
> wal_receiver_timeout. I am not sure how that can happen but can you
> once try by switching autovacuum = off? I wanted to ensure that
> WALSender is not blocked due to the background process autovacuum.
>
> --
> With Regards,
> Amit Kapila.
>


logical replication restrictions

2021-09-20 Thread Marcos Pegoraro
One thing is needed and is not solved yet is delayed replication on logical
replication. Would be interesting to document it on Restrictions page,
right ?

regards,
Marcos


Re: psql: tab completion differs on semicolon placement

2021-09-20 Thread Dagfinn Ilmari Mannsåker
Daniel Gustafsson  writes:

> While testing a patch I fat-fingered a CREATE DATABASE statement by tab
> completing *after* the semicolon, with no space between the objname and
> semicolon.  The below options were presented, which at this point aren't 
> really
> applicable:
>
> db=# create database foo;
> ALLOW_CONNECTIONS ENCODING  LC_COLLATELOCALE
> TABLESPACE
> CONNECTION LIMIT  IS_TEMPLATE   LC_CTYPE  OWNER 
> TEMPLATE
>
> DROP DATABASE has a similar tab completion which makes about as much sense:
>
> db=# drop database foo;WITH (
>
> Checking prev_wd for not ending with ';' as per the attached makes "objname;"
> behave like "objname ;".  Is there a reason for not doing that which I'm
> missing?  I didn't check for others, but if this seems reasonable I'll go
> through to find any other similar cases.

The same applies to any completion after a MatchAny that ends in a any
of the WORD_BREAKS characters (except whitespace and () which are
handled specially).

#define WORD_BREAKS "\t\n@$><=;|&{() "

IMO a fix should be more principled than just special-casing semicolon
and CREATE TABLE.  Maybe get_previous_words() should stop when it sees
an unquoted semicolon?

- ilmari




Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-20 Thread Andres Freund
On 2021-09-19 18:45:45 +0300, Alexander Korotkov wrote:
> Any objections to pushing this?

lgtm

I assume you're planning to backpatch this?




Re: .ready and .done files considered harmful

2021-09-20 Thread Robert Haas
On Thu, Sep 16, 2021 at 7:26 PM Bossart, Nathan  wrote:
>   1. I've removed several calls to PgArchForceDirScan() in favor of
>  calling it at the top of pgarch_ArchiverCopyLoop().  I believe
>  there is some disagreement about this change, but I don't think
>  we gain enough to justify the complexity.  The main reason we
>  exit pgarch_ArchiverCopyLoop() should ordinarily be that we've
>  run out of files to archive, so incurring a directory scan the
>  next time it is called doesn't seem like it would normally be too
>  bad.  I'm sure there are exceptions (e.g., lots of .done files,
>  archive failures), but the patch is still not making things any
>  worse than they presently are for these cases.

I was thinking that this might increase the number of directory scans
by a pretty large amount when we repeatedly catch up, then 1 new file
gets added, then we catch up, etc.

But I guess your thought process is that such directory scans, even if
they happen many times per second, can't really be that expensive,
since the directory can't have much in it. Which seems like a fair
point. I wonder if there are any situations in which there's not much
to archive but the archive_status directory still contains tons of
files.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: .ready and .done files considered harmful

2021-09-20 Thread Alvaro Herrera
On 2021-Sep-20, Robert Haas wrote:

> I was thinking that this might increase the number of directory scans
> by a pretty large amount when we repeatedly catch up, then 1 new file
> gets added, then we catch up, etc.

I was going to say that perhaps we can avoid repeated scans by having a
bitmap of future files that were found by a scan; so if we need to do
one scan, we keep track of the presence of the next (say) 64 files in
our timeline, and then we only have to do another scan when we need to
archive a file that wasn't present the last time we scanned.  However:

> But I guess your thought process is that such directory scans, even if
> they happen many times per second, can't really be that expensive,
> since the directory can't have much in it. Which seems like a fair
> point. I wonder if there are any situations in which there's not much
> to archive but the archive_status directory still contains tons of
> files.

(If we take this stance, which seems reasonable to me, then we don't
need to optimize.)  But perhaps we should complain if we find extraneous
files in archive_status -- Then it'd be on the users' heads not to leave
tons of files that would slow down the scan.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)




Re: psql: tab completion differs on semicolon placement

2021-09-20 Thread Daniel Gustafsson
> On 20 Sep 2021, at 21:26, Dagfinn Ilmari Mannsåker  wrote:

> IMO a fix should be more principled than just special-casing semicolon
> and CREATE TABLE.  Maybe get_previous_words() should stop when it sees
> an unquoted semicolon?

Agreed, something along those lines makes sense.  I will familiarize myself
with this file (which until today has been a blank spot) and will see what I
can come up with.

--
Daniel Gustafsson   https://vmware.com/





Re: Parallel Full Hash Join

2021-09-20 Thread Jaime Casanova
On Fri, Jul 30, 2021 at 04:34:34PM -0400, Melanie Plageman wrote:
> On Sat, Jul 10, 2021 at 9:13 AM vignesh C  wrote:
> >
> > On Mon, May 31, 2021 at 10:47 AM Greg Nancarrow  wrote:
> > >
> > > On Sat, Mar 6, 2021 at 12:31 PM Thomas Munro  
> > > wrote:
> > > >
> > > > On Tue, Mar 2, 2021 at 11:27 PM Thomas Munro  
> > > > wrote:
> > > > > On Fri, Feb 12, 2021 at 11:02 AM Melanie Plageman
> > > > >  wrote:
> > > > > > I just attached the diff.
> > > > >
> > > > > Squashed into one patch for the cfbot to chew on, with a few minor
> > > > > adjustments to a few comments.
> > > >
> > > > I did some more minor tidying of comments and naming.  It's been on my
> > > > to-do-list to update some phase names after commit 3048898e, and while
> > > > doing that I couldn't resist the opportunity to change DONE to FREE,
> > > > which somehow hurts my brain less, and makes much more obvious sense
> > > > after the bugfix in CF #3031 that splits DONE into two separate
> > > > phases.  It also pairs obviously with ALLOCATE.  I include a copy of
> > > > that bugix here too as 0001, because I'll likely commit that first, so


Hi Thomas,

Do you intend to commit 0001 soon? Specially if this apply to 14 should
be committed in the next days.

> > > > I rebased the stack of patches that way.  0002 includes the renaming I
> > > > propose (master only).  Then 0003 is Melanie's patch, using the name
> > > > SCAN for the new match bit scan phase.  I've attached an updated
> > > > version of my "phase diagram" finger painting, to show how it looks
> > > > with these three patches.  "scan*" is new.
> > >

0002: my only concern is that this will cause innecesary pain in
backpatch-ing future code... but not doing that myself will let that to
the experts

0003: i'm testing this now, not at a big scale but just to try to find
problems

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: psql: tab completion differs on semicolon placement

2021-09-20 Thread David Fetter
On Mon, Sep 20, 2021 at 08:26:51PM +0100, Dagfinn Ilmari Mannsåker wrote:
> Daniel Gustafsson  writes:
> 
> > While testing a patch I fat-fingered a CREATE DATABASE statement by tab
> > completing *after* the semicolon, with no space between the objname and
> > semicolon.  The below options were presented, which at this point aren't 
> > really
> > applicable:
> >
> > db=# create database foo;
> > ALLOW_CONNECTIONS ENCODING  LC_COLLATELOCALE
> > TABLESPACE
> > CONNECTION LIMIT  IS_TEMPLATE   LC_CTYPE  OWNER 
> > TEMPLATE
> >
> > DROP DATABASE has a similar tab completion which makes about as much sense:
> >
> > db=# drop database foo;WITH (
> >
> > Checking prev_wd for not ending with ';' as per the attached makes 
> > "objname;"
> > behave like "objname ;".  Is there a reason for not doing that which I'm
> > missing?  I didn't check for others, but if this seems reasonable I'll go
> > through to find any other similar cases.
> 
> The same applies to any completion after a MatchAny that ends in a any
> of the WORD_BREAKS characters (except whitespace and () which are
> handled specially).
> 
> #define WORD_BREAKS "\t\n@$><=;|&{() "
> 
> IMO a fix should be more principled than just special-casing semicolon
> and CREATE TABLE.  Maybe get_previous_words() should stop when it sees
> an unquoted semicolon?

Is there some reason get_previous_words() shouldn't stop for
everything that's WORD_BREAKS? If not, making that the test might make the
general rule a little simpler to write, and if WORD_BREAKS ever
changed, for example to include all space, or all breaking space, or
similar, the consequences would at least not propagate through
seemingly unrelated code.

At the moment, get_previous_words() does look for everything in
WORD_BREAKS, and then accounts for double quotes (") and then does
something clever to account for double quotes and the quoting behavior
that doubling them ("") accomplishes. Anyhow, that looks like it
should work in this case, but clearly it's not.

Would it be less error prone to do these checks and maybe push or pop
one or more stacks holding state as each character came in? I suspect
the overhead would be unnoticeable even on the slowest* client.

Best,
David.

* One possible exception would be a gigantic paste, a case where psql
  can be prevented from attempting tab completion, although the
  prevention measures involve a pretty obscure terminal setting:
  https://cirw.in/blog/bracketed-paste

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-20 Thread Alexander Korotkov
On Mon, Sep 20, 2021 at 10:48 PM Andres Freund  wrote:
> On 2021-09-19 18:45:45 +0300, Alexander Korotkov wrote:
> > Any objections to pushing this?
>
> lgtm

Thanks!

> I assume you're planning to backpatch this?

Yes.

--
Regards,
Alexander Korotkov




Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-20 Thread Tom Lane
Alexander Korotkov  writes:
> On Mon, Sep 20, 2021 at 10:48 PM Andres Freund  wrote:
>> I assume you're planning to backpatch this?

> Yes.

Probably good to wait 24 hours until 14rc1 has been tagged.

regards, tom lane




Re: Deduplicate code updating ControleFile's DBState.

2021-09-20 Thread Bossart, Nathan
On 9/19/21, 11:07 PM, "Amul Sul"  wrote:
> +1, since skipping ControlFileLock for the DBState update is not the
> right thing, let's have two different functions as per your suggestion
> -- did the same in the attached version, thanks.

I see that the attached patch reorders the call to UpdateControlFile()
to before SharedRecoveryState is updated, which seems to go against
the intent of ebdf5bf.  I'm not sure if this really creates that much
of a problem in practice, but it is a behavior change.

Also, I still think it might be better to include this patch in the
patch set where the exported function is needed.  On its own, this is
a very small amount of refactoring that might not be totally
necessary.

> I have one additional concern about the way we update the control
> file, at every place where doing the update, we need to set control
> file update time explicitly, why can't the time update line be moved
> to UpdateControlFile() so that time gets automatically updated?

I see a few places where UpdateControlFile() is called without
updating ControlFile->time.  I haven't found any obvious reason for
that, so perhaps it would be okay to move it to update_controlfile().

Nathan



Re: Small documentation improvement for ALTER SUBSCRIPTION

2021-09-20 Thread Masahiko Sawada
On Wed, Sep 15, 2021 at 4:58 PM Daniel Gustafsson  wrote:
>
> > On 14 Sep 2021, at 14:35, Daniel Gustafsson  wrote:
> >
> >> On 14 Sep 2021, at 11:57, Amit Kapila  wrote:
> >
> >> LGTM as well. Peter E., Daniel, does any one of you is intending to
> >> push this? If not, I can take care of this.
> >
> > No worries, I can pick it up.

Sorry for the late reply. I was on vacation.

> And done, thanks!

Thanks!

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Support for NSS as a libpq TLS backend

2021-09-20 Thread Jacob Champion
On Mon, 2021-07-26 at 15:26 +0200, Daniel Gustafsson wrote:
> > On 19 Jul 2021, at 21:33, Jacob Champion  wrote:
> > ..client connections will crash if
> > hostaddr is provided rather than host, because SSL_SetURL can't handle
> > a NULL argument. I'm running with 0002 to fix it for the moment, but
> > I'm not sure yet if it does the right thing for IP addresses, which the
> > OpenSSL side has a special case for.
> 
> AFAICT the idea is to handle it in the cert auth callback, so I've added some
> PoC code to check for sslsni there and updated the TODO comment to reflect
> that.

I dug a bit deeper into the SNI stuff:

> + server_hostname = SSL_RevealURL(conn->pr_fd);
> + if (!server_hostname || server_hostname[0] == '\0')
> + {
> + /* If SNI is enabled we must have a hostname set */
> + if (conn->sslsni && conn->sslsni[0])
> + status = SECFailure;

conn->sslsni can be explicitly set to "0" to disable it, so this should
probably be changed to a check for "1", but I'm not sure that would be
correct either. If the user has the default sslsni="1" and supplies an
IP address for the host parameter, I don't think we should fail the
connection.

> + if (host && host[0] &&
> + !(strspn(host, "0123456789.") == strlen(host) ||
> +   strchr(host, ':')))
> + SSL_SetURL(conn->pr_fd, host);

It looks like NSS may already have some code that prevents SNI from
being sent for IP addresses, so that part of the guard might not be
necessary. (And potentially counterproductive, because it looks like
NSS can perform verification against the certificate's SANs if you pass
an IP address to SSL_SetURL().)

Speaking of IP addresses in SANs, it doesn't look like our OpenSSL
backend can handle those. That's a separate conversation, but I might
take a look at a patch for next commitfest.

--Jacob


Re: Estimating HugePages Requirements?

2021-09-20 Thread Bossart, Nathan
Should we also initialize the shared memory GUCs in bootstrap and
single-user mode?  I think I missed this in bd17880.

Nathan

diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 48615c0ebc..4c4cf44871 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -324,6 +324,12 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)

 InitializeMaxBackends();

+/*
+ * Initialize runtime-computed GUCs that depend on the amount of shared
+ * memory required.
+ */
+InitializeShmemGUCs();
+
 CreateSharedMemoryAndSemaphores();

 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0775abe35d..cae0b079b9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3978,6 +3978,12 @@ PostgresSingleUserMain(int argc, char *argv[],
 /* Initialize MaxBackends */
 InitializeMaxBackends();

+/*
+ * Initialize runtime-computed GUCs that depend on the amount of shared
+ * memory required.
+ */
+InitializeShmemGUCs();
+
 CreateSharedMemoryAndSemaphores();

 /*



Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-20 Thread Alexander Korotkov
On Tue, Sep 21, 2021 at 2:07 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Mon, Sep 20, 2021 at 10:48 PM Andres Freund  wrote:
> >> I assume you're planning to backpatch this?
>
> > Yes.
>
> Probably good to wait 24 hours until 14rc1 has been tagged.

OK, NP!

--
Regards,
Alexander Korotkov




Re: mem context is not reset between extended stats

2021-09-20 Thread Tomas Vondra


On 9/15/21 10:09 PM, Justin Pryzby wrote:

Memory allocation appeared be O(1) WRT the number of statistics objects, which
was not expected to me.  This is true in v13 (and probably back to v10).

It seems to work fine to reset the memory context within the loop, so long as
the statslist is allocated in the parent context.



Yeah, and I agree this fix seems reasonable. Thanks for looking!

In principle we don't expect too many extended statistics on a single 
table, but building a single statistics may use quite a bit of memory, 
so it makes sense to release it early ...


But while playing with this a bit more, I discovered a much worse issue. 
Consider this:


  create table t (a text, b text, c text, d text,
  e text, f text, g text, h text);

  insert into t select x, x, x, x, x, x, x, x from (
select md5(mod(i,100)::text) as x
  from generate_series(1,3) s(i)) foo;


  create statistics s (dependencies) on a, b, c, d, e, f, g, h from t;

  analyze t;

This ends up eating insane amounts of memory - on my laptop it eats 
~2.5GB and then crashes with OOM. This happens because each call to 
dependency_degree does build_sorted_items, which detoasts the values. 
And resetting the context can't fix that, because this happens while 
building a single statistics object.


IMHO the right fix is to run dependency_degree in a separate context, 
and reset it after each dependency. This releases the detoasted values, 
which are otherwise hard to deal with.


This does not mean we should not do what your patch does too. That does 
address various other "leaks" (for example MCV calls build_sorted_items 
too, but only once so it does not have this same issue).


These issues exist pretty much since PG10, which is where extended stats 
were introduced, so we'll have to backpatch it. But there's no rush and 
I don't want to interfere with rc1 at the moment.


Attached are two patches - 0001 is your patch (seems fine, but I looked 
only very briefly) and 0002 is the context reset I proposed.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 204f4602b218ec13ac1e3fa501a7f94adc8a4ea1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 21 Sep 2021 01:14:11 +0200
Subject: [PATCH 1/3] reset context

---
 src/backend/statistics/extended_stats.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 2e55913bc8..5c18da02c7 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -125,14 +125,14 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	if (!natts)
 		return;
 
+	pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
+	statslist = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
+
 	cxt = AllocSetContextCreate(CurrentMemoryContext,
 "BuildRelationExtStatistics",
 ALLOCSET_DEFAULT_SIZES);
 	oldcxt = MemoryContextSwitchTo(cxt);
 
-	pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
-	statslist = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
-
 	/* report this phase */
 	if (statslist != NIL)
 	{
@@ -234,14 +234,12 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 		pgstat_progress_update_param(PROGRESS_ANALYZE_EXT_STATS_COMPUTED,
 	 ++ext_cnt);
 
-		/* free the build data (allocated as a single chunk) */
-		pfree(data);
+		MemoryContextReset(cxt);
 	}
 
-	table_close(pg_stext, RowExclusiveLock);
-
 	MemoryContextSwitchTo(oldcxt);
 	MemoryContextDelete(cxt);
+	table_close(pg_stext, RowExclusiveLock);
 }
 
 /*
-- 
2.31.1

>From cd0a3398fd6cb586d373bba3d988e564ee7e8c5a Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 21 Sep 2021 01:13:11 +0200
Subject: [PATCH 2/3] build dependencies in context

---
 src/backend/statistics/dependencies.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index d703e9b9ba..4efba06fd6 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -30,6 +30,7 @@
 #include "utils/fmgroids.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/selfuncs.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
@@ -325,12 +326,6 @@ dependency_degree(StatsBuildData *data, int k, AttrNumber *dependency)
 		group_size++;
 	}
 
-	if (items)
-		pfree(items);
-
-	pfree(mss);
-	pfree(attnums_dep);
-
 	/* Compute the 'degree of validity' as (supporting/total). */
 	return (n_supporting_rows * 1.0 / data->numrows);
 }
@@ -359,9 +354,14 @@ statext_dependencies_build(StatsBuildData *data)
 
 	/* result */
 	MVDependencies *dependencies = NULL;
+	MemoryContext	cxt;
 
 	Assert(data->nattnums >= 2);
 
+	cxt = AllocSetContextCreate(Current

PostgreSQL High Precision Support Extension.

2021-09-20 Thread A Z
I have been trying to get a reply or interest in either updating PostgreSQL to 
support the following,
or for there to be a public, free for any use Extension put out there, that 
will support the following.
Can someone able and interested please respond to me about the following 
project specification,
which I am very keen to see happen:

###
# High Precision Numeric and Elementary Functions Support. #
###

-Integer (HPZ) Z, or Rational Decimal Q (HPQ) numbers support.

-A library like GMP, written in C, is an appropriate basis to start from and to 
include, for all OS platforms involved.

-Real numbers include the values of both Recurring Rational Numbers and 
recurring Irrational Numbers.  Those two can be appropriately truncated, by a 
precision value, to obtain an approximating value.  The latter phenomenon is a 
finite Rational value, possibly with integer and/or decimal parts at the same 
time.  These also may be positive or negative, or zero, standard number line, 
values.

-Forward and Inverse operations accuracy, withstanding truncation, can be 
maintained by storing and normalising the expression behind a value, (or by 
just including pointers to value(s)) and displaying that value. This system 
will uphold any precision, certainly within a very large range limit.

-A defaulting number of significant figures, (precision), in a field in memory 
that exists as one copy per connection,  that is updated, as a filter, for all 
relevant HPZ and HPQ numbers. For example, 20 significant figures, as a 
default, would be sensible to start with.

-A function that varies the precision filter for every HPZ and HPQ number at 
once.

-Value assignment to a typed variable by =.

-Operators.  Base 10 Arithmetic and comparisons support on Base 10 Integer and 
Rational Decimal numbers, and casting:

+,-,*,/,%,^,=,!=,<>,>,<,>=,<=, ::

These include full finite division and integer only division, with no 
remainder, and remainder division. The defaulting ability of values within the 
two new types to automatically be cast up to HPZ or HPQ, where specified and 
appropriate in PostgreSQL code.

-Reified support with broader syntax and operations within PostgreSQL, in all 
the obvious and less than obvious places.  Tables and related phenomena, HPZ 
arrays, Indexing, the Window type, Record type, direct compatability with 
Aggregate and Window Functions, the Recursive keyword, are all parts of a 
larger subset that should re-interact with HPZ or HPQ.

-Ease of installation support. Particularly for Windows and Linux. *.exe, *.msi 
or *.rpm, *.deb, *.bin installers.
Upon a PostgreSQL standard installation.  Installation and Activation 
instructions included.  That is, presuming the HPZ and HPQ support is an 
Extension, and simply not added as native, default types into PostgreSQL 
Baseline.

##

-Mathematical and Operational functions support:

precision(BIGINT input)

cast(HPZ as HPQ) returns HPQ;
cast(HPQ as HPZ) returns HPZ;
cast(TEXT as HPZ) returns HPZ;
cast(TEXT as HPQ) returns HPQ;
cast(HPQ as TEXT) returns TEXT;
cast(HPZ as TEXT) returns TEXT;
cast(HPZ as SMALLINT) returns SMALLINT;
cast(SMALLINT as HPQ) returns HPZ;
cast(HPZ as INTEGER) returns INTEGER;
cast(INTEGER as HPZ) returns HPZ;
cast(HPZ as BIGINT) returns BIGINT;
cast(BIGINT as HPZ) returns HPZ;
cast(HPQ as REAL) returns REAL;
cast(REAL as HPQ) returns HPQ
cast(DOUBLE PRECISION as HPQ) returns HPQ;
cast(HPQ as DOUBLE PRECISION) returns DOUBLE PRECISION;
cast(HPQ as DECIMAL) returns DECIMAL;
cast(DECIMAL as HPQ) returns HPQ;
cast(HPQ as NUMERIC) returns NUMERIC;
cast(NUMERIC as HPQ) returns HPQ;

sign(HPQ input) returns HPQ;
abs(HPQ input) returns HPQ;
ceil(HPQ input) returns HPQ;
floor(HPQ input) returns HPQ;
round(HPQ input) returns HPZ;
recip(HPQ input) returns HPQ;
pi() returns HPQ;
e() returns HPQ;
power(HPQ base, HPQ exponent) returns HPQ;
sqrt(HPQ input) returns HPQ
nroot(HPZ theroot, HPQ input) returns HPQ;
log10(HPQ input) returns HPQ;
loge(HPQ input) returns HPQ;
log2(HPQ input) returns HPQ;
factorial(HPZ input) returns HPZ;
nCr(HPZ objects, HPZ selectionSize) returns HPZ
nPr(HPZ objects, HPZ selectionSize) returns HPZ

degrees(HPQ input) returns HPQ;
radians(HPQ input) returns HPQ;
sind(HPQ input) returns HPQ;
cosd(HPQ input) returns HPQ;
tand(HPQ input) returns HPQ;
asind(HPQ input) returns HPQ;
acosd(HPQ input) returns HPQ;
atand(HPQ input) returns HPQ;
sinr(HPQ input) returns HPQ;
cosr(HPQ input) returns HPQ;
tanr(HPQ input) returns HPQ;
asinr(HPQ input) returns HPQ;
acosr(HPQ input) returns HPQ;
atanr(HPQ input) returns HPQ;

##

-Informative articles on all these things exist at:
Comparison Operators: https://en.wikipedia.org/wiki/Relational_operator
Floor and Ceiling Functions: 
https://en.wikipedia.org/wiki/Floor_and_ceilin

Re: mem context is not reset between extended stats

2021-09-20 Thread Justin Pryzby
On Tue, Sep 21, 2021 at 02:15:45AM +0200, Tomas Vondra wrote:
> On 9/15/21 10:09 PM, Justin Pryzby wrote:
> > Memory allocation appeared be O(1) WRT the number of statistics objects, 
> > which
> > was not expected to me.  This is true in v13 (and probably back to v10).

Of course I meant to say that it's O(N) and not O(1) :)

> In principle we don't expect too many extended statistics on a single table,

Yes, but note that expression statistics make it more reasonable to have
multiple extended stats objects.  I noticed this while testing a patch to build
(I think) 7 stats objects on each of our current month's partitions.
autovacuum was repeatedly killed on this vm after using using 2+GB RAM,
probably in part because there were multiple autovacuum workers handling the
most recent batch of inserted tables.

First, I tried to determine what specifically was leaking so badly, and
eventually converged to this patch.  Maybe there's additional subcontexts which
would be useful, but the minimum is to reset between objects.

> These issues exist pretty much since PG10, which is where extended stats
> were introduced, so we'll have to backpatch it. But there's no rush and I
> don't want to interfere with rc1 at the moment.

Ack that.  It'd be *nice* if if the fix were included in v14.0, but I don't
know the rules about what can change after rc1.

> Attached are two patches - 0001 is your patch (seems fine, but I looked only
> very briefly) and 0002 is the context reset I proposed.

I noticed there seems to be a 3rd patch available, which might either be junk
for testing or a cool new feature I'll hear about later ;)

> From 204f4602b218ec13ac1e3fa501a7f94adc8a4ea1 Mon Sep 17 00:00:00 2001
> From: Tomas Vondra 
> Date: Tue, 21 Sep 2021 01:14:11 +0200
> Subject: [PATCH 1/3] reset context

cheers,
-- 
Justin




Re: Estimating HugePages Requirements?

2021-09-20 Thread Michael Paquier
On Fri, Sep 17, 2021 at 04:31:44PM +, Bossart, Nathan wrote:
> Done.

Thanks.  I have gone through the last patch this morning, did some
tests on all the platforms I have at hand (including Linux) and
finished by applying it after doing some small tweaks.  First, I have 
finished by extending GetHugePageSize() to accept NULL for its first
argument hugepagesize.  A second thing was in the docs, where it is
still useful IMO to keep the reference to /proc/meminfo and
/sys/kernel/mm/hugepages to let users know how the system impacts the
calculation of the new GUC.

Let's see what the buildfarm thinks about it.
--
Michael


signature.asc
Description: PGP signature


PostgreSQL 14 press release draft

2021-09-20 Thread Jonathan S. Katz
Hi,

Attached is a copy of the first draft for the PG14 press release.

This brings together highlights of many of the features in the upcoming
PostgreSQL 14 release while providing context on their significance.
With the plethora of new features coming in PostgreSQL 14, it is
challenging to highlight them all, but the idea is to give a glimpse of
what to expect in the new release.

Feedback on the release is welcome. However, please provide your
feedback on the release no later than **Thu, Sep 23, 2021 @ 18:00 UTC**.

Thanks,

Jonathan
The PostgreSQL Global Development Group today announced the release of
[PostgreSQL 14](https://www.postgresql.org/docs/14/release-14.html), the latest
version of the world’s [most advanced open source 
database](https://www.postgresql.org/).

PostgreSQL 14 brings a variety of features that help developers and
administrators to deploy their data-backed applications. PostgreSQL continues to
add innovations on complex data types, including more conveniences for accessing
JSON and support for noncontiguous ranges of data. This latest release also adds
to PostgreSQL's trend on improvements for high performance and distributed
data workloads, with advances in support for connection concurrency, high-write
workloads, query parallelism and logical replication.

_Hold for quote_

[PostgreSQL](https://www.postgresql.org), an innovative data management system
known for its reliability and robustness, benefits from over 25 years of open
source development from a [global developer 
community](https://www.postgresql.org/community/)
and has become the preferred open source relational database for organizations
of all sizes.


### JSON Conveniences and Multiranges

PostgreSQL has supported manipulating 
[JSON](https://www.postgresql.org/docs/14/datatype-json.html)
data since the release of PostgreSQL 9.2, though retrieval of values used a
unique syntax. PostgreSQL 14 now lets you [access JSON data using 
subscripts](https://www.postgresql.org/docs/14/datatype-json.html#JSONB-SUBSCRIPTING),
 e.g. a query like `SELECT ('{ "postgres": { "release": 14 
}}'::jsonb)['postgres']['release'];`
now works. This aligns PostgreSQL with commonly recognized syntax for
retrieving information from JSON data. The subscripting framework added to
PostgreSQL 14 can be generally extended to other nested data structures, and is
also applied to the `hstore` data type in this release.

[Range types](https://www.postgresql.org/docs/14/rangetypes.html), also first
released in PostgreSQL 9.2, now have support for noncontiguous ranges through
the introduction of the 
"[multirange](https://www.postgresql.org/docs/14/rangetypes.html#RANGETYPES-BUILTIN)".
A multirange is an ordered list of ranges that are nonoverlapping, which allows
for developers to write simpler queries for dealing with complex sequences of
ranges. The range types native to PostgreSQL (dates, times, numbers) support
multiranges, and other data types can be extended to use multirange support.

### Performance Improvements for Heavy Workloads

PostgreSQL 14 provides a significant throughput boost on workloads that use many
connections, with some benchmarks showing a 2x speedup. This release continues
on the recent improvements the overall management of B-tree indexes by reducing
index bloat on tables with [frequently updated 
indexes](https://www.postgresql.org/docs/14/btree-implementation.html#BTREE-DELETION).

PostgreSQL 14 introduces the ability to [pipeline 
queries](https://www.postgresql.org/docs/14/libpq-pipeline-mode.html)
to a database, which can significantly improve performance over high latency
connections or for workloads with many small write (`INSERT`/`UPDATE`/`DELETE`)
operations. As this is a client-side feature, you can use pipeline mode with any
modern PostgreSQL database so long as you use the version 14 client.

### Enhancements for Distributed Workloads

Distributed PostgreSQL databases stand to benefit from PostgreSQL 14. When using
[logical 
replication](https://www.postgresql.org/docs/current/logical-replication.html),
PostgreSQL can now stream in-progress transactions to subscribers, with
significant performance benefits for applying large transactions on subscribers.
PostgreSQL 14 also adds several other performance enhancements to the logical
decoding system that powers logical replication.

[Foreign data 
wrappers](https://www.postgresql.org/docs/14/sql-createforeigndatawrapper.html),
used to work with federated workloads across PostgreSQL and other databases, can
now leverage query parallelism in PostgreSQL 14. This release implements this
ability in the 
[`postgres_fdw`](https://www.postgresql.org/docs/14/postgres-fdw.html),
the foreign data wrapper that interfaces with other PostgreSQL databases.

In addition to supporting query parallelism, `postgres_fdw` can now also bulk
insert data on foreign tables and import table partitions with the
[`IMPORT FOREIGN 
SCHEMA`](https://www.postgresql.org/docs/14/sql-importforeignsche

Re: Release 14 Schedule

2021-09-20 Thread Jonathan S. Katz
On 9/20/21 2:33 AM, Nikolay Samokhvalov wrote:
> Observability-related improvements are also good and very important for
> the future of DBA operations -- compute_query_id, new pg_stat_**, etc.
> 
> Things like new knob idle_session_timeout and restore_command change not
> requiring a restart will be very noticeable too.

I agree on the observability enhancements (the PR draft gives a bunch of
coverage on this) and the usefulness on the knobs.

I think this also highlights that there are a lot of helpful features in
PostgreSQL 14 -- it may be tough to distill them all down into a list
for the release notes themselves. I think typically we try pick 5-7
features to highlight, and we're at about 10 or so proposed.

On the flip side and going off-script, do we need to select only a few
features in the release notes? We can let the press release provide the
general highlights and use that as a spring board to pick out particular
features.

Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Estimating HugePages Requirements?

2021-09-20 Thread Michael Paquier
On Tue, Sep 21, 2021 at 12:08:22AM +, Bossart, Nathan wrote:
> Should we also initialize the shared memory GUCs in bootstrap and
> single-user mode?  I think I missed this in bd17880.

Why would we need that for the bootstrap mode?

While looking at the patch for shared_memory_size, I have looked at
those code paths to note that some of the runtime GUCs would be set
thanks to the load of the control file, but supporting this case
sounded rather limited to me for --single when it came to shared
memory and huge page estimation and we don't load
shared_preload_libraries in this context either, which could lead to
wrong estimations.  Anyway, I am not going to fight hard if people
would like that for the --single mode, even if it may lead to an
underestimation of the shmem allocated.
--
Michael


signature.asc
Description: PGP signature


Re: logical replication restrictions

2021-09-20 Thread Amit Kapila
On Mon, Sep 20, 2021 at 9:47 PM Marcos Pegoraro  wrote:
>
> One thing is needed and is not solved yet is delayed replication on logical 
> replication. Would be interesting to document it on Restrictions page, right ?
>

What do you mean by delayed replication? Is it that by default we send
the transactions at commit?

-- 
With Regards,
Amit Kapila.




Re: PostgreSQL High Precision Support Extension.

2021-09-20 Thread Thomas Munro
On Tue, Sep 21, 2021 at 1:30 PM A Z  wrote:
> -A library like GMP, written in C, is an appropriate basis to start from and 
> to include, for all OS platforms involved.

Are you aware of Daniele Varrazzo's extension
https://github.com/dvarrazzo/pgmp/ ?  (Never looked into it myself,
but this seems like the sort of thing you might be looking for?)




Re: row filtering for logical replication

2021-09-20 Thread Ajin Cherian
On Tue, Sep 21, 2021 at 12:03 AM Dilip Kumar  wrote:
>
> On Mon, Sep 20, 2021 at 5:37 PM Amit Kapila  wrote:
> > > >
> > >
> > > Adding a patch that strives to do the logic that I described above.
> > > For updates, the row filter is applied on both old_tuple
> > > and new_tuple. This patch assumes that the row filter only uses
> > > columns that are part of the REPLICA IDENTITY. (the current patch-set
> > > only
> > > restricts this for row-filters that are delete only)
> > > The old_tuple only has columns that are part of the old_tuple and have
> > > been changed, which is a problem while applying the row-filter. Since
> > > unchanged REPLICA IDENTITY columns
> > > are not present in the old_tuple, this patch creates a temporary
> > > old_tuple by getting such column values from the new_tuple and then
> > > applies the filter on this hand-created temp old_tuple. The way the
> > > old_tuple is created can be better optimised in future versions.
>
> I understand why this is done, but I have 2 concerns here 1) We are
> having extra deform and copying the field from new to old in case it
> is unchanged replica identity.  2) The same unchanged attribute values
> get qualified in the old tuple as well as in the new tuple.  What
> exactly needs to be done is that the only updated field should be
> validated as part of the old as well as the new tuple, the unchanged
> field does not make sense to have redundant validation.   For that we
> will have to change the filter for the old tuple to just validate the
> attributes which are actually modified and remaining unchanged and new
> values will anyway get validated in the new tuple.
>
But what if the filter expression depends on multiple columns, say (a+b) > 100
where a is unchanged while b is changed. Then we will still need both
columns for applying
the filter even though one is unchanged. Also, I am not aware of any
mechanism by which
we can apply a filter expression on individual attributes. The current
mechanism does it
on a tuple. Do let me know if you have any ideas there?

Even if it were done, there would still be the overhead of deforming the tuple.
I will run some performance tests like Amit suggested and see what the
overhead is and
try to minimise it.

regards,
Ajin Cherian
Fujitsu Australia




Re: Added schema level support for publication.

2021-09-20 Thread Greg Nancarrow
On Fri, Sep 17, 2021 at 10:09 PM vignesh C  wrote:
>
> Attached v29 patch has the fixes for the same.
>

Some minor comments on the v29-0002 patch:

(1)
In get_object_address_publication_schema(), the error message:

+ errmsg("publication tables of schema \"%s\" in publication \"%s\"
does not exist",

isn't grammatically correct. It should probably be:

+ errmsg("publication tables of schema \"%s\" in publication \"%s\" do
not exist",

(2)
getPublicationSchemaInfo() function header.
"string" should be "strings"

BEFORE:
+ * nspname. Both pubname and nspname are palloc'd string which will be freed by
AFTER:
+ * nspname. Both pubname and nspname are palloc'd strings which will
be freed by

(3)
getPublicationSchemaInfo()

In the case of "if (!(*nspname))", the following line should probably
be added before returning false:

   *pubname = NULL;

(4)
GetAllSchemasPublicationRelations() function header

Shouldn't:

+ * Gets the list of all relations published by FOR ALL TABLES IN SCHEMA
+ * publication(s).

actually say:

+ * Gets the list of all relations published by a FOR ALL TABLES IN SCHEMA
+ * publication.

since it is for a specified publication?

(5)
I'm wondering, in CheckObjSchemaNotAlreadyInPublication(), instead of
checking "checkobjtype" each loop iteration, wouldn't it be better to
just use the same for-loop in each IF block?


Regards,
Greg Nancarrow
Fujitsu Australia




Re: PostgreSQL 14 press release draft

2021-09-20 Thread Justin Pryzby
On Mon, Sep 20, 2021 at 10:19:32PM -0400, Jonathan S. Katz wrote:

> PostgreSQL 14 provides a significant throughput boost on workloads that use 
> many
> connections, with some benchmarks showing a 2x speedup. This release continues
> on the recent improvements the overall management of B-tree indexes by 
> reducing
> index bloat on tables with [frequently updated 
> indexes](https://www.postgresql.org/docs/14/btree-implementation.html#BTREE-DELETION).

improvements *in* ?

> [Foreign data 
> wrappers](https://www.postgresql.org/docs/14/sql-createforeigndatawrapper.html),
> used to work with federated workloads across PostgreSQL and other databases, 
> can

It'd be clearer to write "used for working".
"Used to work" sounds like it no longer works.

> PostgreSQL 14 extends its performance gains to its 
> [vacuuming](https://www.postgresql.org/docs/14/routine-vacuuming.html)

to *the* vacuuming system ?

> indexes and now allows autovacuum to analyze partitioned tables and propagate
> information to its parents.

This was reverted last month.

> The choice of compression for PostgreSQL's 
> [TOAST](https://www.postgresql.org/docs/14/storage-toast.html)
> system, which is used to store larger data like blocks of text or geometries,
> can [now be 
> configured](https://www.postgresql.org/docs/14/runtime-config-client.html#GUC-DEFAULT-TOAST-COMPRESSION).

Remove "the choice of" ?

> The [extended 
> systems](https://www.postgresql.org/docs/14/planner-stats.html#PLANNER-STATS-EXTENDED)

s/systems/statistics/

> includes many improvements in PostgreSQL 14, including the ability to apply
> extend statistics on expressions. Additionally,

s/extend/extended/

-- 
Justin




Re: row filtering for logical replication

2021-09-20 Thread Dilip Kumar
On Tue, Sep 21, 2021 at 8:58 AM Ajin Cherian  wrote:
> > I understand why this is done, but I have 2 concerns here 1) We are
> > having extra deform and copying the field from new to old in case it
> > is unchanged replica identity.  2) The same unchanged attribute values
> > get qualified in the old tuple as well as in the new tuple.  What
> > exactly needs to be done is that the only updated field should be
> > validated as part of the old as well as the new tuple, the unchanged
> > field does not make sense to have redundant validation.   For that we
> > will have to change the filter for the old tuple to just validate the
> > attributes which are actually modified and remaining unchanged and new
> > values will anyway get validated in the new tuple.
> >
> But what if the filter expression depends on multiple columns, say (a+b) > 100
> where a is unchanged while b is changed. Then we will still need both
> columns for applying

In such a case, we need to.

> the filter even though one is unchanged. Also, I am not aware of any
> mechanism by which
> we can apply a filter expression on individual attributes. The current
> mechanism does it
> on a tuple. Do let me know if you have any ideas there?

What I suggested is to modify the filter for the old tuple, e.g.
filter is (a > 10 and b < 20 and c+d = 20), now only if a and c are
modified then we can process the expression and we can transform this
filter to (a > 10 and c+d=20).

>
> Even if it were done, there would still be the overhead of deforming the 
> tuple.

Suppose filter is just (a > 10 and b < 20) and only if the a is
updated, and if we are able to modify the filter for the oldtuple to
be just (a>10) then also do we need to deform?  Even if we have to we
can save a lot on avoiding duplicate expression evaluation.

> I will run some performance tests like Amit suggested and see what the
> overhead is and
> try to minimise it.

It is good to know,  I think you must try with some worst-case
scenarios, e.g. we have 10 text column and 1 int column in the REPLICA
IDENTITY and only the int column get updated and all the text column
are not updated, and you have a filter on all the columns.

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2021-09-20 Thread Masahiko Sawada
Sorry for the late reply. I was on vacation.

On Tue, Sep 14, 2021 at 11:27 AM houzj.f...@fujitsu.com
 wrote:
>
> From Thur, Sep 9, 2021 10:33 PM Masahiko Sawada  wrote:
> > Sorry for the late response. I've attached the updated patches that 
> > incorporate
> > all comments unless I missed something. Please review them.
>
> Thanks for the new version patches.
> Here are some comments for the v13-0001 patch.

Thank you for the comments!

>
> 1)
>
> +   pgstat_setheader(&errmsg.m_hdr, 
> PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE);
> +   pgstat_send(&errmsg, len);
> +   errmsg.m_nentries = 0;
> +   }
>
> It seems we can invoke pgstat_setheader once before the loop like the
> following:
>
> +   errmsg.m_nentries = 0;
> +   errmsg.m_subid = subent->subid;
> +   pgstat_setheader(&errmsg.m_hdr, 
> PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE);
>
> 2)
> +   pgstat_setheader(&submsg.m_hdr, 
> PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
> +   pgstat_send(&submsg, len);
>
> Same as 1), we can invoke pgstat_setheader once before the loop like:
> +   submsg.m_nentries = 0;
> +   pgstat_setheader(&submsg.m_hdr, 
> PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
>

But if we do that, we set the header even if there is no message to
send, right? Looking at other similar code in pgstat_vacuum_stat(), we
set the header just before sending the message. So I'd like to leave
them since it's cleaner.

>
> 3)
>
> +/* --
> + * PgStat_MsgSubscriptionErrPurge  Sent by the autovacuum to purge the 
> subscription
> + * 
> errors.
>
> The comments said it's sent by autovacuum, would the manual vacuum also send
> this message ?

Right. Fixed.

>
>
> 4)
> +
> +   pgstat_send(&msg, offsetof(PgStat_MsgSubscriptionErr, m_reset) + 
> sizeof(bool));
> +}
>
> Does it look cleaner that we use the offset of m_relid here like the 
> following ?
>
> pgstat_send(&msg, offsetof(PgStat_MsgSubscriptionErr, m_relid));

Thank you for the suggestion. After more thought, it was a bit odd to
use PgStat_MsgSubscriptionErr to both report and reset the stats by
sending the part or the full struct. So in the latest version, I've
added a new message struct type to reset the subscription error
statistics.

I've attached the updated version patches. Please review them.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v14-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
Description: Binary data


v14-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
Description: Binary data


v14-0001-Add-pg_stat_subscription_errors-statistics-view.patch
Description: Binary data


Re: Deduplicate code updating ControleFile's DBState.

2021-09-20 Thread Amul Sul
On Tue, Sep 21, 2021 at 4:44 AM Bossart, Nathan  wrote:
>
> On 9/19/21, 11:07 PM, "Amul Sul"  wrote:
> > +1, since skipping ControlFileLock for the DBState update is not the
> > right thing, let's have two different functions as per your suggestion
> > -- did the same in the attached version, thanks.
>
> I see that the attached patch reorders the call to UpdateControlFile()
> to before SharedRecoveryState is updated, which seems to go against
> the intent of ebdf5bf.  I'm not sure if this really creates that much
> of a problem in practice, but it is a behavior change.
>

I had to have a thought on the same and didn't see any problem and
test suits also fine but that doesn't mean the change is perfect, the
issue might be hard to reproduce if there are any. Let's see what
others think and for now, to be safe I have reverted this change.

> Also, I still think it might be better to include this patch in the
> patch set where the exported function is needed.  On its own, this is
> a very small amount of refactoring that might not be totally
> necessary.
>

Well, the other patch set is quite big and complex. In my experience,
usually, people avoid downloading big sets due to lack of time and
such small refactoring patches usually don't get much detailed
attention.

Also, even though this patch is small, it is independent and has
nothing to do with other patch set whether it gets committed or not.
Still, proposing some improvement might not be a big one but nice to
have.

> > I have one additional concern about the way we update the control
> > file, at every place where doing the update, we need to set control
> > file update time explicitly, why can't the time update line be moved
> > to UpdateControlFile() so that time gets automatically updated?
>
> I see a few places where UpdateControlFile() is called without
> updating ControlFile->time.  I haven't found any obvious reason for
> that, so perhaps it would be okay to move it to update_controlfile().
>

Ok, thanks, did the same in the attached version.

Regards,
Amul Sul
From c78d9abcd4a2856446d4afac240c2fcbdc0a315f Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Tue, 21 Sep 2021 00:52:55 -0400
Subject: [PATCH v3] Deduplicate code updating ControleFile's DBState and
 timestamp

---
 src/backend/access/transam/xlog.c | 40 +++
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a749da..2d420334fbf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4942,15 +4942,28 @@ ReadControlFile(void)
 }
 
 /*
- * Utility wrapper to update the control file.  Note that the control
- * file gets flushed.
+ * Utility wrapper to update the control file with update timestamp.  Note that
+ * the control file gets flushed.
  */
 void
 UpdateControlFile(void)
 {
+	ControlFile->time = (pg_time_t) time(NULL);
 	update_controlfile(DataDir, ControlFile, true);
 }
 
+/*
+ * Set ControlFile's database state
+ */
+static void
+SetControlFileDBState(DBState state)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->state = state;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
 /*
  * Returns the unique system identifier from control file.
  */
@@ -7149,7 +7162,7 @@ StartupXLOG(void)
 ControlFile->backupEndPoint = ControlFile->minRecoveryPoint;
 			}
 		}
-		ControlFile->time = (pg_time_t) time(NULL);
+
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
 		UpdateControlFile();
 
@@ -8086,7 +8099,6 @@ StartupXLOG(void)
 	 */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 	ControlFile->state = DB_IN_PRODUCTION;
-	ControlFile->time = (pg_time_t) time(NULL);
 
 	SpinLockAcquire(&XLogCtl->info_lck);
 	XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
@@ -8952,13 +8964,7 @@ CreateCheckPoint(int flags)
 	START_CRIT_SECTION();
 
 	if (shutdown)
-	{
-		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-		ControlFile->state = DB_SHUTDOWNING;
-		ControlFile->time = (pg_time_t) time(NULL);
-		UpdateControlFile();
-		LWLockRelease(ControlFileLock);
-	}
+		SetControlFileDBState(DB_SHUTDOWNING);
 
 	/*
 	 * Let smgr prepare for checkpoint; this has to happen before we determine
@@ -9226,7 +9232,6 @@ CreateCheckPoint(int flags)
 		ControlFile->state = DB_SHUTDOWNED;
 	ControlFile->checkPoint = ProcLastRecPtr;
 	ControlFile->checkPointCopy = checkPoint;
-	ControlFile->time = (pg_time_t) time(NULL);
 	/* crash recovery should always recover to the end of WAL */
 	ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
 	ControlFile->minRecoveryPointTLI = 0;
@@ -9354,7 +9359,6 @@ CreateEndOfRecoveryRecord(void)
 	 * changes to this point.
 	 */
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-	ControlFile->time = (pg_time_t) time(NULL);
 	ControlFile->minRecoveryPoint = recptr;
 	ControlFile->minRecoveryPointTLI = ThisTimeLineID;
 	UpdateControlFile();
@@ -9507,13 +9511,8 @@ CreateRestartPoint(int flags

Re: row filtering for logical replication

2021-09-20 Thread Amit Kapila
On Tue, Sep 21, 2021 at 9:54 AM Dilip Kumar  wrote:
>
> On Tue, Sep 21, 2021 at 8:58 AM Ajin Cherian  wrote:
> > > I understand why this is done, but I have 2 concerns here 1) We are
> > > having extra deform and copying the field from new to old in case it
> > > is unchanged replica identity.  2) The same unchanged attribute values
> > > get qualified in the old tuple as well as in the new tuple.  What
> > > exactly needs to be done is that the only updated field should be
> > > validated as part of the old as well as the new tuple, the unchanged
> > > field does not make sense to have redundant validation.   For that we
> > > will have to change the filter for the old tuple to just validate the
> > > attributes which are actually modified and remaining unchanged and new
> > > values will anyway get validated in the new tuple.
> > >
> > But what if the filter expression depends on multiple columns, say (a+b) > 
> > 100
> > where a is unchanged while b is changed. Then we will still need both
> > columns for applying
>
> In such a case, we need to.
>
> > the filter even though one is unchanged. Also, I am not aware of any
> > mechanism by which
> > we can apply a filter expression on individual attributes. The current
> > mechanism does it
> > on a tuple. Do let me know if you have any ideas there?
>
> What I suggested is to modify the filter for the old tuple, e.g.
> filter is (a > 10 and b < 20 and c+d = 20), now only if a and c are
> modified then we can process the expression and we can transform this
> filter to (a > 10 and c+d=20).
>

If you have only a and c in the old tuple, how will it evaluate
expression c + d? I think the point is if for some expression some
values are in old tuple and others are in new then the idea proposed
in the patch seems sane. Moreover, I think in your idea for each tuple
we might need to build a new expression and sometimes twice that will
beat the purpose of cache we have kept in the patch and I am not sure
if it is less costly.

See another example where splitting filter might not give desired results:

Say filter expression: (a = 10 and b = 20 and c = 30)

Now, old_tuple has values for columns a and c and say values are 10
and 30. So, the old_tuple will match the filter if we split it as per
your suggestion. Now say new_tuple has values (a = 5, b = 15, c = 25).
In such a situation dividing the filter will give us the result that
the old_tuple is matching but new tuple is not matching which seems
incorrect. I think dividing filter conditions among old and new tuples
might not retain its sanctity.

> >
> > Even if it were done, there would still be the overhead of deforming the 
> > tuple.
>
> Suppose filter is just (a > 10 and b < 20) and only if the a is
> updated, and if we are able to modify the filter for the oldtuple to
> be just (a>10) then also do we need to deform?
>

Without deforming, how will you determine which columns are part of
the old tuple?

-- 
With Regards,
Amit Kapila.




Re: relation OID in ReorderBufferToastReplace error message

2021-09-20 Thread Amit Kapila
On Fri, Sep 17, 2021 at 10:53 AM Amit Kapila  wrote:
>
> I don't think it is a bad idea to print additional information as you
> are suggesting but why only for this error? It could be useful to
> investigate any other error we get during decoding. I think normally
> we add such additional information via error_context. We have recently
> added/enhanced it for apply-workers, see commit [1].
>
> I think here we should just print the relation name in the error
> message you pointed out and then work on adding additional information
> via error context as a separate patch. What do you think?
>

Attached please find the patch which just modifies the current error
message as proposed by you. I am planning to commit it in a day or two
unless there are comments or any other suggestions.

-- 
With Regards,
Amit Kapila.


v1-0001-Add-parent-table-name-in-an-error-in-reorderbuffe.patch
Description: Binary data


Re: row filtering for logical replication

2021-09-20 Thread Dilip Kumar
On Tue, Sep 21, 2021 at 10:41 AM Amit Kapila  wrote:
>

> If you have only a and c in the old tuple, how will it evaluate
> expression c + d?

Well, what I told is that if we have such dependency then we will have
to copy that field to the old tuple, e.g. if we convert the filter for
the old tuple from  (a > 10 and b < 20 and c+d = 20) to (a > 10 and
c+d=20), then we will not have to copy 'b' to the old tuple but we
still have to copy 'd' because there is a dependency.

I think the point is if for some expression some
> values are in old tuple and others are in new then the idea proposed
> in the patch seems sane. Moreover, I think in your idea for each tuple
> we might need to build a new expression and sometimes twice that will
> beat the purpose of cache we have kept in the patch and I am not sure
> if it is less costly.

Basically, expression initialization should happen only once in most
cases so with my suggestion you might have to do it twice.  But the
overhead of extra expression evaluation is far less than doing
duplicate evaluation because that will happen for sending each update
operation right?

> See another example where splitting filter might not give desired results:
>
> Say filter expression: (a = 10 and b = 20 and c = 30)
>
> Now, old_tuple has values for columns a and c and say values are 10
> and 30. So, the old_tuple will match the filter if we split it as per
> your suggestion. Now say new_tuple has values (a = 5, b = 15, c = 25).
> In such a situation dividing the filter will give us the result that
> the old_tuple is matching but new tuple is not matching which seems
> incorrect. I think dividing filter conditions among old and new tuples
> might not retain its sanctity.

Yeah that is a good example to apply a duplicate filter, basically
some filters might not even get evaluated on new tuples as the above
example and if we have removed such expression on the other tuple we
might break something.  Maybe for now this suggest that we might not
be able to avoid the duplicate execution of the expression

> > >
> > > Even if it were done, there would still be the overhead of deforming the 
> > > tuple.
> >
> > Suppose filter is just (a > 10 and b < 20) and only if the a is
> > updated, and if we are able to modify the filter for the oldtuple to
> > be just (a>10) then also do we need to deform?
> >
>
> Without deforming, how will you determine which columns are part of
> the old tuple?

Okay, then we might have to deform, but at least are we ensuring that
once we have deform the tuple for the expression evaluation then we
are not doing that again while sending the tuple?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: Allow escape in application_name

2021-09-20 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Horiguchi-san,

Based on your advice, I made a patch
that communize two parsing functions into one.
new internal function parse_format_string() was added.
(This name may be too generic...)
log_line_prefix() and parse_pgfdw_appname() become just the wrapper function.
My prerpimary checking was passed for server and postgres_fdw,
but could you review that?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v17_0002_allow_escapes.patch
Description: v17_0002_allow_escapes.patch


Re: Added schema level support for publication.

2021-09-20 Thread vignesh C
On Tue, Sep 21, 2021 at 9:03 AM Greg Nancarrow  wrote:
>
> On Fri, Sep 17, 2021 at 10:09 PM vignesh C  wrote:
> >
> > Attached v29 patch has the fixes for the same.
> >
>
> Some minor comments on the v29-0002 patch:
>
> (1)
> In get_object_address_publication_schema(), the error message:
>
> + errmsg("publication tables of schema \"%s\" in publication \"%s\"
> does not exist",
>
> isn't grammatically correct. It should probably be:
>
> + errmsg("publication tables of schema \"%s\" in publication \"%s\" do
> not exist",

"does not exist" is used across the file. Should we keep it like that
to maintain consistency. Thoughts?

> (2)
> getPublicationSchemaInfo() function header.
> "string" should be "strings"
>
> BEFORE:
> + * nspname. Both pubname and nspname are palloc'd string which will be freed 
> by
> AFTER:
> + * nspname. Both pubname and nspname are palloc'd strings which will
> be freed by

I will change it in the next version.

> (3)
> getPublicationSchemaInfo()
>
> In the case of "if (!(*nspname))", the following line should probably
> be added before returning false:
>
>*pubname = NULL;

In case of failure we return false and don't access it. I felt we
could keep it as it is. Thoughts?

> (4)
> GetAllSchemasPublicationRelations() function header
>
> Shouldn't:
>
> + * Gets the list of all relations published by FOR ALL TABLES IN SCHEMA
> + * publication(s).
>
> actually say:
>
> + * Gets the list of all relations published by a FOR ALL TABLES IN SCHEMA
> + * publication.
>
> since it is for a specified publication?

I will change it in the next version.

> (5)
> I'm wondering, in CheckObjSchemaNotAlreadyInPublication(), instead of
> checking "checkobjtype" each loop iteration, wouldn't it be better to
> just use the same for-loop in each IF block?

I will be changing it to:
static void
CheckObjSchemaNotAlreadyInPublication(List *rels, List *schemaidlist,
PublicationObjSpecType checkobjtype)
{
  ListCell   *lc;

  foreach(lc, rels)
  {
Relation  rel = (Relation) lfirst(lc);
Oid relSchemaId = RelationGetNamespace(rel);

if (list_member_oid(schemaidlist, relSchemaId))
{
  if (checkobjtype == PUBLICATIONOBJ_REL_IN_SCHEMA)
ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("cannot add schema \"%s\" to publication",
 get_namespace_name(relSchemaId)),
errdetail("Table \"%s\" in schema \"%s\" is already part
of the publication, adding the same schema is not supported.",
  RelationGetRelationName(rel),
  get_namespace_name(relSchemaId)));
  else if (checkobjtype == PUBLICATIONOBJ_TABLE)
ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("cannot add relation \"%s.%s\" to publication",
 get_namespace_name(relSchemaId),
 RelationGetRelationName(rel)),
errdetail("Table's schema \"%s\" is already part of the
publication.",
  get_namespace_name(relSchemaId)));
}
  }
}
After the change checkobjtype will be checked only once in case of error.

Regards,
Vignesh




Re: Logical replication timeout problem

2021-09-20 Thread Amit Kapila
On Mon, Sep 20, 2021 at 9:43 PM Fabrice Chapuis  wrote:
>
> By passing the autovacuum parameter to off the problem did not occur right 
> after loading the table as in our previous tests. However, the timeout 
> occurred later. We have seen the accumulation of .snap files for several Gb.
>
> ...
> -rw---. 1 postgres postgres 16791226 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F500.snap
> -rw---. 1 postgres postgres 16973268 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F600.snap
> -rw---. 1 postgres postgres 16790984 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F700.snap
> -rw---. 1 postgres postgres 16988112 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F800.snap
> -rw---. 1 postgres postgres 16864593 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F900.snap
> -rw---. 1 postgres postgres 16902167 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-FA00.snap
> -rw---. 1 postgres postgres 16914638 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-FB00.snap
> -rw---. 1 postgres postgres 16782471 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-FC00.snap
> -rw---. 1 postgres postgres 16963667 Sep 20 15:27 
> xid-1238444701-lsn-2D2B-FD00.snap
> ...
>

Okay, still not sure why the publisher is not sending keep_alive
messages in between spilling such a big transaction. If you see, we
have logic in WalSndLoop() wherein each time after sending data we
check whether we need to send a keep-alive message via function
WalSndKeepaliveIfNecessary(). I think to debug this problem further
you need to add some logs in function WalSndKeepaliveIfNecessary() to
see why it is not sending keep_alive messages when all these files are
being created.

Did you change the default value of
wal_sender_timeout/wal_receiver_timeout? What is the value of those
variables in your environment? Did you see the message "terminating
walsender process due to replication timeout" in your server logs?

-- 
With Regards,
Amit Kapila.




Re: Release 14 Schedule

2021-09-20 Thread Gavin Flower



On 21/09/21 14:23, Jonathan S. Katz wrote:

On 9/20/21 2:33 AM, Nikolay Samokhvalov wrote:

Observability-related improvements are also good and very important for
the future of DBA operations -- compute_query_id, new pg_stat_**, etc.

Things like new knob idle_session_timeout and restore_command change not
requiring a restart will be very noticeable too.

I agree on the observability enhancements (the PR draft gives a bunch of
coverage on this) and the usefulness on the knobs.

I think this also highlights that there are a lot of helpful features in
PostgreSQL 14 -- it may be tough to distill them all down into a list
for the release notes themselves. I think typically we try pick 5-7
features to highlight, and we're at about 10 or so proposed.

On the flip side and going off-script, do we need to select only a few
features in the release notes? We can let the press release provide the
general highlights and use that as a spring board to pick out particular
features.

Thanks,

Jonathan

I suggest that if there are 7 or more, then possibly you should group 
them under 2 or 3 headings.


That way it will not look quite so intimidating, and people have a 
framework to give them perspective.  Also makes it easier for people to 
focus on the highlights that they might consider the most important to 
themselves.



Cheers,
Gavin




Re: proposal: possibility to read dumped table's name from file

2021-09-20 Thread Pavel Stehule
po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson 
napsal:

> > Will do a closer review on the patch shortly.
>
> Had a read through, and tested, the latest posted version today:
>
> +Read objects filters from the specified file. Specify "-" to read from
> +stdin. Lines of this file must have the following format:
> I think this should be - and STDIN
> to
> match the rest of the docs.
>
>
> +   
> +With the following filter file, the dump would include table
> +mytable1 and data from foreign tables of
> +some_foreign_server foreign server, but
> exclude data
> +from table mytable2.
> +
> +include table mytable1
> +include foreign_data some_foreign_server
> +exclude table mytable2
> +
> +   
> This example is highlighting the issue I've previously raised with the
> UX/doc
> of this feature.  The "exclude table mytable2" is totally pointless in the
> above since the exact match of "mytable1" will remove all other objects.
> What
> we should be doing instead is use the pattern matching aspect along the
> lines
> of the below:
>
> include table mytable*
> exclude table mytable2
>
> +The --filter option works just like the other
> +options to include or exclude tables, schemas, table data, or
> foreign
> This should refer to the actual options by name to make it clear which we
> are
> talking about.
>

fixed


>
> +   printf(_("  --filter=FILENAMEdump objects and data
> based on the filter expressions\n"
> +"   from the filter
> file\n"));
> Before we settle on --filter I think we need to conclude whether this file
> is
> intended to be included from a config file, or used on it's own.  If we
> gow tih
> the former then we might not want a separate option for just --filter.
>

I prefer to separate two files. Although there is some intersection, I
think it is good to have two simple separate files for two really different
tasks.
It does filtering, and it should be controlled by option "--filter". When
the implementation will be changed, then this option can be changed too.
Filtering is just a pg_dump related feature. Revision of client application
configuration is a much more generic task, and if we mix it to one, we can
be
in a trap. It can be hard to find one good format for large script
generated content, and possibly hand written structured content. For
practical
reasons it can be good to have two files too. Filters and configurations
can have different life cycles.


>
> +if (filter_is_keyword(keyword, size, "include"))
> I would prefer if this function call was replaced by just the
> pg_strcasecmp()
> call in filter_is_keyword() and the strlen optimization there removed.
> The is
> not a hot-path, we can afford the string comparison in case of errors.
> Having
> the string comparison done inline here will improve readability saving the
> reading from jumping to another function to see what it does.
>

I agree that this is not a hot-path, just I don't feel well if I need to
make a zero end string just for comparison pg_strcasecmp. Current design
reduces malloc/free cycles. It is used in more places, when Postgres parses
strings - SQL parser, plpgsql parser. I am not sure about the benefits and
costs - pg_strcasecmp can be more readable, but for any keyword I have to
call pstrdup and pfree. Is it necessary? My opinion in this part is not too
strong - it is a minor issue, maybe I have a little bit different feelings
about benefits and costs in this specific case, and if you really think the
benefits of rewriting are higher, I'll do it.


>
> +initStringInfo(&line);
> Why is this using a StringInfo rather than a PQExpBuffer as the rest of
> pg_dump
> does?
>

The StringInfo is used because I use the pg_get_line_buf function, and this
function uses this API.


>
> +typedef struct
> I think these should be at the top of the file with the other typedefs.
>
>
done



>
> When testing strange object names, I was unable to express this name in
> the filter file:
>
> $ ./bin/psql
> psql (15devel)
> Type "help" for help.
>
> danielg=# create table "
> danielg"# t
> danielg"# t
> danielg"# " (a integer);
> CREATE TABLE
> danielg=# select relname from pg_class order by oid desc limit 1;
>  relname
> -
> +
>  t  +
>  t  +
>
> (1 row)
>
>
Good catch - I had badly placed pg_strip_crlf function, fixed and regress
tests enhanced

Please check assigned patch

Regards

Pavel





> --
> Daniel Gustafsson   https://vmware.com/
>
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7682226b99..d692f4230e 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -789,6 +789,55 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Read objects filters from the specified file. Specify - to read from
+STDIN. Lines of this file must have the