RE: make MaxBackends available in _PG_init

2021-08-15 Thread wangsh.f...@fujitsu.com
Hi,

I noticed that in v3-0001-Disallow-external-access-to-MaxBackends.patch, there 
are some modifications like:

> - for (int i = 0; i <= MaxBackends; i++)
> + for (int i = 0; i <= GetMaxBackends(); i++)

I don't think calling function GetMaxBackends() in the for loop is a good idea. 
 
How about use a temp variable to save the return value of function 
GetMaxBackends() ?

Regards,
Shenhao Wang


Get table total page quantity and cached page quantity

2021-08-15 Thread otar shavadze
created pg_buffercache extension:
CREATE EXTENSION pg_buffercache;

restarted server and updated statistic for table :
VACUUM ANALYZE public.my_table;

then just tried to cache table in buffer:
SELECT * FROM public.my_table;

and then tried to get table total page quantity and how much pages are
cached in buffer for this table:
SELECT
(SELECT relpages FROM pg_class where oid = 'public.my_table'::regclass::oid
) AS table_pages_quantity_total,
(SELECT COUNT(DISTINCT relblocknumber) FROM pg_buffercache WHERE
relfilenode = (
SELECT relfilenode FROM pg_class WHERE oid =
'public.my_table'::regclass::oid -- (SELECT rel_oid FROM my_cte)
) ) AS table_pages_quantity_in_cache;


 This shows that table have only one page, while second column shows 3
unique pages in buffer cache. Seems I'm measuring those numbers
incorrectly(?) can you please help, which column is incorrect (or may be
both) ?


Re: CI/windows docker vs "am a service" autodetection on windows

2021-08-15 Thread Ranier Vilela
Em sex., 13 de ago. de 2021 às 10:03, Andres Freund 
escreveu:

> Hi,
>
> Magnus, Michael, Anyone - I'd appreciate a look.
>
> On 2021-03-05 12:55:37 -0800, Andres Freund wrote:
> > > After fighting with a windows VM for a bit (ugh), it turns out that
> yes,
> > > there is stderr, but that fileno(stderr) returns -2, and
> > > GetStdHandle(STD_ERROR_HANDLE) returns NULL (not INVALID_HANDLE_VALUE).
> > >
> > > The complexity however is that while that's true for pg_ctl within
> > > pgwin32_ServiceMain:
> > > checking what stderr=7FF8687DFCB0 is (handle: 0, fileno=-2)
> > > but not for postmaster or backends
> > > WARNING: 01000: checking what stderr=7FF880F5FCB0 is (handle: 92,
> fileno=2)
> > >
> > > which makes sense in a way, because we don't tell CreateProcessAsUser()
> > > that it should pass stdin/out/err down (which then seems to magically
> > > get access to the "topmost" console applications output - damn, this
> > > stuff is weird).
> >
> > That part is not too hard to address - it seems we only need to do that
> > in pg_ctl pgwin32_doRunAsService(). It seems that the
> > stdin/stderr/stdout being set to invalid will then be passed down to
> > postmaster children.
> >
> > https://docs.microsoft.com/en-us/windows/console/getstdhandle
> > "If an application does not have associated standard handles, such as a
> > service running on an interactive desktop, and has not redirected them,
> > the return value is NULL."
> >
> > There does seem to be some difference between what services get as std*
> > - GetStdHandle() returns NULL, and what explicitly passing down invalid
> > handles to postmaster does - GetStdHandle() returns
> > INVALID_HANDLE_VALUE. But passing down NULL rather than
> > INVALID_HANDLE_VALUE to postmaster seems to lead to postmaster
> > re-opening console buffers.
> >
> > Patch attached.
>
> > I'd like to commit something to address this issue to master soon - it
> > allows us to run a lot more tests in cirrus-ci... But probably not
> > backpatch it [yet] - there've not really been field complaints, and who
> > knows if there end up being some unintentional side-effects...
>
> Because it'd allow us to run more tests as part of cfbot and other CI
> efforts, I'd like to push this forward. So I'm planning to commit this
> to master soon-ish, unless somebody wants to take this over? I'm really
> not a windows person...
>
Hi Andres,

I found this function on the web, from OpenSSL, but I haven't tested it.
I think that there is one more way to test if a service is running
(SECURITY_INTERACTIVE_RID).

Can you test on a Windows VM?
If this works I can elaborate a bit.

Attached.

regards,
Ranier Vilela
int
pgwin32_is_service(void)
{
static int  _is_service = -1;
HANDLE hProcessToken = NULL;
DWORD groupLength = 64;
PTOKEN_GROUPS groupInfo;
SID_IDENTIFIER_AUTHORITY siaNt = SECURITY_NT_AUTHORITY;
PSID InteractiveSid = NULL;
PSID ServiceSid = NULL;
DWORD i;

/* Only check the first time */
if (_is_service != -1)
return _is_service;

_is_service = 0;

groupInfo = (PTOKEN_GROUPS) LocalAlloc(0, groupLength);
if (groupInfo == NULL)
{
fprintf(stderr, "could not get SID for local system account\n");
goto clean;
}

if (!OpenProcessToken(GetCurrentProcess(), TOKEN_QUERY, &hProcessToken))
{
fprintf(stderr, "could not get SID for local system account\n");
goto clean;
}

if (!GetTokenInformation(hProcessToken, TokenGroups, groupInfo,
groupLength, &groupLength))
{
if (GetLastError() != ERROR_INSUFFICIENT_BUFFER)
{
fprintf(stderr, "could not get SID for local system 
account\n");
goto clean;
}

LocalFree(groupInfo);
groupInfo = NULL;

groupInfo = (PTOKEN_GROUPS) LocalAlloc(0, groupLength);
if (groupInfo == NULL)
{
fprintf(stderr, "could not get SID for local system 
account\n");
goto clean;
}

if (!GetTokenInformation(hProcessToken, TokenGroups, groupInfo,
groupLength, &groupLength))
{
fprintf(stderr, "could not get SID for local system 
account\n");
goto clean;
}
}

//
//  We now know the groups associated with this token.  We want to look to 
see if
//  the interactive group is active in the token, and if so, we know that
//  this is an interactive process.
//
//  We also look for the "service" SID, and if it's present, we know we're 
a service.
//
//  The service SID will be present iff the service is running in a
//  user account (and was invoked by the service controller).
//

if (!AllocateAndInitializeSid(&siaNt, 1, SECURITY_INTERACTIVE_RID, 0, 0,
0, 0, 0, 0, 0, &InteractiveSid))
   

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-08-15 Thread Andy Fan
Hi:

I have finished the parts for baserel, joinrel, subquery, distinctrel. I think
the hardest ones have been verified.  Here is the design overview.

1. Use EC instead of expr to cover more UniqueKey cases.
2. Redesign the UniqueKey as below:

@@ -246,6 +246,7 @@ struct PlannerInfo

List   *eq_classes; /* list of active EquivalenceClasses */
+ List   *unique_exprs; /* List of unique expr */

  bool ec_merging_done; /* set true once ECs are canonical */

+typedef struct UniqueKey
+{
+ NodeTag type;
+ Bitmapset *unique_expr_indexes;
+ bool multi_nulls;
+} UniqueKey;
+

PlannerInfo.unique_exprs is a List of unique exprs.  Unique Exprs is a set of
EquivalenceClass. for example:

CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT,  pk INT primary key);
CREATE UNIQUE INDEX ON t1(a, b);

SELECT DISTINCT * FROM T1 WHERE a = c;

Then we would have PlannerInfo.unique_exprs as below
[
[EC(a, c), EC(b)],
[EC(pk)]
]

RelOptInfo(t1) would have 2 UniqueKeys.
UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]

The design will benefit many table joins cases. For instance a 10- tables join,
each table has a primary key (a, b).  Then we would have a UniqueKey like
this.

JoinRel{1,2,3,4} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
JoinRel{1,2,3,4,5} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a t5.b}

When more tables are joined, the list would be longer and longer, build the list
consumes both CPU cycles and memory.

With the method as above, we can store it as:

root->unique_exprs =  /* All the UniqueKey is stored once */
[
[t1.a, t1.b], -- EC is ignored in document.
[t2.a, t2.b],
[t3.a, t3.b],
[t4.a, t4.b],
[t5.a, t5.b],
[t6.a, t6.b],
[t7.a, t7.b],
[t8.a, t8.b],
[t9.a, t9.b],
[t10.a, t10.b],
]

JoinRel{1,2,3,4} -  Bitmapset{0,1,2,3} -- one bitmapword.
JoinRel{1,2,3,4,5} -  Bitmapset{0,1,2,3,4} -- one bitmapword.

The member in the bitmap is the index of root->unique_exprs rather than
root->eq_classes because we need to store the SingleRow case in
root->unqiue_exprs as well.

3. Define a new SingleRow node and use it in joinrel as well.

+typedef struct SingleRow
+{
+ NodeTag type;
+ Index relid;
+} SingleRow;

SELECT * FROM t1, t2 WHERE t2.pk = 1;

root->unique_exprs
[
[t1.a, t1.b],
SingleRow{relid=2}
]

JoinRel{t1} - Bitmapset{0}
JoinRel{t2} - Bitmapset{1}

JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to dedicated
exprs.

4. Interesting UniqueKey to remove the Useless UniqueKey as soon as possible.

The overall idea is similar with PathKey, I distinguish the UniqueKey for
distinct purpose and useful_for_merging purpose.

SELECT DISTINCT pk FROM  t; -- OK, maintain it all the time, just like
root->query_pathkey.

SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK is
use before t1 join t2, but not useful after it.

Currently the known issue I didn't pass the "interesting UniqueKey" info to
subquery well [2], I'd like to talk more about this when we discuss about
UnqiueKey on subquery part.


5. relation_is_distinct_for

Now I design the function as

+ bool
+ relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List
  *distinct_pathkey)

It is "List *distinct_pathkey", rather than "List *exprs". The reason pathkey
has EC in it, and all the UniqueKey has EC as well. if so, the subset-like
checking is very effective.  As for the distinct/group as no-op case, we have
pathkey all the time. The only drawback of it is some clauses are not-sortable,
in this case, the root->distinct_pathkey and root->group_pathkey is not
set. However it should be rare by practice, so I ignore this part for
now. After all, I can have a relation_is_disticnt_for version for Exprs. I just
not implemented it so far.

6. EC overhead in UnqiueKey & UNION case.

Until now I didn't create any new EC for the UniqueKey case, I just used the
existing ones. However I can't handle the case like

SELECT a, b FROM t1
UNION
SELECT x, y FROM t2;

In this case, there is no EC created with existing code. and I don't want to
create them for the UniqueKey case as well.  so my plan is just not to handle
the case for UNION.

Since we need some big effort from the reviewer, I split the patch into many
smaller chunks.

Patch 1 / Patch 2:  I just split some code which UniqueKey uses but not very
interrelated. Splitting them out to reduce the core patch size.
Patch 3:  still the notnull stuff.  This one doesn't play a big role overall,
even if the design is changed at last, we can just modify some small stuff. IMO,
I don't think it is a blocker issue to move on.
Patch 4:  Support the UniqueKey for baserel.
Patch 5: Support the UniqueKey for joinrel.
Patch 6: Support the UniqueKey for some upper relation, like distinctrel,
groupby rel.

I'd suggest moving on like this:
1. Have an overall review to see if any outstanding issues the patch has.
2. If not, we can review and commit patch 1 & patch 2 to reduce the patch size.
3. Decide which method t

Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619

2021-08-15 Thread Justin Pryzby
On Sun, May 16, 2021 at 04:23:02PM -0400, Tom Lane wrote:
> 1. Fix FullXidRelativeTo to be a little less trusting.  It'd
> probably be sane to make it return FirstNormalTransactionId
> when it'd otherwise produce a wrapped-around FullXid, but is
> there any situation where we'd want it to throw an error instead?
> 
> 2. Change pg_resetwal to not do the above.  It's not entirely
> apparent to me what business it has trying to force
> autovacuum-for-wraparound anyway, but if it does need to do that,
> can we devise a less klugy method?
> 
> It also seems like some assertions in procarray.c would be a
> good idea.  With the attached patch, we get through core
> regression just fine, but the pg_upgrade test fails immediately
> after the "Resetting WAL archives" step.

#2 is done as of 74cf7d46a.

Is there a plan to include Tom's procarray assertions ?

-- 
Justin




Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-08-15 Thread Zhihong Yu
On Sun, Aug 15, 2021 at 7:33 AM Andy Fan  wrote:

> Hi:
>
> I have finished the parts for baserel, joinrel, subquery, distinctrel. I
> think
> the hardest ones have been verified.  Here is the design overview.
>
> 1. Use EC instead of expr to cover more UniqueKey cases.
> 2. Redesign the UniqueKey as below:
>
> @@ -246,6 +246,7 @@ struct PlannerInfo
>
> List   *eq_classes; /* list of active EquivalenceClasses */
> + List   *unique_exprs; /* List of unique expr */
>
>   bool ec_merging_done; /* set true once ECs are canonical */
>
> +typedef struct UniqueKey
> +{
> + NodeTag type;
> + Bitmapset *unique_expr_indexes;
> + bool multi_nulls;
> +} UniqueKey;
> +
>
> PlannerInfo.unique_exprs is a List of unique exprs.  Unique Exprs is a set
> of
> EquivalenceClass. for example:
>
> CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT,  pk INT primary
> key);
> CREATE UNIQUE INDEX ON t1(a, b);
>
> SELECT DISTINCT * FROM T1 WHERE a = c;
>
> Then we would have PlannerInfo.unique_exprs as below
> [
> [EC(a, c), EC(b)],
> [EC(pk)]
> ]
>
> RelOptInfo(t1) would have 2 UniqueKeys.
> UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
> UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]
>
> The design will benefit many table joins cases. For instance a 10- tables
> join,
> each table has a primary key (a, b).  Then we would have a UniqueKey like
> this.
>
> JoinRel{1,2,3,4} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
> JoinRel{1,2,3,4,5} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a
> t5.b}
>
> When more tables are joined, the list would be longer and longer, build
> the list
> consumes both CPU cycles and memory.
>
> With the method as above, we can store it as:
>
> root->unique_exprs =  /* All the UniqueKey is stored once */
> [
> [t1.a, t1.b], -- EC is ignored in document.
> [t2.a, t2.b],
> [t3.a, t3.b],
> [t4.a, t4.b],
> [t5.a, t5.b],
> [t6.a, t6.b],
> [t7.a, t7.b],
> [t8.a, t8.b],
> [t9.a, t9.b],
> [t10.a, t10.b],
> ]
>
> JoinRel{1,2,3,4} -  Bitmapset{0,1,2,3} -- one bitmapword.
> JoinRel{1,2,3,4,5} -  Bitmapset{0,1,2,3,4} -- one bitmapword.
>
> The member in the bitmap is the index of root->unique_exprs rather than
> root->eq_classes because we need to store the SingleRow case in
> root->unqiue_exprs as well.
>
> 3. Define a new SingleRow node and use it in joinrel as well.
>
> +typedef struct SingleRow
> +{
> + NodeTag type;
> + Index relid;
> +} SingleRow;
>
> SELECT * FROM t1, t2 WHERE t2.pk = 1;
>
> root->unique_exprs
> [
> [t1.a, t1.b],
> SingleRow{relid=2}
> ]
>
> JoinRel{t1} - Bitmapset{0}
> JoinRel{t2} - Bitmapset{1}
>
> JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to
> dedicated
> exprs.
>
> 4. Interesting UniqueKey to remove the Useless UniqueKey as soon as
> possible.
>
> The overall idea is similar with PathKey, I distinguish the UniqueKey for
> distinct purpose and useful_for_merging purpose.
>
> SELECT DISTINCT pk FROM  t; -- OK, maintain it all the time, just like
> root->query_pathkey.
>
> SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK
> is
> use before t1 join t2, but not useful after it.
>
> Currently the known issue I didn't pass the "interesting UniqueKey" info to
> subquery well [2], I'd like to talk more about this when we discuss about
> UnqiueKey on subquery part.
>
>
> 5. relation_is_distinct_for
>
> Now I design the function as
>
> + bool
> + relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List
>   *distinct_pathkey)
>
> It is "List *distinct_pathkey", rather than "List *exprs". The reason
> pathkey
> has EC in it, and all the UniqueKey has EC as well. if so, the subset-like
> checking is very effective.  As for the distinct/group as no-op case, we
> have
> pathkey all the time. The only drawback of it is some clauses are
> not-sortable,
> in this case, the root->distinct_pathkey and root->group_pathkey is not
> set. However it should be rare by practice, so I ignore this part for
> now. After all, I can have a relation_is_disticnt_for version for Exprs. I
> just
> not implemented it so far.
>
> 6. EC overhead in UnqiueKey & UNION case.
>
> Until now I didn't create any new EC for the UniqueKey case, I just used
> the
> existing ones. However I can't handle the case like
>
> SELECT a, b FROM t1
> UNION
> SELECT x, y FROM t2;
>
> In this case, there is no EC created with existing code. and I don't want
> to
> create them for the UniqueKey case as well.  so my plan is just not to
> handle
> the case for UNION.
>
> Since we need some big effort from the reviewer, I split the patch into
> many
> smaller chunks.
>
> Patch 1 / Patch 2:  I just split some code which UniqueKey uses but not
> very
> interrelated. Splitting them out to reduce the core patch size.
> Patch 3:  still the notnull stuff.  This one doesn't play a big role
> overall,
> even if the design is changed at last, we can just modify some small
> stuff. IMO,
> I don't think it is a blocker issue to move on.
> Patch 4:  Support the UniqueKey for 

Re: Skipping logical replication transactions on subscriber side

2021-08-15 Thread Masahiko Sawada
On Fri, Aug 13, 2021 at 1:06 PM Amit Kapila  wrote:
>
> On Thu, Aug 12, 2021 at 5:41 PM Greg Nancarrow  wrote:
> >
> > On Thu, Aug 12, 2021 at 9:18 PM Amit Kapila  wrote:
> > >
> > > > A minor comment on the 0001 patch: In the message I think that using
> > > > "ID" would look better than lowercase "id" and AFAICS it's more
> > > > consistent with existing messages.
> > > >
> > > > + appendStringInfo(&buf, _(" in transaction id %u with commit timestamp 
> > > > %s"),
> > > >
> > >
> > > You have a point but I think in this case it might look a bit odd as
> > > we have another field 'commit timestamp' after that which is
> > > lowercase.
> > >
> >
> > I did a quick search and I couldn't find any other messages in the
> > Postgres code that use "transaction id", but I could find some that
> > use "transaction ID" and "transaction identifier".
> >
>
> Okay, but that doesn't mean using it here is bad. I am personally fine
> with a message containing something like "... in transaction
> id 740 with commit timestamp 2021-08-10 14:44:38.058174+05:30" but I
> won't mind if you and or others find some other way convenient. Any
> opinion from others?

I don't have a strong opinion on this but in terms of consistency we
often use like "transaction %u" in messages when showing XID value,
rather than "transaction [id|ID|identifier]":

$ git grep -i "errmsg.*transaction %u" src/backend/
src/backend/access/transam/commit_ts.c:  errmsg("cannot
retrieve commit timestamp for transaction %u", xid)));
src/backend/access/transam/slru.c:   errmsg("could not
access status of transaction %u", xid),
src/backend/access/transam/slru.c:   errmsg("could not
access status of transaction %u", xid),
src/backend/access/transam/slru.c:   errmsg("could
not access status of transaction %u", xid),
src/backend/access/transam/slru.c:  (errmsg("could
not access status of transaction %u", xid),
src/backend/access/transam/slru.c:   errmsg("could
not access status of transaction %u", xid),
src/backend/access/transam/slru.c:  (errmsg("could
not access status of transaction %u", xid),
src/backend/access/transam/slru.c:   errmsg("could not
access status of transaction %u", xid),
src/backend/access/transam/slru.c:   errmsg("could not
access status of transaction %u", xid),
src/backend/access/transam/twophase.c:
(errmsg("recovering prepared transaction %u from shared memory",
xid)));
src/backend/access/transam/twophase.c:
(errmsg("removing stale two-phase state file for transaction %u",
src/backend/access/transam/twophase.c:
(errmsg("removing stale two-phase state from memory for transaction
%u",
src/backend/access/transam/twophase.c:
(errmsg("removing future two-phase state file for transaction %u",
src/backend/access/transam/twophase.c:
(errmsg("removing future two-phase state from memory for transaction
%u",
src/backend/access/transam/twophase.c:
errmsg("corrupted two-phase state file for transaction %u",
src/backend/access/transam/twophase.c:
errmsg("corrupted two-phase state in memory for transaction %u",
src/backend/access/transam/xlog.c:  (errmsg("recovery
stopping before commit of transaction %u, time %s",
src/backend/access/transam/xlog.c:  (errmsg("recovery
stopping before abort of transaction %u, time %s",
src/backend/access/transam/xlog.c:
(errmsg("recovery stopping after commit of transaction %u, time %s",
src/backend/access/transam/xlog.c:
(errmsg("recovery stopping after abort of transaction %u, time %s",
src/backend/replication/logical/worker.c:
errmsg_internal("transaction %u not found in stream XID hash table",
src/backend/replication/logical/worker.c:
errmsg_internal("transaction %u not found in stream XID hash table",
src/backend/replication/logical/worker.c:
errmsg_internal("transaction %u not found in stream XID hash table",
src/backend/replication/logical/worker.c:
errmsg_internal("transaction %u not found in stream XID hash table",

$ git grep -i "errmsg.*transaction identifier" src/backend/
src/backend/access/transam/twophase.c:
errmsg("transaction identifier \"%s\" is too long",
src/backend/access/transam/twophase.c:
errmsg("transaction identifier \"%s\" is already in use",

$ git grep -i "errmsg.*transaction id" src/backend/
src/backend/access/transam/twophase.c:
errmsg("transaction identifier \"%s\" is too long",
src/backend/access/transam/twophase.c:
errmsg("transaction identifier \"%s\" is already in use",
src/backend/access/transam/varsup.c:
(errmsg_internal("transaction ID wrap limit is %u, limited by database
with OID %u",
src/backend/access/transam/xlog.c:  (errmsg_internal("next
transaction ID: " UINT64_FORMAT "; next OID: %u",
src/backend/access/transam/xlog.c:  (errmsg_internal("oldest
unfrozen transaction ID: %u, in database %u",
src/backend/access/transam/xlog.c:  (errmsg("invalid next
transaction ID"))

Re: Get table total page quantity and cached page quantity

2021-08-15 Thread David Rowley
On Sun, 15 Aug 2021 at 23:41, otar shavadze  wrote:
> SELECT
> (SELECT relpages FROM pg_class where oid = 'public.my_table'::regclass::oid ) 
> AS table_pages_quantity_total,
> (SELECT COUNT(DISTINCT relblocknumber) FROM pg_buffercache WHERE relfilenode 
> = (
> SELECT relfilenode FROM pg_class WHERE oid = 'public.my_table'::regclass::oid 
> -- (SELECT rel_oid FROM my_cte)
> ) ) AS table_pages_quantity_in_cache;
>
>  This shows that table have only one page, while second column shows 3 unique 
> pages in buffer cache. Seems I'm measuring those numbers incorrectly(?) can 
> you please help, which column is incorrect (or may be both) ?

This question likely is more suited to the pgsql-general mailing list.
My answer in [1] likely will clear this up for you.

If you need further information, please ask on that thread.

David

[1] 
https://www.postgresql.org/message-id/caaphdvphddfozoviyxiz2dcr9emrrbnj7n3hks8knsd9w1w...@mail.gmail.com




Re: Added schema level support for publication.

2021-08-15 Thread Masahiko Sawada
On Fri, Aug 13, 2021 at 11:59 AM Amit Kapila  wrote:
>
> On Thu, Aug 12, 2021 at 5:54 PM Masahiko Sawada  wrote:
> >
> > On Fri, Aug 6, 2021 at 5:33 PM vignesh C  wrote:
> > >
> >
> > ---
> > Even if we drop all tables added to the publication from it, 'pubkind'
> > doesn't go back to 'empty'. Is that intentional behavior? If we do
> > that, we can save the lookup of pg_publication_rel and
> > pg_publication_schema in some cases, and we can switch the publication
> > that was created as FOR SCHEMA to FOR TABLE and vice versa.
> >
>
> Do we really want to allow users to change a publication that is FOR
> SCHEMA to FOR TABLE? I see that we don't allow to do that FOR TABLES.
> postgres=# Alter Publication pub add table tbl1;
> ERROR:  publication "pub" is defined as FOR ALL TABLES
> DETAIL:  Tables cannot be added to or dropped from FOR ALL TABLES 
> publications.

When it comes to FOR ALL TABLES, we can neither add/drop tables
to/from the publication. So it makes sense to me that it never changes
to 'empty'. I'm not sure there are use cases in practice where to
change FOR SCHEMA to FOR TABLE. But it seems a bit weird to me that
pubkind doesn't change to "empty" even if the publication actually
gets empty. It might be just that "empty" is misleading, though.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Get table total page quantity and cached page quantity

2021-08-15 Thread otar shavadze
Answer is pretty clear, thank you David.

On Mon, Aug 16, 2021 at 12:31 AM David Rowley  wrote:

> On Sun, 15 Aug 2021 at 23:41, otar shavadze  wrote:
> > SELECT
> > (SELECT relpages FROM pg_class where oid =
> 'public.my_table'::regclass::oid ) AS table_pages_quantity_total,
> > (SELECT COUNT(DISTINCT relblocknumber) FROM pg_buffercache WHERE
> relfilenode = (
> > SELECT relfilenode FROM pg_class WHERE oid =
> 'public.my_table'::regclass::oid -- (SELECT rel_oid FROM my_cte)
> > ) ) AS table_pages_quantity_in_cache;
> >
> >  This shows that table have only one page, while second column shows 3
> unique pages in buffer cache. Seems I'm measuring those numbers
> incorrectly(?) can you please help, which column is incorrect (or may be
> both) ?
>
> This question likely is more suited to the pgsql-general mailing list.
> My answer in [1] likely will clear this up for you.
>
> If you need further information, please ask on that thread.
>
> David
>
> [1]
> https://www.postgresql.org/message-id/caaphdvphddfozoviyxiz2dcr9emrrbnj7n3hks8knsd9w1w...@mail.gmail.com
>


Re: Added schema level support for publication.

2021-08-15 Thread Masahiko Sawada
On Sun, Aug 15, 2021 at 12:23 AM Tom Lane  wrote:
>
> Peter Eisentraut  writes:
> > I think the strict separation between publication-for-tables vs.
> > publication-for-schemas is a mistake.  Why can't I have a publication
> > that publishes tables t1, t2, t3, *and* schemas s1, s2, s3.  Also note
> > that we have a pending patch to add sequences support to logical
> > replication.  So eventually, a publication will be able to contain a
> > bunch of different objects of different kinds.
>
> This seems like it's going to create a mess, because the meaning of
> "include schema S" will change over time as we add more features.
> That is, with the present patch (I suppose, haven't read it) we have
> "schema S" meaning "publish all tables in schema S".  When that other
> patch lands, presumably that same publication definition would start
> meaning "publish all tables and sequences in schema S".  And a few years
> down the road it might start meaning something else again.  That sounds
> like the sort of potentially app-breaking change that we don't like
> to make.
>
> We could avoid that bug-in-waiting if the syntax were more like
> "FOR ALL TABLES IN SCHEMA s", which could later extend to
> "FOR ALL SEQUENCES IN SCHEMA s", etc.  This is then a very clean
> intermediate step between publishing one table and "FOR ALL TABLES"
> without a schema limitation.

+1

> I tend to agree that a single publication should be able to incorporate
> any of these options.

I personally prefer that a single publication can include both all
tables and all sequences in a database or a schema. It would be a more
convenient way to specify replicating all objects (tables and
sequences) in a database or a schema.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Minor regexp hacking: code coverage, moveins() and friends

2021-08-15 Thread Tom Lane
While trying to improve the code-coverage report for backend/regex/,
I found that there are portions of copyins() and copyouts() that are
just plain unreachable, because those functions are over-engineered.
In point of fact, the only uses of copyins() and copyouts() are in
places where the target state is brand new and cannot have any
pre-existing in-arcs (resp. out-arcs).  This means that all the
trouble we're going to to de-duplicate the copied arcs is entirely
wasted; we could just copy the source arcs without extra checking.

A fairly significant fraction, though by no means all, of the calls
of moveins() and moveouts() are likewise working with new target
states, and so don't really need to do any de-duplication.

Hence I propose 0001 attached, which creates simplified functions
copyinstonew() and so on, for use when the target state is known not
to have any existing arcs.  I'd thought that this might show a useful
improvement in regexp compilation speed, but it's pretty hard to
measure any noticeable change on typical regexps such as Jacobson's
web corpus.  (I do see maybe a 1% improvement on that, but that's
below the noise threshold so I don't take it too seriously.)  It is
possible to demonstrate noticeable improvement on handpicked regexes,
for example on HEAD:

regression=# SELECT regexp_matches('foo', 'abcdefghijklmnopq((\y|.?)+)+','');
 regexp_matches 

(0 rows)

Time: 6.297 ms

versus with patch:

regression=# SELECT regexp_matches('foo', 'abcdefghijklmnopq((\y|.?)+)+','');
 regexp_matches 

(0 rows)

Time: 5.506 ms

So this isn't entirely a waste of time, but it is marginal.  Improving
the code-coverage numbers is probably a better argument.  (0001 also
adds some test cases that exercise nearly everything that's reachable
without OOM conditions or cancels in regc_nfa.c.)

0002 below is some additional test cases to improve code coverage in
regc_locale.c and regc_pg_locale.c.  (regc_pg_locale.c is still not
great, but that's mostly because much of the code is only reachable
for particular choices of database encoding, so any one coverage
run hits just some of it.)

Barring objections, I plan to push this in a day or two; I don't
think it needs much review.

regards, tom lane

diff --git a/src/backend/regex/regc_nfa.c b/src/backend/regex/regc_nfa.c
index 6d77c59e12..2b5ffcba8f 100644
--- a/src/backend/regex/regc_nfa.c
+++ b/src/backend/regex/regc_nfa.c
@@ -867,9 +867,37 @@ moveins(struct nfa *nfa,
 	assert(oldState->ins == NULL);
 }
 
+/*
+ * moveinstonew - move all in arcs of a state to another state
+ *
+ * The newState must not have any existing in-arcs.
+ */
+static void
+moveinstonew(struct nfa *nfa,
+			 struct state *oldState,
+			 struct state *newState)
+{
+	struct arc *a;
+
+	assert(oldState != newState);
+	assert(newState->ins == NULL && newState->nins == 0);
+
+	/*
+	 * Since there is no risk of creating duplicate arcs (given that none
+	 * exist already), we can just use createarc directly and not spend any
+	 * time de-duplicating.
+	 */
+	while ((a = oldState->ins) != NULL)
+	{
+		createarc(nfa, a->type, a->co, a->from, newState);
+		freearc(nfa, a);
+	}
+}
+
 /*
  * copyins - copy in arcs of a state to another state
  */
+#ifdef NOT_USED	/* not currently needed */
 static void
 copyins(struct nfa *nfa,
 		struct state *oldState,
@@ -945,6 +973,31 @@ copyins(struct nfa *nfa,
 		}
 	}
 }
+#endif			/* NOT_USED */
+
+/*
+ * copyinstonew - copy in arcs of a state to another state
+ *
+ * The newState must not have any existing in-arcs.
+ */
+static void
+copyinstonew(struct nfa *nfa,
+			 struct state *oldState,
+			 struct state *newState)
+{
+	struct arc *a;
+
+	assert(oldState != newState);
+	assert(newState->ins == NULL && newState->nins == 0);
+
+	/*
+	 * Since there is no risk of creating duplicate arcs (given that none
+	 * exist already), we can just use createarc directly and not spend any
+	 * time de-duplicating.
+	 */
+	for (a = oldState->ins; a != NULL; a = a->inchain)
+		createarc(nfa, a->type, a->co, a->from, newState);
+}
 
 /*
  * mergeins - merge a list of inarcs into a state
@@ -1140,9 +1193,37 @@ moveouts(struct nfa *nfa,
 	assert(oldState->outs == NULL);
 }
 
+/*
+ * moveoutstonew - move all out arcs of a state to another state
+ *
+ * The newState must not have any existing out-arcs.
+ */
+static void
+moveoutstonew(struct nfa *nfa,
+			  struct state *oldState,
+			  struct state *newState)
+{
+	struct arc *a;
+
+	assert(oldState != newState);
+	assert(newState->outs == NULL && newState->nouts == 0);
+
+	/*
+	 * Since there is no risk of creating duplicate arcs (given that none
+	 * exist already), we can just use createarc directly and not spend any
+	 * time de-duplicating.
+	 */
+	while ((a = oldState->outs) != NULL)
+	{
+		createarc(nfa, a->type, a->co, newState, a->to);
+		freearc(nfa, a);
+	}
+}
+
 /*
  * copyouts - copy out arcs of a state to another state
  */
+#ifdef NOT_USED	/* not c

Re: Default to TIMESTAMP WITH TIME ZONE?

2021-08-15 Thread David Fetter
On Sat, Aug 14, 2021 at 10:03:01AM +0200, Peter Eisentraut wrote:
> On 13.08.21 19:07, Tom Lane wrote:
> > "David G. Johnston"  writes:
> > > On Fri, Aug 13, 2021 at 9:28 AM Simon Riggs 
> > > wrote:
> > > > The only hope is to eventually change the default, so probably
> > > > the best thing is to apply pressure via the SQL Std process.
> > 
> > > Then there is no hope because this makes the situation worse.
> > 
> > Agreed; the points I made upthread are just as valid if the change
> > is made in the standard.  But I'd be astonished if the SQL
> > committee would consider such a change anyway.
> 
> AFAIU, our timestamp with time zone type doesn't really do what the
> SQL standard specifies anyway, as it doesn't actually record the
> time zone, but it's more of a "timestamp with time zone aware
> formatting".  For SQL, it might make sense to add a (third) time
> stamp type that behaves more like that.

The way the standard defines time zones, namely as fix offsets from
UTC, is just ludicrous and has been baseless in fact much longer than
SQL has existed.  If we're going to capture input time zones, we
should come up with a way to capture the time zone as it existed when
the write occurred, i.e. both its name and the UTC offset it
represented at that time of the write.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: PoC/WIP: Extended statistics on expressions

2021-08-15 Thread Justin Pryzby
On 1/22/21 5:01 AM, Justin Pryzby wrote:
> > In any case, why are there so many parentheses ?

On Fri, Jan 22, 2021 at 02:09:04PM +0100, Tomas Vondra wrote:
> That's a bug in pg_get_statisticsobj_worker, probably. It shouldn't be
> adding extra parentheses, on top of what deparse_expression_pretty does.
> Will fix.

The extra parens are still here - is it intended ?

postgres=# CREATE STATISTICS s ON i, (1+i), (2+i) FROM t;
CREATE STATISTICS
postgres=# \d t
 Table "public.t"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 i  | integer |   |  | 
Statistics objects:
"public"."s" ON i, ((1 + i)), ((2 + i)) FROM t

-- 
Justin




Re: PoC/WIP: Extended statistics on expressions

2021-08-15 Thread Justin Pryzby
On Mon, Dec 07, 2020 at 03:15:17PM +0100, Tomas Vondra wrote:
> > Looking at the current behaviour, there are a couple of things that
> > seem a little odd, even though they are understandable. For example,
> > the fact that
> > 
> >   CREATE STATISTICS s (expressions) ON (expr), col FROM tbl;
> > 
> > fails, but
> > 
> >   CREATE STATISTICS s (expressions, mcv) ON (expr), col FROM tbl;
> > 
> > succeeds and creates both "expressions" and "mcv" statistics. Also, the 
> > syntax
> > 
> >   CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl;
> > 
> > tends to suggest that it's going to create statistics on the pair of
> > expressions, describing their correlation, when actually it builds 2
> > independent statistics. Also, this error text isn't entirely accurate:
> > 
> >   CREATE STATISTICS s ON col FROM tbl;
> >   ERROR:  extended statistics require at least 2 columns
> > 
> > because extended statistics don't always require 2 columns, they can
> > also just have an expression, or multiple expressions and 0 or 1
> > columns.
> > 
> > I think a lot of this stems from treating "expressions" in the same
> > way as the other (multi-column) stats kinds, and it might actually be
> > neater to have separate documented syntaxes for single- and
> > multi-column statistics:
> > 
> >   CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
> > ON (expression)
> > FROM table_name
> > 
> >   CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
> > [ ( statistics_kind [, ... ] ) ]
> > ON { column_name | (expression) } , { column_name | (expression) } [, 
> > ...]
> > FROM table_name
> > 
> > The first syntax would create single-column stats, and wouldn't accept
> > a statistics_kind argument, because there is only one kind of
> > single-column statistic. Maybe that might change in the future, but if
> > so, it's likely that the kinds of single-column stats will be
> > different from the kinds of multi-column stats.
> > 
> > In the second syntax, the only accepted kinds would be the current
> > multi-column stats kinds (ndistinct, dependencies, and mcv), and it
> > would always build stats describing the correlations between the
> > columns listed. It would continue to build standard/expression stats
> > on any expressions in the list, but that's more of an implementation
> > detail.
> > 
> > It would no longer be possible to do "CREATE STATISTICS s
> > (expressions) ON (expr1), (expr2) FROM tbl". Instead, you'd have to
> > issue 2 separate "CREATE STATISTICS" commands, but that seems more
> > logical, because they're independent stats.
> > 
> > The parsing code might not change much, but some of the errors would
> > be different. For example, the errors "building only extended
> > expression statistics on simple columns not allowed" and "extended
> > expression statistics require at least one expression" would go away,
> > and the error "extended statistics require at least 2 columns" might
> > become more specific, depending on the stats kind.

This still seems odd:

postgres=# CREATE STATISTICS asf ON i FROM t;
ERROR:  extended statistics require at least 2 columns
postgres=# CREATE STATISTICS asf ON (i) FROM t;
CREATE STATISTICS

It seems wrong that the command works with added parens, but builds expression
stats on a simple column (which is redundant with what analyze does without
extended stats).

-- 
Justin




Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-08-15 Thread Andy Fan
Hi Zhihong,

On Mon, Aug 16, 2021 at 12:35 AM Zhihong Yu  wrote:
>
>
>
> On Sun, Aug 15, 2021 at 7:33 AM Andy Fan  wrote:
>>
>> Hi:
>>
>> I have finished the parts for baserel, joinrel, subquery, distinctrel. I 
>> think
>> the hardest ones have been verified.  Here is the design overview.
>>
>> 1. Use EC instead of expr to cover more UniqueKey cases.
>> 2. Redesign the UniqueKey as below:
>>
>> @@ -246,6 +246,7 @@ struct PlannerInfo
>>
>> List   *eq_classes; /* list of active EquivalenceClasses */
>> + List   *unique_exprs; /* List of unique expr */
>>
>>   bool ec_merging_done; /* set true once ECs are canonical */
>>
>> +typedef struct UniqueKey
>> +{
>> + NodeTag type;
>> + Bitmapset *unique_expr_indexes;
>> + bool multi_nulls;
>> +} UniqueKey;
>> +
>>
>> PlannerInfo.unique_exprs is a List of unique exprs.  Unique Exprs is a set of
>> EquivalenceClass. for example:
>>
>> CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT,  pk INT primary key);
>> CREATE UNIQUE INDEX ON t1(a, b);
>>
>> SELECT DISTINCT * FROM T1 WHERE a = c;
>>
>> Then we would have PlannerInfo.unique_exprs as below
>> [
>> [EC(a, c), EC(b)],
>> [EC(pk)]
>> ]
>>
>> RelOptInfo(t1) would have 2 UniqueKeys.
>> UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
>> UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]
>>
>> The design will benefit many table joins cases. For instance a 10- tables 
>> join,
>> each table has a primary key (a, b).  Then we would have a UniqueKey like
>> this.
>>
>> JoinRel{1,2,3,4} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
>> JoinRel{1,2,3,4,5} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a 
>> t5.b}
>>
>> When more tables are joined, the list would be longer and longer, build the 
>> list
>> consumes both CPU cycles and memory.
>>
>> With the method as above, we can store it as:
>>
>> root->unique_exprs =  /* All the UniqueKey is stored once */
>> [
>> [t1.a, t1.b], -- EC is ignored in document.
>> [t2.a, t2.b],
>> [t3.a, t3.b],
>> [t4.a, t4.b],
>> [t5.a, t5.b],
>> [t6.a, t6.b],
>> [t7.a, t7.b],
>> [t8.a, t8.b],
>> [t9.a, t9.b],
>> [t10.a, t10.b],
>> ]
>>
>> JoinRel{1,2,3,4} -  Bitmapset{0,1,2,3} -- one bitmapword.
>> JoinRel{1,2,3,4,5} -  Bitmapset{0,1,2,3,4} -- one bitmapword.
>>
>> The member in the bitmap is the index of root->unique_exprs rather than
>> root->eq_classes because we need to store the SingleRow case in
>> root->unqiue_exprs as well.
>>
>> 3. Define a new SingleRow node and use it in joinrel as well.
>>
>> +typedef struct SingleRow
>> +{
>> + NodeTag type;
>> + Index relid;
>> +} SingleRow;
>>
>> SELECT * FROM t1, t2 WHERE t2.pk = 1;
>>
>> root->unique_exprs
>> [
>> [t1.a, t1.b],
>> SingleRow{relid=2}
>> ]
>>
>> JoinRel{t1} - Bitmapset{0}
>> JoinRel{t2} - Bitmapset{1}
>>
>> JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to 
>> dedicated
>> exprs.
>>
>> 4. Interesting UniqueKey to remove the Useless UniqueKey as soon as possible.
>>
>> The overall idea is similar with PathKey, I distinguish the UniqueKey for
>> distinct purpose and useful_for_merging purpose.
>>
>> SELECT DISTINCT pk FROM  t; -- OK, maintain it all the time, just like
>> root->query_pathkey.
>>
>> SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK is
>> use before t1 join t2, but not useful after it.
>>
>> Currently the known issue I didn't pass the "interesting UniqueKey" info to
>> subquery well [2], I'd like to talk more about this when we discuss about
>> UnqiueKey on subquery part.
>>
>>
>> 5. relation_is_distinct_for
>>
>> Now I design the function as
>>
>> + bool
>> + relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List
>>   *distinct_pathkey)
>>
>> It is "List *distinct_pathkey", rather than "List *exprs". The reason pathkey
>> has EC in it, and all the UniqueKey has EC as well. if so, the subset-like
>> checking is very effective.  As for the distinct/group as no-op case, we have
>> pathkey all the time. The only drawback of it is some clauses are 
>> not-sortable,
>> in this case, the root->distinct_pathkey and root->group_pathkey is not
>> set. However it should be rare by practice, so I ignore this part for
>> now. After all, I can have a relation_is_disticnt_for version for Exprs. I 
>> just
>> not implemented it so far.
>>
>> 6. EC overhead in UnqiueKey & UNION case.
>>
>> Until now I didn't create any new EC for the UniqueKey case, I just used the
>> existing ones. However I can't handle the case like
>>
>> SELECT a, b FROM t1
>> UNION
>> SELECT x, y FROM t2;
>>
>> In this case, there is no EC created with existing code. and I don't want to
>> create them for the UniqueKey case as well.  so my plan is just not to handle
>> the case for UNION.
>>
>> Since we need some big effort from the reviewer, I split the patch into many
>> smaller chunks.
>>
>> Patch 1 / Patch 2:  I just split some code which UniqueKey uses but not very
>> interrelated. Splitting them out to reduce the core patch size.
>> Patch 3:  still the notnull stuff.  Th

Re: badly calculated width of emoji in psql

2021-08-15 Thread Michael Paquier
On Thu, Aug 12, 2021 at 05:13:31PM -0400, John Naylor wrote:
> I'm a bit concerned about the build dependencies not working right, but
> it's not clear it's even due to the patch. I'll spend some time
> investigating next week.

How large do libpgcommon deliverables get with this patch?  Skimming
through the patch, that just looks like a couple of bytes, still.
--
Michael


signature.asc
Description: PGP signature


Re: Skipping logical replication transactions on subscriber side

2021-08-15 Thread Greg Nancarrow
On Mon, Aug 16, 2021 at 6:24 AM Masahiko Sawada  wrote:
>
> Therefore, perhaps a message like "... in transaction 740 with commit
> timestamp 2021-08-10 14:44:38.058174+05:30" is better in terms of
> consistency with other messages?
>

Yes, I think that would be more consistent.

On another note, for the 0001 patch, the elog ERROR at the bottom of
the logicalrep_message_type() function seems to assume that the
unrecognized "action" is a printable character (with its use of %c)
and also that the character is meaningful to the user in some way.
But given that the compiler normally warns of an unhandled enum value
when switching on an enum, such an error would most likely be when
action is some int value that wouldn't be meaningful to the user (as
it wouldn't be one of the LogicalRepMsgType enum values).
I therefore think it would be better to use %d in that ERROR:

i.e.

+ elog(ERROR, "invalid logical replication message type %d", action);

Similar comments apply to the apply_dispatch() function (and I realise
it used %c before your patch).


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Changes to recovery_min_apply_delay are ignored while waiting for delay

2021-08-15 Thread Michael Paquier
On Fri, Aug 13, 2021 at 05:59:21PM -0700, Soumyadeep Chakraborty wrote:
> and passes with the code change, as expected. I can't explain why the
> test doesn't freeze up in v3 in wait_for_catchup() at the end.

It took me some some to understand why.  If I am right, that's because
of the intermediate test block working on $standby_2 and the two
INSERT queries of the primary.  In v1 and v4, we have no activity on
the primary between the first set of tests and yours, meaning that
$standby has nothing to do.  In v2 and v3, the two INSERT queries run
on the primary for the purpose of the recovery pause make $standby_1
wait for the default value of recovery_min_apply_delay, aka 3s, in
parallel.  If the set of tests for $standby_2 is faster than that,
we'd bump on the phase where the code still waited for 3s, not the 2
hours set, visibly.

After considering this stuff, the order dependency we'd introduce in
this test makes the whole thing more brittle than it should.  And such
an edge case does not seem worth spending extra cycles testing anyway,
as if things break we'd finish with a test stuck for an unnecessary
long time by relying on wait_for_catchup("replay").  We could use
something else, say based on a lookup of pg_stat_activity but this
still requires extra run time for the wait phases needed.  So at the
end I have dropped the test, but backpatched the fix.
--
Michael


signature.asc
Description: PGP signature


Failure of subscription tests with topminnow

2021-08-15 Thread Michael Paquier
Hi all,

topminnow has just failed in a weird fashion:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2021-08-15%2013%3A24%3A58
# SELECT pid !=  FROM pg_stat_replication WHERE application_name = 'tap_sub';
# expecting this output:
# t
# last actual query output:
#
# with stderr:
# ERROR:  syntax error at or near "FROM"
# LINE 1: SELECT pid !=  FROM pg_stat_replication WHERE application_na...

Looking at the logs, it seems like the problem boils down to an active
slot when using ALTER SUBSCRIPTION tap_sub CONNECTION:
2021-08-15 18:44:38.027 CEST [16473:2] ERROR:  could not start WAL
streaming: ERROR:  replication slot "tap_sub" is active for PID 16336

There is only one place in 001_rep_changes.pl where this is used.

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: Changes to recovery_min_apply_delay are ignored while waiting for delay

2021-08-15 Thread Soumyadeep Chakraborty
On Sun, Aug 15, 2021 at 8:16 PM Michael Paquier  wrote:
>
> On Fri, Aug 13, 2021 at 05:59:21PM -0700, Soumyadeep Chakraborty wrote:
> > and passes with the code change, as expected. I can't explain why the
> > test doesn't freeze up in v3 in wait_for_catchup() at the end.
>
> It took me some some to understand why.  If I am right, that's because
> of the intermediate test block working on $standby_2 and the two
> INSERT queries of the primary.  In v1 and v4, we have no activity on
> the primary between the first set of tests and yours, meaning that
> $standby has nothing to do.  In v2 and v3, the two INSERT queries run
> on the primary for the purpose of the recovery pause make $standby_1
> wait for the default value of recovery_min_apply_delay, aka 3s, in
> parallel.  If the set of tests for $standby_2 is faster than that,
> we'd bump on the phase where the code still waited for 3s, not the 2
> hours set, visibly.

I see, thanks a lot for the explanation. Thanks to your investigation, I
can now kind of reuse some of the test mechanisms for the other patch that
I am working on [1]. There, we don't have multiple standbys getting in the
way, thankfully.

> After considering this stuff, the order dependency we'd introduce in
> this test makes the whole thing more brittle than it should.  And such
> an edge case does not seem worth spending extra cycles testing anyway,
> as if things break we'd finish with a test stuck for an unnecessary
> long time by relying on wait_for_catchup("replay").  We could use
> something else, say based on a lookup of pg_stat_activity but this
> still requires extra run time for the wait phases needed.  So at the
> end I have dropped the test, but backpatched the fix.
> --

Fair.

Regards,
Soumyadeep (VMware)

[1] 
https://www.postgresql.org/message-id/flat/CANXE4Tc3FNvZ_xAimempJWv_RH9pCvsZH7Yq93o1VuNLjUT-mQ%40mail.gmail.com




Re: .ready and .done files considered harmful

2021-08-15 Thread Bossart, Nathan
+* This .ready file is created out of order, notify archiver to perform
+* a full directory scan to archive corresponding WAL file.
+*/
+   StatusFilePath(archiveStatusPath, xlog, ".ready");
+   if (stat(archiveStatusPath, &stat_buf) == 0)
+   PgArchEnableDirScan();

We may want to call PgArchWakeup() after setting the flag.

+* Perform a full directory scan to identify the next log segment. There
+* may be one of the following scenarios which may require us to 
perform a
+* full directory scan.
...
+* - The next anticipated log segment is not available.

I wonder if we really need to perform a directory scan in this case.
Unless there are other cases where the .ready files are created out of
order, I think this just causes an unnecessary directory scan every
time the archiver catches up.

+* Flag to enable/disable directory scan. If this flag is set then it
+* forces archiver to perform a full directory scan to get the next log
+* segment.
+*/
+   pg_atomic_flag dirScan;

I personally don't think it's necessary to use an atomic here.  A
spinlock or LWLock would probably work just fine, as contention seems
unlikely.  If we use a lock, we also don't have to worry about memory
barriers.

Nathan



Re: .ready and .done files considered harmful

2021-08-15 Thread Bossart, Nathan
On 8/15/21, 9:52 PM, "Bossart, Nathan"  wrote:
> +  * Perform a full directory scan to identify the next log segment. There
> +  * may be one of the following scenarios which may require us to 
> perform a
> +  * full directory scan.
> ...
> +  * - The next anticipated log segment is not available.
>
> I wonder if we really need to perform a directory scan in this case.
> Unless there are other cases where the .ready files are created out of
> order, I think this just causes an unnecessary directory scan every
> time the archiver catches up.

Thinking further, I suppose this is necessary for when lastSegNo gets
reset after processing an out-of-order .ready file.

Nathan



Re: Skipping logical replication transactions on subscriber side

2021-08-15 Thread Amit Kapila
On Mon, Aug 16, 2021 at 1:54 AM Masahiko Sawada  wrote:
>
> On Fri, Aug 13, 2021 at 1:06 PM Amit Kapila  wrote:
> >
> >
> > Okay, but that doesn't mean using it here is bad. I am personally fine
> > with a message containing something like "... in transaction
> > id 740 with commit timestamp 2021-08-10 14:44:38.058174+05:30" but I
> > won't mind if you and or others find some other way convenient. Any
> > opinion from others?
>
> I don't have a strong opinion on this but in terms of consistency we
> often use like "transaction %u" in messages when showing XID value,
> rather than "transaction [id|ID|identifier]":
>
..
>
> Therefore, perhaps a message like "... in transaction 740 with commit
> timestamp 2021-08-10 14:44:38.058174+05:30" is better in terms of
> consistency with other messages?
>

+1.

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-08-15 Thread Peter Smith
On Sun, Aug 15, 2021 at 1:23 AM Tom Lane  wrote:
>
> Peter Eisentraut  writes:
> > I think the strict separation between publication-for-tables vs.
> > publication-for-schemas is a mistake.  Why can't I have a publication
> > that publishes tables t1, t2, t3, *and* schemas s1, s2, s3.  Also note
> > that we have a pending patch to add sequences support to logical
> > replication.  So eventually, a publication will be able to contain a
> > bunch of different objects of different kinds.
>
> This seems like it's going to create a mess, because the meaning of
> "include schema S" will change over time as we add more features.
> That is, with the present patch (I suppose, haven't read it) we have
> "schema S" meaning "publish all tables in schema S".  When that other
> patch lands, presumably that same publication definition would start
> meaning "publish all tables and sequences in schema S".  And a few years
> down the road it might start meaning something else again.  That sounds
> like the sort of potentially app-breaking change that we don't like
> to make.
>
> We could avoid that bug-in-waiting if the syntax were more like
> "FOR ALL TABLES IN SCHEMA s", which could later extend to
> "FOR ALL SEQUENCES IN SCHEMA s", etc.  This is then a very clean
> intermediate step between publishing one table and "FOR ALL TABLES"
> without a schema limitation.
>
> I tend to agree that a single publication should be able to incorporate
> any of these options.
>

How about if the improved syntax from Tom Lane [1] also allowed an
"AND" keyword for combining whatever you wish?

Then the question from Peter E. [2] "Why can't I have a publication
that publishes tables t1, t2, t3, *and* schemas s1, s2, s3." would
have an intuitive solution like:

CREATE PUBLICATION pub1
FOR TABLE t1,t2,t3 AND
FOR ALL TABLES IN SCHEMA s1,s2,s3;

--
[1] https://www.postgresql.org/message-id/155565.1628954580%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/4fb39707-dca9-1563-4482-b7a8315c36ca%40enterprisedb.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-15 Thread Michael Paquier
On Sat, Aug 14, 2021 at 11:08:44PM -0400, Tom Lane wrote:
> Please do something about that.
> 
> (1) There should be no output to stderr in the tests.  Why isn't this
> message being caught and redirected to the normal test output file?

These are generated during the compilation of the tests with the
pre-processor, so that's outside the test runs.

> (2) This message is both unintelligible and grammatically incorrect.

Yeah, debugging such tests would be more helpful if the name of the
DECLARE statement is included, at least.  Those messages being
generated is not normal anyway, which is something coming from the
tests as a typo with the connection name of stmt_3.

Michael, what do you think about the attached?
--
Michael
diff --git a/src/interfaces/ecpg/preproc/ecpg.header b/src/interfaces/ecpg/preproc/ecpg.header
index 067c9cf8e7..863076945f 100644
--- a/src/interfaces/ecpg/preproc/ecpg.header
+++ b/src/interfaces/ecpg/preproc/ecpg.header
@@ -594,8 +594,9 @@ check_declared_list(const char *name)
 			continue;
 		if (strcmp(name, ptr -> name) == 0)
 		{
-			if (connection)
-mmerror(PARSE_ERROR, ET_WARNING, "connection %s is overwritten to %s.", connection, ptr->connection);
+			if (connection && strcmp(ptr->connection, connection) != 0)
+mmerror(PARSE_ERROR, ET_WARNING, "declare statement %s using connection %s overwritten to connection %s.",
+	name, connection, ptr->connection);
 			connection = mm_strdup(ptr -> connection);
 			return true;
 		}
diff --git a/src/interfaces/ecpg/test/expected/sql-declare.c b/src/interfaces/ecpg/test/expected/sql-declare.c
index cff928204e..b9b189b656 100644
--- a/src/interfaces/ecpg/test/expected/sql-declare.c
+++ b/src/interfaces/ecpg/test/expected/sql-declare.c
@@ -374,7 +374,7 @@ if (sqlca.sqlcode < 0) sqlprint();}
 /* declare  \"stmt_3\"  as an SQL identifier */
 #line 122 "declare.pgc"
 
-{ ECPGprepare(__LINE__, "con1", 0, "stmt_3", selectString);
+{ ECPGprepare(__LINE__, "con2", 0, "stmt_3", selectString);
 #line 123 "declare.pgc"
 
 if (sqlca.sqlcode < 0) sqlprint();}
@@ -383,8 +383,8 @@ if (sqlca.sqlcode < 0) sqlprint();}
 /* declare cur_3 cursor for $1 */
 #line 124 "declare.pgc"
 
-{ ECPGdo(__LINE__, 0, 1, "con1", 0, ECPGst_normal, "declare cur_3 cursor for $1", 
-	ECPGt_char_variable,(ECPGprepared_statement("con1", "stmt_3", __LINE__)),(long)1,(long)1,(1)*sizeof(char), 
+{ ECPGdo(__LINE__, 0, 1, "con2", 0, ECPGst_normal, "declare cur_3 cursor for $1", 
+	ECPGt_char_variable,(ECPGprepared_statement("con2", "stmt_3", __LINE__)),(long)1,(long)1,(1)*sizeof(char), 
 	ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EOIT, ECPGt_EORT);
 #line 125 "declare.pgc"
 
@@ -398,7 +398,7 @@ if (sqlca.sqlcode < 0) sqlprint();}
 i = 0;
 while (1)
 {
-{ ECPGdo(__LINE__, 0, 1, "con1", 0, ECPGst_normal, "fetch cur_3", ECPGt_EOIT, 
+{ ECPGdo(__LINE__, 0, 1, "con2", 0, ECPGst_normal, "fetch cur_3", ECPGt_EOIT, 
 	ECPGt_int,&(f1[i]),(long)1,(long)1,sizeof(int), 
 	ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, 
 	ECPGt_int,&(f2[i]),(long)1,(long)1,sizeof(int), 
@@ -415,13 +415,13 @@ if (sqlca.sqlcode < 0) sqlprint();}
 
 i++;
 }
-{ ECPGdo(__LINE__, 0, 1, "con1", 0, ECPGst_normal, "close cur_3", ECPGt_EOIT, ECPGt_EORT);
+{ ECPGdo(__LINE__, 0, 1, "con2", 0, ECPGst_normal, "close cur_3", ECPGt_EOIT, ECPGt_EORT);
 #line 134 "declare.pgc"
 
 if (sqlca.sqlcode < 0) sqlprint();}
 #line 134 "declare.pgc"
 
-{ ECPGdeallocate(__LINE__, 0, "con1", "stmt_3");
+{ ECPGdeallocate(__LINE__, 0, "con2", "stmt_3");
 #line 135 "declare.pgc"
 
 if (sqlca.sqlcode < 0) sqlprint();}
diff --git a/src/interfaces/ecpg/test/expected/sql-declare.stderr b/src/interfaces/ecpg/test/expected/sql-declare.stderr
index 29d0b828e7..65db974a85 100644
--- a/src/interfaces/ecpg/test/expected/sql-declare.stderr
+++ b/src/interfaces/ecpg/test/expected/sql-declare.stderr
@@ -142,13 +142,13 @@
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: prepare_common on line 123: name stmt_3; query: "SELECT f1,f2,f3 FROM source"
 [NO_PID]: sqlca: code: 0, state: 0
-[NO_PID]: ecpg_execute on line 125: query: declare cur_3 cursor for SELECT f1,f2,f3 FROM source; with 0 parameter(s) on connection con1
+[NO_PID]: ecpg_execute on line 125: query: declare cur_3 cursor for SELECT f1,f2,f3 FROM source; with 0 parameter(s) on connection con2
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_execute on line 125: using PQexec
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_process_output on line 125: OK: DECLARE CURSOR
 [NO_PID]: sqlca: code: 0, state: 0
-[NO_PID]: ecpg_execute on line 131: query: fetch cur_3; with 0 parameter(s) on connection con1
+[NO_PID]: ecpg_execute on line 131: query: fetch cur_3; with 0 parameter(s) on connection con2
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_execute on line 131: using PQexec
 [NO_PID]: sqlca: code: 0, state: 0
@@ -158,9 +158,9 @@
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_

Re: Add option --drop-cascade for pg_dump/restore

2021-08-15 Thread Wu Haotian
There are already documents for "--clean only works with plain text output",
so adding checks for --clean seems like a breaking change to me.

I've updated the docs to indicate --drop-cascade and --if-exists only
works with plain text output.


0004-pg_dump-restore-add-drop-cascade-option.patch
Description: Binary data


Re: logical replication empty transactions

2021-08-15 Thread Peter Smith
On Fri, Aug 13, 2021 at 9:01 PM Ajin Cherian  wrote:
>
> On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila  wrote:
> >
> > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian  wrote:
> > >
> >
> > Let's first split the patch for prepared and non-prepared cases as
> > that will help to focus on each of them separately. BTW, why haven't
> > you considered implementing point 1b as explained by Andres in his
> > email [1]? I think we can send a keepalive message in case of
> > synchronous replication when we skip an empty transaction, otherwise,
> > it might delay in responding to transactions synchronous_commit mode.
> > I think in the tests done in the thread, it might not have been shown
> > because we are already sending keepalives too frequently. But what if
> > someone disables wal_sender_timeout or kept it to a very large value?
> > See WalSndKeepaliveIfNecessary. The other thing you might want to look
> > at is if the reason for frequent keepalives is the same as described
> > in the email [2].
> >
>
> I have tried to address the comment here by modifying the
> ctx->update_progress callback function (WalSndUpdateProgress) provided
> for plugins. I have added an option
> by which the callback can specify if it wants to send keep_alives. And
> when the callback is called with that option set, walsender updates a
> flag force_keep_alive_syncrep.
> The Walsender in the WalSndWaitForWal for loop, checks this flag and
> if synchronous replication is enabled, then sends a keep alive.
> Currently this logic
> is added as an else to the current logic that is already there in
> WalSndWaitForWal, which is probably considered unnecessary and a
> source of the keep alive flood
> that you talked about. So, I can change that according to how that fix
> shapes up there. I have also added an extern function in syncrep.c
> that makes it possible
> for walsender to query if synchronous replication is turned on.
>
> The reason I had to turn on a flag and rely on the WalSndWaitForWal to
> send the keep alive in its next iteration is because I tried doing
> this directly when a
> commit is skipped but it didn't work. The reason for this is that when
> the commit is being decoded the sentptr at the moment is at the commit
> LSN and the keep alive
> will be sent for the commit LSN but the syncrep wait is waiting for
> end_lsn of the transaction which is the next LSN. So, sending a keep
> alive at the moment the
> commit is decoded doesn't seem to solve the problem of the waiting
> synchronous reply.
>
> > Few other miscellaneous comments:
> > 1.
> > static void
> >  pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
> > ReorderBufferTXN *txn,
> > - XLogRecPtr commit_lsn)
> > + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
> > + TimestampTz prepare_time)
> >  {
> > + PGOutputTxnData*txndata = (PGOutputTxnData *) 
> > txn->output_plugin_private;
> > +
> >   OutputPluginUpdateProgress(ctx);
> >
> > + /*
> > + * If the BEGIN PREPARE was not yet sent, then it means there were no
> > + * relevant changes encountered, so we can skip the COMMIT PREPARED
> > + * message too.
> > + */
> > + if (txndata)
> > + {
> > + bool skip = !txndata->sent_begin_txn;
> > + pfree(txndata);
> > + txn->output_plugin_private = NULL;
> >
> > How is this supposed to work after the restart when prepared is sent
> > before the restart and we are just sending commit_prepared after
> > restart? Won't this lead to sending commit_prepared even when the
> > corresponding prepare is not sent? Can we think of a better way to
> > deal with this?
> >
>
> I have tried to resolve this by adding logic in worker,c to silently
> ignore spurious commit_prepareds. But this change required checking if
> the prepare exists on the
> subscriber before attempting the commit_prepared but the current API
> that checks this requires prepare time and transaction end_lsn. But
> for this I had to
> change the protocol of commit_prepared, and I understand that this
> would break backward compatibility between subscriber and publisher
> (you have raised this issue as well).
> I am not sure how else to handle this, let me know if you have any
> other ideas. One option could be to have another API to check if the
> prepare exists on the subscriber with
> the prepared 'gid' alone, without checking prepare_time or end_lsn.
> Let me know if this idea works.
>
> I have left out the patch 0002 for prepared transactions until we
> arrive at a decision on how to address the above issue.
>
> Peter,
> I have also addressed the comments you've raised on patch 0001, please
> have a look and confirm.

I have reviewed the v13-0001 patch.

Apply / build / test was all OK

Below are my code review comments.

//

Comments for v13-0001
=

1. Patch comment

=>

Probably this comment should include some description for the new
"keepalive" logic as well.

--

2. src/backend/replication/syncrep.c - new function

@@ -330,6 +330,18 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool co