Re: Failed transaction statistics to measure the logical replication progress

2021-08-04 Thread Masahiko Sawada
On Wed, Aug 4, 2021 at 12:17 PM Amit Kapila  wrote:
>
> On Wed, Aug 4, 2021 at 6:19 AM Masahiko Sawada  wrote:
> >
> > On Tue, Aug 3, 2021 at 6:11 PM Amit Kapila  wrote:
> > >
> > > I was trying to think based on similar counters in pg_stat_database
> > > but if you think there is a value in showing errored and actual
> > > rollbacked transactions separately then we can do that but how do you
> > > think one can make use of it?
> >
> > I'm concerned that the value that includes both errored and actual
> > rollbacked transactions doesn't make sense in practice since the
> > number of errored transactions can easily get increased once a
> > conflict happens. IMO the errored transaction doesn’t not necessarily
> > necessary since the number of (successive) errors that happened on the
> > subscription is tracked by pg_stat_subscription_errors view.
> >
>
> It sounds awkward to display two of the xact (xact_commit,
> xact_rollback) counters in one view and then the other similar counter
> (xact_error or something like that) in another view. Isn't it better
> to display all of them together possibly in pg_stat_subscription? I
> guess it might be a bit tricky to track counters for tablesync workers
> separately but we can track them in the corresponding subscription.

I meant that the number of rolled back transactions due to an error
seems not to be necessary since pg_stat_subscription_errors has a
similar value. So what I imagined is that we have xact_commit and
xact_rollback (counting only actual rollbacked transaction) counters
in pg_stat_subscription.  What do you think of this idea? Or did you
mean the number of errored transactions also has value so should be
included in pg_stat_subscription along with xact_commit and
xact_rollback?

Originally I thought your proposal of having the number of rollback
transactions includes both errored transactions and actual rolled back
transactions so my point was it's better to separate them and include
only the actual rolled-back transaction.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Division by zero error in to_char(num, '9.9EEEE')

2021-08-04 Thread Dean Rasheed
On Fri, 30 Jul 2021 at 08:26, Dean Rasheed  wrote:
>
> SELECT to_char(1.2e-1002, '9.9'); -- fails
> ERROR:  division by zero
>
> I think the simplest
> solution is to just introduce a new local function, as in the attached
> patch. This directly constructs 10^n, for integer n, which is pretty
> trivial, and doesn't need any numeric multiplication or rounding.
>

Unless there are any objections, I intend to push this shortly. I
think it's a fairly straightforward bug fix, and I want to be able to
use to_char() in some new numeric regression tests.

Regards,
Dean




Possible dependency issue in makefile

2021-08-04 Thread Filip Janus
Hi all,
I am building libecpg 13.1 but 13.3 behaves in the same manner and my build
fails with:

path.c: In function 'get_html_path':
path.c:787:38: error: 'HTMLDIR' undeclared (first use in this function)
  787 | make_relative_path(ret_path, HTMLDIR, PGBINDIR, my_exec_path);
  |  ^~~
path.c:787:38: note: each undeclared identifier is reported only once
for each function it appears in
path.c: In function 'get_man_path':
path.c:796:38: error: 'MANDIR' undeclared (first use in this
function); did you mean 'MAXDIM'?
  796 | make_relative_path(ret_path, MANDIR, PGBINDIR, my_exec_path);
  |  ^~
  |  MAXDIM
make[2]: *** [: path.o] Error 1

It looks like some dependency issue, because this error is followed by:
make[2]: Entering directory '/builddir/build/BUILD/postgresql-13.1/src/port'
echo "#define PGBINDIR \"/usr/bin\"" >pg_config_paths.h
echo "#define PGSHAREDIR \"/usr/share/pgsql\"" >>pg_config_paths.h
echo "#define SYSCONFDIR \"/etc\"" >>pg_config_paths.h
echo "#define INCLUDEDIR \"/usr/include\"" >>pg_config_paths.h
echo "#define PKGINCLUDEDIR \"/usr/include/pgsql\"" >>pg_config_paths.h
echo "#define INCLUDEDIRSERVER \"/usr/include/pgsql/server\""
>>pg_config_paths.h
echo "#define LIBDIR \"/usr/lib64\"" >>pg_config_paths.h
echo "#define PKGLIBDIR \"/usr/lib64/pgsql\"" >>pg_config_paths.h
echo "#define LOCALEDIR \"/usr/share/locale\"" >>pg_config_paths.h
echo "#define DOCDIR \"/usr/share/doc//pgsql\"" >>pg_config_paths.h
echo "#define HTMLDIR \"/usr/share/doc//pgsql\"" >>pg_config_paths.h
echo "#define MANDIR \"/usr/share/man\"" >>pg_config_paths.h
make[2]: Leaving directory '/builddir/build/BUILD/postgresql-13.1/src/port'

So from my point of view, pg_config_paths.h header file finished
*after* it was used to compile code.

This issue is probably caused by parallel build because sometimes build succeed.

Complete build log: log 

Regards,

-Filip-


RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-04 Thread houzj.f...@fujitsu.com
On Wednesday, August 4, 2021 1:47 PM Masahiko Sawada 
> On Mon, Aug 2, 2021 at 10:52 PM houzj.f...@fujitsu.com
>  wrote:
> >
> >
> > Hi hackers,
> >
> > When testing some other logical replication related patches, I found
> > two unexpected behaviours about ALTER SUBSCRIPTION ADD/DROP
> PUBLICATION.
> >
> > (1)
> > when I execute the following sqls[1], the data of tables that
> > registered to 'pub' wasn't copied to the subscriber side which means
> > tablesync worker didn't start.
> >
> > -sub originally had 2 pub nodes(pub,pub2) ALTER SUBSCRIPTION sub
> > drop PUBLICATION pub; ALTER SUBSCRIPTION sub add PUBLICATION pub;
> > -
> >
> > (2)
> > And when I execute the following sqls, the data of table registered to 
> > 'pub2'
> > are synced again.
> >
> > -sub originally had 2 pub nodes(pub,pub2) ALTER SUBSCRIPTION sub
> > drop PUBLICATION pub; ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
> > -
> 
> I could reproduce this issue by the above steps.

Thanks for looking into this problem.

> 
> I've not looked at the patch deeply yet but I think that the following one 
> line
> change seems to fix the cases you reported, although I migit be missing
> something:
> 
> diff --git a/src/backend/commands/subscriptioncmds.c
> b/src/backend/commands/subscriptioncmds.c
> index 22ae982328..c74d6d51d6 100644
> --- a/src/backend/commands/subscriptioncmds.c
> +++ b/src/backend/commands/subscriptioncmds.c
> @@ -1068,7 +1068,7 @@ AlterSubscription(ParseState *pstate,
> AlterSubscriptionStmt *stmt,
> PreventInTransactionBlock(isTopLevel, "ALTER
> SUBSCRIPTION with refresh");
> 
> /* Only refresh the added/dropped list of publications. */
> -   sub->publications = stmt->publication;
> +   sub->publications = publist;
> 
> AlterSubscription_refresh(sub, opts.copy_data);
> }

I considered the same fix before, but with the above fix, it seems refresh both
existsing publications and new publication, which most of people didn't like in
previous discussion[1]. From the doc[2], IIRC, Currently, the ADD/DROP
PUBLICATION is supposed to only refresh the new publication.

[1] 
https://www.postgresql.org/message-id/MEYP282MB166921C8622675A5C0701C31B6BB0%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM

[2] https://www.postgresql.org/docs/14/sql-altersubscription.html
By default, this command will also act like REFRESH PUBLICATION, except that in
case of ADD or DROP, only the added or dropped publications are refreshed.

> Also, I think we need to add some regression tests for this.
> Currently, subscription.sql has tests for ADD/DROP PUBLICATION but they don't
> check pg_subscription_rel.

Currently, the subscription.sql doesn't have actual tables to
be subscribed which means the pg_subscription_rel is empty. I am not sure where
to place the testcases, temporarily placed in 001_rep_changes.pl.

Best regards,
houzj


v2-0001-fix-ALTER-SUB-ADD-DROP-PUBLICATION.patch
Description: v2-0001-fix-ALTER-SUB-ADD-DROP-PUBLICATION.patch


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Pavel Borisov
ср, 4 авг. 2021 г. в 07:41, Greg Nancarrow :

> On Wed, Aug 4, 2021 at 3:21 AM Robert Haas  wrote:
> >
> >The idea I sort of had floating around in my mind is a little
> >different than what Greg has implemented. I was thinking that we could
> >just skip SerializeSnapshot and the corresponding shm_toc_allocate()
> >if !IsolationUsesXactSnapshot(). Then on the restore side we could
> >just call shm_toc_lookup() with noError = true and skip
> >RestoreTransactionSnapshot/RestoreSnapshot if it returns NULL.
>
> I've tried to follow your description and have attached a patch to
> hopefully match it, but it doesn't pass "make check-world".
> Perhaps I messed something up (apologies if so), or additional changes
> are needed to match what you had in mind or correct additional issues
> you didn't foresee?
>
> t/001_pgbench_with_server.pl .. 10/?
> #   Failed test 'pgbench scale 1 initialization status (got 1 vs expected
> 0)'
> #   at t/001_pgbench_with_server.pl line 108.
> ...
> # creating primary keys...
> # pgbench: fatal: query failed: ERROR:  cannot take query snapshot
> during a parallel operation
> # CONTEXT:  parallel worker
> # pgbench: query was: alter table pgbench_accounts add primary key (aid)
>
> Greg, thanks for the fast response! I suppose that a check
for IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot
for the correct processing of a case with NULL transaction snapshot.
This corrects mentioned check-world test.
PFA v7 patch.


v7-0001-Correct-usage-of-transaction-snapshot-in-a-parall.patch
Description: Binary data


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Greg Nancarrow
On Wed, Aug 4, 2021 at 7:55 PM Pavel Borisov  wrote:
>
>>
> Greg, thanks for the fast response! I suppose that a check for 
> IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot for 
> the correct processing of a case with NULL transaction snapshot.
> This corrects mentioned check-world test.
> PFA v7 patch.

Ah, thanks for that (I didn't debug that failure).
But is the coredump issue reproducible now? (using v7 and your test script)


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Lowering the ever-growing heap->pd_lower

2021-08-04 Thread Matthias van de Meent
On Wed, 4 Aug 2021 at 03:51, Peter Geoghegan  wrote:
>
> We generate an FPI the first time a page is modified after a
> checkpoint. The FPI consists of the page *after* it has been modified.

In that case, I misremembered when FPIs were written with relation to
checkpoints. Thanks for reminding me.

The point of non-FPIs as a replacement could still be valid, except
for the point below making this yet more unlikely.

> Presumably this optimization would need the heap page to be 100%
> empty, so we're left with what seems to me to be a very narrow target
> for optimization; something that is naturally rare.

Yes.

> A fully-empty page seems quite unlikely in the case of xl_heap_vacuum
> records, and impossible in the case of xl_heap_prune records. Even
> with all the patches, working together. Have I missed something?

No, you're correct. xl_heap_prune cannot ever empty pages, as it
leaves at least 1 dead, or 1 redirect + 1 normal, line pointer on the
page.

Furthermore, it is indeed quite unlikely that the 2nd pass of vacuum
will be the first page modification after a checkpoint; it is quite a
bit more likely that the page was first modified by the 1st vacuum
pass. Although, this chance on the first modification since checkpoint
being made by the second pass is increased by indexless tables
(however unlikely, they exist in practice) and the autovacuum index
cleanup delay mechanisms allowing pages with only dead item pointers
to remain dead for more than just this one vacuum run, but the chances
on fully clearing the page are indeed very, very slim.

Kind regards,

Matthias van de Meent




Re: Added schema level support for publication.

2021-08-04 Thread Amit Kapila
On Tue, Aug 3, 2021 at 8:38 PM vignesh C  wrote:
>
> Thanks for reporting this, this is fixed in the v18 patch attached.
>

I have started looking into this patch and below are some initial comments.

1.
+ /* Fetch publication name and schema oid from input list */
+ schemaname = strVal(linitial(object));
+ pubname = strVal(lsecond(object));

I think the comment should be: "Fetch schema name and publication name
from input list"

2.
@@ -3902,6 +3958,46 @@ getObjectDescription(const ObjectAddress
*object, bool missing_ok)
  break;
  }

+ case OCLASS_PUBLICATION_SCHEMA:
+ {
+ HeapTuple tup;
+ char*pubname;
+ Form_pg_publication_schema psform;
+ char*nspname;
+
+ tup = SearchSysCache1(PUBLICATIONSCHEMA,
+   ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ {
+ if (!missing_ok)
+ elog(ERROR, "cache lookup failed for publication schema %u",
+ object->objectId);
+ break;
+ }
+
+ psform = (Form_pg_publication_schema) GETSTRUCT(tup);
+ pubname = get_publication_name(psform->pspubid, false);
+ nspname = get_namespace_name(psform->psnspcid);
+ if (!nspname)
+ {
+ Oid psnspcid = psform->psnspcid;
+
+ pfree(pubname);
+ ReleaseSysCache(tup);
+ if (!missing_ok)
+ elog(ERROR, "cache lookup failed for schema %u",
+ psnspcid);
+ break;
+ }

The above code in getObjectDescription looks quite similar to what you
have in getObjectIdentityParts(). Can we extract the common code into
a separate function?

3. Can we use column name pubkind (similar to relkind in pg_class)
instead of pubtype? If so, please change PUBTYPE_ALLTABLES and similar
other defines to PUBKIND_*.


4.
@@ -3632,6 +3650,7 @@ typedef struct CreatePublicationStmt
  List*options; /* List of DefElem nodes */
  List*tables; /* Optional list of tables to add */
  bool for_all_tables; /* Special publication for all tables in db */
+ List*schemas; /* Optional list of schemas */
 } CreatePublicationStmt;

Isn't it better to keep a schemas list after tables?

5.
@@ -1163,12 +1168,27 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
  Publication *pub = lfirst(lc);
  bool publish = false;

- if (pub->alltables)
+ if (pub->pubtype == PUBTYPE_ALLTABLES)
  {
  publish = true;
  if (pub->pubviaroot && am_partition)
  publish_as_relid = llast_oid(get_partition_ancestors(relid));
  }
+ else if (pub->pubtype == PUBTYPE_SCHEMA)
+ {
+ Oid schemaId = get_rel_namespace(relid);
+ Oid psid = GetSysCacheOid2(PUBLICATIONSCHEMAMAP,
+Anum_pg_publication_schema_oid,
+ObjectIdGetDatum(schemaId),
+ObjectIdGetDatum(pub->oid));
+
+ if (OidIsValid(psid))
+ {
+ publish = true;
+ if (pub->pubviaroot && am_partition)
+ publish_as_relid = llast_oid(get_partition_ancestors(relid));
+ }
+ }

Isn't it better to get schema publications once as we get relation
publications via GetRelationPublications and then decide whether to
publish or not? I think that will save repeated cache searches for
each publication requested by the subscriber?

6.
+ {PublicationSchemaRelationId, /* PUBLICATIONSCHEMAMAP */
+ PublicationSchemaPsnspcidPspubidIndexId,
+ 2,
+ {
+ Anum_pg_publication_schema_psnspcid,
+ Anum_pg_publication_schema_pspubid,
+ 0,
+ 0
+ },

Why don't we keep pubid as the first column in this index?

7.
getPublicationSchemas()
{
..
+ /* Get the publication membership for the schema */
+ printfPQExpBuffer(query,
+   "SELECT ps.psnspcid, ps.oid, p.pubname, p.oid AS pubid "
+   "FROM pg_publication_schema ps, pg_publication p "
+   "WHERE ps.psnspcid = '%u' "
+   "AND p.oid = ps.pspubid",
+   nsinfo->dobj.catId.oid);
..
}

Why do you need to use join here? Why the query and another handling
in this function be similar to what we have in getPublicationTables?
Also, there is another function GetPublicationSchemas() in the patch,
can we name one of these differently for the purpose of easy
grepability?

-- 
With Regards,
Amit Kapila.




Re: [HACKERS] logical decoding of two-phase transactions

2021-08-04 Thread Amit Kapila
On Wed, Aug 4, 2021 at 6:51 AM tanghy.f...@fujitsu.com
 wrote:
>
> On Tuesday, August 3, 2021 6:03 PM vignesh C wrote:
> >
> > On Tue, Aug 3, 2021 at 12:32 PM Amit Kapila  wrote:
> > >
> > > On Tue, Aug 3, 2021 at 6:17 AM Peter Smith  wrote:
> > > >
> > > > Please find attached the latest patch set v102*
> > > >
> > >
> > > I have made minor modifications in the comments and docs, please see
> > > attached. Can you please check whether the names of contributors in
> > > the commit message are correct or do we need to change it?
> >
> > The patch applies neatly, the tests pass and documentation built with
> > the updates provided. I could not find any comments. The patch looks
> > good to me.
> >
>
> I did some stress tests on the patch and found no issues.
> It also works well when using synchronized replication.
> So the patch LGTM.
>

I have pushed this last patch in the series.

-- 
With Regards,
Amit Kapila.




Re: Lowering the ever-growing heap->pd_lower

2021-08-04 Thread Matthias van de Meent
On Wed, 4 Aug 2021 at 02:43, Peter Geoghegan  wrote:
>
> On Mon, Aug 2, 2021 at 11:57 PM Simon Riggs
>  wrote:
> > 2. Reduce number of line pointers to 0 in some cases.
> > Matthias - I don't think you've made a full case for doing this, nor
> > looked at the implications.
> > The comment clearly says "it seems like a good idea to avoid leaving a
> > PageIsEmpty()" page behind.
> > So I would be inclined to remove that from the patch and consider that
> > as a separate issue, or close this.
>
> This was part of that earlier commit because of sheer paranoia;
> nothing more. I actually think that it's highly unlikely to protect us
> from bugs in practice. Though I am, in a certain sense, likely to be
> wrong about "PageIsEmpty() defensiveness", it does not bother me in
> the slightest. It seems like the right approach in the absence of new
> information about a significant downside. If my paranoia really did
> turn out to be justified, then I would expect that there'd be a
> subtle, nasty bug. That possibility is what I was really thinking of.
> And so it almost doesn't matter to me how unlikely we might think such
> a bug is now, unless and until somebody can demonstrate a real
> practical downside to my defensive approach.

As I believe I have mentioned before, there is one significant
downside: 32-bit systems cannot reuse pages that contain only a
singular unused line pointer for max-sized FSM-requests. A fresh page
has 8168 bytes free (8kB - 24B page header), which then becomes 8164
when returned from PageGetFreeSpace (it acocunts for space used by the
line pointer when inserting items onto the page).

On 64-bit systems, MaxHeapTupleSize is 8160, and for for 32-bit
systems the MaxHeapTupleSize is 8164. When we leave one more unused
line pointer on the page, this means the page will have a
PageGetFreeSpace of 8160, 4 bytes less than the MaxHeapTupleSize on
32-bit systems. As such, there will never be FSM entries of the
largest category for pages that have had data on those systems, and as
such, those systems will need to add pages for each request of the
largest category, meaning that all tuples larger than 8128 bytes
(largest request that would request the 254-category) will always be
put on a new page, regardless of the actual availability of free space
in the table.

You might argue that this is a problem in the FSM subsystem, but in
this case it actively hinders us from reusing pages in the largest
category of FSM-requests. If you would argue 'PageGetHeapFreeSpace
should keep free line pointers in mind when calculating free space',
then I would argue 'yes, but isn't it better then to also actually
fully mark that space as unused'.

All in all, I'd just rather remove the distinction between once-used
pages and fresh pages completely by truncating the LP-array to 0 than
to leave this bloating behaviour in the system.

Kind regards,

Matthias van de Meent.




Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-04 Thread Masahiko Sawada
On Wed, Aug 4, 2021 at 5:06 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, August 4, 2021 1:47 PM Masahiko Sawada 
> > On Mon, Aug 2, 2021 at 10:52 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > >
> > > Hi hackers,
> > >
> > > When testing some other logical replication related patches, I found
> > > two unexpected behaviours about ALTER SUBSCRIPTION ADD/DROP
> > PUBLICATION.
> > >
> > > (1)
> > > when I execute the following sqls[1], the data of tables that
> > > registered to 'pub' wasn't copied to the subscriber side which means
> > > tablesync worker didn't start.
> > >
> > > -sub originally had 2 pub nodes(pub,pub2) ALTER SUBSCRIPTION sub
> > > drop PUBLICATION pub; ALTER SUBSCRIPTION sub add PUBLICATION pub;
> > > -
> > >
> > > (2)
> > > And when I execute the following sqls, the data of table registered to 
> > > 'pub2'
> > > are synced again.
> > >
> > > -sub originally had 2 pub nodes(pub,pub2) ALTER SUBSCRIPTION sub
> > > drop PUBLICATION pub; ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
> > > -
> >
> > I could reproduce this issue by the above steps.
>
> Thanks for looking into this problem.
>
> >
> > I've not looked at the patch deeply yet but I think that the following one 
> > line
> > change seems to fix the cases you reported, although I migit be missing
> > something:
> >
> > diff --git a/src/backend/commands/subscriptioncmds.c
> > b/src/backend/commands/subscriptioncmds.c
> > index 22ae982328..c74d6d51d6 100644
> > --- a/src/backend/commands/subscriptioncmds.c
> > +++ b/src/backend/commands/subscriptioncmds.c
> > @@ -1068,7 +1068,7 @@ AlterSubscription(ParseState *pstate,
> > AlterSubscriptionStmt *stmt,
> > PreventInTransactionBlock(isTopLevel, "ALTER
> > SUBSCRIPTION with refresh");
> >
> > /* Only refresh the added/dropped list of publications. 
> > */
> > -   sub->publications = stmt->publication;
> > +   sub->publications = publist;
> >
> > AlterSubscription_refresh(sub, opts.copy_data);
> > }
>
> I considered the same fix before, but with the above fix, it seems refresh 
> both
> existsing publications and new publication, which most of people didn't like 
> in
> previous discussion[1]. From the doc[2], IIRC, Currently, the ADD/DROP
> PUBLICATION is supposed to only refresh the new publication.

What do you mean by refreshing both existing and new publications? I
dropped and created one publication out of three publications that the
subscription is subscribing to but the corresponding entries in
pg_subscription_rel for tables associated with the other two
publications were not changed and the table sync workers also were not
started for them.

>
> > Also, I think we need to add some regression tests for this.
> > Currently, subscription.sql has tests for ADD/DROP PUBLICATION but they 
> > don't
> > check pg_subscription_rel.
>
> Currently, the subscription.sql doesn't have actual tables to
> be subscribed which means the pg_subscription_rel is empty. I am not sure 
> where
> to place the testcases, temporarily placed in 001_rep_changes.pl.

Right. Adding tests to 001_rep_changes.pl seems good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: speed up verifying UTF-8

2021-08-04 Thread John Naylor
I wrote:
> If we have only 16 bytes in the input, it still seems to be faster to use
SSE, even though it's called through a function pointer on x86. I didn't
test the DFA path, but I don't think the conclusion would be different.
I'll include the 16 threshold next time I need to update the patch.

v22 attached, which changes the threshold to 16, with a few other cosmetic
adjustments, mostly in the comments.

--
John Naylor
EDB: http://www.enterprisedb.com


v22-0001-Add-fast-paths-for-validating-UTF-8-text.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2021-08-04 Thread Masahiko Sawada
On Tue, Aug 3, 2021 at 7:54 PM vignesh C  wrote:
>
> On Tue, Aug 3, 2021 at 12:20 PM Masahiko Sawada  wrote:
> >
> > On Mon, Aug 2, 2021 at 12:21 PM Amit Kapila  wrote:
> > >
> > > On Mon, Aug 2, 2021 at 7:45 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada 
> > > > >  wrote:
> > > > >
> > > > > Setting up logical rep error context in a generic function looks a bit
> > > > > odd to me. Do we really need to set up error context here? I
> > > > > understand we can't do this in caller but anyway I think we are not
> > > > > sending this to logical replication view as well, so not sure we need
> > > > > to do it here.
> > > >
> > > > Yeah, I'm not convinced of this part yet. I wanted to show relid also
> > > > in truncate cases but I came up with only this idea.
> > > >
> > > > If an error happens during truncating the table (in
> > > > ExecuteTruncateGuts()), relid set by
> > > > set_logicalrep_error_context_rel() is actually sent to the view. If we
> > > > don’t have it, the view always shows relid as NULL in truncate cases.
> > > > On the other hand, it doesn’t cover all cases. For example, it doesn’t
> > > > cover an error that the target table doesn’t exist on the subscriber,
> > > > which happens when opening the target table. Anyway, in most cases,
> > > > even if relid is NULL, the error message in the view helps users to
> > > > know which relation the error happened on. What do you think?
> > > >
> > >
> > > Yeah, I also think at this stage error message is sufficient in such 
> > > cases.
> >
> > I've attached new patches that incorporate all comments I got so far.
> > Please review them.
>
> I had a look at the first patch, couple of minor comments:
> 1) Should we include this in typedefs.lst
> +/* Struct for saving and restoring apply information */
> +typedef struct ApplyErrCallbackArg
> +{
> +   LogicalRepMsgType command;  /* 0 if invalid */
> +
> +   /* Local relation information */
> +   char   *nspname;
>
> 2)  We can keep the case statement in the same order as in the
> LogicalRepMsgType enum, this will help in easily identifying if any
> enum gets missed.
> +   case LOGICAL_REP_MSG_RELATION:
> +   return "RELATION";
> +   case LOGICAL_REP_MSG_TYPE:
> +   return "TYPE";
> +   case LOGICAL_REP_MSG_ORIGIN:
> +   return "ORIGIN";
> +   case LOGICAL_REP_MSG_MESSAGE:
> +   return "MESSAGE";
> +   case LOGICAL_REP_MSG_STREAM_START:
> +   return "STREAM START";
>

Thank you for reviewing the patch!

I agreed with all comments and will fix them in the next version patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-08-04 Thread Masahiko Sawada
On Wed, Aug 4, 2021 at 1:02 PM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, August 3, 2021 2:49 PM Masahiko Sawada  
> wrote:
> >
> > I've attached new patches that incorporate all comments I got so far.
> > Please review them.
>
> Hi,
>
> I had a few comments for the 0003 patch.

Thanks for reviewing the patch!

>
> 1).
> -  This clause alters parameters originally set by
> -  .  See there for more
> -  information.  The parameters that can be altered
> -  are slot_name,
> -  synchronous_commit,
> -  binary, and
> -  streaming.
> +  This clause sets or resets a subscription option. The parameters that 
> can be
> +  set are the parameters originally set by  linkend="sql-createsubscription"/>:
> +  slot_name, synchronous_commit,
> +  binary, streaming.
> + 
> + 
> +   The parameters that can be reset are: streaming,
> +   binary, synchronous_commit.
>
> Maybe the doc looks better like the following ?
>
> +  This clause alters parameters originally set by
> +  .  See there for more
> +  information.  The parameters that can be set
> +  are slot_name,
> +  synchronous_commit,
> +  binary, and
> +  streaming.
> + 
> + 
> +   The parameters that can be reset are: streaming,
> +   binary, synchronous_commit.

Agreed.

>
> 2).
> -   opts->create_slot = defGetBoolean(defel);
> +   if (!is_reset)
> +   opts->create_slot = defGetBoolean(defel);
> }
>
> Since we only support RESET streaming/binary/synchronous_commit, it
> might be unnecessary to add the check 'if (!is_reset)' for other
> option.

Good point.

>
> 3).
> typedef struct AlterSubscriptionStmt
> {
> NodeTag type;
> AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_OPTIONS, etc */
>
> Since the patch change the remove the enum value
> 'ALTER_SUBSCRIPTION_OPTIONS', it'd better to change the comment here
> as well.

Agreed.

I'll incorporate those comments in the next version patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Greg Nancarrow
On Wed, Aug 4, 2021 at 8:17 PM Greg Nancarrow  wrote:
>
> Ah, thanks for that (I didn't debug that failure).
> But is the coredump issue reproducible now? (using v7 and your test script)
>

Er, with the v7 patch, the problem still occurs (that Assert still
fires during a run of the SubTransGetTopmostTransaction-rep.sh
script).
Stracktrace from the coredump is below:

#0  0x7f06e5e7d37f in raise () from /lib64/libc.so.6
#1  0x7f06e5e67db5 in abort () from /lib64/libc.so.6
#2  0x00b06547 in ExceptionalCondition (
conditionName=0xba8d78 "TransactionIdFollowsOrEquals(xid,
TransactionXmin)", errorType=0xba8d0b "FailedAssertion",
fileName=0xba8d00 "subtrans.c", lineNumber=156) at assert.c:69
#3  0x00576f0f in SubTransGetTopmostTransaction (xid=3676) at
subtrans.c:156
#4  0x00b5f55a in XidInMVCCSnapshot (xid=3676,
snapshot=0x2e44560) at snapmgr.c:2293
#5  0x0050e014 in HeapTupleSatisfiesMVCC (htup=0x7ffdc6dee6f0,
snapshot=0x2e44560, buffer=15559)
at heapam_visibility.c:1070
#6  0x0050f148 in HeapTupleSatisfiesVisibility
(tup=0x7ffdc6dee6f0, snapshot=0x2e44560, buffer=15559)
at heapam_visibility.c:1771
#7  0x004f1d26 in heapgetpage (sscan=0x2e9a9a0, page=10310) at
heapam.c:466
#8  0x004f45cf in heapgettup_pagemode (scan=0x2e9a9a0,
dir=ForwardScanDirection, nkeys=0, key=0x0)
at heapam.c:1118
#9  0x004f4c91 in heap_getnextslot (sscan=0x2e9a9a0,
direction=ForwardScanDirection, slot=0x2e8f9a0)
at heapam.c:1396
#10 0x0077abb1 in table_scan_getnextslot (sscan=0x2e9a9a0,
direction=ForwardScanDirection, slot=0x2e8f9a0)
at ../../../src/include/access/tableam.h:1044
#11 0x0077ac79 in SeqNext (node=0x2e8f428) at nodeSeqscan.c:80
#12 0x0073d997 in ExecScanFetch (node=0x2e8f428,
accessMtd=0x77abe5 , recheckMtd=0x77ac8a )
at execScan.c:133


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Pavel Borisov
ср, 4 авг. 2021 г. в 14:18, Greg Nancarrow :

> On Wed, Aug 4, 2021 at 7:55 PM Pavel Borisov 
> wrote:
> >
> >>
> > Greg, thanks for the fast response! I suppose that a check for
> IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot for
> the correct processing of a case with NULL transaction snapshot.
> > This corrects mentioned check-world test.
> > PFA v7 patch.
>
> Ah, thanks for that (I didn't debug that failure).
> But is the coredump issue reproducible now? (using v7 and your test script)
>
> Now I've run my test script attached above in the thread on v6 and v7 and
quite soon got crashes with the Assert and a backtrace identical to the
original one. So it may be useful for further development, but now it is
not enough to fix the original crash.

And the same script run on v2/v5 patch was completed without crash at every
isolation level, I've tested i.e. READ COMMITTED, REPEATABLE READ and
SERIALIZABLE. If I remember correctly none of us could demonstrate any
errors with REPEATABLE READ and SERIALIZABLE on v2/v5. That fact was the
base of my proposal to commit v2/v5 i.e. to fix the obvious bug and let the
further improvements (if any) be potentially done later.

At SERIALIZABLE level with v2/v5 I get an error which I don't have before
the patch (but no crash):
pgbench: error: client 6 script 0 aborted in command 594 query 0: ERROR:
 could not serialize access due to read/write dependencies among
transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during
conflict out checking.


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Pavel Borisov
>
> At SERIALIZABLE level with v2/v5 I get an error which I don't have before
> the patch (but no crash):
> pgbench: error: client 6 script 0 aborted in command 594 query 0: ERROR:
>  could not serialize access due to read/write dependencies among
> transactions
> DETAIL:  Reason code: Canceled on identification as a pivot, during
> conflict out checking.
>
I should correct myself: the mentioned error under SERIALIZABLE is also
present before the patch.


RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-04 Thread houzj.f...@fujitsu.com
On Wednesday, August 4, 2021 7:00 PM Masahiko Sawada  
wrote
> On Wed, Aug 4, 2021 at 5:06 PM houzj.f...@fujitsu.com 
>  wrote:
> >
> > On Wednesday, August 4, 2021 1:47 PM Masahiko Sawada 
> > >
> > > I've not looked at the patch deeply yet but I think that the
> > > following one line change seems to fix the cases you reported,
> > > although I migit be missing
> > > something:
> > >
> > > diff --git a/src/backend/commands/subscriptioncmds.c
> > > b/src/backend/commands/subscriptioncmds.c
> > > index 22ae982328..c74d6d51d6 100644
> > > --- a/src/backend/commands/subscriptioncmds.c
> > > +++ b/src/backend/commands/subscriptioncmds.c
> > > @@ -1068,7 +1068,7 @@ AlterSubscription(ParseState *pstate,
> > > AlterSubscriptionStmt *stmt,
> > > PreventInTransactionBlock(isTopLevel, "ALTER
> > > SUBSCRIPTION with refresh");
> > >
> > > /* Only refresh the added/dropped list of 
> > > publications.
> */
> > > -   sub->publications = stmt->publication;
> > > +   sub->publications = publist;
> > >
> > > AlterSubscription_refresh(sub, opts.copy_data);
> > > }
> >
> > I considered the same fix before, but with the above fix, it seems
> > refresh both existsing publications and new publication, which most of
> > people didn't like in previous discussion[1]. From the doc[2], IIRC,
> > Currently, the ADD/DROP PUBLICATION is supposed to only refresh the new
> publication.
> 
> What do you mean by refreshing both existing and new publications? I dropped
> and created one publication out of three publications that the subscription is
> subscribing to but the corresponding entries in pg_subscription_rel for tables
> associated with the other two publications were not changed and the table sync
> workers also were not started for them.
> 

+   sub->publications = publist;
-   sub->publications = stmt->publication;
With the above fix, When ADD/DROP PUBLICATION, I meant the table that doesn't
belong to the dropped or added publication could be updated in
pg_subscription_rel.

I can see the effect in the following example:

1)-publisher-
CREATE TABLE test,test2,test3
CREATE PUBLICATION pub FOR TABLE test;
CREATE PUBLICATION pub2 FOR TABLE test2;
2)-subscriber-
CREATE TABLE test,test2,test3
CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=1' PUBLICATION 
pub,pub2;
select * from pg_subscription_rel;
-[ RECORD 1 ]-
srsubid| 16393
srrelid| 16387
srsubstate | r
srsublsn   | 0/150A2D8
-[ RECORD 2 ]-
srsubid| 16393
srrelid| 16384
srsubstate | r
srsublsn   | 0/150A310

3)-publisher-
alter publication pub add table test3;
4)-subscriber-
ALTER SUBSCRIPTION sub drop PUBLICATION pub2;
select * from pg_subscription_rel;
-[ RECORD 1 ]-
srsubid| 16393
srrelid| 16384
srsubstate | r
srsublsn   | 0/150A310
-[ RECORD 2 ]-
srsubid| 16393
srrelid| 16390
srsubstate | r
srsublsn   |

I can see the [RECORD 2] which is the new registered table in 'pub2' (see step 
3)
has been added to the pg_subscription_rel. Based pervious discussion and
document, that table seems shouldn't be refreshed when drop another
publication.

Best regards,
houzj


Re: Failed transaction statistics to measure the logical replication progress

2021-08-04 Thread Amit Kapila
On Wed, Aug 4, 2021 at 12:35 PM Masahiko Sawada  wrote:
>
> On Wed, Aug 4, 2021 at 12:17 PM Amit Kapila  wrote:
> >
> > On Wed, Aug 4, 2021 at 6:19 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Aug 3, 2021 at 6:11 PM Amit Kapila  
> > > wrote:
> > > >
> > > > I was trying to think based on similar counters in pg_stat_database
> > > > but if you think there is a value in showing errored and actual
> > > > rollbacked transactions separately then we can do that but how do you
> > > > think one can make use of it?
> > >
> > > I'm concerned that the value that includes both errored and actual
> > > rollbacked transactions doesn't make sense in practice since the
> > > number of errored transactions can easily get increased once a
> > > conflict happens. IMO the errored transaction doesn’t not necessarily
> > > necessary since the number of (successive) errors that happened on the
> > > subscription is tracked by pg_stat_subscription_errors view.
> > >
> >
> > It sounds awkward to display two of the xact (xact_commit,
> > xact_rollback) counters in one view and then the other similar counter
> > (xact_error or something like that) in another view. Isn't it better
> > to display all of them together possibly in pg_stat_subscription? I
> > guess it might be a bit tricky to track counters for tablesync workers
> > separately but we can track them in the corresponding subscription.
>
> I meant that the number of rolled back transactions due to an error
> seems not to be necessary since pg_stat_subscription_errors has a
> similar value.
>

I got that point.

> So what I imagined is that we have xact_commit and
> xact_rollback (counting only actual rollbacked transaction) counters
> in pg_stat_subscription.  What do you think of this idea? Or did you
> mean the number of errored transactions also has value so should be
> included in pg_stat_subscription along with xact_commit and
> xact_rollback?
>

I meant the later one. I think it might be better to display all three
(xact_commit, xact_abort, xact_error) in one place. Earlier it made
sense to me to display it in pg_stat_subscription_errors but not sure
after this proposal. Won't it be better for users to see all the
counters in one view?

> Originally I thought your proposal of having the number of rollback
> transactions includes both errored transactions and actual rolled back
> transactions so my point was it's better to separate them and include
> only the actual rolled-back transaction.
>

I am fine with your proposal to separate the actual rollback and error
transactions counter but I thought it would be better to display them
in one view.

-- 
With Regards,
Amit Kapila.




Re: A varint implementation for PG?

2021-08-04 Thread Robert Haas
On Tue, Aug 3, 2021 at 3:32 PM Andres Freund  wrote:
> I am now wondering if what we're talking about here would best be thought of
> not as a variable width integer type, but a variable-width encoding for all
> pass-by-value types.
>
> Leaving on-disk compatibility aside (:)), ISTM that we by default could use
> the following heuristic to decide how to encode pass-by-value types: If it's a
> leading fixed-width NOT NULL type, store it in fixed-length
> encoding. Otherwise use a variable-length encoding.

This is pretty integer-centric, though. If your pass-by-value type is
storing timestamps, for example, they're not likely to be especially
close to zero. Since a 64-bit address is pretty big, perhaps they're
still close enough to zero that this will work out to a win, but I
don't know, that seems a bit cheesy. I grant that it could work out to
a win -- pass-by-value data types whose distribution is very different
from what's typical for integers, or for that matter columns full of
integers that all happen to be toward the extreme values the data type
can store, are probably not that common. I just don't really like
making such assumptions on a system-wide basis (as opposed to a
per-datatype basis where it's easier to reason about the
consequences).

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Robert Haas
On Tue, Aug 3, 2021 at 11:41 PM Greg Nancarrow  wrote:
> I've tried to follow your description and have attached a patch to
> hopefully match it, but it doesn't pass "make check-world".
> Perhaps I messed something up (apologies if so), or additional changes
> are needed to match what you had in mind or correct additional issues
> you didn't foresee?

This is the sort of thing I was thinking about but I don't understand
why it doesn't fix the reported problem. Apparently I haven't
correctly understood what the issue is.

> >I don't know why Greg's patch is changing anything related to the
> >active snapshot (as opposed to the transaction snapshot). Maybe
> >there's a reason why we need that change, but I don't know what it is.
>
> I don't think my v2/v5 patch is changing anything related to the
> active snapshot (is it?).
> It's restoring the serialized active snapshot, installing it as the
> transaction snapshot and setting it as the active snapshot.

Why do you think it's right to install the serialized *active*
snapshot as the *transaction* snapshot? I've been operating on the
presumption that we wanted the worker to install the leader's
transaction snapshot as its transaction snapshot as its transaction
snapshot and the leader's active snapshot as its active snapshot,
because in my mind the active snapshot and the transaction snapshot
are two separate things. However, there's clearly something wrong with
that line of reasoning, because apparently your proposal fixes the
problem and mine doesn't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Possible dependency issue in makefile

2021-08-04 Thread Tom Lane
Filip Janus  writes:
> I am building libecpg 13.1 but 13.3 behaves in the same manner and my build
> fails with:
> ...
> Complete build log: log 

It looks like you're just running configure and then trying to do this:

/usr/bin/make -O -j40 V=1 VERBOSE=1 -C src/interfaces/ecpg

which is probably not a great idea.  It bypasses the .NOTPARALLEL:
in src/Makefile, which would normally ensure that src/port gets
built before src/interfaces.  If you want to not build the entire
system, I think a minimal approach would be to make src/port,
src/common, src/interfaces/libpq, then src/interfaces/ecpg, in series.

As-is, it looks like two different sub-makes are recursing to build
pg_config_paths.h concurrently.  Since the rule for that is

pg_config_paths.h: $(top_builddir)/src/Makefile.global
echo "#define PGBINDIR \"$(bindir)\"" >$@
echo "#define PGSHAREDIR \"$(datadir)\"" >>$@
...
echo "#define HTMLDIR \"$(htmldir)\"" >>$@
echo "#define MANDIR \"$(mandir)\"" >>$@

it's not too surprising that concurrent builds misbehave.  I don't
know of any way to prevent make from doing that other than
sprinkling .NOTPARALLEL around a lot more, which would defeat
your purpose in using -j in the first place.

We could possibly adjust this specific rule to create pg_config_paths.h
atomically (say, write to a temp file and then "mv" it into place), but
I don't have any faith that there's not other similar issues behind
this one.  Building ecpg by itself is not a case that we test or consider
supported.

regards, tom lane




Release 13.1 of the PostgreSQL BuildFarm client

2021-08-04 Thread Andrew Dunstan


I have just pushed Release 13.1 of the PostgreSQL BuildFarm client.

This update to Release 13 fixes errors that occur from the new default
branch name checking code when used with versions of git that are too
old. This code is now disabled if the git version is not capable of
running it, and in verbose mode a warning is printed. The warning can be
disabled by explicitly setting "skip_git_default_check => 1" in the
config file. In either case, the owner will need to update their git
installation or remove all branch and mirror repositories when the
default branch name changes.


Downloads are available at
 and



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





RE: Added schema level support for publication.

2021-08-04 Thread tanghy.f...@fujitsu.com
On Tuesday, August 3, 2021 11:08 PM vignesh C  wrote:
> 
> Thanks for reporting this, this is fixed in the v18 patch attached.

Thanks for fixing it.

Few suggestions for V18:

1. 
+# Clean up the tables on both publisher and subscriber as we don't need them
+$node_publisher->safe_psql('postgres', "DROP SCHEMA sch1 cascade");
+$node_publisher->safe_psql('postgres', "DROP SCHEMA sch2 cascade");
+$node_publisher->safe_psql('postgres', "DROP SCHEMA sch3 cascade");
+$node_subscriber->safe_psql('postgres', "DROP SCHEMA sch1 cascade");
+$node_subscriber->safe_psql('postgres', "DROP SCHEMA sch2 cascade");
+$node_subscriber->safe_psql('postgres', "DROP SCHEMA sch3 cascade");

Should we change the comment to "Clean up the schemas ... ", instead of 
'tables'?

2.
+$result = $node_subscriber->safe_psql('postgres',
+"SELECT count(*) FROM sch1.tab3");

Spaces are used here(and some other places), but in most places we use a TAB, so
I think it's better to change it to a TAB.

3.
List   *tables; /* List of tables to add/drop */
boolfor_all_tables; /* Special publication for all tables 
in db */
DefElemAction tableAction;  /* What action to perform with the 
tables */
+   List   *schemas;/* Optional list of schemas */
 } AlterPublicationStmt;

Should we change the comment here to 'List of schemas to add/drop', then it can
be consistent with the comment for 'tables'.

I also noticed that 'tableAction' variable is used when we add/drop/set schemas,
so maybe the variable name is not suitable any more.

Besides, the following comment is above these codes. Should we add some comments
for schema?

/* parameters used for ALTER PUBLICATION ... ADD/DROP TABLE */

And it says 'add/drop', do we need to add 'set'? (it's not related to this
patch, so I think I can add it in another thread[1] if needed, which is related
to comment improvement)

4.
I saw the existing tests about permissions in publication.sql, should we add
tests for schema publication? Like this:

diff --git a/src/test/regress/sql/publication.sql 
b/src/test/regress/sql/publication.sql
index 33dbdf7bed..c19337631e 100644
--- a/src/test/regress/sql/publication.sql
+++ b/src/test/regress/sql/publication.sql
@@ -160,16 +160,19 @@ GRANT CREATE ON DATABASE regression TO 
regress_publication_user2;
 SET ROLE regress_publication_user2;
 SET client_min_messages = 'ERROR';
 CREATE PUBLICATION testpub2;  -- ok
+CREATE PUBLICATION testpub3;  -- ok
 RESET client_min_messages;

 ALTER PUBLICATION testpub2 ADD TABLE testpub_tbl1;  -- fail
+ALTER PUBLICATION testpub3 ADD SCHEMA pub_test;  -- fail

 SET ROLE regress_publication_user;
 GRANT regress_publication_user TO regress_publication_user2;
 SET ROLE regress_publication_user2;
 ALTER PUBLICATION testpub2 ADD TABLE testpub_tbl1;  -- ok
+ALTER PUBLICATION testpub3 ADD SCHEMA pub_test;  -- ok

-DROP PUBLICATION testpub2;
+DROP PUBLICATION testpub2, testpub3;

 SET ROLE regress_publication_user;
 REVOKE CREATE ON DATABASE regression FROM regress_publication_user2;


[1] 
https://www.postgresql.org/message-id/flat/OS0PR01MB6113480F937572BF1216DD61FBEF9%40OS0PR01MB6113.jpnprd01.prod.outlook.com

Regards
Tang


Re: Lowering the ever-growing heap->pd_lower

2021-08-04 Thread Robert Haas
On Tue, Aug 3, 2021 at 8:44 PM Peter Geoghegan  wrote:
> This time it's quite different: we're truncating the line pointer
> array during pruning. Pruning often occurs opportunistically, during
> regular query processing. In fact I'd say that it's far more common
> than pruning by VACUUM. So the chances of line pointer array
> truncation hurting rather than helping seems higher.

How would it hurt?

It's easy to see the harm caused by not shortening the line pointer
array when it is possible to do so: we're using up space in the page
that could have been made free. It's not so obvious to me what the
downside of shortening it might be. I suppose there is a risk that we
shorten it and get no benefit, or even shorten it and have to lengthen
it again almost immediately. But neither of those things really
matters unless shortening is expensive. If we can piggy-back it on an
existing WAL record, I don't really see what the problem is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




RFC: Improve CPU cache locality of syscache searches

2021-08-04 Thread John Naylor
CPU caches have multiple levels, so I had an idea to use that concept in
the syscaches.

Imagine if catcache buckets were cacheline-sized. In that case, we would
store the most recently accessed hash values and pointers to catctup, in
addition to the dlist_head:

typedef struct cc_bucket
{
  uint32 hashes[4];
  catctup *ct[4];
  dlist_head;
};

You can think of the bucket here as a 4-way set associative L1 and the list
walk as an inclusive L2. There might be an existing term for this scheme,
but I don't know what it is offhand.

Creating entries:

Instead of calling dlist_push_head(), we call dlist_push_tail() and then
stash the entry in the L1 so it's still quickly available if it's accessed
in the near future. This way, older entries evicted from L1 will tend to
remain toward the front of the list. Walking the list will always go from
oldest to newest, which might have better prefetch behavior (not sure).

Search:

Buckets with <= 4 entries would only need to access a single cacheline to
get the catctup pointer with the matching hash value. If we have to walk
the list to find a match, we stash it in the L1, which is faster than
calling dlist_move_head().

L1 Eviction:

When putting an entry here, we memmove() everything down in each array and
place the pointer and the hash value in the front, evicting the fourth
(possibly NULL) entry.

The buckets would now be 4 times larger on 64-bit machines, but that might
not be a problem if we just divide the initial bucket size by four as well.
If the minimum nbuckets is 2, then the smaller caches would waste a bit of
space, but it doesn't seem too bad. As far as when to rehash the bucket
array, the fill factor would logically jump from 2 to 8. The worst-case
list walk would be longer, but with a better overall memory access pattern,
it might be fine.

I think it would be straightforward to code this up -- the difficulty is
testing and accounting for random effects due to binary layout changes.
Thoughts?

-- 
John Naylor
EDB: http://www.enterprisedb.com


Re: A varint implementation for PG?

2021-08-04 Thread Andres Freund
Hi,

On 2021-08-04 09:31:25 -0400, Robert Haas wrote:
> This is pretty integer-centric, though. If your pass-by-value type is
> storing timestamps, for example, they're not likely to be especially
> close to zero. Since a 64-bit address is pretty big, perhaps they're
> still close enough to zero that this will work out to a win, but I
> don't know, that seems a bit cheesy.

Yea, that's fair. The really bad™ example probably is negative numbers - which
wouldn't be easy to do something about in a datatype agnostic way.


> I grant that it could work out to a win -- pass-by-value data types whose
> distribution is very different from what's typical for integers, or for that
> matter columns full of integers that all happen to be toward the extreme
> values the data type can store, are probably not that common.

It'd work out as a wash for common timestamps:

./varint_test -u 681413261095983
processing unsigned
unsigned:   681413261095983
  input bytes:   00 02  6b bd  e3 5f  74 2f
8 output bytes:  01 02  6b bd  e3 5f  74 2f
decoded:681413261095983

I don't think there's many workloads where plain integers would skew extreme
enough for it to work out to a loss often enough to matter. But:

> I just don't really like making such assumptions on a system-wide basis (as
> opposed to a per-datatype basis where it's easier to reason about the
> consequences).

I'd not at all be opposed to datatypes having influence over the on-disk
encoding. I was just musing about a default heuristic that could make sense. I
do think you'd want something that chooses the encoding for one pg_attribute
values based on preceding columns.

Greetings,

Andres Freund




Re: Commitfest overflow

2021-08-04 Thread Pavel Borisov
>
> I think there might be a higher number of work-in-progress patches
> these days, which represent ongoing collaborative efforts, and are not
> expected to be committed soon, but are registered to attract the
> attention of humans and robots.  Perhaps if there were a separate
> status for that, it would be clearer.  Or perhaps they don't belong in
> the "commit" fest.
>

I totally support this view on CF as a better way to track and process
ongoing community activity in the one place than the mailing list. The fact
is that complicated patches need many CFs to be reviewed, improved and
committed. I'd vote for not-so-strict approach to what should be on CF and
what should not. But probably some prioritization inside CF is indeed
needed.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: A varint implementation for PG?

2021-08-04 Thread Andres Freund
Hi,

On 2021-08-03 14:26:16 -0400, Robert Haas wrote:
> [ resurrecting this 2-year-old thread ]
>
> On Fri, Dec 13, 2019 at 12:45 AM Andres Freund  wrote:
> > > If baking a new variant integer format now, I think limiting it to 64 bits
> > > is probably a mistake given how long-lived PostgreSQL is, and how hard it
> > > can be to change things in the protocol, on disk, etc.
> >
> > I don't think it's ever going to be sensible to transport 64bit quanta
> > of data. Also, uh, it'd be larger than the data a postgres instance
> > could really contain, given LSNs are 64 bit.
>
> We already use 128-bit integers in some places in the source code, and
> it seems more likely than not that use of 128-bit integer data types
> will continue to expand over time. If we want to represent 64 bit
> integers in a variable number of bits today, it does seem pretty
> likely that some day we will want to do the same thing with 128 bit
> integers, and maybe eventually larger. And I think that all of that is
> true even though exhausting a 64-bit LSN space looks difficult today
> and will likely remain difficult for the foreseeable future.

I was thinking a bit about how to encode arbitrary length values for the cases
where that's interesting.

Currently what I proposed for 8 byte unsigned integers is that we encode the
length in unary unset bits, followed by a set bit. As a special case, a prefix
of 8 bits indicates a length of 9, without needing the separator bit - we
don't need a separator bit at that point.

Extending that to arbitrary lengths obviously at some point makes the encoding
in unary wasteful, and the benefit of few branches vanishes. So what I was
thinking is that for variable length pieces of data that are not limited to 8
bytes, we could replace the '8 0 bits' special case with a new special case:
The length in bytes follows as a max-8-byte varint.

That'd leave us with the following overheads:
- 0 - 127: 0 bytes (i.e. 0 to 7 bits)
- 128 - 2^56 - 1: 1 byte (i.e. 7 bits - 7 bytes)
- 7 bytes - 127 bytes: 2 bytes
- 128 bytes - 16383 bytes: 3 bytes
- 16384 bytes - 2097151 bytes: 4 bytes
- 2097152 bytes - 268435455 bytes: 5 bytes
- 268435456 bytes - 34359738367 bytes: 6 bytes
- ...

The obvious alternative would of course be to just always store the length
prefix separately:
- 0 - 127 bytes: 1 byte
- 128 - 16383 bytes: 2 bytes
- 16384 - 2097151 bytes: 3 bytes
- ...

I do suspect that for a fair number of cases the "0 byte overhead" for very
small values would be worth the increase in overhead later. Particularly
because the decoding for values up to 7 bytes would be cheaper cpu-wise as
well.

Greetings,

Andres Freund




Re: Use POPCNT on MSVC

2021-08-04 Thread John Naylor
On Tue, Aug 3, 2021 at 11:36 PM David Rowley  wrote:
>
> On Tue, 3 Aug 2021 at 22:43, John Naylor 
wrote:
> > 1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which
is a bit of a misnomer since it's not assembly. Renaming s/_asm/_fast/
would help it look better. But then looking around, other platforms have
intrinsics coded, but for some reason they're put in pg_popcount64_slow(),
where the compiler will emit instructions we could easily write ourselves
in C (and without #ifdefs) since without the right CFLAGS these intrinsics
won't emit a popcnt instruction. Is MSVC different in that regard? If so,
it might be worth a comment.
>
> Yeah, the names no longer fit so well after stuffing the MSVC
> intrinsic into the asm function.  The reason I did it that way is down
> to what I read in the docs. Namely:
>
> "If you run code that uses these intrinsics on hardware that doesn't
> support the popcnt instruction, the results are unpredictable."
>
> So, it has to go somewhere where we're sure the CPU supports POPCNT
> and that seemed like the correct place.
>
> From what I understand of GCC and __builtin_popcountl(), the code
> it'll output will depend on the -mpopcnt flag.  So having
> __builtin_popcountl() does not mean the popcnt instruction will be
> used. The Microsoft documentation indicates that's not the case for
> __popcnt().

Okay, "unpredictable" sounds bad.

> > 2. (defined(_MSC_VER) && defined(_WIN64)  lead to a runtime check for
the CPUID, which is fine, but now next to it HAVE_X86_64_POPCNTQ looks
strange because the latter symbol comes from a specific configure test --
the two don't seem equivalent, but maybe they are because of what MSVC
does? That would be nice to spell out here.
>
> The problem there is that we define HAVE_X86_64_POPCNTQ based on the
> outcome of configure so it does not get set for MSVC.  Maybe it's
> worth coming up with some other constant that can be defined or we
> could just do:
>
> #if defined(_MSC_VER) && defined(_WIN64)
> #define HAVE_X86_64_POPCNTQ
> #endif

That seems fine. I don't know PG can build with Arm on Windows, but for the
cpuid to work, it seems safer to also check for __x86_64?

> I think the reason for the asm is that __builtin_popcountl does not
> mean popcnt will be used. Maybe we could have done something like
> compile pg_bitutils.c with -mpopcnt, and have kept the run-time check,
> but the problem there is that means that the compiler might end up
> using that instruction in some other function in that file that we
> don't want it to. It looks like my patch in [1] did pass the -mpopcnt
> flag which Tom fixed.

Ah, that makes sense. (If we someday offer a configure option for
x86-64-v2, we can use intrinsics in the asm functions, and call them
directly. Yet another different thread.)

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Lowering the ever-growing heap->pd_lower

2021-08-04 Thread Simon Riggs
On Wed, 4 Aug 2021 at 01:43, Peter Geoghegan  wrote:
>
> On Mon, Aug 2, 2021 at 11:57 PM Simon Riggs
>  wrote:
> > 1. Allow same thing as PageTruncateLinePointerArray() during HOT cleanup
> > That is going to have a clear benefit for HOT workloads, which by
> > their nature will use a lot of line pointers.
>
> Why do you say that?

Truncating line pointers can make extra space on the page, so it could
be the difference between a HOT and a non-HOT update. My understanding
is that these just-in-time actions have a beneficial effect in other
circumstances, so we can do that here also.

If we truncate line pointers more frequently then the root pointers
will tend to be lower in the array, which will make truncation even
more effective.

> > Many applications are updated much more frequently than they are vacuumed.
> > Peter - what is your concern about doing this more frequently? Why
> > would we *not* do this?
>
> What I meant before was that this argument worked back when we limited
> the technique to VACUUM's second heap pass. Doing line pointer array
> truncation at that point alone meant that it only ever happened
> outside of VACUUM proper. Prior to that point we literally did nothing
> about LP_UNUSED items at the end of each line pointer array, so we
> were going from doing nothing to doing something.
>
> This time it's quite different: we're truncating the line pointer
> array during pruning. Pruning often occurs opportunistically, during
> regular query processing. In fact I'd say that it's far more common
> than pruning by VACUUM. So the chances of line pointer array
> truncation hurting rather than helping seems higher.

If the only thing we do to a page is truncate line pointers then it
may not be worth it. But dirtying a page to reclaim space is also the
precise time when reclaiming line pointers makes sense also. So if the
page is dirtied by cleaning, then that is the time to reclaim line
pointers also.

Again, why would we reclaim space to avoid bloat but then ignore any
line pointer bloat?

It's not clear why truncating unused line pointers would break DDL.

--
Simon Riggshttp://www.EnterpriseDB.com/




Re: A varint implementation for PG?

2021-08-04 Thread Robert Haas
On Wed, Aug 4, 2021 at 3:01 PM Andres Freund  wrote:
> Extending that to arbitrary lengths obviously at some point makes the encoding
> in unary wasteful, and the benefit of few branches vanishes. So what I was
> thinking is that for variable length pieces of data that are not limited to 8
> bytes, we could replace the '8 0 bits' special case with a new special case:
> The length in bytes follows as a max-8-byte varint.

But what if I have a machine with more than 16 exabytes of RAM and I
want to use all of its memory to store one really big integer?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: RFC: Improve CPU cache locality of syscache searches

2021-08-04 Thread Andres Freund
Hi,

On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> CPU caches have multiple levels, so I had an idea to use that concept in
> the syscaches.

I do think we loose a good bit to syscache efficiency in real workloads, so I
think it's worth investing time into it.

However:

> Imagine if catcache buckets were cacheline-sized. In that case, we would
> store the most recently accessed hash values and pointers to catctup, in
> addition to the dlist_head:
>
> typedef struct cc_bucket
> {
>   uint32 hashes[4];
>   catctup *ct[4];
>   dlist_head;
> };

I'm not convinced that the above the right idea though. Even if the hash
matches, you're still going to need to fetch at least catctup->keys[0] from
a separate cacheline to be able to return the cache entry.

If the hashes we use are decent and we're resizing at reasonable thresholds,
we shouldn't constantly lookup up values that are later in a collision chain -
particularly because we move cache hits to the head of the bucket. So I don't
think a scheme as you described would actually result in a really better cache
hit ratio?

ISTM that something like

struct cc_bucket_1
{
uint32 hashes[3]; // 12
// 4 bytes alignment padding
Datum key0s[3]; // 24
catctup *ct[3]; // 24
// cacheline boundary
dlist_head conflicts; // 16
};

would be better for 1 key values?

It's obviously annoying to need different bucket types for different key
counts, but given how much 3 unused key Datums waste, it seems worth paying
for?

One issue with stuff like this is that aset.c won't return cacheline aligned
values...


It's possible that it'd be better to put the catctup pointers onto a
neigboring cacheline (so you still get TLB benefits, as well as making it easy
for prefetchers) and increase the number of hashes/keys that fit on one
cacheline.

If we stuffed four values into one bucket we could potentially SIMD the hash
and Datum comparisons ;)

Greetings,

Andres Freund




Re: A varint implementation for PG?

2021-08-04 Thread Andres Freund
On 2021-08-04 15:37:36 -0400, Robert Haas wrote:
> On Wed, Aug 4, 2021 at 3:01 PM Andres Freund  wrote:
> > Extending that to arbitrary lengths obviously at some point makes the 
> > encoding
> > in unary wasteful, and the benefit of few branches vanishes. So what I was
> > thinking is that for variable length pieces of data that are not limited to 
> > 8
> > bytes, we could replace the '8 0 bits' special case with a new special case:
> > The length in bytes follows as a max-8-byte varint.
> 
> But what if I have a machine with more than 16 exabytes of RAM and I
> want to use all of its memory to store one really big integer?

Then the embedded 8 byte length value would just have to do the same thing
recursively to store that huge length header :)




Re: straightening out backend process startup

2021-08-04 Thread Robert Haas
On Mon, Aug 2, 2021 at 12:41 PM Andres Freund  wrote:
> - AuxiliaryProcessMain() is used for two independent tasks: Start bootstrap /
>   checker mode and starting auxiliary processes. In HEAD there's maybe 5 lines
>   out 250 that are actually common to both uses.
>
>   A related oddity is that we reserve shared memory resources for bootstrap &
>   checker aux processes, despite those never existing.
>
>   This is addressed in patches 1-7

This all looks pretty mechanical and, I would guess, not very controversial.

> - The order of invocation of InitProcess()/InitAuxiliaryProcess() and
>   BaseInit() depends on EXEC_BACKEND. Due to that there often is no single
>   place initialization code can be put if it needs any locks.
>
>   This is addressed in patches 8-9
>
> - PostgresMain() has code for single user and multi user interleaved, making
>   it unnecessarily hard to understand what's going on.
>
>   This is addressed in patches 10

This stuff I'd need to study more in order to have an intelligent opinion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: A varint implementation for PG?

2021-08-04 Thread Robert Haas
On Wed, Aug 4, 2021 at 3:46 PM Andres Freund  wrote:
> > But what if I have a machine with more than 16 exabytes of RAM and I
> > want to use all of its memory to store one really big integer?
>
> Then the embedded 8 byte length value would just have to do the same thing
> recursively to store that huge length header :)

Well, yes. But more seriously, my point is that I can't imagine why we
would need an object with a length bounded by 2^64. I mean I suppose
there's no harm in looking to the future, but that's *really big*.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: A varint implementation for PG?

2021-08-04 Thread Tomas Vondra




On 8/4/21 9:01 PM, Andres Freund wrote:

Hi,

On 2021-08-03 14:26:16 -0400, Robert Haas wrote:

[ resurrecting this 2-year-old thread ]

On Fri, Dec 13, 2019 at 12:45 AM Andres Freund  wrote:

If baking a new variant integer format now, I think limiting it to 64 bits
is probably a mistake given how long-lived PostgreSQL is, and how hard it
can be to change things in the protocol, on disk, etc.


I don't think it's ever going to be sensible to transport 64bit quanta
of data. Also, uh, it'd be larger than the data a postgres instance
could really contain, given LSNs are 64 bit.


We already use 128-bit integers in some places in the source code, and
it seems more likely than not that use of 128-bit integer data types
will continue to expand over time. If we want to represent 64 bit
integers in a variable number of bits today, it does seem pretty
likely that some day we will want to do the same thing with 128 bit
integers, and maybe eventually larger. And I think that all of that is
true even though exhausting a 64-bit LSN space looks difficult today
and will likely remain difficult for the foreseeable future.


I was thinking a bit about how to encode arbitrary length values for the cases
where that's interesting.

Currently what I proposed for 8 byte unsigned integers is that we encode the
length in unary unset bits, followed by a set bit. As a special case, a prefix
of 8 bits indicates a length of 9, without needing the separator bit - we
don't need a separator bit at that point.



How is that better than the two varint flavors that are already out 
there, i.e. the bitcoin [1] and protocol buffers [2]?


The first one seems quite efficient in how it encodes the length into 
very few bits (which matters especially for small values). It's designed 
for integers with 1B, 2B, 4B or 8B, but it can be extended to arbitrary 
lengths fairly easily, I think:


Look at the first byte, and

0 - 243 - encoded as is
244 - 1 byte
245 - 2 bytes
246 - 3 bytes
247 - 4 bytes
248 - 5 bytes
249 - 6 bytes
250 - 7 bytes
251 - 8 bytes
252 - next 1 byte is length
253 - next 2 bytes are length
254 - next 3 bytes are length
255 - next 4 bytes are length

If we want to support longer lengths, we'd have to reserve an extra 
value (which reduces the number of values that require a single byte).


The [2] is a bit more problematic, as it's tailored for very short 
values (essentially wasting 1/8 of bits) and you have to parse the whole 
value to determine the length.


[1] https://wiki.bitcoinsv.io/index.php/VarInt
[2] https://developers.google.com/protocol-buffers/docs/encoding


Extending that to arbitrary lengths obviously at some point makes the encoding
in unary wasteful, and the benefit of few branches vanishes. So what I was
thinking is that for variable length pieces of data that are not limited to 8
bytes, we could replace the '8 0 bits' special case with a new special case:
The length in bytes follows as a max-8-byte varint.

That'd leave us with the following overheads:
- 0 - 127: 0 bytes (i.e. 0 to 7 bits)
- 128 - 2^56 - 1: 1 byte (i.e. 7 bits - 7 bytes)
- 7 bytes - 127 bytes: 2 bytes
- 128 bytes - 16383 bytes: 3 bytes
- 16384 bytes - 2097151 bytes: 4 bytes
- 2097152 bytes - 268435455 bytes: 5 bytes
- 268435456 bytes - 34359738367 bytes: 6 bytes
- ...

The obvious alternative would of course be to just always store the length
prefix separately:
- 0 - 127 bytes: 1 byte
- 128 - 16383 bytes: 2 bytes
- 16384 - 2097151 bytes: 3 bytes
- ...

I do suspect that for a fair number of cases the "0 byte overhead" for very
small values would be worth the increase in overhead later. Particularly
because the decoding for values up to 7 bytes would be cheaper cpu-wise as
well.



Yeah. Especially if the long values can be compressed, which probably 
applies to most real-world data sets. IMHO the efficiency for short 
values is the more important thing.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-08-04 Thread Paul Martinez
On Wed, Jul 14, 2021 at 6:51 AM Peter Eisentraut
 wrote:
> I think this is an interesting feature with a legitimate use case.

Great! I'm glad to hear that.

> Consider what should happen when you update users.id.  Per SQL standard,
> for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the
> referencing column that corresponds to the referenced column actually
> updated, not all of them.  PostgreSQL doesn't do this, but if it did,
> then this would work just fine.
>
> ...
>
> So, unless I'm missing an angle here, I would suggest leaving out the ON
> UPDATE variant of this feature.

I understand what you're saying here, but are you sure that is the
behavior specified in the SQL standard?

I can't find a copy of a more recent standard, but at least in SQL 1999 [1],
it has this to say (page 459 of the linked PDF, page 433 of the standard):

> 8) If a non-null value of a referenced column in the referenced table is
>updated to a value that is distinct from the current value of that
>column, then for every member F of the subtable family of the
>referencing table:
>
>   Case:
>   a) If SIMPLE is specified or implicit, or if FULL is specified, then
>
> Case:
> ii) If the  specifies SET NULL, then
>
>   Case:
>   1) If SIMPLE is specified or implicit, then:
>
> A) 
>
> B) For every F, in every matching row in F, each referencing
> column in F that corresponds with a referenced column is set to
> the null value.

This phrasing doesn't seem to make any reference to which columns were
actually updated.

The definitions a few pages earlier do explicitly define the set of
updated columns ("Let UMC be the set of referencing columns that
correspond with updated referenced columns"), but that defined set is
only referenced in the behavior when PARTIAL is specified, implying that
the set of updated columns is _not_ relevant in the SIMPLE case.


If my understanding of this is correct, then Postgres _isn't_ out of spec,
and this extension still works fine for the ON UPDATE triggers. (But, wow,
this spec is not easy to read, so I could definitely be wrong.)

I'm not sure how MATCH PARTIAL fits into this; I know it's
unimplemented, but I don't know what its use cases are, and I don't
think I understand how ON DELETE / ON UPDATE should work with MATCH
PARTIAL, let alone how they would work combined with this extension.

This lack of clarity may be a good-enough reason to leave out the ON
UPDATE case.

On Wed, Jul 14, 2021 at 12:36 PM Ibrar Ahmed  wrote:
> Patch does not apply on head, I am marking the status "Waiting on author"
> http://cfbot.cputube.org/patch_33_2932.log

I've rebased my original patch and it should work now. This is
referential-actions-set-cols-v2.patch.

I've also created a second patch that leaves out the ON UPDATE behavior, if
we think that's the safer route. This is
referential-actions-on-delete-only-set-cols-v1.patch.

[1]: http://web.cecs.pdx.edu/~len/sql1999.pdf

On Wed, Jul 14, 2021 at 12:36 PM Ibrar Ahmed  wrote:
>
>
>
> On Wed, Jul 14, 2021 at 6:51 PM Peter Eisentraut 
>  wrote:
>>
>>
>> On 05.01.21 22:40, Paul Martinez wrote:
>> > I've created a patch to better support referential integrity constraints 
>> > when
>> > using composite primary and foreign keys. This patch allows creating a 
>> > foreign
>> > key using the syntax:
>> >
>> >FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL 
>> > (fk_id)
>> >
>> > which means that only the fk_id column will be set to NULL when the 
>> > referenced
>> > row is deleted, rather than both the tenant_id and fk_id columns.
>>
>> I think this is an interesting feature with a legitimate use case.
>>
>> I'm wondering a bit about what the ON UPDATE side of this is supposed to
>> mean.  Consider your example:
>>
>> > CREATE TABLE tenants (id serial PRIMARY KEY);
>> > CREATE TABLE users (
>> >tenant_id int REFERENCES tenants ON DELETE CASCADE,
>> >id serial,
>> >PRIMARY KEY (tenant_id, id),
>> > );
>> > CREATE TABLE posts (
>> >  tenant_id int REFERENCES tenants ON DELETE CASCADE,
>> >  id serial,
>> >  author_id int,
>> >  PRIMARY KEY (tenant_id, id),
>> >  FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
>> > );
>> >
>> > INSERT INTO tenants VALUES (1);
>> > INSERT INTO users VALUES (1, 101);
>> > INSERT INTO posts VALUES (1, 201, 101);
>> > DELETE FROM users WHERE id = 101;
>> > ERROR:  null value in column "tenant_id" violates not-null constraint
>> > DETAIL:  Failing row contains (null, 201, null).
>>
>> Consider what should happen when you update users.id.  Per SQL standard,
>> for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the
>> referencing column that corresponds to the referenced column actually
>> updated, not all of them.  PostgreSQL doesn't do this, but if it did,
>> then this would work just fine.
>>
>> Your feature requires specifying a fixed column or columns to update, so
>> it cannot 

Bug fix for cache lookup failure for statistic_ext type

2021-08-04 Thread Mark Dilger
Hackers,

You can easily get a cache lookup failure by changing the regression tests as 
included in this small patch.  The failure looks thus:

+COMMENT ON STATISTICS ab1_a_b_stats IS 'new comment';
+CREATE ROLE temp_role;
+SET SESSION AUTHORIZATION temp_role;
+COMMENT ON STATISTICS ab1_a_b_stats IS 'changed comment';
+ERROR:  cache lookup failed for type 27447
+DROP STATISTICS ab1_a_b_stats;
+ERROR:  cache lookup failed for type 27447
+ALTER STATISTICS ab1_a_b_stats RENAME TO ab1_a_b_stats_new;
+ERROR:  must be owner of statistics object ab1_a_b_stats
+RESET SESSION AUTHORIZATION;
+DROP ROLE temp_role;

I believe this case simply has not had any test coverage, as I don't see any 
way the current code would ever work.  It treats the Oid of the statistics 
object as a type, which it is not.



v1-0001-Fix-cache-lookup-error-in-ownership-check.patch
Description: Binary data



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Another regexp performance improvement: skip useless paren-captures

2021-08-04 Thread Tom Lane
Here's a little finger exercise that improves a case that's bothered me
for awhile.  In a POSIX regexp, parentheses cause capturing by default;
you have to write the very non-obvious "(?:...)" if you don't want the
matching substring to be reported by the regexp engine.  That'd be fine
if capturing were cheap, but with our engine it is not particularly
cheap.  In many situations, the initial DFA check is sufficient to
tell whether there is an overall match, but it does not tell where any
subexpression match boundaries are.  To identify exactly which substring
is deemed to match a parenthesized subexpression, we have to recursively
break down the match, which takes at the very least a few more DFA
invocations; and with an uncooperative regex, it can easily result in
O(N^2) behavior where there was none at the DFA stage.

Therefore, we really ought to expend some effort to not capture
subexpressions if the sub-match data is not actually needed, which in
many invocations we know that it isn't.  Spencer's original code has
a REG_NOSUB option that looks like it ought to be good for this ... but
on closer inspection it's basically useless, because it turns *all*
parens into non-capturing ones.  That breaks back-references, so unless
you know that the regexp contains no back-refs, you can't use it.

The attached proposed patch redefines REG_NOSUB as being a regexp-
compile-time promise that the caller doesn't care about sub-match
locations, but not a promise that no backrefs exist.  (If the
caller passes a match-locations array at execution anyway, it will
just get back -1 values, as if no sub-match had been identified.)
If that flag is passed, we run through the completed sub-regexp
tree and remove the "capture" markers on any subREs that are
not actually referenced by some backref.  This typically causes
some parent subREs to no longer be deemed "messy", so that their
separate child subREs can be thrown away entirely, saving memory
space as well as runtime.

(I'd originally thought that a much more complex patch would be
needed to do this, because I assumed that re-optimizing the subRE
tree would be much more complicated than this.  However, as far
as I can see this is sufficient; this change doesn't expose any
cases where additional tree restructuring would be helpful.)

Testing with Joel's handy little corpus of web regexps, there's a
useful improvement of the speed of ~ operators (a/k/a regexp_like()).
I see the total time to apply regexp_like() to all 4474520 entries
dropping from 10:17 to 5:46.  Interesting statistics include

regexp=# select max(duration),avg(duration) from headresults;
   max   |   avg   
-+-
 00:00:00.939389 | 00:00:00.000138
(1 row)

regexp=# select max(duration),avg(duration) from patchresults;  
   max   |   avg   
-+-
 00:00:00.918549 | 00:00:00.77
(1 row)

The lower percentiles don't move much, but upper ones do:

regexp=# select percentile_cont(array[0.5,0.75,0.8,0.9]) within group(order by 
duration) from headresults;
  percentile_cont  
---
 {00:00:00.27,00:00:00.59,00:00:00.67,00:00:00.000108}
(1 row)

regexp=# select percentile_cont(array[0.5,0.75,0.8,0.9]) within group(order by 
duration) from patchresults;
  percentile_cont  
---
 {00:00:00.25,00:00:00.42,00:00:00.48,00:00:00.65}
(1 row)

This isn't terribly surprising, because regexps that were already
really cheap probably have no capturing parens to dispense with.

Of course, there's no benefit with functions that do need sub-match
data, such as regexp_match.  But the added overhead in such cases
should be quite negligible.  The only downside I can see is that
if you use the "same" regexp in both submatches-needed and
non-submatches-needed contexts, you'll end up with two separate
compiled regexp cache entries.  That doesn't seem like a big
problem though.

regards, tom lane

diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c
index bf1dea6352..64816dd370 100644
--- a/contrib/pg_trgm/trgm_regexp.c
+++ b/contrib/pg_trgm/trgm_regexp.c
@@ -541,9 +541,11 @@ createTrgmNFA(text *text_re, Oid collation,
 	 * Stage 1: Compile the regexp into a NFA, using the regexp library.
 	 */
 #ifdef IGNORECASE
-	RE_compile(®ex, text_re, REG_ADVANCED | REG_ICASE, collation);
+	RE_compile(®ex, text_re,
+			   REG_ADVANCED | REG_NOSUB | REG_ICASE, collation);
 #else
-	RE_compile(®ex, text_re, REG_ADVANCED, collation);
+	RE_compile(®ex, text_re,
+			   REG_ADVANCED | REG_NOSUB, collation);
 #endif
 
 	/*
diff --git a/src/backend/regex/regc_lex.c b/src/backend/regex/regc_lex.c
index 7673dab76f..45727ffa01 100644
--- a/src/backend/regex/regc_lex.c
+++ b/sr

Re: Lowering the ever-growing heap->pd_lower

2021-08-04 Thread Peter Geoghegan
On Wed, Aug 4, 2021 at 12:09 PM Simon Riggs
 wrote:
> Truncating line pointers can make extra space on the page, so it could
> be the difference between a HOT and a non-HOT update. My understanding
> is that these just-in-time actions have a beneficial effect in other
> circumstances, so we can do that here also.

I would prefer if the arguments in favor were a little more concrete.
Maybe in general they don't have to be. But that would be my
preference, and what I would look if I was to commit such a patch.

> If the only thing we do to a page is truncate line pointers then it
> may not be worth it. But dirtying a page to reclaim space is also the
> precise time when reclaiming line pointers makes sense also. So if the
> page is dirtied by cleaning, then that is the time to reclaim line
> pointers also.
>
> Again, why would we reclaim space to avoid bloat but then ignore any
> line pointer bloat?

I don't think that my mental model is significantly different to yours
here. Like everybody else, I can easily imagine how this *might* have
value. All I'm really saying is that a burden of proof exists for this
patch (or any performance orientated patch). In my opinion that burden
has yet to be met by this patch. My guess is that Matthias can show a
clear, measurable benefit with a little more work. If that happens
then all of my marginal concerns about the risk of bugs become
relatively unimportant.

It might even be okay to commit the patch on the basis of "what could
the harm be?" if there was some effort to demonstrate empirically that
the performance downside really was zero.

> It's not clear why truncating unused line pointers would break DDL.

I'm just saying that it's obviously not possible now, with the
VACUUM-only PageTruncateLinePointerArray()/lazy_vacuum_heap_page()
code path added to Postgres 14 -- because VACUUM's relation-level lock
makes sure of that. That property has some value. Certainly not enough
value to block progress on a feature that is clearly useful, but
enough to give me pause.

-- 
Peter Geoghegan




Re: Lowering the ever-growing heap->pd_lower

2021-08-04 Thread Peter Geoghegan
On Wed, Aug 4, 2021 at 7:39 AM Robert Haas  wrote:
> How would it hurt?
>
> It's easy to see the harm caused by not shortening the line pointer
> array when it is possible to do so: we're using up space in the page
> that could have been made free. It's not so obvious to me what the
> downside of shortening it might be.

I think that that's probably true. That in itself doesn't seem like a
good enough reason to commit the patch.

Maybe this really won't be difficult for Matthias. I just want to see
some concrete testing, maybe with pgbench, or with an artificial test
case. Maybe something synthetic that shows a benefit measurable in
on-disk table size. Or at least the absence of any regressions.
Something.

-- 
Peter Geoghegan




Re: Bug fix for cache lookup failure for statistic_ext type

2021-08-04 Thread Tomas Vondra

On 8/5/21 12:03 AM, Mark Dilger wrote:

Hackers,

You can easily get a cache lookup failure by changing the regression tests as 
included in this small patch.  The failure looks thus:

+COMMENT ON STATISTICS ab1_a_b_stats IS 'new comment';
+CREATE ROLE temp_role;
+SET SESSION AUTHORIZATION temp_role;
+COMMENT ON STATISTICS ab1_a_b_stats IS 'changed comment';
+ERROR:  cache lookup failed for type 27447
+DROP STATISTICS ab1_a_b_stats;
+ERROR:  cache lookup failed for type 27447
+ALTER STATISTICS ab1_a_b_stats RENAME TO ab1_a_b_stats_new;
+ERROR:  must be owner of statistics object ab1_a_b_stats
+RESET SESSION AUTHORIZATION;
+DROP ROLE temp_role;

I believe this case simply has not had any test coverage, as I don't
see any way the current code would ever work.  It treats the Oid of the
statistics object as a type, which it is not.



Yeah, you're right - this is broken since 7b504eb282c. Thanks for the 
fix, I'll get it committed.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: A varint implementation for PG?

2021-08-04 Thread Andres Freund
Hi,

On 2021-08-04 23:44:10 +0200, Tomas Vondra wrote:
> How is that better than the two varint flavors that are already out there,
> i.e. the bitcoin [1] and protocol buffers [2]?

The protobuf one is *terrible* for CPU efficiency. You need to go through each
byte, do masking and shifting for each byte and then have a conditional
branch. That's bad from the the amount of instructions you need to execute,
and *really* bad for the branch predictor.


> The first one seems quite efficient in how it encodes the length into very
> few bits (which matters especially for small values). It's designed for
> integers with 1B, 2B, 4B or 8B, but it can be extended to arbitrary lengths
> fairly easily, I think:

> Look at the first byte, and
> 
> 0 - 243 - encoded as is
> 244 - 1 byte
> 245 - 2 bytes
> 246 - 3 bytes
> 247 - 4 bytes
> 248 - 5 bytes
> 249 - 6 bytes
> 250 - 7 bytes
> 251 - 8 bytes
> 252 - next 1 byte is length
> 253 - next 2 bytes are length
> 254 - next 3 bytes are length
> 255 - next 4 bytes are length

> If we want to support longer lengths, we'd have to reserve an extra value
> (which reduces the number of values that require a single byte).

I think that's not a bad scheme. I think it may end up being a bit more
expensive to decode because you need more branches instead of using
find-first-set type instructions.

Greetings,

Andres Freund




Re: A varint implementation for PG?

2021-08-04 Thread Tomas Vondra

On 8/5/21 1:05 AM, Andres Freund wrote:

Hi,

On 2021-08-04 23:44:10 +0200, Tomas Vondra wrote:

How is that better than the two varint flavors that are already out there,
i.e. the bitcoin [1] and protocol buffers [2]?


The protobuf one is *terrible* for CPU efficiency. You need to go through each
byte, do masking and shifting for each byte and then have a conditional
branch. That's bad from the the amount of instructions you need to execute,
and *really* bad for the branch predictor.



Yeah, probably true - particularly for longer values. No argument here.




The first one seems quite efficient in how it encodes the length into very
few bits (which matters especially for small values). It's designed for
integers with 1B, 2B, 4B or 8B, but it can be extended to arbitrary lengths
fairly easily, I think:



Look at the first byte, and

0 - 243 - encoded as is
244 - 1 byte
245 - 2 bytes
246 - 3 bytes
247 - 4 bytes
248 - 5 bytes
249 - 6 bytes
250 - 7 bytes
251 - 8 bytes
252 - next 1 byte is length
253 - next 2 bytes are length
254 - next 3 bytes are length
255 - next 4 bytes are length



If we want to support longer lengths, we'd have to reserve an extra value
(which reduces the number of values that require a single byte).


I think that's not a bad scheme. I think it may end up being a bit more
expensive to decode because you need more branches instead of using
find-first-set type instructions.



I don't think it requires many branches, because you can essentially do

  if (byte[0] <= 243)
length = 0
  else if (byte[0] <= 251)
length = byte[0] - 243
  else
  {
length_bytes = byte[0] - 251;
... read length_bytes, decode length
  }

but I haven't tried implementing it and maybe my intuition is wrong.

Or maybe it'd be a good scheme for on-disk format, but poor for memory.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: archive status ".ready" files may be created too early

2021-08-04 Thread Kyotaro Horiguchi
At Tue, 3 Aug 2021 21:32:18 +, "Bossart, Nathan"  
wrote in 
> On 8/2/21, 7:37 PM, "Kyotaro Horiguchi"  wrote:
> > I'm afraid that using hash to store boundary info is too much. Isn't a
> > ring buffer enough for this use?  In that case it is enough to
> > remember only the end LSN of the segment spanning records.  It is easy
> > to expand the buffer if needed.
> 
> I agree that the hash table requires a bit more memory than what is
> probably necessary, but I'm not sure I agree that maintaining a custom
> data structure to save a few kilobytes of memory is worth the effort.

Memory is one of my concerns but more significant point was required
CPU cycles by GetLatestRecordBoundarySegment.  So I don't mind it is
using a hash if the loop on the hash didn't block other backends.

Addition to that, while NotifySegmentsReadyForArchive() is notifying
pending segments, other backends simultaneously reach there are
blocked until the notification, incuding file creation, finishes.  I
don't think that's great. Couldn't we set lastNotifiedSegment before
the loop?  At the moment a backend decides to notify some segments,
others no longer need to consider those segments.  Even if the backend
crashes meanwhile, as you mentionied below, it's safe since the
unnotified segments are notifed after restart.

> > @@ -2602,7 +2855,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
> > LogwrtResult.Flush = LogwrtResult.Write;
> > /* end of page */
> >
> > if (XLogArchivingActive())
> > -   XLogArchiveNotifySeg(openLogSegNo);
> > +   
> > SetLastNotifiedSegmentIfInvalid(openLogSegNo - 1);
> >
> > Is it safe?  If server didn't notified of WAL files for recent 3
> > finished segments in the previous server life, they need to be
> > archived this life time. But this omits maybe all of the tree.
> > (I didn't confirm that behavior..)
> 
> I tested this scenario out earlier [0].  It looks like the call to
> XLogArchiveCheckDone() in RemoveOldXlogFiles() will take care of
> creating any .ready files we missed.

Yeah, I reclled of that behvaior. In that case crash recovery reads up
to just before the last (continued) record in the last finished
segment.  On the other hand if creash recovery was able to read that
record, it's safe to archive the last segment immediately after
recovery.  So that behavior is safe. Thanks!

> >> I believe my worry was that we'd miss notifying a segment as soon as
> >> possible if the record was somehow flushed prior to registering the
> >> record boundary in the map.  If that's actually impossible, then I
> >> would agree that the extra call to NotifySegmentsReadyForArchive() is
> >> unnecessary.
> >
> > I don't think that XLogWrite(up to LSN=X) can happen before
> > XLogInsert(endpos = X) ends.
> 
> Is there anything preventing that from happening?  At the location
> where we are registering the record boundary, we've already called
> CopyXLogRecordToWAL(), and neither the WAL insertion lock nor the
> WALWriteLock are held.  Even if we register the boundary before
> updating the shared LogwrtRqst.Write, there's a chance that someone
> else has already moved it ahead and called XLogWrite().  I think the
> worst case scenario is that we hold off creating .ready files longer
> than necessary, but IMO that's still a worthwhile thing to do.

Oh, boundary registration happens actually after an insertion ends
(but before XLogInsert ends). The missed segment is never processed
due to the qualification by lastNotifiedSeg.

Does it work that RegisterRecordBoundaryEntry omits registering of the
bounary if it finds lastNotifiedSeg have gone too far?

> Nathan
> 
> [0] https://postgr.es/m/DA71434B-7340-4984-9B91-F085BC47A778%40amazon.com
> 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Greg Nancarrow
On Wed, Aug 4, 2021 at 11:43 PM Robert Haas  wrote:
>
> Why do you think it's right to install the serialized *active*
> snapshot as the *transaction* snapshot? I've been operating on the
> presumption that we wanted the worker to install the leader's
> transaction snapshot as its transaction snapshot as its transaction
> snapshot and the leader's active snapshot as its active snapshot,
> because in my mind the active snapshot and the transaction snapshot
> are two separate things. However, there's clearly something wrong with
> that line of reasoning, because apparently your proposal fixes the
> problem and mine doesn't.
>

In setting up the snapshot for the execution state used in command
execution, GetTransactionSnapshot() is called and (possibly a copy of)
the returned snapshot is pushed as the ActiveSnapshot.
The way I see it is that there is essentially only one snapshot here,
the last-acquired TransactionSnapshot, which the ActiveSnapshot points
to (or a copy of it). Rightly or wrongly, my v2/v5 patch is basically
changing the code to do the same snapshot setup in each of the
parallel workers.

So why (current Postgres code, no patches applied) in setting up for
parallel-worker execution (in InitializeParallelDSM) does the Postgres
code then acquire ANOTHER TransactionSnapshot (by calling
GetTransactionSnashot(), which could return CurrentSnapshot or a new
snapshot) and serialize that, as well as serializing what the
ActiveSnapshot points to, and then restore those in the workers as two
separate snapshots? Is it a mistake? Or if intentional and correct,
how so?

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Partition Check not updated when insert into a partition

2021-08-04 Thread Amit Langote
Sorry that I missed this thread.

On Wed, Jul 14, 2021 at 11:16 AM houzj.f...@fujitsu.com
 wrote:
> On Tuesday, July 13, 2021 2:52 AM Alvaro Herrera  
> wrote:
> > On 2021-Jun-23, houzj.f...@fujitsu.com wrote:
> >
> > > For a multi-level partition, for example: table 'A' is partition of
> > > table 'B', and 'B' is also partition of table 'C'. After I 'ALTER
> > > TABLE C DETACH B', I thought partition constraint check of table 'C'
> > > does not matter anymore if INSERT INTO table 'A'. But it looks like
> > > the relcache of 'A' is not invalidated after detaching 'B'. And the
> > > relcache::rd_partcheck still include the partition constraint of table
> > > 'C'. Note If I invalidate the table 'A''s relcache manually, then next
> > > time the relcache::rd_partcheck will be updated to the expected one which
> > does not include partition constraint check of table 'C'.
> >
> > Hmm, if I understand correctly, this means that we need to invalidate 
> > relcache
> > for all partitions of the partition being detached.  Maybe like in the 
> > attached
> > WIP ("XXX VERY CRUDE XXX DANGER EATS DATA") patch, which solves what
> > you complained about, but I didn't run any other tests.
> > (Also, in the concurrent case I think this should be done during the first
> > transaction, so this patch is wrong for it.)
> >
> > Did you have a misbehaving test for the ATTACH case?
>
> Thanks for the response.

Thank you both.

> Yes, I think the following example of ATTACH doesn't work as expected.

Yeah, need the fix for the ATTACH case too.

Couple more things:

* We must invalidate not just the "direct" partitions of the table
being attached/detached, but also any indirect ones, because all of
their partition constraints would need to contain (or no longer
contain) the root parent's partition constraint.

* I think we should lock the partitions before sending the
invalidation.  The ATTACH code already locks the descendents for a
different purpose, but DETACH doesn't, so the latter needs to be fixed
to match.

I've updated Alvaro's patch to address these points.  Maybe, we should
also add these cases to the regression and isolation suites?

-- 
Amit Langote
EDB: http://www.enterprisedb.com


inval-partitions_v2.patch
Description: Binary data


Re: [PATCH] postgres_fdw: suppress explicit casts in text:text comparisons (was: column option to override foreign types)

2021-08-04 Thread Dian M Fay
On Sun Mar 7, 2021 at 2:37 AM EST, Dian M Fay wrote:
> > What'd be better, if we could do it, is to ship the clause in
> > the form
> > WHERE foreigncol = 'one'
> > that is, instead of plastering a cast on the Var, try to not put
> > any explicit cast on the constant. That fixes your original use
> > case better than what you've proposed, and I think it might be
> > possible to do it unconditionally instead of needing a hacky
> > column property to enable it. The reason this could be okay
> > is that it seems reasonable for postgres_fdw to rely on the
> > core parser's heuristic that an unknown-type literal is the
> > same type as what it's being compared to. So, if we are trying
> > to deparse something of the form "foreigncol operator constant",
> > and the foreigncol and constant are of the same type locally,
> > we could leave off the cast on the constant. (There might need
> > to be some restrictions about the operator taking those types
> > natively with no cast, not sure; also this doesn't apply to
> > constants that are going to be printed as non-string literals.)
> >
> > Slipping this heuristic into the code structure of deparse.c
> > might be rather messy, though. I've not looked at just how
> > to implement it.
>
> This doesn't look too bad from here, at least so far. The attached
> change adds a new const_showtype field to the deparse_expr_cxt, and
> passes that instead of the hardcoded 0 to deparseConst. deparseOpExpr
> modifies const_showtype if both sides of a binary operation are text,
> and resets it to 0 after the recursion.
>
> I restricted it to text-only after seeing a regression test fail: while
> deparsing `percentile_cont(c2/10::numeric)`, c2, an integer column, is a
> FuncExpr with a numeric return type. That matches the numeric 10, and
> without the explicit cast, integer-division-related havoc ensues. I
> don't know why it's a FuncExpr, and I don't know why it's not an int,
> but the constant is definitely a non-string, in any case.
>
> In the course of testing, I discovered that the @@ text-search operator
> works against textified enums on my stock 13.1 server (a "proper" enum
> column yields "operator does not exist"). I'm rather wary of actually
> trying to depend on that behavior, although it seems probably-safe in
> the same character set and collation.

hello again! My second version of this change (suppressing the cast
entirely as Tom suggested) seemed to slip under the radar back in March
and then other matters intervened. I'm still interested in making it
happen, though, and now that we're out of another commitfest it seems
like a good time to bring it back up. Here's a rebased patch to start.
From 3b950aa43af686acdbbaa5aada82bfed725652d7 Mon Sep 17 00:00:00 2001
From: Dian M Fay 
Date: Sun, 7 Mar 2021 02:59:14 -0500
Subject: [PATCH] Suppress explicit casts of text constants in postgres_fdw
 when the other side of a binary OpExpr is also text.

---
 contrib/postgres_fdw/deparse.c| 27 +-
 .../postgres_fdw/expected/postgres_fdw.out| 93 ---
 contrib/postgres_fdw/sql/postgres_fdw.sql |  9 ++
 3 files changed, 90 insertions(+), 39 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2b8f00abe7..e056c5f117 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -101,6 +101,7 @@ typedef struct deparse_expr_cxt
 * a base 
relation. */
StringInfo  buf;/* output buffer to append to */
List  **params_list;/* exprs that will become remote Params 
*/
+   int const_showtype; /* override showtype in 
deparseConst */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX   "r"
@@ -1146,6 +1147,7 @@ deparseSelectStmtForRel(StringInfo buf, PlannerInfo 
*root, RelOptInfo *rel,
context.foreignrel = rel;
context.scanrel = IS_UPPER_REL(rel) ? fpinfo->outerrel : rel;
context.params_list = params_list;
+   context.const_showtype = 0;
 
/* Construct SELECT clause */
deparseSelectSql(tlist, is_subquery, retrieved_attrs, &context);
@@ -1725,6 +1727,7 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, 
RelOptInfo *foreignrel,
context.scanrel = foreignrel;
context.root = root;
context.params_list = params_list;
+   context.const_showtype = 0;
 
appendStringInfoChar(buf, '(');
appendConditions(fpinfo->joinclauses, &context);
@@ -2028,6 +2031,7 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
context.scanrel = foreignrel;
context.buf = buf;
context.params_list = params_list;
+   context.const_showtype = 0;
 
appendStringInfoString(buf, "UPDATE ");
deparseRelation(buf, rel);
@@ -2135,6 +2139,7 @@ deparseDire

Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)

2021-08-04 Thread kuroda.hay...@fujitsu.com
Dear Hackers, Tom,

(I changed subject because this is no longer postgres_fdw's patch)

> > What would be better to think about is how to let users specify this
> > kind of behavior for themselves.  I think it's possible to set
> > application_name as part of a foreign server's connection options,
> > but at present the result would only be a constant string.  Somebody
> > who wished the PID to be in there would like to have some sort of
> > formatting escape, say "%p" for PID.  Extrapolating wildly, maybe we
> > could make all the %-codes known to log_line_prefix available here.
> 
> I think your argument is better than mine. I will try to implement this 
> approach.

I made a patch based on your advice. I add parse and rewrite function in 
connectOptions2().
I implemented in libpq layer, hence any other application can be used.
Here is an example:

```
$ env PGAPPNAME=000%p%utest000 psql -d postgres -c "show application_name"
   application_name
---
 00025632hayatotest000
(1 row)
```

> > I don't think this is a great idea as-is.  People who need to do this
> > sort of thing will all have their own ideas of what they need to track
> > --- most obviously, it might be appropriate to include the originating
> > server's name, else you don't know what machine the PID is for.
> 
> I thought this is not big problem because hostname (or IP address) can be
> added to log_line_prefix. I added only local-pid because this is the only 
> thing
> that cannot be set in the parameter.

Currently network hostname (or IP address) cannot be set because of the above 
reason.
Some backends' code must be modified if we want to get and embed such 
information.
Of cause we can get system hostname by gethostname(), but I cannot judge whether
it is meaningful.

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



allow_escape_in_allication_name.patch
Description: allow_escape_in_allication_name.patch


Re: Implementing Incremental View Maintenance

2021-08-04 Thread Yugo NAGATA
Hello Zhihong Yu,

On Mon, 2 Aug 2021 14:33:46 -0700
Zhihong Yu  wrote:

> On Sun, Aug 1, 2021 at 11:30 PM Yugo NAGATA  wrote:
> 
> > Hi hackers,
> >
> > On Mon, 19 Jul 2021 09:24:30 +0900
> > Yugo NAGATA  wrote:
> >
> > > On Wed, 14 Jul 2021 21:22:37 +0530
> > > vignesh C  wrote:
> >
> > > > The patch does not apply on Head anymore, could you rebase and post a
> > > > patch. I'm changing the status to "Waiting for Author".
> > >
> > > Ok. I'll update the patch in a few days.
> >
> > Attached is the latest patch set to add support for Incremental
> > Materialized View Maintenance (IVM)
> >
> > The patches are rebased to the master and also revised with some
> > code cleaning.
> >
> > IVM is a way to make materialized views up-to-date in which only
> > incremental changes are computed and applied on views rather than
> > recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
> > does. IVM can update materialized views more efficiently
> > than recomputation when only small part of the view need updates.
> >
> > The patch set implements a feature so that materialized views could be
> > updated automatically and immediately when a base table is modified.
> >
> > Currently, our IVM implementation supports views which could contain
> > tuple duplicates whose definition includes:
> >
> >  - inner and outer joins including self-join
> >  - DISTINCT
> >  - some built-in aggregate functions (count, sum, agv, min, and max)
> >  - a part of subqueries
> >-- simple subqueries in FROM clause
> >-- EXISTS subqueries in WHERE clause
> >  - CTEs
> >
> > We hope the IVM feature would be adopted into pg15. However, the size of
> > patch set has grown too large through supporting above features.
> > Therefore,
> > I think it is better to consider only a part of these features for the
> > first
> > release. Especially, I would like propose the following features for pg15.
> >
> >  - inner joins including self-join
> >  - DISTINCT and views with tuple duplicates
> >  - some built-in aggregate functions (count, sum, agv, min, and max)
> >
> > By omitting outer-join, sub-queries, and CTE features, the patch size
> > becomes
> > less than half. I hope this will make a bit easer to review the IVM patch
> > set.
> >
> > Here is a list of separated  patches.
> >
> > - 0001: Add a new syntax:
> >   CREATE INCREMENTAL MATERIALIZED VIEW
> > - 0002: Add a new column relisivm to pg_class
> > - 0003: Add new deptype option 'm' in pg_depend
> > - 0004: Change trigger.c to allow to prolong life span of tupestores
> > containing Transition Tables generated via AFTER trigger
> > - 0005: Add IVM supprot for pg_dump
> > - 0006: Add IVM support for psql
> > - 0007: Add the basic IVM future:
> > This supports inner joins, DISTINCT, and tuple duplicates.
> > - 0008: Add aggregates (count, sum, avg, min, max) support for IVM
> > - 0009: Add regression tests for IVM
> > - 0010: Add documentation for IVM
> >
> > We could split the patch furthermore if this would make reviews much
> > easer.
> > For example, I think 0007 could be split into the more basic part and the
> > part
> > for handling tuple duplicates. Moreover, 0008 could be split into "min/max"
> > and other aggregates because handling min/max is a bit more complicated
> > than
> > others.
> >
> > I also attached IVM_extra.tar.gz that contains patches for sub-quereis,
> > outer-join, CTE support, just for your information.
> >
> > Regards,
> > Yugo Nagata
> >
> > --
> > Yugo NAGATA 
> >
> >
> > --
> > Yugo NAGATA 
> >
> Hi,
> For v23-0008-Add-aggregates-support-in-IVM.patch :

Thank you for looking into this!

> As a restriction, expressions specified in GROUP BY must appear in
> the target list because tuples to be updated in IMMV are identified
> by using this group keys.
> 
> IMMV ->  IMVM (Incremental Materialized View Maintenance, as said above)
> Or maybe it means 'incrementally maintainable materialized view'. It would
> be better to use the same abbreviation.

IMMV is correct in the commit message of this patch. Rather, IMVM used
in  v23-0003-Add-new-deptype-option-m-in-pg_depend-system-cat.patch 
should be corrected to IMMV. 

> this group keys -> this group key
> 
> +errmsg("GROUP BY expression not appeared in select
> list is not supported on incrementally maintainable materialized view")));
> 
> expression not appeared in select list -> expression not appearing in
> select list
> 
> +* For aggregate functions except to count
> 
> except to count -> except count

Thank you for pointing out them. I'll fix.

Regards,
Yugo Nagata


-- 
Yugo NAGATA 




Re: Implementing Incremental View Maintenance

2021-08-04 Thread Yugo NAGATA
Hello Takahashi-san,

On Tue, 3 Aug 2021 10:15:42 +
"r.takahash...@fujitsu.com"  wrote:

> Hi Nagata-san,
> 
> 
> I am interested in this patch since it is good feature.
> 
> I run some simple tests.
> I found the following problems.

Thank you for your interest for this patch!

> (1) 
> Failed to "make world".
> I think there are extra "" in 
> doc/src/sgml/ref/create_materialized_view.sgml
> (line 110 and 117)

Oops. I'll fix it. 

> (2)
> In the case of partition, it seems that IVM does not work well.
> I run as follows.
> 
> postgres=# create table parent (c int) partition by range (c);
> CREATE TABLE
> postgres=# create table child partition of parent for values from (1) to 
> (100);
> CREATE TABLE
> postgres=# create incremental materialized view ivm_parent as select c from 
> parent;
> NOTICE:  could not create an index on materialized view "ivm_parent" 
> automatically
> HINT:  Create an index on the materialized view for efficient incremental 
> maintenance.
> SELECT 0
> postgres=# create incremental materialized view ivm_child as select c from 
> child;
> NOTICE:  could not create an index on materialized view "ivm_child" 
> automatically
> HINT:  Create an index on the materialized view for efficient incremental 
> maintenance.
> SELECT 0
> postgres=# insert into parent values (1);
> INSERT 0 1
> postgres=# insert into child values (2);
> INSERT 0 1
> postgres=# select * from parent;
>  c
> ---
>  1
>  2
> (2 rows)
> 
> postgres=# select * from child;
>  c
> ---
>  1
>  2
> (2 rows)
> 
> postgres=# select * from ivm_parent;
>  c
> ---
>  1
> (1 row)
> 
> postgres=# select * from ivm_child;
>  c
> ---
>  2
> (1 row)
> 
> 
> I think ivm_parent and ivm_child should return 2 rows.

Good point!
I'll investigate this more, but we may have to prohibit views on partitioned
table and partitions.

> (3)
> I think IVM does not support foreign table, but try to make IVM.
> 
> postgres=# create incremental materialized view ivm_foreign as select c from 
> foreign_table;
> NOTICE:  could not create an index on materialized view "ivm_foreign" 
> automatically
> HINT:  Create an index on the materialized view for efficient incremental 
> maintenance.
> ERROR:  "foreign_table" is a foreign table
> DETAIL:  Triggers on foreign tables cannot have transition tables.
> 
> It finally failed to make IVM, but I think it should be checked more early.

You are right. We don't support foreign tables as long as we use triggers.

 I'll fix.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: archive status ".ready" files may be created too early

2021-08-04 Thread Kyotaro Horiguchi
By the way about the v3 patch,

+#define InvalidXLogSegNo   ((XLogSegNo) 0x)

Like InvalidXLogRecPtr, the first valid segment number is 1 so we can
use 0 as InvalidXLogSegNo.

BootStrapXLOG():
/* Create first XLOG segment file */
openLogFile = XLogFileInit(1);

KeepLogSeg():
/* avoid underflow, don't go below 1 */
if (currSegNo <= keep_segs)
segno = 1;

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Sort keys are omitted incorrectly after grouping sets

2021-08-04 Thread Greg Stark
On Tue, 3 Aug 2021 at 00:04, Richard Guo  wrote:
>
> Is this a problem we should be worried about?

It's easy to see this produce output in the wrong order:

postgres=# select a, b from (values (1,1),(2,2)) as foo(a,b) where a =
b group by cube(a, b) order by a, b nulls first;
 a | b
---+---
 1 |
 1 | 1
 2 | 2
 2 |
   |
   | 1
   | 2
(7 rows)

postgres=# select a, b from (values (1,1),(2,2)) as foo(a,b) where a =
b group by cube(a, b) order by a, b nulls last;
 a | b
---+---
 1 |
 1 | 1
 2 | 2
 2 |
   |
   | 1
   | 2
(7 rows)

I know we had a hack to deal with outer joins "placeholder vars" or
something like that. I imagine the same thing needs to happen here.

Incidentally, the same thing doesn't happen for a VALUES clause with a
single row value. There it seems we inline the row value and the plan
ends up ordering on both -- though it's hard to tell because the way
the explain plan is formatted makes it hard to see what's going on:

postgres=# select a, b from (values (1,1)) as foo(a,b) where a = b
group by cube(a, b) order by a, b nulls first;
 a | b
---+---
 1 |
 1 | 1
   |
   | 1
(4 rows)


postgres=# explain select a, b from (values (1,1)) as foo(a,b) where a
= b group by cube(a, b) order by a, b nulls first;
   QUERY PLAN

 Sort  (cost=0.10..0.11 rows=4 width=8)
   Sort Key: (1), (1) NULLS FIRST
   ->  MixedAggregate  (cost=0.00..0.06 rows=4 width=8)
 Hash Key: 1, 1
 Hash Key: 1
 Hash Key: 1
 Group Key: ()
 ->  Result  (cost=0.00..0.01 rows=1 width=8)
   One-Time Filter: (1 = 1)
(9 rows)

With two rows we're clearly not inlining it and clearly ordering on
only the first column:

postgres=# explain select a, b from (values (1,1),(2,2)) as foo(a,b)
where a = b group by cube(a, b) order by a, b nulls first;
   QUERY PLAN
-
 Sort  (cost=0.12..0.13 rows=4 width=8)
   Sort Key: "*VALUES*".column1
   ->  MixedAggregate  (cost=0.00..0.08 rows=4 width=8)
 Hash Key: "*VALUES*".column1, "*VALUES*".column2
 Hash Key: "*VALUES*".column1
 Hash Key: "*VALUES*".column2
 Group Key: ()
 ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=1 width=8)
   Filter: (column1 = column2)
(9 rows)


-- 
greg




Re: Commitfest overflow

2021-08-04 Thread Noah Misch
On Tue, Aug 03, 2021 at 12:13:44PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Tue, Aug  3, 2021 at 04:53:40PM +0100, Simon Riggs wrote:
> >> There are 273 patches in the queue for the Sept Commitfest already, so
> >> it seems clear the queue is not being cleared down each CF as it was
> >> before. We've been trying hard, but it's overflowing.

> 1. There wasn't that much getting done during this CF because it's
> summer and many people are on vacation (in the northern hemisphere
> anyway).

> I don't think there's much to be done about the vacation effect;
> we just have to accept that the summer CF is likely to be less
> productive than others.

Early commitfests recognized a rule that patch authors owed one review per
patch registered in the commitfest.  If authors were holding to that, then
both submissions and reviews would slow during vacations, but the neglected
fraction of the commitfest would be about the same.  I think it would help to
track each author's balance (reviews_done - reviews_owed).




Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-04 Thread Masahiko Sawada
On Wed, Aug 4, 2021 at 9:19 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, August 4, 2021 7:00 PM Masahiko Sawada  
> wrote
> > On Wed, Aug 4, 2021 at 5:06 PM houzj.f...@fujitsu.com 
> >  wrote:
> > >
> > > On Wednesday, August 4, 2021 1:47 PM Masahiko Sawada 
> > > 
> > > >
> > > > I've not looked at the patch deeply yet but I think that the
> > > > following one line change seems to fix the cases you reported,
> > > > although I migit be missing
> > > > something:
> > > >
> > > > diff --git a/src/backend/commands/subscriptioncmds.c
> > > > b/src/backend/commands/subscriptioncmds.c
> > > > index 22ae982328..c74d6d51d6 100644
> > > > --- a/src/backend/commands/subscriptioncmds.c
> > > > +++ b/src/backend/commands/subscriptioncmds.c
> > > > @@ -1068,7 +1068,7 @@ AlterSubscription(ParseState *pstate,
> > > > AlterSubscriptionStmt *stmt,
> > > > PreventInTransactionBlock(isTopLevel, "ALTER
> > > > SUBSCRIPTION with refresh");
> > > >
> > > > /* Only refresh the added/dropped list of 
> > > > publications.
> > */
> > > > -   sub->publications = stmt->publication;
> > > > +   sub->publications = publist;
> > > >
> > > > AlterSubscription_refresh(sub, opts.copy_data);
> > > > }
> > >
> > > I considered the same fix before, but with the above fix, it seems
> > > refresh both existsing publications and new publication, which most of
> > > people didn't like in previous discussion[1]. From the doc[2], IIRC,
> > > Currently, the ADD/DROP PUBLICATION is supposed to only refresh the new
> > publication.
> >
> > What do you mean by refreshing both existing and new publications? I dropped
> > and created one publication out of three publications that the subscription 
> > is
> > subscribing to but the corresponding entries in pg_subscription_rel for 
> > tables
> > associated with the other two publications were not changed and the table 
> > sync
> > workers also were not started for them.
> >
>
> +   sub->publications = publist;
> -   sub->publications = stmt->publication;
> With the above fix, When ADD/DROP PUBLICATION, I meant the table that doesn't
> belong to the dropped or added publication could be updated in
> pg_subscription_rel.
>
> I can see the effect in the following example:
>
> 1)-publisher-
> CREATE TABLE test,test2,test3
> CREATE PUBLICATION pub FOR TABLE test;
> CREATE PUBLICATION pub2 FOR TABLE test2;
> 2)-subscriber-
> CREATE TABLE test,test2,test3
> CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=1' PUBLICATION 
> pub,pub2;
> select * from pg_subscription_rel;
> -[ RECORD 1 ]-
> srsubid| 16393
> srrelid| 16387
> srsubstate | r
> srsublsn   | 0/150A2D8
> -[ RECORD 2 ]-
> srsubid| 16393
> srrelid| 16384
> srsubstate | r
> srsublsn   | 0/150A310
>
> 3)-publisher-
> alter publication pub add table test3;
> 4)-subscriber-
> ALTER SUBSCRIPTION sub drop PUBLICATION pub2;
> select * from pg_subscription_rel;
> -[ RECORD 1 ]-
> srsubid| 16393
> srrelid| 16384
> srsubstate | r
> srsublsn   | 0/150A310
> -[ RECORD 2 ]-
> srsubid| 16393
> srrelid| 16390
> srsubstate | r
> srsublsn   |
>
> I can see the [RECORD 2] which is the new registered table in 'pub2' (see 
> step 3)
> has been added to the pg_subscription_rel. Based pervious discussion and
> document, that table seems shouldn't be refreshed when drop another
> publication.

Thanks for the explanation! I understood the problem.

I've reviewed v2 patch. Here are some comments:

+   if (type == ALTER_SUBSCRIPTION_SET_PUBLICATION ||
+   type == ALTER_SUBSCRIPTION_REFRESH)
+   drop_table = !bsearch(&relid, pubrel_local_oids,
+ list_length(pubrel_names),
+ sizeof(Oid), oid_cmp);
+   else if (type == ALTER_SUBSCRIPTION_DROP_PUBLICATION)
+   drop_table = bsearch(&relid, pubrel_local_oids,
+list_length(pubrel_names),
+sizeof(Oid), oid_cmp);

IIUC, in DROP PUBLICATION cases, we don't need to refer to the tables
in the publication on the publisher that we're dropping in order to
check whether to remove the relation. If we drop a table from the
publication on the publisher before dropping the publication from the
subscription on the subscriber, the pg_subscription_rel entry for the
table remains. For example, please consider the following case:

On publisher and subscriber:
create table t1 (a int);
create table t2 (a int);
create table t3 (a int);

On publisher:
create publication pub12 for table t1, t2;
create publication pub3 for table t3;

On subscriber:
create subscription sub connection 'dbname=postgres' publication pub12, pub3;

On publisher:
alter publication pub12 drop table t2;

On subscriber:
alter subscription sub drop publication pub12;
select 

Re: archive status ".ready" files may be created too early

2021-08-04 Thread Bossart, Nathan
On 8/4/21, 9:05 PM, "Kyotaro Horiguchi"  wrote:
> By the way about the v3 patch,
>
> +#define InvalidXLogSegNo   ((XLogSegNo) 0x)
>
> Like InvalidXLogRecPtr, the first valid segment number is 1 so we can
> use 0 as InvalidXLogSegNo.

It's been a while since I wrote this, but if I remember correctly, the
issue with using 0 is that we could end up initializing
lastNotifiedSeg to InvalidXLogSegNo in XLogWrite().  Eventually, we'd
initialize it to 1, but we will have skipped creating the .ready file
for the first segment.

Nathan



Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-04 Thread Masahiko Sawada
On Thu, Aug 5, 2021 at 2:08 PM Masahiko Sawada  wrote:
>
> On Wed, Aug 4, 2021 at 9:19 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Wednesday, August 4, 2021 7:00 PM Masahiko Sawada 
> >  wrote
> > > On Wed, Aug 4, 2021 at 5:06 PM houzj.f...@fujitsu.com 
> > >  wrote:
> > > >
> > > > On Wednesday, August 4, 2021 1:47 PM Masahiko Sawada 
> > > > 
> > > > >
> > > > > I've not looked at the patch deeply yet but I think that the
> > > > > following one line change seems to fix the cases you reported,
> > > > > although I migit be missing
> > > > > something:
> > > > >
> > > > > diff --git a/src/backend/commands/subscriptioncmds.c
> > > > > b/src/backend/commands/subscriptioncmds.c
> > > > > index 22ae982328..c74d6d51d6 100644
> > > > > --- a/src/backend/commands/subscriptioncmds.c
> > > > > +++ b/src/backend/commands/subscriptioncmds.c
> > > > > @@ -1068,7 +1068,7 @@ AlterSubscription(ParseState *pstate,
> > > > > AlterSubscriptionStmt *stmt,
> > > > > PreventInTransactionBlock(isTopLevel, "ALTER
> > > > > SUBSCRIPTION with refresh");
> > > > >
> > > > > /* Only refresh the added/dropped list of 
> > > > > publications.
> > > */
> > > > > -   sub->publications = stmt->publication;
> > > > > +   sub->publications = publist;
> > > > >
> > > > > AlterSubscription_refresh(sub, opts.copy_data);
> > > > > }
> > > >
> > > > I considered the same fix before, but with the above fix, it seems
> > > > refresh both existsing publications and new publication, which most of
> > > > people didn't like in previous discussion[1]. From the doc[2], IIRC,
> > > > Currently, the ADD/DROP PUBLICATION is supposed to only refresh the new
> > > publication.
> > >
> > > What do you mean by refreshing both existing and new publications? I 
> > > dropped
> > > and created one publication out of three publications that the 
> > > subscription is
> > > subscribing to but the corresponding entries in pg_subscription_rel for 
> > > tables
> > > associated with the other two publications were not changed and the table 
> > > sync
> > > workers also were not started for them.
> > >
> >
> > +   sub->publications = publist;
> > -   sub->publications = stmt->publication;
> > With the above fix, When ADD/DROP PUBLICATION, I meant the table that 
> > doesn't
> > belong to the dropped or added publication could be updated in
> > pg_subscription_rel.
> >
> > I can see the effect in the following example:
> >
> > 1)-publisher-
> > CREATE TABLE test,test2,test3
> > CREATE PUBLICATION pub FOR TABLE test;
> > CREATE PUBLICATION pub2 FOR TABLE test2;
> > 2)-subscriber-
> > CREATE TABLE test,test2,test3
> > CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=1' PUBLICATION 
> > pub,pub2;
> > select * from pg_subscription_rel;
> > -[ RECORD 1 ]-
> > srsubid| 16393
> > srrelid| 16387
> > srsubstate | r
> > srsublsn   | 0/150A2D8
> > -[ RECORD 2 ]-
> > srsubid| 16393
> > srrelid| 16384
> > srsubstate | r
> > srsublsn   | 0/150A310
> >
> > 3)-publisher-
> > alter publication pub add table test3;
> > 4)-subscriber-
> > ALTER SUBSCRIPTION sub drop PUBLICATION pub2;
> > select * from pg_subscription_rel;
> > -[ RECORD 1 ]-
> > srsubid| 16393
> > srrelid| 16384
> > srsubstate | r
> > srsublsn   | 0/150A310
> > -[ RECORD 2 ]-
> > srsubid| 16393
> > srrelid| 16390
> > srsubstate | r
> > srsublsn   |
> >
> > I can see the [RECORD 2] which is the new registered table in 'pub2' (see 
> > step 3)
> > has been added to the pg_subscription_rel. Based pervious discussion and
> > document, that table seems shouldn't be refreshed when drop another
> > publication.
>
> Thanks for the explanation! I understood the problem.
>
> I've reviewed v2 patch. Here are some comments:

Regarding regression tests, since ADD/DROP PUBLICATION logic could be
complicated it seems to me that we can add tests to a new file, say
alter_sub_pub.pl, that includes some scenarios as I mentioned in the
previous message.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Commitfest overflow

2021-08-04 Thread Andrey Borodin



> 5 авг. 2021 г., в 09:25, Noah Misch  написал(а):
> 
> On Tue, Aug 03, 2021 at 12:13:44PM -0400, Tom Lane wrote:
>> Bruce Momjian  writes:
>>> On Tue, Aug  3, 2021 at 04:53:40PM +0100, Simon Riggs wrote:
 There are 273 patches in the queue for the Sept Commitfest already, so
 it seems clear the queue is not being cleared down each CF as it was
 before. We've been trying hard, but it's overflowing.
> 
>> 1. There wasn't that much getting done during this CF because it's
>> summer and many people are on vacation (in the northern hemisphere
>> anyway).
> 
>> I don't think there's much to be done about the vacation effect;
>> we just have to accept that the summer CF is likely to be less
>> productive than others.
> 
> Early commitfests recognized a rule that patch authors owed one review per
> patch registered in the commitfest.  If authors were holding to that, then
> both submissions and reviews would slow during vacations, but the neglected
> fraction of the commitfest would be about the same.  I think it would help to
> track each author's balance (reviews_done - reviews_owed).

+1 for tracking this.
BTW when review is done? When first revision is published? Or when patch is 
committed\rollbacked?
When the review is owed? At the moment when patch is submitted? Or when it is 
committed?

Best regards, Andrey Borodin.



Re: row filtering for logical replication

2021-08-04 Thread Peter Smith
v21 --> v22

(This small change is only to keep the patch up-to-date with HEAD)

Changes:

* A recent commit [1] added a new TAP subscription test file 023, so
now this patch's test file (previously "023_row_filter.pl") has been
bumped to "024_row_filter.pl".

--
[1] 
https://github.com/postgres/postgres/commit/63cf61cdeb7b0450dcf3b2f719c553177bac85a2

Kind Regards,
Peter Smith.
Fujitsu Australia


v22-0001-Row-filter-for-logical-replication.patch
Description: Binary data