Re: pgbench bug candidate: negative "initial connection time"

2021-06-18 Thread Kyotaro Horiguchi
At Thu, 17 Jun 2021 11:52:04 +0200 (CEST), Fabien COELHO  
wrote in 
> > Ok. I gave up to change the state in threadRun. Instead, I changed the
> > condition at the end of bench, which enables to report abortion due to
> > socket errors.
> >
> > +@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
> > + #endif/* 
> > ENABLE_THREAD_SAFETY */
> > +
> > +   for (int j = 0; j < thread->nstate; j++)
> > +-  if (thread->state[j].state == CSTATE_ABORTED)
> > ++  if (thread->state[j].state != CSTATE_FINISHED)
> > +   exit_code = 2;
> > +
> > +   /* aggregate thread level stats */
> >
> > Does this make sense?
> 
> Yes, definitely.

I sought for a simple way to enforce all client finishes with the
states abort or finished but I didn't find.  So +1 for the
change. However, as a matter of style. if we touch the code maybe we
want to enclose the if statement.

Doing this means we regard any state other than CSTATE_FINISHED as
aborted. So, the current goto's to done in threadRun are effectively
aborting a part or the all clients running on the thread. So for
example the following place:

pgbench.c:6713
>/* must be something wrong */
>pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
>goto done;

Should say such like "thread %d aborted: %s() failed: ...".


I'm not sure what is the consensus here about the case where aborted
client can recoonect to the same server.  This patch doesn't that.  However, I 
think causing reconnection needs more work than accepted as a fix while beta.




+/* as the bench is already running, we do not abort */
 pg_log_error("client %d aborted while establishing connection", 
st->id);
 st->state = CSTATE_ABORTED;

The comment looks strange that it is saying "we don't abort" while
setting the state to CSTATE_ABORT then showing "client %d aborted".



   if ((con = doConnect()) == NULL)
+  {
+pg_log_fatal("connection for initialization failed");
 exit(1);

doConnect() prints an error emssage given from libpq.  So The
additional messaget is redundant.



   errno = THREAD_BARRIER_INIT(&barrier, nthreads);
   if (errno != 0)
+  {
 pg_log_fatal("could not initialize barrier: %m");
+exit(1);

This is a run-time error. Maybe we should return 2 in that case.


===
 if (thread->logfile == NULL)
 {
   pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-  goto done;
+  exit(1);

Maybe we should exit with 2 this case.  If we exit this case, we might
also want to exit when fclose() fails.  (Currently the error of
fclose() is ignored.)



===
+/* coldly abort on connection failure */
+pg_log_fatal("cannot create connection for thread %d client %d",
+   thread->tid, i);
+exit(1);

It seems to me that the "thread %d client %d(not clinent id but the
client index within the thread)" doesn't make sense to users.  Even if
we showed a message like that, it should show only the global clinent
id (cstate->id).

I think that we should return with 2 here but we return with 1
in another place for the same reason..


===
 /* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
 goto done;

Why doesn't a fatal error cause an immediate exit?  (And if we change
this to fatal, we also need to change similar errors in the same
function to fatal.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PoC] Federated Authn/z with OAUTHBEARER

2021-06-18 Thread Heikki Linnakangas

On 08/06/2021 19:37, Jacob Champion wrote:

We've been working on ways to expand the list of third-party auth
methods that Postgres provides. Some example use cases might be "I want
to let anyone with a Google account read this table" or "let anyone who
belongs to this GitHub organization connect as a superuser".


Cool!


The iddawc dependency for client-side OAuth was extremely helpful to
develop this proof of concept quickly, but I don't think it would be an
appropriate component to build a real feature on. It's extremely
heavyweight -- it incorporates a huge stack of dependencies, including
a logging framework and a web server, to implement features we would
probably never use -- and it's fairly difficult to debug in practice.
If a device authorization flow were the only thing that libpq needed to
support natively, I think we should just depend on a widely used HTTP
client, like libcurl or neon, and implement the minimum spec directly
against the existing test suite.


You could punt and let the application implement that stuff. I'm 
imagining that the application code would look something like this:


conn = PQconnectStartParams(...);
for (;;)
{
status = PQconnectPoll(conn)
switch (status)
{
case CONNECTION_SASL_TOKEN_REQUIRED:
/* open a browser for the user, get token */
token = open_browser()
PQauthResponse(token);
break;
...
}
}

It would be nice to have a simple default implementation, though, for 
psql and all the other client applications that come with PostgreSQL itself.



If you've read this far, thank you for your interest, and I hope you
enjoy playing with it!


A few small things caught my eye in the backend oauth_exchange function:


+   /* Handle the client's initial message. */
+   p = strdup(input);


this strdup() should be pstrdup().

In the same function, there are a bunch of reports like this:


   ereport(ERROR,
+  (errcode(ERRCODE_PROTOCOL_VIOLATION),
+   errmsg("malformed OAUTHBEARER message"),
+   errdetail("Comma expected, but found character 
\"%s\".",
+ sanitize_char(*p;


I don't think the double quotes are needed here, because sanitize_char 
will return quotes if it's a single character. So it would end up 
looking like this: ... found character "'x'".


- Heikki




RE: locking [user] catalog tables vs 2pc vs logical rep

2021-06-18 Thread osumi.takami...@fujitsu.com
On  Friday, June 18, 2021 11:41 AM osumi.takami...@fujitsu.com 
 wrote:
> On Thursday, June 17, 2021 10:34 PM Simon Riggs
>  wrote:
> > On Thu, Jun 17, 2021 at 12:57 PM Amit Kapila 
> > wrote:
> > > On Thu, Jun 17, 2021 at 4:27 PM Amit Kapila
> > > 
> > wrote:
> > > >
> > > > On Thu, Jun 17, 2021 at 8:41 AM osumi.takami...@fujitsu.com
> > > >  wrote:
> > > >
> > > > Pushed!
> > > >
> > > [Responding to Simon's comments]
> > >
> > > > If LOCK and TRUNCATE is advised against on all user catalog
> > > > tables, why would CLUSTER only apply to pg_class? Surely its
> > > > locking level is the
> > same as LOCK?
> > > >
> > >
> > > Cluster will also apply to all user catalog tables. I think we can
> > > extend it slightly as we have mentioned for Lock.
> >
> > OK, good.
> >
> > > > The use of "[user]" isn't fully explained, so it might not be
> > > > clear that this applies to both Postgres catalog tables and any
> > > > user tables that
> > have been nominated as catalogs. Probably worth linking to the
> "Capabilities"
> > section to explain.
> > > >
> > >
> > > Sounds reasonable.
> Simon, I appreciate your suggestions and yes, if the user catalog table is
> referenced by the output plugin, it can be another cause of the deadlock.
> 
> I'm going to post the patch for the those two changes, accordingly.
Hi, I've made the patch-set to cover the discussion above for all-supported 
versions.
Please have a look at those.


Best Regards,
Takamichi Osumi



HEAD_extend_logical_decoding_caveats_in_sync_mode_v01.patch
Description:  HEAD_extend_logical_decoding_caveats_in_sync_mode_v01.patch


PG10_extend_logical_decoding_caveats_in_sync_mode_v01.patch
Description:  PG10_extend_logical_decoding_caveats_in_sync_mode_v01.patch


PG11_extend_logical_decoding_caveats_in_sync_mode_v01.patch
Description:  PG11_extend_logical_decoding_caveats_in_sync_mode_v01.patch


PG12_extend_logical_decoding_caveats_in_sync_mode_v01.patch
Description:  PG12_extend_logical_decoding_caveats_in_sync_mode_v01.patch


PG13_extend_logical_decoding_caveats_in_sync_mode_v01.patch
Description:  PG13_extend_logical_decoding_caveats_in_sync_mode_v01.patch


PG96_extend_logical_decoding_caveats_in_sync_mode_v01.patch
Description:  PG96_extend_logical_decoding_caveats_in_sync_mode_v01.patch


Re: fdatasync performance problem with large number of DB files

2021-06-18 Thread Fujii Masao




On 2021/06/04 23:39, Justin Pryzby wrote:

You said switching to SIGHUP "would have zero effect"; but, actually it allows
an admin who's DB took a long time in recovery/startup to change the parameter
without shutting down the service.  This mitigates the downtime if it crashes
again.  I think that's at least 50% of how this feature might end up being
used.


Yes, it would have an effect when the server is automatically restarted
after crash when restart_after_crash is enabled. At least for me +1 to
your proposed change.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: disfavoring unparameterized nested loops

2021-06-18 Thread Ashutosh Bapat
>
> The problem I have with this idea is that I really don't know how to
> properly calculate what the risk_factor should be set to.  It seems
> easy at first to set it to something that has the planner avoid these
> silly 1-row estimate nested loop mistakes, but I think what we'd set
> the risk_factor to would become much more important when more and more
> Path types start using it. So if we did this and just guessed the
> risk_factor, that might be fine when only 1 of the paths being
> compared had a non-zero risk_factor, but as soon as both paths have
> one set, unless they're set to something sensible, then we just end up
> comparing garbage costs to garbage costs.

Risk factor is the inverse of confidence on estimate, lesser
confidence, higher risk. If we associate confidence with the
selectivity estimate, or computer confidence interval of the estimate
instead of a single number, we can associate risk factor with each
estimate. When we combine estimates to calculate new estimates, we
also combine their confidences/confidence intervals. If my memory
serves well, confidence intervals/confidences are calculated based on
the sample size and method used for estimation, so we should be able
to compute those during ANALYZE.

I have not come across many papers which leverage this idea. Googling
"selectivity estimation confidence interval", does not yield many
papers. Although I found [1] to be using a similar idea. So may be
there's not merit in this idea, thought theoretically it sounds fine
to me.


[1] 
https://pi3.informatik.uni-mannheim.de/~moer/Publications/vldb18_smpl_synop.pdf
--
Best Wishes,
Ashutosh Bapat




Re: row filtering for logical replication

2021-06-18 Thread Amit Kapila
On Wed, Jun 9, 2021 at 5:33 AM Peter Smith  wrote:
>
> On Mon, May 10, 2021 at 11:42 PM Euler Taveira  wrote:
> >
> > On Mon, May 10, 2021, at 5:19 AM, Peter Smith wrote:
> >
> > AFAIK this is the latest patch available, but FYI it no longer applies
> > cleanly on HEAD.
> >
> > Peter, the last patch is broken since f3b141c4825. I'm still working on it 
> > for
> > the next CF. I already addressed the points suggested by Amit in his last
> > review; however, I'm still working on a cache for evaluating expression as
> > suggested by Andres. I hope to post a new patch soon.
>
> Is there any ETA for your new patch?
>
> In the interim can you rebase the old patch just so it builds and I can try 
> it?
>

I have rebased the patch so that you can try it out. The main thing I
have done is to remove changes in worker.c and created a specialized
function to create estate for pgoutput.c as I don't think we need what
is done in worker.c.

Euler, do let me know if you are not happy with the change in pgoutput.c?

-- 
With Regards,
Amit Kapila.


v15-0001-Row-filter-for-logical-replication.patch
Description: Binary data


Re: Added schema level support for publication.

2021-06-18 Thread Ajin Cherian
On Thu, Jun 17, 2021 at 12:41 AM vignesh C  wrote:

> Thanks for reporting it, the attached patch is a rebased version of
> the patch with few review comment fixes, I will reply with the comment
> fixes to the respective mails.
>

I've applied the patch, it applies cleand and ran "make check" and
tests run fine.

Some comments for patch 1:

In the commit message, some grammar mistakes:

"Changes was done in
pg_dump to handle pubtype updation in pg_publication table while the database
gets upgraded."

-- change to --

Changes were done in
pg_dump to handle pubtype updation in pg_publication table while the database
gets upgraded.

=

Prototypes present in pg_publication.h was moved to publicationcmds.h so
that minimal datastructures ...

- change to --

Prototypes present in pg_publication.h were moved to publicationcmds.h so
that minimal datastructures ..



In patch 1:

In getObjectDescription()

+ if (!nspname)
+ {
+ pfree(pubname);
+ pfree(nspname);
+ ReleaseSysCache(tup);

Why free nspname if it is NULL?

Same comment in getObjectIdentityParts()


In GetAllTablesPublicationRelations()

+ ScanKeyData key[2];
  TableScanDesc scan;
  HeapTuple tuple;
  List*result = NIL;
+ int keycount = 0;

  classRel = table_open(RelationRelationId, AccessShareLock);

- ScanKeyInit(&key[0],
+ ScanKeyInit(&key[keycount++],

Here you have init key[1], but the code below in the same function
inits key[0]. I am not sure if this will affect that code below.

if (pubviaroot)
{
ScanKeyInit(&key[0],
Anum_pg_class_relkind,
BTEqualStrategyNumber, F_CHAREQ,
CharGetDatum(RELKIND_PARTITIONED_TABLE));
=

in UpdatePublicationTypeTupleValue():

+ tup = heap_modify_tuple(tup, RelationGetDescr(rel), values, nulls,
+ replaces);
+
+ /* Update the catalog. */
+ CatalogTupleUpdate(rel, &tup->t_self, tup);

Not sure if this tup needs to be freed or if the memory context will
take care of it.
=

regards,
Ajin Cherian
Fujitsu Australia




Supply restore_command to pg_rewind via CLI argument

2021-06-18 Thread Andrey Borodin
Hi hackers!

Starting from v13 pg_rewind can use restore_command if it lacks necessary WAL 
segments. And this is awesome for HA clusters with many nodes! Thanks to 
everyone who worked on the feature!

Here's some feedback on how to make things even better.

If we run 'pg_rewind --restore-target-wal' there must be restore_command in 
config of target installation. But if the config is not within 
$PGDATA\postgresql.conf pg_rewind cannot use it.
If we run postmaster with `-c 
config_file=/etc/postgresql/10/data/postgresql.conf`, we simply cannot use the 
feature. We solved the problem by putting config into PGDATA only during 
pg_rewind, but this does not seem like a very robust solution.

Maybe we could add "-C, --target-restore-command=COMMAND  target WAL 
restore_command\n" as was proposed within earlier versions of the patch[0]? Or 
instruct pg_rewind to pass config to 'postgres -C restore_command' run?

From my POV adding --target-restore-command is simplest way, I can extract 
corresponding portions from original patch.

Thanks!

Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/CAPpHfduUqKLr2CRpcpHcv1qjaz%2B-%2Bi9bOL2AOvdWSr954ti8Xw%40mail.gmail.com#1d4b372b5aa26f93af9ed1d5dd0693cd



Re: Asymmetric partition-wise JOIN

2021-06-18 Thread Alexander Pyhalov

Andrey Lepikhov писал 2021-05-27 07:27:

Next version of the patch.
For searching any problems I forced this patch during 'make check'
tests. Some bugs were found and fixed.


Hi.
I've tested this patch and haven't found issues, but I have some 
comments.


src/backend/optimizer/path/joinrels.c:

1554
1555 /*
1556  * Build RelOptInfo on JOIN of each partition of the outer relation 
and the inner
1557  * relation. Return List of such RelOptInfo's. Return NIL, if at 
least one of

1558  * these JOINs are impossible to build.
1559  */
1560 static List *
1561 extract_asymmetric_partitionwise_subjoin(PlannerInfo *root,
1562 
 RelOptInfo *joinrel,
1563 
 AppendPath *append_path,
1564 
 RelOptInfo *inner_rel,
1565 
 JoinType jointype,
1566 
 JoinPathExtraData *extra)

1567 {
1568 List*result = NIL;
1569 ListCell*lc;
1570
1571 foreach (lc, append_path->subpaths)
1572 {
1573 Path*child_path = lfirst(lc);
1574 RelOptInfo  *child_rel = 
child_path->parent;

1575 Relids  child_join_relids;
1576 Relids  parent_relids;
1577 RelOptInfo  *child_join_rel;
1578 SpecialJoinInfo *child_sjinfo;
1579 List*child_restrictlist;


Variable names - child_join_rel and child_join_relids seem to be 
inconsistent with rest of the file (I see child_joinrelids in 
try_partitionwise_join() and child_joinrel in try_partitionwise_join() 
and get_matching_part_pairs()).



1595 child_join_rel = build_child_join_rel(root,
1596 
  child_rel,
1597 
  inner_rel,
1598 
  joinrel,
1599 
  child_restrictlist,
1600 
  child_sjinfo,
1601 
  jointype);

1602 if (!child_join_rel)
1603 {
1604 /*
1605  * If can't build JOIN between 
inner relation and one of the outer

1606  * partitions - return immediately.
1607  */
1608 return NIL;
1609 }

When build_child_join_rel() can return NULL?
If I read code correctly, joinrel is created in the begining of 
build_child_join_rel() with makeNode(), makeNode() wraps newNode() and 
newNode() uses MemoryContextAllocZero()/MemoryContextAllocZeroAligned(), 
which would error() on alloc() failure.


1637
1638 static bool
1639 is_asymmetric_join_capable(PlannerInfo *root,
1640RelOptInfo 
*outer_rel,
1641RelOptInfo 
*inner_rel,
1642JoinType 
jointype)

1643 {

Function misses a comment.

1656 /*
1657  * Don't allow asymmetric JOIN of two append subplans.
1658  * In the case of a parameterized NL join, a 
reparameterization procedure will

1659  * lead to large memory allocations and a CPU consumption:
1660  * each reparameterize will induce subpath duplication, 
creating new
1661  * ParamPathInfo instance and increasing of ppilist up to 
number of partitions
1662  * in the inner. Also, if we have many partitions, each 
bitmapset
1663  * variable will large and many leaks of such variable 
(caused by relid

1664  * replacement) will highly increase memory consumption.
1665  * So, we deny such paths for now.
1666  */


Missing word:
each bitmapset variable will large => each bitmapset variable will be 
large



1694 foreach (lc, outer_rel->pathlist)
1695 {
1696 AppendPath *append_path = lfirst(lc);
1697
1698 /*
1699  * MEMO: We assume this pathlist keeps at least one 
AppendPath that
1700  * represents partitioned table-scan, symmetric or 
as

Re: pgbench bug candidate: negative "initial connection time"

2021-06-18 Thread Fabien COELHO



Hello,


Doing this means we regard any state other than CSTATE_FINISHED as
aborted. So, the current goto's to done in threadRun are effectively
aborting a part or the all clients running on the thread. So for
example the following place:

pgbench.c:6713

   /* must be something wrong */
   pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
   goto done;


Should say such like "thread %d aborted: %s() failed: ...".


Yep, possibly.


I'm not sure what is the consensus here about the case where aborted
client can recoonect to the same server.  This patch doesn't that.


Non trivial future work.

However, I think causing reconnection needs more work than accepted as 
a fix while beta.


It is an entire project which requires some thinking about.


+/* as the bench is already running, we do not abort */
pg_log_error("client %d aborted while establishing connection", 
st->id);
st->state = CSTATE_ABORTED;

The comment looks strange that it is saying "we don't abort" while
setting the state to CSTATE_ABORT then showing "client %d aborted".


Indeed. There is abort from the client, which just means that it stops 
sending transaction, and abort for the process, which is basically 
"exit(1)".




  if ((con = doConnect()) == NULL)
+  {
+pg_log_fatal("connection for initialization failed");
exit(1);

doConnect() prints an error emssage given from libpq.  So The
additional messaget is redundant.


This is not the same for me: doConnect may fail but we may decide to go 
retry the connection later, or just one client may be disconnected but 
others are going on, which is different from deciding to stop the whole 
program, which deserves a message on its own.




  errno = THREAD_BARRIER_INIT(&barrier, nthreads);
  if (errno != 0)
+  {
pg_log_fatal("could not initialize barrier: %m");
+exit(1);

This is a run-time error. Maybe we should return 2 in that case.


Hmmm. Yep.


===
if (thread->logfile == NULL)
{
  pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-  goto done;
+  exit(1);

Maybe we should exit with 2 this case.


Yep.

If we exit this case, we might also want to exit when fclose() fails. 
(Currently the error of fclose() is ignored.)


Not sure. I'd let it at that for now.


===
+/* coldly abort on connection failure */
+pg_log_fatal("cannot create connection for thread %d client %d",
+   thread->tid, i);
+exit(1);

It seems to me that the "thread %d client %d(not clinent id but the
client index within the thread)" doesn't make sense to users.  Even if
we showed a message like that, it should show only the global clinent
id (cstate->id).


This is not obvious to me. I think that we should be homogeneous with what 
is already done around.



I think that we should return with 2 here but we return with 1
in another place for the same reason..


Possibly.


/* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
goto done;

Why doesn't a fatal error cause an immediate exit?


Good point. I do not know, but I would expect it to be the case, and 
AFAICR it does not.


(And if we change this to fatal, we also need to change similar errors 
in the same function to fatal.)


Possibly.

I'll look into it over the week-end.

Thanks for the feedback!

--
Fabien.




Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-06-18 Thread Bharath Rupireddy
On Thu, Jun 10, 2021 at 6:36 PM Bharath Rupireddy
 wrote:
>
> On Thu, Jun 10, 2021 at 9:17 AM Bharath Rupireddy
>  wrote:
> > > I don't know the answer; my guess is that all this has become obsolete
> > > and the whole Assert and the dodgy comment can just be deleted.
> >
> > Hm. I get it. Unfortunately the commit b1ff33f is missing information
> > on what the coverity tool was complaining of and it has no related
> > discussion at all.
> >
> > I agree to remove that assertion entirely. I will post a new patch set soon.
>
> PSA v5 patch set.

PSA v6 patch set rebased onto the latest master.

With Regards,
Bharath Rupireddy.


v6-0001-Refactor-parse_subscription_options.patch
Description: Binary data


v6-0002-Remove-similar-ereport-calls-in-parse_subscriptio.patch
Description: Binary data


Re: Add version macro to libpq-fe.h

2021-06-18 Thread Boris Kolpackov
Tom Lane  writes:

> I think putting a version number as such in there is a truly
> horrid idea.  However, I could get behind adding a boolean flag
> that says specifically whether the pipeline feature exists.
> Then you'd do something like
> 
> #ifdef LIBPQ_HAS_PIPELINING
> 
> rather than embedding knowledge of exactly which release
> added that.

That would be even better, but I agree with what others have
said: we would have to keep adding such feature test macros
going forward.

I think ideally you would want to have both since the version
macro could still be helpful in dealing with "features" that you
did not plan to add (aka bugs).


> Comparing v13 and v14 libpq-fe.h, I see that there is a solution
> available now: "#ifdef PQ_QUERY_PARAM_MAX_LIMIT".

Hm, it must have been added recently since I don't see it in 14beta1.
But thanks for the pointer, if nothing better comes up this will
have to do.




Re: pgbench bug candidate: negative "initial connection time"

2021-06-18 Thread Fabien COELHO



   /* must be something wrong */
   pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
   goto done;


Should say such like "thread %d aborted: %s() failed: ...".


After having a lookg, there are already plenty such cases. I'd say not to 
change anything for beta, and think of it for the next round.




  errno = THREAD_BARRIER_INIT(&barrier, nthreads);
  if (errno != 0)
+  {
pg_log_fatal("could not initialize barrier: %m");
+exit(1);

This is a run-time error. Maybe we should return 2 in that case.


I think that you are right, but there are plenty such places where exit 
should be 2 instead of 1 if the doc is followed:


"""Errors during the run such as database errors or problems in the script 
will result in exit status 2."""


My beta take is to let these as they are, i.e. pretty inconsistent all 
over pgbench, and schedule a cleanup on the next round.



===
if (thread->logfile == NULL)
{
  pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-  goto done;
+  exit(1);

Maybe we should exit with 2 this case.


Yep.


The bench is not even started, this is not really run time yet, 1 seems 
ok. The failure may be due to a typo in the path which comes from the 
user.


If we exit this case, we might also want to exit when fclose() fails. 
(Currently the error of fclose() is ignored.)


Not sure. I'd let it at that for now.


I stand by this.


+/* coldly abort on connection failure */
+pg_log_fatal("cannot create connection for thread %d client %d",
+   thread->tid, i);
+exit(1);

It seems to me that the "thread %d client %d(not clinent id but the
client index within the thread)" doesn't make sense to users.  Even if
we showed a message like that, it should show only the global clinent
id (cstate->id).


This is not obvious to me. I think that we should be homogeneous with what is 
already done around.


ok for only giving the global client id.


I think that we should return with 2 here but we return with 1
in another place for the same reason..


Possibly.


Again for this one, the bench has not really started, so 1 seems fine.


/* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
goto done;

Why doesn't a fatal error cause an immediate exit?


Good point. I do not know, but I would expect it to be the case, and AFAICR 
it does not.


(And if we change this to fatal, we also need to change similar errors in 
the same function to fatal.)


Possibly.


On second look, I think that error is fine, indeed we do not stop the 
process, so "fatal" it is not;


Attached Yugo-san patch with some updates discussed in the previous mails, 
so as to move things along.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..202507f5d5 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3171,6 +3171,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 	if ((st->con = doConnect()) == NULL)
 	{
+		/* as the bench is already running, we do not abort the process */
 		pg_log_error("client %d aborted while establishing connection", st->id);
 		st->state = CSTATE_ABORTED;
 		break;
@@ -4405,7 +4406,10 @@ runInitSteps(const char *initialize_steps)
 	initPQExpBuffer(&stats);
 
 	if ((con = doConnect()) == NULL)
+	{
+		pg_log_fatal("connection for initialization failed");
 		exit(1);
+	}
 
 	setup_cancel_handler(NULL);
 	SetCancelConn(con);
@@ -6332,7 +6336,10 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con == NULL)
+	{
+		pg_log_fatal("setup connection failed");
 		exit(1);
+	}
 
 	pg_log_debug("pghost: %s pgport: %s nclients: %d %s: %d dbName: %s",
  PQhost(con), PQport(con), nclients,
@@ -6437,7 +6444,10 @@ main(int argc, char **argv)
 
 	errno = THREAD_BARRIER_INIT(&barrier, nthreads);
 	if (errno != 0)
+	{
 		pg_log_fatal("could not initialize barrier: %m");
+		exit(1);
+	}
 
 #ifdef ENABLE_THREAD_SAFETY
 	/* start all threads but thread 0 which is executed directly later */
@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
 #endif			/* ENABLE_THREAD_SAFETY */
 
 		for (int j = 0; j < thread->nstate; j++)
-			if (thread->state[j].state == CSTATE_ABORTED)
+			if (thread->state[j].state != CSTATE_FINISHED)
 exit_code = 2;
 
 		/* aggregate thread level stats */
@@ -6548,7 +6558,7 @@ threadRun(void *arg)
 		if (thread->logfile == NULL)
 		{
 			pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-			goto done;
+			exit(1);
 		}
 	}
 
@@ -6572,16 +6582,10 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 			{
-/*
- * On connection failure, we meet the barrier here in place of
- * GO before proceeding to the "done" path which will cleanup,
- * so as to avoid locking the process.
- *
- * It is unclear

Re: Add version macro to libpq-fe.h

2021-06-18 Thread Tom Lane
Boris Kolpackov  writes:
> Tom Lane  writes:
>> I think putting a version number as such in there is a truly
>> horrid idea.  However, I could get behind adding a boolean flag
>> that says specifically whether the pipeline feature exists.

> That would be even better, but I agree with what others have
> said: we would have to keep adding such feature test macros
> going forward.

Yes, and I think that is a superior solution.  I think the
argument that it's too much effort is basically nonsense.

> I think ideally you would want to have both since the version
> macro could still be helpful in dealing with "features" that you
> did not plan to add (aka bugs).

I really doubt that a version number appearing in libpq-fe.h would
be helpful for deciding whether you need to work around a bug.
The problem again is version skew: how well does the libpq.so you
are running against today match up with the header you compiled
against (possibly months ago, possibly on a different machine)?
What you'd want for that sort of thing is a runtime test, i.e.
consult PQlibVersion().

That point, along with the previously-discussed point about confusion
between server and libpq versions, nicely illustrates another reason
why I'm resistant to just adding a version number to libpq-fe.h.
If we do that, application programmers will be presented with THREE
different Postgres version numbers, and it seems inevitable that
people will make mistakes and consult the wrong one for a particular
purpose.  I think we can at least reduce the confusion by handling
the question of which-features-are-visible-in-the-include-file in a
different style.

regards, tom lane




Re: Centralizing protective copying of utility statements

2021-06-18 Thread Tom Lane
Julien Rouhaud  writes:
> On Thu, Jun 17, 2021 at 01:03:29PM -0400, Tom Lane wrote:
> + *   readOnlyTree: treat pstmt's node tree as read-only

> Maybe it's because I'm not a native english speaker, or because it's quite
> late here, but I don't find "treat as read-only" really clear.  I don't have a
> concise better wording to suggest.

Maybe "if true, pstmt's node tree must not be modified" ?

>> Still thinking about which way to fix it in the back branches.

> I'm +0.5 for a narrow fix, due to the possibility of unspotted similar problem
> vs possibility of performance regression ratio.

After sleeping on it another day, I'm inclined to think the same.  The
key attraction of a centralized fix is that it prevents the possibility
of new bugs of the same ilk in newly-added features.  Given how long
these CREATE/ALTER DOMAIN bugs escaped detection, it's hard to have
full confidence that there are no others in the back branches --- but
they must be in really lightly-used features.

regards, tom lane




Re: Add version macro to libpq-fe.h

2021-06-18 Thread Alvaro Herrera
On 2021-Jun-18, Boris Kolpackov wrote:

> Tom Lane  writes:
> 
> > I think putting a version number as such in there is a truly
> > horrid idea.  However, I could get behind adding a boolean flag
> > that says specifically whether the pipeline feature exists.
> > Then you'd do something like
> > 
> > #ifdef LIBPQ_HAS_PIPELINING
> > 
> > rather than embedding knowledge of exactly which release
> > added that.
> 
> That would be even better, but I agree with what others have
> said: we would have to keep adding such feature test macros
> going forward.

But we do not add that many significant features to libpq in the first
place, so I'm not sure it would be too bad.  As far as I am aware, this
is the first time someone has requested a mechanism to detect feature
presence specifically in libpq.

To put a number to it, I counted the number of commits to exports.txt
since Jan 2015 -- there are 17.  But many of them are just intra-release
fixups; the number of actual "features" is 11, an average of two per
year.  That seems small enough to me.

So I'm +1 on adding this "feature macro".

(The so-version major changed from 4 to 5 in commit 1e7bb2da573e, dated
April 2006.)

> I think ideally you would want to have both since the version
> macro could still be helpful in dealing with "features" that you
> did not plan to add (aka bugs).
> 
> 
> > Comparing v13 and v14 libpq-fe.h, I see that there is a solution
> > available now: "#ifdef PQ_QUERY_PARAM_MAX_LIMIT".
> 
> Hm, it must have been added recently since I don't see it in 14beta1.
> But thanks for the pointer, if nothing better comes up this will
> have to do.

Yeah, this one was added by commit cb92703384e2 on June 8th, three weeks
after beta1.

-- 
Álvaro Herrera   Valdivia, Chile
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)




Re: Centralizing protective copying of utility statements

2021-06-18 Thread Julien Rouhaud
On Fri, Jun 18, 2021 at 10:24:20AM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Thu, Jun 17, 2021 at 01:03:29PM -0400, Tom Lane wrote:
> > + * readOnlyTree: treat pstmt's node tree as read-only
> 
> > Maybe it's because I'm not a native english speaker, or because it's quite
> > late here, but I don't find "treat as read-only" really clear.  I don't 
> > have a
> > concise better wording to suggest.
> 
> Maybe "if true, pstmt's node tree must not be modified" ?

Thanks, I find it way better!




Re: Centralizing protective copying of utility statements

2021-06-18 Thread Tom Lane
Julien Rouhaud  writes:
> On Fri, Jun 18, 2021 at 10:24:20AM -0400, Tom Lane wrote:
>> Maybe "if true, pstmt's node tree must not be modified" ?

> Thanks, I find it way better!

OK, pushed that way, and with a couple other comment tweaks from
an additional re-reading.

regards, tom lane




Re: pgbench bug candidate: negative "initial connection time"

2021-06-18 Thread Yugo NAGATA
Hello Horiguchi-san, Fabien,

On Fri, 18 Jun 2021 15:58:48 +0200 (CEST)
Fabien COELHO  wrote:

> 
> >>>/* must be something wrong */
> >>>pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
> >>>goto done;
> >> 
> >> Should say such like "thread %d aborted: %s() failed: ...".
> 
> After having a lookg, there are already plenty such cases. I'd say not to 
> change anything for beta, and think of it for the next round.

Agreed. Basically, I think the existing message should be left as is.

> >> 
> >>   errno = THREAD_BARRIER_INIT(&barrier, nthreads);
> >>   if (errno != 0)
> >> +  {
> >> pg_log_fatal("could not initialize barrier: %m");
> >> +exit(1);
> >> 
> >> This is a run-time error. Maybe we should return 2 in that case.
> 
> I think that you are right, but there are plenty such places where exit 
> should be 2 instead of 1 if the doc is followed:
> 
> """Errors during the run such as database errors or problems in the script 
> will result in exit status 2."""
> 
> My beta take is to let these as they are, i.e. pretty inconsistent all 
> over pgbench, and schedule a cleanup on the next round.

As same as the below Fabian's comment about thread->logfile, 

> >> ===
> >> if (thread->logfile == NULL)
> >> {
> >>   pg_log_fatal("could not open logfile \"%s\": %m", logpath);
> >> -  goto done;
> >> +  exit(1);
> >> 
> >> Maybe we should exit with 2 this case.
> >
> > Yep.
> 
> The bench is not even started, this is not really run time yet, 1 seems 
> ok. The failure may be due to a typo in the path which comes from the 
> user.

the bench is not started at THREAD_BARRIER_INIT, so I think exit(1) is ok.  

> 
> >> /* must be something wrong */
> >> -pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
> >> +pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
> >> goto done;
> >> 
> >> Why doesn't a fatal error cause an immediate exit?
> >
> > Good point. I do not know, but I would expect it to be the case, and AFAICR 
> > it does not.
> >
> >> (And if we change this to fatal, we also need to change similar errors in 
> >> the same function to fatal.)
> >
> > Possibly.
> 
> On second look, I think that error is fine, indeed we do not stop the 
> process, so "fatal" it is not;

I replaced this 'fatal' with 'error' because we are aborting the client
instead of exit(1). When pgbench was rewritten to use common logging API
by the commit 30a3e772b40, somehow pg_log_fatal was used, but I am
wondering it should have be pg_log_error.

> Attached Yugo-san patch with some updates discussed in the previous mails, 
> so as to move things along.

Thank you for update. I agree with this fix.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




pg_stats and range statistics

2021-06-18 Thread Egor Rogov

Hi,

Statistics for range types are not currently exposed in pg_stats view 
(i.e. STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and 
STATISTIC_KIND_BOUNDS_HISTOGRAM).


Shouldn't they? If so, here is a patch for adding them.

The following is a simple example of what it looks like:

CREATE TABLE test(r int4range);
INSERT INTO test
    SELECT int4range((random()*10)::integer,(10+random()*10)::integer)
    FROM generate_series(1,1);
SET default_statistics_target = 10;
ANALYZE test;

SELECT range_length_histogram, range_length_empty_frac, 
range_bounds_histogram

FROM pg_stats
WHERE tablename = 'test' \gx

-[ RECORD 1 
]---+--

range_length_histogram  | {1,4,6,8,9,10,11,12,14,16,20}
range_length_empty_frac | {0.003666}
range_bounds_histogram  | 
{"[0,10)","[1,11)","[2,12)","[3,13)","[4,14)","[5,15)","[6,16)","[7,17)","[8,18)","[9,19)","[10,20)"}



Regards,
Egor Rogov.

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f517a7d4af..4d037b590e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -12877,6 +12877,36 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts 
ppx
non-null elements.  (Null for scalar types.)
   
  
+
+ 
+  
+   range_length_histogram anyarray
+  
+  
+   A histogram of lengths of non-empty, non-null ranges.
+   (Null for non-range types.)
+  
+ 
+
+ 
+  
+   range_length_empty_frac float4
+  
+  
+   A fraction of empty ranges.  (Null for non-range types.)
+  
+ 
+
+ 
+  
+   range_bounds_histogram anyarray
+  
+  
+   Histograms of lower and upper bounds of non-empty, non-null ranges,
+   combined into a single array of range values.
+   (Null for non-range types.)
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 999d984068..93fadfff15 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
 WHEN stakind3 = 5 THEN stanumbers3
 WHEN stakind4 = 5 THEN stanumbers4
 WHEN stakind5 = 5 THEN stanumbers5
-END AS elem_count_histogram
+END AS elem_count_histogram,
+CASE
+WHEN stakind1 = 6 THEN stavalues1
+WHEN stakind2 = 6 THEN stavalues2
+WHEN stakind3 = 6 THEN stavalues3
+WHEN stakind4 = 6 THEN stavalues4
+WHEN stakind5 = 6 THEN stavalues5
+END AS range_length_histogram,
+CASE
+WHEN stakind1 = 6 THEN stanumbers1[1]
+WHEN stakind2 = 6 THEN stanumbers2[1]
+WHEN stakind3 = 6 THEN stanumbers3[1]
+WHEN stakind4 = 6 THEN stanumbers4[1]
+WHEN stakind5 = 6 THEN stanumbers5[1]
+END AS range_length_empty_frac,
+CASE
+WHEN stakind1 = 7 THEN stavalues1
+WHEN stakind2 = 7 THEN stavalues2
+WHEN stakind3 = 7 THEN stavalues3
+WHEN stakind4 = 7 THEN stavalues4
+WHEN stakind5 = 7 THEN stavalues5
+END AS range_bounds_histogram
 FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
  JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
  LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/test/regress/expected/rules.out 
b/src/test/regress/expected/rules.out
index e5ab11275d..780aefdc26 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2409,7 +2409,31 @@ pg_stats| SELECT n.nspname AS schemaname,
 WHEN (s.stakind4 = 5) THEN s.stanumbers4
 WHEN (s.stakind5 = 5) THEN s.stanumbers5
 ELSE NULL::real[]
-END AS elem_count_histogram
+END AS elem_count_histogram,
+CASE
+WHEN (s.stakind1 = 6) THEN s.stavalues1
+WHEN (s.stakind2 = 6) THEN s.stavalues2
+WHEN (s.stakind3 = 6) THEN s.stavalues3
+WHEN (s.stakind4 = 6) THEN s.stavalues4
+WHEN (s.stakind5 = 6) THEN s.stavalues5
+ELSE NULL::anyarray
+END AS range_length_histogram,
+CASE
+WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+ELSE NULL::real
+END AS range_length_empty_frac,
+CASE
+WHEN (s.stakind1 = 7) THEN s.stavalues1
+WHEN (s.stakind2 = 7) THEN s.stavalues2
+WHEN (s.stakind3 = 7) THEN s.stavalues3
+WHEN (s.stakind4 = 7) THEN s.stavalues4
+WHEN (s.stakind5 = 7) THEN s.stavalues5
+

Re: disfavoring unparameterized nested loops

2021-06-18 Thread John Naylor
On Fri, Jun 18, 2021 at 6:20 AM Ashutosh Bapat 
wrote:

> I have not come across many papers which leverage this idea. Googling
> "selectivity estimation confidence interval", does not yield many
> papers. Although I found [1] to be using a similar idea. So may be
> there's not merit in this idea, thought theoretically it sounds fine
> to me.
>
>
> [1]
https://pi3.informatik.uni-mannheim.de/~moer/Publications/vldb18_smpl_synop.pdf

Well, that paper's title shows it's a bit too far forward for us, since we
don't use samples during plan time (although that's a separate topic worth
considering). From the references, however, this one gives some
mathematical framing of the problem that lead to the thread subject,
although I haven't read enough to see if we can get practical advice from
it:

Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the
size of join results.
https://www.csd.uoc.gr/~hy460/pdf/p268-ioannidis.pdf

--
John Naylor
EDB: http://www.enterprisedb.com


Re: PoC: Using Count-Min Sketch for join cardinality estimation

2021-06-18 Thread John Naylor
On Wed, Jun 16, 2021 at 8:23 PM Tomas Vondra 
wrote:

> Not really, but to be fair even for the histograms it's only really
> supported by "seems to work in practice" :-(

Hmm, we cite a theoretical result in analyze.c, but I don't know if there
is something better out there:

 * The following choice of minrows is based on the paper
 * "Random sampling for histogram construction: how much is enough?"
 * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in

What is more troubling to me is that we set the number of MCVs to the
number of histograms. Since b5db1d93d2a6 we have a pretty good method of
finding the MCVs that are justified. When that first went in, I
experimented with removing the MCV limit and found it easy to create value
distributions that lead to thousands of MCVs. I guess the best
justification now for the limit is plan time, but if we have a sketch also,
we can choose one or the other based on a plan-time speed vs accuracy
tradeoff (another use for a planner effort guc). In that scenario, for
tables with many MCVs we would only use them for restriction clauses.

--
John Naylor
EDB: http://www.enterprisedb.com


Version reporting in pgbench

2021-06-18 Thread Tom Lane
I see that commit 547f04e73 caused pgbench to start printing its
version number.  I think that's a good idea in general, but it
appears to me that next to no thought went into the details
(as perhaps evidenced by the fact that the commit message doesn't
even mention it).  I've got two beefs with how it was done:

* The output only mentions pgbench's own version, which would be
highly misleading if the server being used is of a different
version.  I should think that in most cases the server's version
is more important than pgbench's.

* We have a convention for how client programs should print their
versions, and this ain't it.  (Specifically, you should print the
PG_VERSION string not make up your own.)

What I think should have been done instead is to steal psql's
battle-tested logic for printing its startup version banner,
more or less as attached.

One point here is that printing the server version requires
access to a connection, which printResults() hasn't got
because we already closed all the connections by that point.
I solved that by printing the banner during the initial
connection that gets the scale factor, does vacuuming, etc.
If you're dead set on not printing the version till the end,
that could be made to happen; but it's not clear to me that
this way is any worse, and it's certainly easier.

Thoughts?

regards, tom lane

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..78b5ac612b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -63,6 +63,7 @@
 #include "common/username.h"
 #include "fe_utils/cancel.h"
 #include "fe_utils/conditional.h"
+#include "fe_utils/string_utils.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
 #include "pgbench.h"
@@ -5493,6 +5494,37 @@ printSimpleStats(const char *prefix, SimpleStats *ss)
 	}
 }
 
+/* print version banner */
+static void
+printVersion(PGconn *con)
+{
+	int			server_ver = PQserverVersion(con);
+	int			client_ver = PG_VERSION_NUM;
+
+	if (server_ver != client_ver)
+	{
+		const char *server_version;
+		char		sverbuf[32];
+
+		/* Try to get full text form, might include "devel" etc */
+		server_version = PQparameterStatus(con, "server_version");
+		/* Otherwise fall back on server_ver */
+		if (!server_version)
+		{
+			formatPGVersionNumber(server_ver, true,
+  sverbuf, sizeof(sverbuf));
+			server_version = sverbuf;
+		}
+
+		printf(_("%s (%s, server %s)\n"),
+			   "pgbench", PG_VERSION, server_version);
+	}
+	/* For version match, only print pgbench version */
+	else
+		printf("%s (%s)\n", "pgbench", PG_VERSION);
+	fflush(stdout);
+}
+
 /* print out results */
 static void
 printResults(StatsData *total,
@@ -5506,7 +5538,6 @@ printResults(StatsData *total,
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
 	double		tps = ntx / bench_duration;
 
-	printf("pgbench (PostgreSQL) %d.%d\n", PG_VERSION_NUM / 1, PG_VERSION_NUM % 100);
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
 		   num_scripts == 1 ? sql_script[0].desc : "multiple scripts");
@@ -6386,6 +6417,9 @@ main(int argc, char **argv)
 exit(1);
 	}
 
+	/* report pgbench and server versions */
+	printVersion(con);
+
 	if (!is_no_vacuum)
 	{
 		fprintf(stderr, "starting vacuum...");


A few nuances about specifying the timeline with START_REPLICATION

2021-06-18 Thread Jeff Davis
A few questions about this comment in walsender.c, originating in
commit abfd192b1b5b:

/*
 * Found the requested timeline in the history. Check that
 * requested startpoint is on that timeline in our history.
 *
 * This is quite loose on purpose. We only check that we didn't
 * fork off the requested timeline before the switchpoint. We
 * don't check that we switched *to* it before the requested
 * starting point. This is because the client can legitimately
 * request to start replication from the beginning of the WAL
 * segment that contains switchpoint, but on the new timeline, so
 * that it doesn't end up with a partial segment. If you ask for
 * too old a starting point, you'll get an error later when we
 * fail to find the requested WAL segment in pg_wal.
 *
 * XXX: we could be more strict here and only allow a startpoint
 * that's older than the switchpoint, if it's still in the same
 * WAL segment.
 */

1. I think there's a typo: it should be "fork off the requested
timeline before the startpoint", right?

2. It seems to imply that requesting an old start point is wrong, but I
don't see why. As long as the WAL is there (or at least the slot
boundaries), what's the problem? Can we either just change the comment
to say that it's fine to start on an ancestor of the requested timeline
(and maybe update the docs, too)?

3. I noticed when looking at this that the terminology in the docs is a
bit inconsistent between START_REPLICATION and
recovery_target_timeline.
  a. In recovery_target_timeline:
 i. a numeric value means "stop when this timeline forks"
 ii. "latest" means "follow forks along the newest timeline"
 iii. "current" is an alias for the backup's numerical timeline
  b. In the start START_REPLICATION docs:
 i. "current" means "follow forks along the newest timeline"
 ii. a numeric value that is equal to the current timeline is the
same as "current"
 iii. a numeric value that is less than the current timeline means
"stop when this timeline forks"

On point #3, it looks like START_REPLICATION could be improved:

  * Should we change the docs to say "latest" rather than "current"?
  * Should we change the behavior so that specifying the current
timeline as a number still means a historic timeline (e.g. it will stop
replicating there and emit a tuple)?
  * Should we add some keywords like "latest" or "current" to the
START_REPLICATION command?

Regards,
Jeff Davis






Re: Add version macro to libpq-fe.h

2021-06-18 Thread Tom Lane
Alvaro Herrera  writes:
> So I'm +1 on adding this "feature macro".

Concretely, how about the attached?  (I also got rid of a recently-added
extra comma.  While the compilers we use might not warn about that,
it seems unwise to assume that no user's compiler will.)

I guess one unresolved question is whether we want to mention these in
the SGML docs.  I vote "no", because it'll raise the maintenance cost
noticeably.  But I can see an argument on the other side.

regards, tom lane

diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ec378705ad..4677c51e1b 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -28,6 +28,13 @@ extern "C"
  */
 #include "postgres_ext.h"
 
+/*
+ * These symbols may be used in compile-time #ifdef tests for the availability
+ * of newer libpq features.
+ */
+#define LIBPQ_HAS_PIPELINING 1
+#define LIBPQ_HAS_TRACE_FLAGS 1
+
 /*
  * Option flags for PQcopyResult
  */
@@ -98,7 +105,7 @@ typedef enum
 	PGRES_COPY_BOTH,			/* Copy In/Out data transfer in progress */
 	PGRES_SINGLE_TUPLE,			/* single tuple from larger resultset */
 	PGRES_PIPELINE_SYNC,		/* pipeline synchronization point */
-	PGRES_PIPELINE_ABORTED,		/* Command didn't run because of an abort
+	PGRES_PIPELINE_ABORTED		/* Command didn't run because of an abort
  * earlier in a pipeline */
 } ExecStatusType;
 


Re: Pipeline mode and PQpipelineSync()

2021-06-18 Thread Alvaro Herrera
On 2021-Jun-16, Boris Kolpackov wrote:

> Specifically, the documentation[1]
> makes it sound like the use of PQpipelineSync() is optional (34.5.1.1
> "Issuing Queries"):

Hmm.  My intention here was to indicate that you should have
PQpipelineSync *somewhere*, but that the server was free to start
executing some commands even before that, if the buffered commands
happened to reach the server somehow -- but not necessarily that the
results from those commands would reach the client immediately.

I'll experiment a bit more to be sure that what I'm saying is correct.
But if it is, then I think the documentation you quote is misleading:

> "After entering pipeline mode, the application dispatches requests using
> PQsendQuery, PQsendQueryParams, or its prepared-query sibling
> PQsendQueryPrepared. These requests are queued on the client-side until
> flushed to the server; this occurs when PQpipelineSync is used to establish a
> synchronization point in the pipeline, or when PQflush is called. [...]
> 
> The server executes statements, and returns results, in the order the client
> sends them. The server will begin executing the commands in the pipeline
> immediately, not waiting for the end of the pipeline. [...]"

... because it'll lead people to do what you've done, only to discover
that it doesn't really work.

I think I should rephrase this to say that PQpipelineSync() is needed
where the user needs the server to start executing commands; and
separately indicate that it is possible (but not promised) that the
server would start executing commands ahead of time because $reasons.

Do I have it right that other than this documentation problem, you've
been able to use pipeline mode successfully?

> So to me it looks like, contrary to the documentation, the server does
> not start executing the statements immediately, instead waiting for the
> synchronization point. Or am I missing something here?

I don't think you are.

> The above tests were performed using libpq from 14beta1 running against
> PostgreSQL server version 9.5. If you would like to take a look at the
> actual code, you can find it here[2] (the PIPELINE_SYNC macro controls
> whether PQpipelineSync() is used).

Thanks.

> On a related note, I've been using libpq_pipeline.c[3] as a reference
> and I believe it has a busy loop calling PQflush() repeatedly on line
> 721 since once everything has been sent and we are waiting for the
> result, select() will keep returning with an indication that the socket
> is writable

Oops, thanks, will look at fixing this too.

> (you can find one way to fix this in [2]).
> [2] 
> https://git.codesynthesis.com/cgit/odb/libodb-pgsql/tree/odb/pgsql/statement.cxx?h=bulk#n771

Neat, can do likewise I suppose.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: Add version macro to libpq-fe.h

2021-06-18 Thread Alvaro Herrera
On 2021-Jun-18, Tom Lane wrote:

> Alvaro Herrera  writes:
> > So I'm +1 on adding this "feature macro".
> 
> Concretely, how about the attached?

Seems OK to me.  We can just accumulate any similar ones in the future
nearby.

> (I also got rid of a recently-added
> extra comma.  While the compilers we use might not warn about that,
> it seems unwise to assume that no user's compiler will.)

Oops.

> I guess one unresolved question is whether we want to mention these in
> the SGML docs.  I vote "no", because it'll raise the maintenance cost
> noticeably.  But I can see an argument on the other side.

Well, if we do want docs for these macros, then IMO it'd be okay to have
them in libpq-fe.h itself rather than SGML.  A one-line comment for each
would suffice:

+/*
+ * These symbols may be used in compile-time #ifdef tests for the availability
+ * of newer libpq features.
+ */
+/* Indicates presence of PQenterPipelineMode and friends */
+#define LIBPQ_HAS_PIPELINING 1
+
+/* Indicates presence of PQsetTraceFlags; PQtrace changed output format */
+#define LIBPQ_HAS_TRACE_FLAGS 1

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: Add version macro to libpq-fe.h

2021-06-18 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Jun-18, Tom Lane wrote:
>> I guess one unresolved question is whether we want to mention these in
>> the SGML docs.  I vote "no", because it'll raise the maintenance cost
>> noticeably.  But I can see an argument on the other side.

> Well, if we do want docs for these macros, then IMO it'd be okay to have
> them in libpq-fe.h itself rather than SGML.  A one-line comment for each
> would suffice:

WFM.  I'd sort of supposed that the symbol names were self-documenting,
but you're right that a line or so of annotation improves things.

regards, tom lane




Re: Version reporting in pgbench

2021-06-18 Thread Fabien COELHO



Hello Tom,


One point here is that printing the server version requires
access to a connection, which printResults() hasn't got
because we already closed all the connections by that point.
I solved that by printing the banner during the initial
connection that gets the scale factor, does vacuuming, etc.


Ok.


If you're dead set on not printing the version till the end,
that could be made to happen; but it's not clear to me that
this way is any worse, and it's certainly easier.


pgbench (14beta1 dev 2021-06-12 08:10:44, server 13.3 (Ubuntu 
13.3-1.pgdg20.04+1))

Why not move the printVersion call right after the connection is created, 
at line 6374?


Otherwise it works for me.

--
Fabien.




Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-18 Thread Alvaro Herrera
On 2021-Jun-03, Michael Paquier wrote:

> Hi all,
> 
> serinus has been complaining about the new gcd functions in 13~:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2021-06-03%2003%3A44%3A14

Hello, this problem is still happening; serinus' configure output says
it's running a snapshot from 20210527, and Fabien mentioned downthread
that his compiler stopped complaining on 2021-06-05.  Andres, maybe the
compiler in serinus is due for an update too?

Thanks

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Once again, thank you and all of the developers for your hard work on
PostgreSQL.  This is by far the most pleasant management experience of
any database I've worked on." (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php




Re: Version reporting in pgbench

2021-06-18 Thread Tom Lane
Fabien COELHO  writes:
> Why not move the printVersion call right after the connection is created, 
> at line 6374?

I started with that, and one of the 001_pgbench_with_server.pl
tests fell over --- it was expecting no stdout output before a
"Perhaps you need to do initialization" failure.  If you don't
mind changing that, I agree that printing immediately after
the connection is made is a bit less astonishing.

regards, tom lane




Re: A few nuances about specifying the timeline with START_REPLICATION

2021-06-18 Thread Heikki Linnakangas

On 18/06/2021 20:27, Jeff Davis wrote:

A few questions about this comment in walsender.c, originating in
commit abfd192b1b5b:

/*
  * Found the requested timeline in the history. Check that
  * requested startpoint is on that timeline in our history.
  *
  * This is quite loose on purpose. We only check that we didn't
  * fork off the requested timeline before the switchpoint. We
  * don't check that we switched *to* it before the requested
  * starting point. This is because the client can legitimately
  * request to start replication from the beginning of the WAL
  * segment that contains switchpoint, but on the new timeline, so
  * that it doesn't end up with a partial segment. If you ask for
  * too old a starting point, you'll get an error later when we
  * fail to find the requested WAL segment in pg_wal.
  *
  * XXX: we could be more strict here and only allow a startpoint
  * that's older than the switchpoint, if it's still in the same
  * WAL segment.
  */

1. I think there's a typo: it should be "fork off the requested
timeline before the startpoint", right?


Yes, I think you're right.


2. It seems to imply that requesting an old start point is wrong, but I
don't see why. As long as the WAL is there (or at least the slot
boundaries), what's the problem? Can we either just change the comment
to say that it's fine to start on an ancestor of the requested timeline
(and maybe update the docs, too)?


Hmm, I'm not sure if the logic in WalSndSegmentOpen() would work if you 
did that. For example, if you had the following WAL segments, because a 
timeline switch happened somewhere in the middle of segment 13:


00040012
00040013
00050013
00050014

and you requested to start streaming from timeline 5, 0/1200, I 
think WalSndSegmentOpen() would try to open file 
"00050012" and not find it.


We could teach it to look into the timeline history to find the correct 
file, though. Come to think of it, we could remove the START_REPLICATION 
TIMELINE option altogether (or rather, make it optional or backwards 
compatibility). The server doesn't need it for anything, it knows the 
timeline history so the LSN is enough to uniquely identify the starting 
point.


If the client asks for a historic timeline, the replication will stop 
when it reaches the end of that timeline. In hindsight, I think it would 
make more sense to send a message to the client to say that it's 
switching to a new timeline, and continue streaming from the new timeline.



3. I noticed when looking at this that the terminology in the docs is a
bit inconsistent between START_REPLICATION and
recovery_target_timeline.
   a. In recovery_target_timeline:
  i. a numeric value means "stop when this timeline forks"
  ii. "latest" means "follow forks along the newest timeline"
  iii. "current" is an alias for the backup's numerical timeline
   b. In the start START_REPLICATION docs:
  i. "current" means "follow forks along the newest timeline"
  ii. a numeric value that is equal to the current timeline is the
same as "current"
  iii. a numeric value that is less than the current timeline means
"stop when this timeline forks"

On point #3, it looks like START_REPLICATION could be improved:

   * Should we change the docs to say "latest" rather than "current"?
   * Should we change the behavior so that specifying the current
timeline as a number still means a historic timeline (e.g. it will stop
replicating there and emit a tuple)?
   * Should we add some keywords like "latest" or "current" to the
START_REPLICATION command?
Hmm, the timeline in the START_REPLICATION command is not specifying a 
recovery target timeline, so I don't think "latest" or "current" make 
much sense there. Per above, it just tells the server which timeline the 
requested starting point belongs to, so it's actually redundant.


- Heikki




Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-18 Thread Tom Lane
Alvaro Herrera  writes:
> Hello, this problem is still happening; serinus' configure output says
> it's running a snapshot from 20210527, and Fabien mentioned downthread
> that his compiler stopped complaining on 2021-06-05.  Andres, maybe the
> compiler in serinus is due for an update too?

Yeah, serinus is visibly still running one of the broken revisions:

configure: using compiler=gcc (Debian 20210527-1) 12.0.0 20210527 
(experimental) [master revision 
262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973]

It'd sure be nice if seawasp stopped spamming the buildfarm failure log,
too.  That seems to be a different issue:

Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f3827947859 in __GI_abort () at abort.c:79
#2  0x7f3827947729 in __assert_fail_base (fmt=0x7f3827add588 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\\n%n", assertion=0x7f381c3ce4c8 "S->getValue() && 
\\"Releasing SymbolStringPtr with zero ref count\\"", file=0x7f381c3ce478 
"/home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h",
 line=91, function=) at assert.c:92
#3  0x7f3827958f36 in __GI___assert_fail (assertion=0x7f381c3ce4c8 
"S->getValue() && \\"Releasing SymbolStringPtr with zero ref count\\"", 
file=0x7f381c3ce478 
"/home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h",
 line=91, function=0x7f381c3ce570 
"llvm::orc::SymbolStringPtr::~SymbolStringPtr()") at assert.c:101
#4  0x7f381c23c98d in llvm::orc::SymbolStringPtr::~SymbolStringPtr 
(this=0x277a8b0, __in_chrg=) at 
/home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h:91
#5  0x7f381c24f879 in std::_Destroy 
(__pointer=0x277a8b0) at 
/home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_construct.h:140
#6  0x7f381c24d10c in 
std::_Destroy_aux::__destroy 
(__first=0x277a8b0, __last=0x277a998) at 
/home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_construct.h:152
#7  0x7f381c2488a6 in std::_Destroy 
(__first=0x277a8b0, __last=0x277a998) at 
/home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_construct.h:185
#8  0x7f381c243c51 in std::_Destroy (__first=0x277a8b0, __last=0x277a998) at 
/home/fabien/gcc-10-bin/include/c++/10.3.1/bits/alloc_traits.h:738
#9  0x7f381c23f1c3 in std::vector >::~vector (this=0x7ffc73440a10, 
__in_chrg=) at 
/home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_vector.h:680
#10 0x7f381c26112c in llvm::orc::JITDylib::removeTracker (this=0x18b4240, 
RT=...) at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:1464
#11 0x7f381c264cb9 in operator() (__closure=0x7ffc73440d00) at 
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:2054
#12 0x7f381c264d29 in 
llvm::orc::ExecutionSession::runSessionLocked
 >(struct {...} &&) (this=0x187d110, F=...) at 
/home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/Core.h:1291
#13 0x7f381c264ebc in llvm::orc::ExecutionSession::removeResourceTracker 
(this=0x187d110, RT=...) at 
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:2051
#14 0x7f381c25734c in llvm::orc::ResourceTracker::remove (this=0x1910c30) 
at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:53
#15 0x7f381c25a9c1 in llvm::orc::JITDylib::clear (this=0x18b4240) at 
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:622
#16 0x7f381c26305e in llvm::orc::ExecutionSession::endSession 
(this=0x187d110) at 
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:1777
#17 0x7f381c3373a3 in llvm::orc::LLJIT::~LLJIT (this=0x18a73b0, 
__in_chrg=) at 
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp:1002
#18 0x7f381c38af48 in LLVMOrcDisposeLLJIT (J=0x18a73b0) at 
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/OrcV2CBindings.cpp:561
#19 0x7f381c596612 in llvm_shutdown (code=, 
arg=140722242323824) at llvmjit.c:892
#20 0x007d4385 in proc_exit_prepare (code=code@entry=0) at ipc.c:209
#21 0x007d4288 in proc_exit (code=code@entry=0) at ipc.c:107


regards, tom lane




Re: Supply restore_command to pg_rewind via CLI argument

2021-06-18 Thread Alexey Kondratov
Hi,

On Fri, Jun 18, 2021 at 5:42 PM Andrey Borodin  wrote:
>
> If we run 'pg_rewind --restore-target-wal' there must be restore_command in 
> config of target installation. But if the config is not within 
> $PGDATA\postgresql.conf pg_rewind cannot use it.
> If we run postmaster with `-c 
> config_file=/etc/postgresql/10/data/postgresql.conf`, we simply cannot use 
> the feature. We solved the problem by putting config into PGDATA only during 
> pg_rewind, but this does not seem like a very robust solution.
>

Yeah, Michael was against it, while we had no good arguments, so
Alexander removed it, IIRC. This example sounds reasonable to me. I
also recall some complaints from PostgresPro support folks, that it is
sad to not have a cli option to pass restore_command. However, I just
thought about another recent feature --- ensure clean shutdown, which
is turned on by default. So you usually run Postgres with one config,
but pg_rewind may start it with another, although in single-user mode.
Is it fine for you?

>
> Maybe we could add "-C, --target-restore-command=COMMAND  target WAL 
> restore_command\n" as was proposed within earlier versions of the patch[0]? 
> Or instruct pg_rewind to pass config to 'postgres -C restore_command' run?

Hm, adding --target-restore-command is the simplest way, sure, but
forwarding something like '-c config_file=...' to postgres sounds
interesting too. Could it have any use case beside providing a
restore_command? I cannot imagine anything right now, but if any
exist, then it could be a more universal approach.

>
> From my POV adding --target-restore-command is simplest way, I can extract 
> corresponding portions from original patch.
>

I will have a look, maybe I even already have this patch separately. I
remember that we were considering adding this option to PostgresPro,
when we did a backport of this feature.


--
Alexey Kondratov




Re: Optionally automatically disable logical replication subscriptions on error

2021-06-18 Thread Mark Dilger



> On Jun 17, 2021, at 9:47 PM, Amit Kapila  wrote:
> 
> (a) The patch
> seem to be assuming that the error can happen only by the apply worker
> but I think the constraint violation can happen via one of the table
> sync workers as well

You are right.  Peter mentioned the same thing, and it is clearly so.  I am 
working to repair this fault in v2 of the patch.

> (b) What happens if the error happens when you
> are updating the error information in the catalog table.

I think that is an entirely different kind of error.  The patch attempts to 
catch errors caused by the user, not by core functionality of the system 
failing.  If there is a fault that prevents the catalogs from being updated, it 
is unclear what the patch can do about that.

> I think
> instead of seeing the actual apply time error, the user might see some
> other for which it won't be clear what is an appropriate action.

Good point.

Before trying to do much of anything with the caught error, the v2 patch logs 
the error.  If the subsequent efforts to disable the subscription fail, at 
least the logs should contain the initial failure message.  The v1 patch 
emitted a log message much further down, and really just intended for debugging 
the patch itself, with many opportunities for something else to throw before 
the log is written.

> We are also discussing another action like skipping the apply of the
> transaction on an error [1]. I think it is better to evaluate both the
> proposals as one seems to be an extension of another.

Thanks for the link.

I think they are two separate options.  For some users and data patterns, 
subscriber-side skipping of specific problematic commits will be fine.  For 
other usage patterns, skipping earlier commits will results in more and more 
data integrity problems (foreign key references, etc.) such that the failures 
will snowball with skipping becoming the norm rather than the exception.  Users 
with those usage patterns would likely prefer the subscription to automatically 
be disabled until manual intervention can clean up the problem.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: PoC: Using Count-Min Sketch for join cardinality estimation

2021-06-18 Thread Tomas Vondra

On 6/18/21 7:03 PM, John Naylor wrote:
On Wed, Jun 16, 2021 at 8:23 PM Tomas Vondra 
mailto:tomas.von...@enterprisedb.com>> 
wrote:


 > Not really, but to be fair even for the histograms it's only really
 > supported by "seems to work in practice" :-(

Hmm, we cite a theoretical result in analyze.c, but I don't know if 
there is something better out there:


  * The following choice of minrows is based on the paper
  * "Random sampling for histogram construction: how much is enough?"
  * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in



True. I read that paper (long time ago), and it certainly gives some 
very interesting guidance and guarantees regarding relative error. And 
now that I look at it, the theorems 5 & 6, and the corollary 1 do 
provide a way to calculate probability of a lower error (essentially 
vary the f, get the probability).


I still think there's a lot of reliance on experience from practice, 
because even with such strong limits delta=0.5 of a histogram with 100 
buckets, representing 1e9 rows, is still plenty of space for errors.


What is more troubling to me is that we set the number of MCVs to the 
number of histograms. Since b5db1d93d2a6 we have a pretty good method of 
finding the MCVs that are justified. When that first went in, I 
experimented with removing the MCV limit and found it easy to create 
value distributions that lead to thousands of MCVs. I guess the best 
justification now for the limit is plan time, but if we have a sketch 
also, we can choose one or the other based on a plan-time speed vs 
accuracy tradeoff (another use for a planner effort guc). In that 
scenario, for tables with many MCVs we would only use them for 
restriction clauses.




Sorry, I'm not sure what you mean by "we set the number of MCVs to the 
number of histograms" :-(


When you say "MCV limit" you mean that we limit the number of items to 
statistics target, right? I agree plan time is one concern - but it's 
also about analyze, as we need larger sample to build a larger MCV or 
histogram (as the paper you referenced shows).


I think the sketch is quite interesting for skewed data sets where the 
MCV can represent only small fraction of the data, exactly because of 
the limit. For (close to) uniform data distributions we can just use 
ndistinct estimates to get estimates that are better than those from a 
sketch, I think.


So I think we should try using MCV first, and then use sketches for the 
rest of the data (or possibly all data, if one side does not have MCV).


FWIW I think the sketch may be useful even for restriction clauses, 
which is what the paper calls "point queries". Again, maybe this should 
use the same correction depending on ndistinct estimate.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PoC: Using Count-Min Sketch for join cardinality estimation

2021-06-18 Thread John Naylor
On Fri, Jun 18, 2021 at 3:43 PM Tomas Vondra 
wrote:

> Sorry, I'm not sure what you mean by "we set the number of MCVs to the
> number of histograms" :-(
>
> When you say "MCV limit" you mean that we limit the number of items to
> statistics target, right? I agree plan time is one concern - but it's
> also about analyze, as we need larger sample to build a larger MCV or
> histogram (as the paper you referenced shows).

Ah, I didn't realize the theoretical limit applied to the MCVs too, but
that makes sense since they're basically singleton histogram buckets.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: A few nuances about specifying the timeline with START_REPLICATION

2021-06-18 Thread Jeff Davis
On Fri, 2021-06-18 at 21:48 +0300, Heikki Linnakangas wrote:
> On 18/06/2021 20:27, Jeff Davis wrote:
> We could teach it to look into the timeline history to find the
> correct 
> file, though.

That's how recovery_target_timeline behaves, and it would match my
intuition better if START_REPLICATION behaved that way.

> If the client asks for a historic timeline, the replication will
> stop 
> when it reaches the end of that timeline. In hindsight, I think it
> would 
> make more sense to send a message to the client to say that it's 
> switching to a new timeline, and continue streaming from the new
> timeline.

Why is it important for the standby to be told explicitly in the
protocol about timeline switches? If it is important, why only for
historical timelines?

> Hmm, the timeline in the START_REPLICATION command is not specifying
> a 
> recovery target timeline, so I don't think "latest" or "current"
> make 
> much sense there. Per above, it just tells the server which timeline
> the 
> requested starting point belongs to, so it's actually redundant.

That's not very clear from the docs: "if TIMELINE option is specified,
streaming starts on timeline tli...".

Part of the confusion is that there's not a good distinction in
terminology between:
   1. a timeline ID, which is a specific segment of a timeline
   2. a timeline made up of the given timeline ID and all its
ancestors, terminating at the given ID
   3. the timeline made up of the current ID, all ancestor IDs, and all
descendent IDs that the current active primary switches to
   4. the set of all timelines that contain a given ID

It seems you are saying that replication only concerns itself with #3,
which does not require a timeline ID at all. That seems basically
correct for now, but since we already document the protocol to take a
timeline, it makes sense to me to just have the primary serve it if
possible.

If we (continue to?) allow timelines for replication, it will start to
treat the primary like an archive. That might not be quite what was
intended, but could be powerful. You could imagine a special archive
that implements the replication protocol, and have replicas directly
off the archive, or maybe doing PITR off the archive.

Regards,
Jeff Davis






Re: PoC: Using Count-Min Sketch for join cardinality estimation

2021-06-18 Thread Tomas Vondra

On 6/18/21 9:54 PM, John Naylor wrote:


On Fri, Jun 18, 2021 at 3:43 PM Tomas Vondra 
mailto:tomas.von...@enterprisedb.com>> 
wrote:


 > Sorry, I'm not sure what you mean by "we set the number of MCVs to the
 > number of histograms" :-(
 >
 > When you say "MCV limit" you mean that we limit the number of items to
 > statistics target, right? I agree plan time is one concern - but it's
 > also about analyze, as we need larger sample to build a larger MCV or
 > histogram (as the paper you referenced shows).

Ah, I didn't realize the theoretical limit applied to the MCVs too, but 
that makes sense since they're basically singleton histogram buckets.




Something like that, yes. Looking at MCV items as singleton histogram 
buckets is interesting, although I'm not sure that was the reasoning 
when calculating the MCV size. AFAIK it was kinda the other way around, 
i.e. the sample size is derived from the histogram paper, and when 
building the MCV we ask what's sufficiently different from the average 
frequency, based on the sample size etc.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pg_stats and range statistics

2021-06-18 Thread Tomas Vondra

On 6/18/21 6:22 PM, Egor Rogov wrote:

Hi,

Statistics for range types are not currently exposed in pg_stats view 
(i.e. STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and 
STATISTIC_KIND_BOUNDS_HISTOGRAM).


Shouldn't they? If so, here is a patch for adding them.



I think they should be exposed - I don't see why not to do that. I 
noticed this when working on the count-min sketch experiment too, so 
thanks for this patch.


FWIW I've added the patch to the next CF:

https://commitfest.postgresql.org/33/3184/


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-18 Thread Álvaro Herrera
On 2021-Jun-11, Álvaro Herrera wrote:

> I tried hard to make this stable, but it just isn't (it works fine one
> thousand runs, then I grab some coffee and run it once more and that one
> fails.  Why?  that's not clear to me).  Attached is the last one I have,
> in case somebody wants to make it better.  Maybe there's some completely
> different approach that works better, but I'm out of ideas for now.

It occurred to me that this could be made better by sigstopping both
walreceiver and walsender, then letting only the latter run; AFAICS this
makes the test stable.  I'll register this on the upcoming commitfest to
let cfbot run it, and if it looks good there I'll get it pushed to
master.  If there's any problem I'll just remove it before beta2 is
stamped.

-- 
Álvaro Herrera   Valdivia, Chile
>From 8080ced6b3c807039ff9a66810fa661fae19b347 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 10 Jun 2021 16:44:03 -0400
Subject: [PATCH v3] Add test case for obsoleting slot with active walsender
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The code to signal a running walsender when its reserved WAL size grows
too large is completely uncovered before this commit; this adds coverage
for that case.

This test involves sending SIGSTOP to walsender and walreceiver and
running a checkpoint while advancing WAL, then sending SIGCONT.  There's
no precedent for this coding in Perl tests, and my reading of relevant
manpages says it's likely to fail on Windows.  Because of this, this
test is always skipped on that platform.

Author: Álvaro Herrera 
Discussion: https://postgr.es/m/202106102202.mjw4huiix7lo@alvherre.pgsql
---
 src/test/recovery/t/019_replslot_limit.pl | 82 ++-
 1 file changed, 79 insertions(+), 3 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7094aa0704..2e6c5a229e 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -11,7 +11,7 @@ use TestLib;
 use PostgresNode;
 
 use File::Path qw(rmtree);
-use Test::More tests => 14;
+use Test::More tests => $TestLib::windows_os ? 14 : 18;
 use Time::HiRes qw(usleep);
 
 $ENV{PGDATABASE} = 'postgres';
@@ -211,8 +211,8 @@ for (my $i = 0; $i < 1; $i++)
 }
 ok($failed, 'check that replication has been broken');
 
-$node_primary->stop('immediate');
-$node_standby->stop('immediate');
+$node_primary->stop;
+$node_standby->stop;
 
 my $node_primary2 = get_new_node('primary2');
 $node_primary2->init(allows_streaming => 1);
@@ -253,6 +253,82 @@ my @result =
 		timeout => '60'));
 is($result[1], 'finished', 'check if checkpoint command is not blocked');
 
+$node_primary2->stop;
+$node_standby->stop;
+
+# The next test depends on Perl's `kill`, which apparently is not
+# portable to Windows.  (It would be nice to use Test::More's `subtest`,
+# but that's not in the ancient version we require.)
+if ($TestLib::windows_os)
+{
+	done_testing();
+	exit;
+}
+
+# Get a slot terminated while the walsender is active
+# We do this by sending SIGSTOP to the walsender.  Skip this on Windows.
+my $node_primary3 = get_new_node('primary3');
+$node_primary3->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
+$node_primary3->append_conf(
+	'postgresql.conf', qq(
+	min_wal_size = 2MB
+	max_wal_size = 2MB
+	log_checkpoints = yes
+	max_slot_wal_keep_size = 1MB
+	));
+$node_primary3->start;
+$node_primary3->safe_psql('postgres',
+	"SELECT pg_create_physical_replication_slot('rep3')");
+# Take backup
+$backup_name = 'my_backup';
+$node_primary3->backup($backup_name);
+# Create standby
+my $node_standby3 = get_new_node('standby_3');
+$node_standby3->init_from_backup($node_primary3, $backup_name,
+	has_streaming => 1);
+$node_standby3->append_conf('postgresql.conf', "primary_slot_name = 'rep3'");
+$node_standby3->start;
+$node_primary3->wait_for_catchup($node_standby3->name, 'replay');
+my $senderpid = $node_primary3->safe_psql('postgres',
+	"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walsender'");
+like($senderpid, qr/^[0-9]+$/, "have walsender pid $senderpid");
+my $receiverpid = $node_standby3->safe_psql('postgres',
+	"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walreceiver'");
+like($receiverpid, qr/^[0-9]+$/, "have walreceiver pid $receiverpid");
+
+# freeze walsender and walreceiver. Slot will still be active, but walreceiver
+# won't get anything anymore.
+kill 'STOP', $senderpid, $receiverpid;
+$logstart = get_log_size($node_primary3);
+advance_wal($node_primary3, 4);
+ok(find_in_log($node_primary3, "to release replication slot", $logstart),
+	"walreceiver termination logged");
+
+# Now let the walsender continue; slot should be killed now.
+# (Must not let walreceiver run yet; otherwise the standby could start another
+# one before the slot can be killed)
+kill 'CONT', $senderpid;
+$node_primary3->poll_query_until('postgres',
+	"SELECT wal_status FROM pg_replica

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-18 Thread Álvaro Herrera
Apologies, I inadvertently sent the version before I added a maximum
number of iterations in the final loop.

-- 
Álvaro Herrera   Valdivia, Chile
"La fuerza no está en los medios físicos
sino que reside en una voluntad indomable" (Gandhi)
>From 1492e9468ecd86167a1253c4a2e9b31139835649 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 10 Jun 2021 16:44:03 -0400
Subject: [PATCH v4] Add test case for obsoleting slot with active walsender
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The code to signal a running walsender when its reserved WAL size grows
too large is completely uncovered before this commit; this adds coverage
for that case.

This test involves sending SIGSTOP to walsender and walreceiver and
running a checkpoint while advancing WAL, then sending SIGCONT.  There's
no precedent for this coding in Perl tests, and my reading of relevant
manpages says it's likely to fail on Windows.  Because of this, this
test is always skipped on that platform.

Author: Álvaro Herrera 
Discussion: https://postgr.es/m/202106102202.mjw4huiix7lo@alvherre.pgsql
---
 src/test/recovery/t/019_replslot_limit.pl | 84 ++-
 1 file changed, 81 insertions(+), 3 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7094aa0704..52bc9f8cfd 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -11,7 +11,7 @@ use TestLib;
 use PostgresNode;
 
 use File::Path qw(rmtree);
-use Test::More tests => 14;
+use Test::More tests => $TestLib::windows_os ? 14 : 18;
 use Time::HiRes qw(usleep);
 
 $ENV{PGDATABASE} = 'postgres';
@@ -211,8 +211,8 @@ for (my $i = 0; $i < 1; $i++)
 }
 ok($failed, 'check that replication has been broken');
 
-$node_primary->stop('immediate');
-$node_standby->stop('immediate');
+$node_primary->stop;
+$node_standby->stop;
 
 my $node_primary2 = get_new_node('primary2');
 $node_primary2->init(allows_streaming => 1);
@@ -253,6 +253,84 @@ my @result =
 		timeout => '60'));
 is($result[1], 'finished', 'check if checkpoint command is not blocked');
 
+$node_primary2->stop;
+$node_standby->stop;
+
+# The next test depends on Perl's `kill`, which apparently is not
+# portable to Windows.  (It would be nice to use Test::More's `subtest`,
+# but that's not in the ancient version we require.)
+if ($TestLib::windows_os)
+{
+	done_testing();
+	exit;
+}
+
+# Get a slot terminated while the walsender is active
+# We do this by sending SIGSTOP to the walsender.  Skip this on Windows.
+my $node_primary3 = get_new_node('primary3');
+$node_primary3->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
+$node_primary3->append_conf(
+	'postgresql.conf', qq(
+	min_wal_size = 2MB
+	max_wal_size = 2MB
+	log_checkpoints = yes
+	max_slot_wal_keep_size = 1MB
+	));
+$node_primary3->start;
+$node_primary3->safe_psql('postgres',
+	"SELECT pg_create_physical_replication_slot('rep3')");
+# Take backup
+$backup_name = 'my_backup';
+$node_primary3->backup($backup_name);
+# Create standby
+my $node_standby3 = get_new_node('standby_3');
+$node_standby3->init_from_backup($node_primary3, $backup_name,
+	has_streaming => 1);
+$node_standby3->append_conf('postgresql.conf', "primary_slot_name = 'rep3'");
+$node_standby3->start;
+$node_primary3->wait_for_catchup($node_standby3->name, 'replay');
+my $senderpid = $node_primary3->safe_psql('postgres',
+	"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walsender'");
+like($senderpid, qr/^[0-9]+$/, "have walsender pid $senderpid");
+my $receiverpid = $node_standby3->safe_psql('postgres',
+	"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walreceiver'");
+like($receiverpid, qr/^[0-9]+$/, "have walreceiver pid $receiverpid");
+
+# freeze walsender and walreceiver. Slot will still be active, but walreceiver
+# won't get anything anymore.
+kill 'STOP', $senderpid, $receiverpid;
+$logstart = get_log_size($node_primary3);
+advance_wal($node_primary3, 4);
+ok(find_in_log($node_primary3, "to release replication slot", $logstart),
+	"walreceiver termination logged");
+
+# Now let the walsender continue; slot should be killed now.
+# (Must not let walreceiver run yet; otherwise the standby could start another
+# one before the slot can be killed)
+kill 'CONT', $senderpid;
+$node_primary3->poll_query_until('postgres',
+	"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep3'",
+	"lost")
+  or die "timed out waiting for slot to be lost";
+
+my $max_attempts = 180;
+while ($max_attempts-- > 0)
+{
+	if (find_in_log($node_primary3,
+			'invalidating slot "rep3" because its restart_lsn', $logstart))
+	{
+		ok(1, "slot invalidation logged");
+		last;
+	}
+	sleep 1;
+}
+
+# Now let the walreceiver continue, so that the node can be stopped cleanly
+kill 'CONT', $receiverpid;
+
+$node_primary3->stop;
+$node_standby3->stop;
+
 #
 # Advance WAL of $node by $n segments
 

Re: A few nuances about specifying the timeline with START_REPLICATION

2021-06-18 Thread Heikki Linnakangas

On 18/06/2021 22:55, Jeff Davis wrote:

On Fri, 2021-06-18 at 21:48 +0300, Heikki Linnakangas wrote:

On 18/06/2021 20:27, Jeff Davis wrote:
We could teach it to look into the timeline history to find the
correct
file, though.


That's how recovery_target_timeline behaves, and it would match my
intuition better if START_REPLICATION behaved that way.


If the client asks for a historic timeline, the replication will
stop
when it reaches the end of that timeline. In hindsight, I think it
would
make more sense to send a message to the client to say that it's
switching to a new timeline, and continue streaming from the new
timeline.


Why is it important for the standby to be told explicitly in the
protocol about timeline switches?


So that it knows to write the WAL to the correctly named WAL segment. 
You could do it differently, looking at the 'xlp_tli' field in the WAL 
page headers, or watching out for checkpoint records that change the 
timeline. But currently the standby (and pg_receivewal) depends on the 
protocol for that.



If it is important, why only for historical timelines?


Well, the latest timeline doesn't have any timeline switches, by 
definition. If you're connected to a standby server, IOW you're doing 
cascading replication, then the current timeline can become historic, if 
the standby follows a timeline switch. In that case, the replication is 
stopped when you reach the timeline switch, just like when you request a 
historic timeline.



Hmm, the timeline in the START_REPLICATION command is not specifying
a
recovery target timeline, so I don't think "latest" or "current"
make
much sense there. Per above, it just tells the server which timeline
the
requested starting point belongs to, so it's actually redundant.


That's not very clear from the docs: "if TIMELINE option is specified,
streaming starts on timeline tli...".

Part of the confusion is that there's not a good distinction in
terminology between:
1. a timeline ID, which is a specific segment of a timeline
2. a timeline made up of the given timeline ID and all its
ancestors, terminating at the given ID
3. the timeline made up of the current ID, all ancestor IDs, and all
descendent IDs that the current active primary switches to
4. the set of all timelines that contain a given ID


Agreed, that's a bit confusing.


It seems you are saying that replication only concerns itself with #3,
which does not require a timeline ID at all. That seems basically
correct for now, but since we already document the protocol to take a
timeline, it makes sense to me to just have the primary serve it if
possible.

If we (continue to?) allow timelines for replication, it will start to
treat the primary like an archive. That might not be quite what was
intended, but could be powerful. You could imagine a special archive
that implements the replication protocol, and have replicas directly
off the archive, or maybe doing PITR off the archive.


True.

- Heikki




Re: Version reporting in pgbench

2021-06-18 Thread Fabien COELHO


Hello Tom,

Why not move the printVersion call right after the connection is 
created, at line 6374?


I started with that, and one of the 001_pgbench_with_server.pl
tests fell over --- it was expecting no stdout output before a
"Perhaps you need to do initialization" failure.  If you don't
mind changing that,


Why would I mind?

I agree that printing immediately after the connection is made is a bit 
less astonishing.


Ok, so let's just update the test? Attached a proposal with the version 
moved.


Note that if no connections are available, then you do not get the 
version, which may be a little bit strange. Attached v3 prints out the 
local version in that case. Not sure whether it is worth the effort.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..e61055b6b7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -63,6 +63,7 @@
 #include "common/username.h"
 #include "fe_utils/cancel.h"
 #include "fe_utils/conditional.h"
+#include "fe_utils/string_utils.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
 #include "pgbench.h"
@@ -5493,6 +5494,37 @@ printSimpleStats(const char *prefix, SimpleStats *ss)
 	}
 }
 
+/* print version banner */
+static void
+printVersion(PGconn *con)
+{
+	int			server_ver = PQserverVersion(con);
+	int			client_ver = PG_VERSION_NUM;
+
+	if (server_ver != client_ver)
+	{
+		const char *server_version;
+		char		sverbuf[32];
+
+		/* Try to get full text form, might include "devel" etc */
+		server_version = PQparameterStatus(con, "server_version");
+		/* Otherwise fall back on server_ver */
+		if (!server_version)
+		{
+			formatPGVersionNumber(server_ver, true,
+  sverbuf, sizeof(sverbuf));
+			server_version = sverbuf;
+		}
+
+		printf(_("%s (%s, server %s)\n"),
+			   "pgbench", PG_VERSION, server_version);
+	}
+	/* For version match, only print pgbench version */
+	else
+		printf("%s (%s)\n", "pgbench", PG_VERSION);
+	fflush(stdout);
+}
+
 /* print out results */
 static void
 printResults(StatsData *total,
@@ -5506,7 +5538,6 @@ printResults(StatsData *total,
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
 	double		tps = ntx / bench_duration;
 
-	printf("pgbench (PostgreSQL) %d.%d\n", PG_VERSION_NUM / 1, PG_VERSION_NUM % 100);
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
 		   num_scripts == 1 ? sql_script[0].desc : "multiple scripts");
@@ -6334,6 +6365,9 @@ main(int argc, char **argv)
 	if (con == NULL)
 		exit(1);
 
+	/* report pgbench and server versions */
+	printVersion(con);
+
 	pg_log_debug("pghost: %s pgport: %s nclients: %d %s: %d dbName: %s",
  PQhost(con), PQport(con), nclients,
  duration <= 0 ? "nxacts" : "duration",
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 55b3c3f6fd..49fe48093c 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -100,7 +100,7 @@ pgbench(
 	'no such database');
 
 pgbench(
-	'-S -t 1', 1, [qr{^$}],
+	'-S -t 1', 1, [qr{^pgbench \([^\n]+\)$}],
 	[qr{Perhaps you need to do initialization}],
 	'run without init');
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..20910e5ad1 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -63,6 +63,7 @@
 #include "common/username.h"
 #include "fe_utils/cancel.h"
 #include "fe_utils/conditional.h"
+#include "fe_utils/string_utils.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
 #include "pgbench.h"
@@ -5493,6 +5494,37 @@ printSimpleStats(const char *prefix, SimpleStats *ss)
 	}
 }
 
+/* print version banner */
+static void
+printVersion(PGconn *con)
+{
+	int			server_ver = PQserverVersion(con);
+	int			client_ver = PG_VERSION_NUM;
+
+	if (server_ver != client_ver)
+	{
+		const char *server_version;
+		char		sverbuf[32];
+
+		/* Try to get full text form, might include "devel" etc */
+		server_version = PQparameterStatus(con, "server_version");
+		/* Otherwise fall back on server_ver */
+		if (!server_version)
+		{
+			formatPGVersionNumber(server_ver, true,
+  sverbuf, sizeof(sverbuf));
+			server_version = sverbuf;
+		}
+
+		printf(_("%s (%s, server %s)\n"),
+			   "pgbench", PG_VERSION, server_version);
+	}
+	/* For version match, only print pgbench version */
+	else
+		printf("%s (%s)\n", "pgbench", PG_VERSION);
+	fflush(stdout);
+}
+
 /* print out results */
 static void
 printResults(StatsData *total,
@@ -5506,7 +5538,6 @@ printResults(StatsData *total,
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
 	double		tps = ntx / bench_duration;
 
-	printf("pgbench (PostgreSQL) %d.%d\n", PG_VERSION_NUM / 1, PG_VERSION_NUM % 100);
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
 		   num_scripts == 1 ? sql_script[0].desc : "multiple scripts");
@@ -6332,7 +6363,13 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con ==

Re: Version reporting in pgbench

2021-06-18 Thread Tom Lane
Fabien COELHO  writes:
> Note that if no connections are available, then you do not get the 
> version, which may be a little bit strange. Attached v3 prints out the 
> local version in that case. Not sure whether it is worth the effort.

I'm inclined to think that the purpose of that output is mostly
to report the server version, so not printing it if we fail to
connect isn't very surprising.  Certainly that's how psql has
acted for decades.

regards, tom lane




Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-18 Thread Fabien COELHO



It'd sure be nice if seawasp stopped spamming the buildfarm failure log,
too.


There was a silent API breakage (same API, different behavior, how nice…) 
in llvm main that Andres figured out, which will have to be fixed at some 
point, so this is reminder that it is still a todo… Not sure when a fix is 
planned, though. I'm afraid portability may require that different code is 
executed depending on llvm version. Or maybe we should wrestle a revert on 
llvm side? Hmmm…


So I'm not very confident that the noise will go away quickly, sorry.

--
Fabien.

Re: Version reporting in pgbench

2021-06-18 Thread Fabien COELHO




Note that if no connections are available, then you do not get the
version, which may be a little bit strange. Attached v3 prints out the
local version in that case. Not sure whether it is worth the effort.


I'm inclined to think that the purpose of that output is mostly
to report the server version, so not printing it if we fail to
connect isn't very surprising.  Certainly that's how psql has
acted for decades.


I'm fine with having a uniform behavior over pg commands.

Thanks for the improvement!

--
Fabien.




Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-18 Thread Tom Lane
Fabien COELHO  writes:
>> It'd sure be nice if seawasp stopped spamming the buildfarm failure log,
>> too.

> There was a silent API breakage (same API, different behavior, how nice…) 
> in llvm main that Andres figured out, which will have to be fixed at some 
> point, so this is reminder that it is still a todo…

If it were *our* todo, that would be one thing; but it isn't.

> Not sure when a fix is 
> planned, though. I'm afraid portability may require that different code is 
> executed depending on llvm version. Or maybe we should wrestle a revert on 
> llvm side? Hmmm…

> So I'm not very confident that the noise will go away quickly, sorry.

Could you please just shut down the animal until that's dealt with?
It's extremely unpleasant to have to root through a lot of useless
failures to find the ones that might be of interest.  Right now
serinus and seawasp are degrading this report nearly to uselessness:

https://buildfarm.postgresql.org/cgi-bin/show_failures.pl

regards, tom lane




Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-18 Thread Fabien COELHO


Hello Tom,


So I'm not very confident that the noise will go away quickly, sorry.


Could you please just shut down the animal until that's dealt with?


Hmmm… Obviously I can.

However, please note that the underlying logic of "a test is failing, 
let's just remove it" does not sound right to me at all:-(


The test is failing because there is a problem, and shuting down the test 
to improve a report does not in any way help to fix it, it just helps to 
hide it.



It's extremely unpleasant to have to root through a lot of useless
failures


I do not understand how they are useless. Pg does not work properly with 
current LLVM, and keeps on not working. I think that this information is 
worthy, even if I do not like it and would certainly prefer a quick fix.


to find the ones that might be of interest.  Right now serinus and 
seawasp are degrading this report nearly to uselessness:


https://buildfarm.postgresql.org/cgi-bin/show_failures.pl


IMHO, the report should be improved, not the test removed.

If you insist I will shut down the animal, bit I'd prefer not to.

I think that the reminder has value, and just because some report is not 
designed to handle this nicely does not seem like a good reason to do 
that.


--
Fabien.

Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-18 Thread Tom Lane
Fabien COELHO  writes:
>> Could you please just shut down the animal until that's dealt with?

> The test is failing because there is a problem, and shuting down the test 
> to improve a report does not in any way help to fix it, it just helps to 
> hide it.

Our buildfarm is run for the use of the Postgres project, not the LLVM
project.  I'm not really happy that it contains any experimental-compiler
animals at all, but as long as they're unobtrusive I can stand it.
serinus and seawasp are being the opposite of unobtrusive.

If you don't want to shut it down entirely, maybe backing it off to
run only once a week would be an acceptable compromise.  Since you
only update its compiler version once a week, I doubt we learn much
from runs done more often than that anyway.

regards, tom lane




Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-18 Thread Fabien COELHO


Hello Tom,


Could you please just shut down the animal until that's dealt with?



The test is failing because there is a problem, and shuting down the test
to improve a report does not in any way help to fix it, it just helps to
hide it.


Our buildfarm is run for the use of the Postgres project, not the LLVM
project.


The point of these animals is to have early warning of upcoming compiler 
changes. Given the release cycle of the project and the fact that a 
version is expected to work for 5 years, this is a clear benefit for 
postgres, IMO. When the compiler is broken, it is noisy, too bad.


In this instance the compiler is not broken, but postgres is.

If the consensus is that these animals are useless, I'll remove them, and 
be sad that the community is not able to see their value.



I'm not really happy that it contains any experimental-compiler
animals at all, but as long as they're unobtrusive I can stand it.
serinus and seawasp are being the opposite of unobtrusive.


I think that the problem is the report, not the failing animal.

In French we say "ce n’est pas en cassant le thermomètre qu’on fait tomber 
la fièvre", which is an equivalent of "don't shoot the messenger".



If you don't want to shut it down entirely, maybe backing it off to
run only once a week would be an acceptable compromise.  Since you
only update its compiler version once a week, I doubt we learn much
from runs done more often than that anyway.


Hmmm… I can slow it down. We will wait one week to learn that the problems 
have been fixed, wow.


.

--
Fabien.

Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-18 Thread Thomas Munro
On Sat, Jun 19, 2021 at 9:46 AM Tom Lane  wrote:
> Fabien COELHO  writes:
> >> It'd sure be nice if seawasp stopped spamming the buildfarm failure log,
> >> too.
>
> > There was a silent API breakage (same API, different behavior, how nice…)
> > in llvm main that Andres figured out, which will have to be fixed at some
> > point, so this is reminder that it is still a todo…
>
> If it were *our* todo, that would be one thing; but it isn't.

Over on the other thread[1] we learned that this is an API change
affecting reference counting semantics[2], so unless there is some
discussion somewhere about reverting the LLVM change that I'm unaware
of, I'm guessing we're going to need to change our code sooner or
later.  I have a bleeding edge LLVM on my dev machine, and I'm willing
to try to reproduce the crash and write the trivial patch (that is,
figure out the right preprocessor incantation to detect the version or
feature, and bump the reference count as appropriate), if Andres
and/or Fabien aren't already on the case.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGLEy8mgtN7BNp0ooFAjUedDTJj5dME7NxLU-m91b85siA%40mail.gmail.com
[2] 
https://github.com/llvm/llvm-project/commit/c8fc5e3ba942057d6c4cdcd1faeae69a28e7b671




Re: pgbench logging broken by time logic changes

2021-06-18 Thread Thomas Munro
On Fri, Jun 18, 2021 at 12:31 PM Michael Paquier  wrote:
> On Fri, Jun 18, 2021 at 12:49:42AM +1200, Thomas Munro wrote:
> > Yeah I've been catching up with these threads.
>
> Thomas, do you want me to look more at this issue?  I don't feel
> comfortable with the idea of doing anything if you are planning to
> look at this thread and you are the owner here, so that should be your
> call.
>
> From what I can see, we have the same area getting patched with
> patches across two threads, so it seems better to give up the other
> thread and just focus on the discussion here, where v7 has been sent:
> https://www.postgresql.org/message-id/20210617175542.ad6b9b82926d8469e8520...@sraoss.co.jp
> https://www.postgresql.org/message-id/CAF7igB1r6wRfSCEAB-iZBKxkowWY6%2BdFF2jObSdd9%2BiVK%2BvHJg%40mail.gmail.com

Thanks for looking so far.  It's the weekend here and I need to
unplug, but I'll test these changes and if all looks good push on
Monday.




Re: Speed up pg_checksums in cases where checksum already set

2021-06-18 Thread Greg Sabino Mullane
On Fri, Jun 18, 2021 at 1:57 AM Michael Paquier  wrote:

> This doc addition is a bit confusing, as it could mean that each file
> has just one single checksum.  We could be more precise, say:
> "When enabling checksums, each relation file block with a changed
> checksum is rewritten in place."
>

Agreed, I like that wording. New patch attached.


> Should we also mention that the sync happens even if no blocks are
> rewritten based on the reasoning of upthread (aka we'd better do the
> final flush as an interrupted pg_checksums may let a portion of the
> files as not flushed)?
>

I don't know that we need to bother: the default is already to sync and one
has to go out of one's way using the -N argument to NOT sync, so I think
it's a pretty safe assumption to everyone (except those who read my first
version of my patch!) that syncing always happens.

Cheers,
Greg


003.pg_checksums.optimize.writes.and.always.sync.patch
Description: Binary data


Re: Optionally automatically disable logical replication subscriptions on error

2021-06-18 Thread Mark Dilger


> On Jun 17, 2021, at 11:34 PM, Peter Smith  wrote:
> 
> I tried your patch.

Thanks for the quick and thorough review!

> (2) New column "disabled_by_error".
> 
> I wondered if there was actually any need for this column. Isn't the
> same information conveyed by just having "subenabled" = false, at same
> time as as non-empty "suberrmsg"? This would remove any confusion for
> having 2 booleans which both indicate disabled.

Yeah, I wondered about that before posting v1.  I removed the disabled_by_error 
field for v2.

> (3) New columns "disabled_by_error", "disabled_on_error".
> 
> All other columns of the pg_subscription have a "sub" prefix.

I don't feel strongly about this.  How about "subdisableonerr"?  I used that in 
v2.

> I did not find any code using that newly added member "errhint".

Thanks for catching that.  I had tried to remove all references to "errhint" 
before posting v1.  The original idea was that both the message and hint of the 
error would be kept, but in testing I found the hint field was typically empty, 
so I removed it.  Sorry that I left one mention of it lying around.

> (5) dump.c

I didn't bother getting pg_dump working before posting v1, and I still have not 
done so, as I mainly want to solicit feedback on whether the basic direction I 
am going will work for the community.

> (6) Patch only handles errors only from the Apply worker.
> 
> Tablesync can give similar errors (e.g. PK violation during DATASYNC
> phase) which will trigger re-launch forever regardless of the setting
> of "disabled_on_error".
> (confirmed by observations below)

Yes, this is a good point, and also mentioned by Amit.  I have fixed it in v2 
and adjusted the regression test to trigger an automatic disabling for initial 
table sync as well as for change replication.

> 2021-06-18 15:12:45.905 AEST [25904] LOG:  edata is true for
> subscription 'tap_sub': message = "duplicate key value violates unique
> constraint "test_tab_pkey"", hint = ""

You didn't call this out, but FYI, I don't intend to leave this particular log 
message in the patch.  It was for development only.  I have removed it for v2 
and have added a different log message much sooner after catching the error, to 
avoid squashing the error in case some other action fails.

The regression test shows this, if you open 
tmp_check/log/022_disable_on_error_subscriber.log:

2021-06-18 16:25:20.138 PDT [56926] LOG:  logical replication subscription "s1" 
will be disabled due to error: duplicate key value violates unique constraint 
"s1_tbl_unique"
2021-06-18 16:25:20.139 PDT [56926] ERROR:  duplicate key value violates unique 
constraint "s1_tbl_unique"
2021-06-18 16:25:20.139 PDT [56926] DETAIL:  Key (i)=(1) already exists.
2021-06-18 16:25:20.139 PDT [56926] CONTEXT:  COPY tbl, line 2

The first line logs the error prior to attempting to disable the subscription, 
and the next three lines are due to rethrowing the error after committing the 
successful disabling of the subscription.  If the attempt to disable the 
subscription itself throws, these additional three lines won't show up, but the 
first one should.  Amit mentioned this upthread.  Do you think this will be ok, 
or would you like to also have a suberrdetail field so that the detail doesn't 
get lost?  I haven't added such an extra field, and am inclined to think it 
would be excessive, but maybe others feel differently?


> ==
> 
> Test: Cause a PK violation in the Tablesync copy (DATASYNC) phase.
> (when disable_on_error = true)
> Observation: This patch changes nothing for this case. The Tablesyn
> re-launchs in a forever loop the same as current functionality.

In v2, tablesync copy errors should also be caught.  The test has been extended 
to cover this also.



v2-0001-Optionally-disabling-subscriptions-on-error.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pgbench logging broken by time logic changes

2021-06-18 Thread Michael Paquier
On Sat, Jun 19, 2021 at 11:59:16AM +1200, Thomas Munro wrote:
> Thanks for looking so far.  It's the weekend here and I need to
> unplug, but I'll test these changes and if all looks good push on
> Monday.

Thanks for the update.  Have a good weekend.
--
Michael


signature.asc
Description: PGP signature


Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-06-18 Thread Peter Geoghegan
On Thu, Jun 17, 2021 at 7:26 PM Peter Geoghegan  wrote:
> Thanks for the review!
>
> Attached is v3, which has all the changes that you suggested (plus the
> doc stuff from Justin).

Just pushed a version of that with much improved documentation.

Thanks again
-- 
Peter Geoghegan




Re: PG 14 release notes, first draft

2021-06-18 Thread Peter Geoghegan
On Mon, Jun 14, 2021 at 1:11 PM Bruce Momjian  wrote:
> FYI, the most recent PG 14 relnote doc build is at:
>
> https://momjian.us/pgsql_docs/release-14.html

I just pushed a commit that makes the existing vacuum_index_cleanup
reloption and INDEX_CLEANUP VACUUM parameter support disabling the
"Allow vacuum to skip index vacuuming when the number of removable
index entries is insignificant" behavior. This should be mentioned in
the release notes.

-- 
Peter Geoghegan




Re: Centralizing protective copying of utility statements

2021-06-18 Thread Julien Rouhaud
On Fri, Jun 18, 2021 at 11:24:00AM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Fri, Jun 18, 2021 at 10:24:20AM -0400, Tom Lane wrote:
> >> Maybe "if true, pstmt's node tree must not be modified" ?
> 
> > Thanks, I find it way better!
> 
> OK, pushed that way, and with a couple other comment tweaks from
> an additional re-reading.

Thanks!  For the record I already pushed the required compatibility changes for
hypopg extension.




Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-18 Thread Fabien COELHO



There was a silent API breakage (same API, different behavior, how nice…)
in llvm main that Andres figured out, which will have to be fixed at some
point, so this is reminder that it is still a todo…


If it were *our* todo, that would be one thing; but it isn't.


Over on the other thread[1] we learned that this is an API change
affecting reference counting semantics[2], so unless there is some
discussion somewhere about reverting the LLVM change that I'm unaware
of, I'm guessing we're going to need to change our code sooner or
later.


Indeed, I'm afraid the solution will have to be on pg side.

I have a bleeding edge LLVM on my dev machine, and I'm willing to try to 
reproduce the crash and write the trivial patch (that is, figure out the 
right preprocessor incantation to detect the version or feature, and 
bump the reference count as appropriate), if Andres and/or Fabien aren't 
already on the case.


I'm not in the case, I'm only the one running the farm animal which barks 
too annoyingly for Tom.


--
Fabien.

Re: seawasp failing, maybe in glibc allocator

2021-06-18 Thread Thomas Munro
On Sat, May 22, 2021 at 12:25 PM Andres Freund  wrote:
> On 2021-05-21 15:57:01 -0700, Andres Freund wrote:
> > I found the LLVM commit to blame (c8fc5e3ba942057d6c4cdcd1faeae69a28e7b671).
> > Contacting the author and reading the change to see if I can spit the
> > issue myself.
>
> Hrmpf. It's a silent API breakage. The author intended to email us about
> it, but apparently forgot. One now needs to increment a string-pool
> refcount. The reason that didn't trigger a reliable crash is that
> there's a path where the refcount of string-pool entries aren't asserted
> to be above before decrementing the refcount... And that there
> practically never are references to the pool entries after use.
>
> Continuing to discusss whether there's a better way to deal with this.

Any news?

FWIW this change appears to fix the problem for my system (LLVM 13
build from a couple of days ago).  No more weird results, valgrind
errors gone.  I ran the leak checker to see if I now had the opposite
problem, and although there are various leaks reported, I didn't see
obvious intern pool related stacks.

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 71029a39a9..7b09e520f5 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -1116,6 +1116,11 @@
llvm_resolve_symbols(LLVMOrcDefinitionGeneratorRef GeneratorObj, void
*Ctx,
if (error != LLVMErrorSuccess)
LLVMOrcDisposeMaterializationUnit(mu);

+#if LLVM_VERSION_MAJOR > 12
+   for (int i = 0; i < LookupSetSize; i++)
+   LLVMOrcRetainSymbolStringPoolEntry(symbols[i].Name);
+#endif
+
pfree(symbols);

return error;