Re: wal stats questions

2021-03-26 Thread Masahiro Ikeda
Thanks for many your suggestions!
I made the patch to handle the issues.

> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
>instead of just accumulating the stats since the last submission?
>There doesn't seem to be any comment explaining it? Computing
>differences to previous values, and copying to prev*, isn't free. I
>assume this is out of a fear that the stats could get reset before
>they're used for instrument.c purposes - but who knows?

I removed the unnecessary code copying pgWalUsage and just reset the
pgWalUsage after reporting the stats in pgstat_report_wal().


> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
>former being used by wal writer, the latter by most other processes?
>There again don't seem to be comments explaining this.

I added the comments why two functions are separated.
(But is it better to merge them?)


> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>just to figure out if there's been any changes isn't all that
>cheap. This is regularly exercised in read-only workloads too. Seems
>adding a boolean WalStats.have_pending = true or such would be
>better.
> 4) For plain backends pgstat_report_wal() is called by
>pgstat_report_stat() - but it is not checked as part of the "Don't
>expend a clock check if nothing to do" check at the top.  It's
>probably rare to have pending wal stats without also passing one of
>the other conditions, but ...

I added the logic to check if the stats counters are updated or not in
pgstat_report_stat() using not only generated wal record but also write/sync
counters, and it can skip to call reporting function.

Although I added the condition which the write/sync counters are updated or
not, I haven't understood the following case yet...Sorry. I checked related
code and tested to insert large object, but I couldn't. I'll investigate more
deeply, but if you already know the function which leads the following case,
please let me know.

> Maybe there is the case where a backend generates no WAL records,
> but just writes them because it needs to do write-ahead-logging
> when flush the table data?

> Ugh! I was missing a very large blob.. Ok, we need additional check
> for non-pgWalUsage part. Sorry.


Regards,

-- 
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 237e13361b..095d8d0970 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,6 +19,17 @@
 
 BufferUsage pgBufferUsage;
 static BufferUsage save_pgBufferUsage;
+
+/*
+ * generated WAL usage counters.
+ *
+ * Be careful that the counters are cleared after reporting them to
+ * the stats collector although you can use WalUsageAccumDiff()
+ * to computing differences to previous values. For backends,
+ * the counters may be reset after a transaction is finished and
+ * pgstat_report_wal() is invoked, so you can compute the difference
+ * in the same transaction only.
+ */
 WalUsage	pgWalUsage;
 static WalUsage save_pgWalUsage;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 60f45ccc4e..e354a454a9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -144,14 +144,6 @@ char	   *pgstat_stat_tmpname = NULL;
 PgStat_MsgBgWriter BgWriterStats;
 PgStat_MsgWal WalStats;
 
-/*
- * WAL usage counters saved from pgWALUsage at the previous call to
- * pgstat_report_wal(). This is used to calculate how much WAL usage
- * happens between pgstat_report_wal() calls, by substracting
- * the previous counters from the current ones.
- */
-static WalUsage prevWalUsage;
-
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -879,9 +871,18 @@ pgstat_report_stat(bool disconnect)
 	TabStatusArray *tsa;
 	int			i;
 
-	/* Don't expend a clock check if nothing to do */
+	/*
+	 * Don't expend a clock check if nothing to do.
+	 *
+	 * Note: Regarding the wal data, it's not enough to check only the wal
+	 * records, because there is a case where a backend generates no wal
+	 * records, but just writes them because it needs to do
+	 * write-ahead-logging when flush the table data.
+	 */
 	if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
 		pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
+		pgWalUsage.wal_records == 0 && WalStats.m_wal_write == 0 &&
+		WalStats.m_wal_sync == 0 &&
 		!have_function_stats && !disconnect)
 		return;
 
@@ -3117,13 +3118,6 @@ pgstat_initialize(void)
 		MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
 	}
 
-	/*
-	 * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can
-	 * calculate how much pgWalUsage counters are increased by substracting
-	 * prevWalUsage from pgWalUsage.
-	 */
-	prevWalUsage = pgWalUsage;
-
 	/* Set up a proce

Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-26 Thread Masahiro Ikeda



On 2021/03/26 9:54, Fujii Masao wrote:
> On 2021/03/26 9:25, Masahiro Ikeda wrote:
>> On 2021/03/25 21:23, Fujii Masao wrote:
>>> On 2021/03/25 9:58, Andres Freund wrote:
 Also, won't this lead to postmaster now starting to complain about
 pgstat having crashed in an immediate shutdown? Right now only exit
 status 0 is expected:

  if (pid == PgStatPID)
  {
  PgStatPID = 0;
  if (!EXIT_STATUS_0(exitstatus))
  LogChildExit(LOG, _("statistics collector process"),
   pid, exitstatus);
  if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
  PgStatPID = pgstat_start();
  continue;
  }
>>>
>>> No. In immediate shutdown case, pmdie() sets "Shutdown" variable to
>>> ImmdiateShutdown and HandleChildCrash() doesn't complain that in that case
>>> because of the following.
>>>
>>>  /*
>>>   * We only log messages and send signals if this is the first process
>>>   * crash and we're not doing an immediate shutdown; otherwise, we're 
>>> only
>>>   * here to update postmaster's idea of live processes.  If we have
>>> already
>>>   * signaled children, nonzero exit status is to be expected, so don't
>>>   * clutter log.
>>>   */
>>>  take_action = !FatalError && Shutdown != ImmediateShutdown;
>>
>> IIUC, in immediate shutdown case (and other cases too?), HandleChildCrash() 
>> is
>> never invoked due to the exit of pgstat. My understanding of Andres-san's
>> comment is that is it ok to show like the following log message?
>>
>> ```
>> LOG:  statistics collector process (PID 64991) exited with exit code 1
>> ```
>>
>> Surely, other processes don't output the log like it. So, I agree to suppress
>> the log message.
> 
> Yes. I was mistakenly thinking that HandleChildCrash() was called
> even when the stats collector exits with non-zero code, like other processes.
> But that's not true.
> 
> How should we do this? HandleChildCrash() calls LogChildExit()
> when FatalError = false and Shutdown != ImmediateShutdown.
> We should use the same conditions for the stats collector as follows?
> 
>     if (pid == PgStatPID)
>     {
>     PgStatPID = 0;
> -   if (!EXIT_STATUS_0(exitstatus))
> +   if (!EXIT_STATUS_0(exitstatus) && !FatalError &&
> +   Shutdown != ImmediateShutdown)
>     LogChildExit(LOG, _("statistics collector
> process"),
>  pid, exitstatus);

Thanks, I agree the above code if it's ok that the exit status of the stats
collector is not 0 in immediate shutdown case.

Regards,

-- 
Masahiro Ikeda
NTT DATA CORPORATION




Re: [PATCH] More docs on what to do and not do in extension code

2021-03-26 Thread Craig Ringer
On Fri, 26 Mar 2021 at 06:15, Bruce Momjian  wrote:

> On Thu, Mar 25, 2021 at 08:49:44AM -0400, David Steele wrote:
> > On 1/22/21 1:36 AM, Craig Ringer wrote:
> > >
> > > Would you mind attaching a revised version of the patch with your
> edits?
> > > Otherwise I'll go and merge them in once you've had your say on my
> > > comments inline below.
> >
> > Bharath, do the revisions in [1] look OK to you?
> >
> > > Bruce, Robert, can I have an opinion from you on how best to locate and
> > > structure these docs, or whether you think they're suitable for the
> main
> > > docs at all? See patch upthread.
> >
> > Bruce, Robert, any thoughts here?
>
> I know I sent an email earlier this month saying we shouldn't
> over-document the backend hooks because the code could drift away from
> the README content:
>
>
> https://www.postgresql.org/message-id/20210309172049.GD26575%40momjian.us
>
> Agreed.  If you document the hooks too much, it allows them to
> drift
> away from matching the code, which makes the hook documentation
> actually
> worse than having no hook documentation at all.
>
> However, for this doc patch, the content seem to be more strategic, so
> less likely to change, and hard to figure out from the code directly.
> Therefore, I think this would be a useful addition to the docs.
>

Thanks for the kind words. It's good to hear that it may be useful. Let me
know if anything further is needed.


Re: Is it useful to record whether plans are generic or custom?

2021-03-26 Thread Fujii Masao




On 2021/03/26 0:33, torikoshia wrote:

On 2021-03-25 22:14, Fujii Masao wrote:

On 2021/03/23 16:32, torikoshia wrote:

On 2021-03-05 17:47, Fujii Masao wrote:

Thanks for your comments!


Thanks for updating the patch!

PostgreSQL Patch Tester reported that the patched version failed to be compiled
at Windows. Could you fix this issue?
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.131238



It seems PGDLLIMPORT was necessary..
Attached a new one.


Thanks for updating the patch!

In my test, generic_calls for a utility command was not incremented
before PL/pgSQL function was executed. Maybe this is expected behavior.
But it was incremented after the function was executed. Is this a bug?
Please see the following example.

---
SELECT pg_stat_statements_reset();
SET enable_seqscan TO on;
SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query LIKE 
'%seqscan%';
CREATE OR REPLACE FUNCTION do_ckpt() RETURNS VOID AS $$
BEGIN
  EXECUTE 'CHECKPOINT';
END $$ LANGUAGE plpgsql;
SET enable_seqscan TO on;
SET enable_seqscan TO on;

-- SET commands were executed three times before do_ckpt() was called.
-- generic_calls for SET command is zero in this case.
SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query LIKE 
'%seqscan%';

SELECT do_ckpt();
SET enable_seqscan TO on;
SET enable_seqscan TO on;
SET enable_seqscan TO on;

-- SET commands were executed additionally three times after do_ckpt() was 
called.
-- generic_calls for SET command is three in this case.
SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query LIKE 
'%seqscan%';
---

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH] add concurrent_abort callback for output plugin

2021-03-26 Thread Markus Wanner

On 26.03.21 04:28, Amit Kapila wrote:

I think as you have noted that stream_abort or rollback_prepared will
be sent (the remaining changes in-between will be skipped) as we
decode them from WAL


Yes, but as outlined, too late.  Multiple other transactions may get 
decoded until the decoder reaches the ROLLBACK PREPARED.  Thus, 
effectively, the output plugin currently needs to deduce that a 
transaction got aborted concurrently from one out of half a dozen other 
callbacks that may trigger right after that transaction, because it will 
only get closed properly much later.



so it is not clear to me how it causes any delays
as opposed to where we don't detect concurrent abort say because after
that we didn't access catalog table.


You're assuming very little traffic, where the ROLLBACK ABORT follows 
the PREPARE immediately in WAL.  On a busy system, chances for that to 
happen are rather low.


(I think the same is true for streaming and stream_abort being sent only 
at the time the decoder reaches the ROLLBACK record in WAL.  However, I 
did not try.  Unlike 2PC, where this actually bit me.)


Regards

Markus




Walsender may fail to send wal to the end.

2021-03-26 Thread Kyotaro Horiguchi
Hello, I happened to see a doubious behavior of walsender.

On a replication set with wal_keep_size/(segments) = 0, running the
following command on the primary causes walsender to fail to send up
to the final shutdown checkpoint record to the standby.

(create table t in advance)

psql -c 'insert into t values(0); select pg_switch_wal();'; pg_ctl stop

The primary complains like this:

2021-03-26 17:59:29.324 JST [checkpointer][140697] LOG:  shutting down
2021-03-26 17:59:29.387 JST [walsender][140816] ERROR:  requested WAL segment 
00010032 has already been removed
2021-03-26 17:59:29.387 JST [walsender][140816] STATEMENT:  START_REPLICATION 
0/3200 TIMELINE 1
2021-03-26 17:59:29.394 JST [postmaster][140695] LOG:  database system is shut 
down

This is because XLogSendPhysical detects removal of the wal segment
currently reading by shutdown checkpoint.  However, there' no fear of
overwriting of WAL segments at the time.

So I think we can omit the call to CheckXLogRemoved() while
MyWalSnd->state is WALSNDSTTE_STOPPING because the state comes after
the shutdown checkpoint completes.

Of course that doesn't help if walsender was running two segments
behind. There still could be a small window for the failure.  But it's
a great help to save the case of just 1 segment behind.

Is it worth fixing?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 23baa4498a..4b1e0cf9c5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2755,9 +2755,18 @@ retry:
  &errinfo))
 		WALReadRaiseError(&errinfo);
 
-	/* See logical_read_xlog_page(). */
-	XLByteToSeg(startptr, segno, xlogreader->segcxt.ws_segsize);
-	CheckXLogRemoved(segno, xlogreader->seg.ws_tli);
+	/*
+	 * See logical_read_xlog_page().  However, there's a case where we're
+	 * reading the segment which is removed/recycled by shutdown checkpoint.
+	 * We continue to read it in that case because 1) It's safe because no wal
+	 * activity happens after shutdown checkpoint completes, 2) We need to do
+	 * our best to send WAL up to the shutdown checkpoint record.
+	 */
+	if (MyWalSnd->state < WALSNDSTATE_STOPPING)
+	{
+		XLByteToSeg(startptr, segno, xlogreader->segcxt.ws_segsize);
+		CheckXLogRemoved(segno, xlogreader->seg.ws_tli);
+	}
 
 	/*
 	 * During recovery, the currently-open WAL file might be replaced with the


Re: [PATCH] Allow multiple recursive self-references

2021-03-26 Thread Denis Hirn
Thanks for the feedback, Tom.

> Tom Lane  writes:
> [...]
> As far as I can see, the spec flat-out forbids this.  In SQL:2021,
> it's discussed in 7.17  syntax rule 3) j) ix), which
> defines [linear recursion]

(Aside: We don't have a copy of the SQL:2021 specification here (all
we've got here is the SQL:2016 variant).  We've browsed the ISO site
and didn't find find SQL:2021 there.  Is that a generally available
draft document?)

> So the problem with extending the spec here is that someday they might
> extend it with some other semantics, and then we're between a rock and
> a hard place.

There are two issues, say [LIN] and [NON-LIN], here.  Re [LIN]:

[LIN] PostgreSQL does not allow multiple references to the recursive
   common table, even if the recursion is LINEAR.  Plenty of examples
   of such queries are found in the documentation of established RDBMSs,
   among these IBM Db2 and MS SQL Server:

   - 
https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/sqlp/rbafyrecursivequeries.htm#d60691e2455
 (example "Two tables used for recursion using recursive common
 table expressions")

   - 
https://docs.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver15#h-using-multiple-anchor-and-recursive-members
 (example "Using multiple anchor and recursive members")

Db2 and SQL Server process such linear queries with multiple recursive
CTE references just fine.  As does SQLite3.  With the proposed patch,
PostgreSQL would be able to process these, too (and deliver the same
expected result).

> If there were really compelling reasons why (a) we have to have this
> feature and (b) there's only one reasonable way for it to act, hence
> no likelihood that the SQL committee will choose something else, then
> I'd be willing to think about getting out front of the committee.
> But you haven't made either case.
> [...]
> Do they all act the same?  Has anybody that sits on the SQL committee
> done it?  (If you could point to DB2, in particular, I'd have some
> confidence that the committee might standardize on their approach.)

We'd classify the ability of established RDBMSs to cope with the just
mentioned class of queries (and PostgreSQL's inability to do the same)
as one compelling reason.  Also, the existing functionality in Db2 and
SQL Server would be a yardstick for the expected behavior.

> A totally independent question is whether you've even defined a
> well-defined behavior.  There's an interesting comment in the spec:
>
>
>   NOTE 310 — The restrictions insure that each WLEi, viewed as a
>   transformation of the query names of the stratum, is monotonically
>   increasing. According to Tarski’s fixed point theorem, this insures that
>   there is a fixed point. The General Rules use Kleene’s fixed point
>   theorem to define a sequence that converges to the minimal fixed
>   point.
>
>
> I don't know any of the underlying math here, but if you've got a
> join of multiple recursion references then the number of output
> rows could certainly be nonlinear, which very possibly breaks the
> argument that there's a well-defined interpretation.

This brings us to [NON-LIN]:

[NON-LIN] NOTE 310 refers to Tarski's fixed point theorem and the
  prerequisite that the recursive CTE defines a MONOTONIC query (this
  guarantees the existence of a least fixed point which defines
  the meaning of a recursive CTE in the first place).  MONOTONICITY
  and NON-LINEAR recursion, however, are two separate/orthogonal issues.

A linear recursive query can be monotonic or non-monotonic (and PostgreSQL
currently has a system of safeguards that aim to identify the latter
kind of problematic queries, ruling out the use of EXCEPT, aggregation,
and so forth), just like a non-linear query can.  A mononotic non-linear
recursive query approaches a least fixed point which makes the
behavior of a non-linear recursive CTE as well-defined as a linear
CTE.

In fact, the document that led to the inclusion of WITH RECURSIVE into
the SQL standard (see reference [1] below) mentions that "Linear
recursion is a special case of non-linear recursion" (Section 3.9) in
the fixpoint-based semantics.  (It appears that the authors aimed to
introduce non-linear WITH RECURSIVE from the start but the existing
RECURSIVE UNION proposal at the time was restricted to linear recursion;
so they paddled back and suggested to include non-linear recursion in a
future SQL standard update, coined SQL4 back then).

We do agree, though, that the absence of non-linear recursion in the SQL
standard has openened the field for varied interpretations of the
semantics.  MariaDB, for example, admits non-linear recursion once you
set the configuration option standard_compliant_cte to 0.  However, a
recursive table reference then returns *all* rows collected so far (not
just those rows produced by the last iteration). This implicit change of
behavior makes sense for use cases of non-linear recursion, yet may come
as a su

Re: [PATCH] pg_permissions

2021-03-26 Thread Joel Jacobson
On Fri, Mar 26, 2021, at 07:53, Joel Jacobson wrote:
> On Thu, Mar 25, 2021, at 17:51, Tom Lane wrote:
>> "Joel Jacobson" mailto:joel%40compiler.org>> writes:
>> > On Thu, Mar 25, 2021, at 16:16, Alvaro Herrera wrote:
>> >> Ah, of course -- the only way to obtain the acl columns is by going
>> >> through the catalogs individually, so it won't be possible.  I think
>> >> this could be fixed with some very simple, quick function pg_get_acl()
>> >> that takes a catalog OID and object OID and returns the ACL; then
>> >> use aclexplode() to obtain all those details.
>> 
>> > +1 for adding pg_get_acl().
>> 
>> I wonder what performance will be like with lots o' objects.
> 
> I guess pg_get_acl() would need to be implemented using a switch(classid) 
> with 36 cases (one for each class)?
> 
> Is your performance concern on how such switch statement will be optimized by 
> the C-compiler?
> ...
> the classid case values are nicely ordered from 
> OCLASS_CLASS..OCLASS_TRANSFORM (0..37), so they should produce O(2) fast jump 
> tables.
> 
> Maybe there is some other performance concern to reason about that I'm 
> missing here?

Hmm, I think I understand your performance concern now:

Am I right guessing the problem even with a jump table is going to be branch 
prediction,
which will be poor due to many classids being common?

Interesting, the long UNION ALL variant does not seem to suffer from this 
problem,
thanks to explicitly specifying where to find the aclitem/owner-column.
We pay the lookup-cost "compile time" when writing the 
pg_ownerships/pg_permissions system views,
instead of having to lookup the classids at run-time to go fetch 
aclitem/owner-info.

The query planner is also smart enough to understand not all the individuals 
queries
needs to be executed, for the use-case when filtering on a specific classid.

/Joel



Re: [PATCH] add concurrent_abort callback for output plugin

2021-03-26 Thread Amit Kapila
On Fri, Mar 26, 2021 at 2:42 PM Markus Wanner
 wrote:
>
> On 26.03.21 04:28, Amit Kapila wrote:
> > I think as you have noted that stream_abort or rollback_prepared will
> > be sent (the remaining changes in-between will be skipped) as we
> > decode them from WAL
>
> Yes, but as outlined, too late.  Multiple other transactions may get
> decoded until the decoder reaches the ROLLBACK PREPARED.  Thus,
> effectively, the output plugin currently needs to deduce that a
> transaction got aborted concurrently from one out of half a dozen other
> callbacks that may trigger right after that transaction, because it will
> only get closed properly much later.
>
> > so it is not clear to me how it causes any delays
> > as opposed to where we don't detect concurrent abort say because after
> > that we didn't access catalog table.
>
> You're assuming very little traffic, where the ROLLBACK ABORT follows
> the PREPARE immediately in WAL.  On a busy system, chances for that to
> happen are rather low.
>

No, I am not assuming that. I am just trying to describe you that it
is not necessary that we will be able to detect concurrent abort in
each and every case. Say if any transaction operates on one relation
and concurrent abort happens after first access of relation then we
won't access catalog and hence won't detect abort. In such cases, you
will get the abort only when it happens in WAL. So, why try to get
earlier in some cases when it is not guaranteed in every case. Also,
what will you do when you receive actual Rollback, may be the plugin
can throw it by checking in some way that it has already aborted the
transaction, if so, that sounds a bit awkward to me.

The other related thing is it may not be a good idea to finish the
transaction before we see its actual WAL record because after the
client (say subscriber) finishes xact, it sends the updated LSN
location based on which we update the slot LSNs from where it will
start decoding next time after restart, so by bad timing it might try
to decode the contents of same transaction but may be for
concurrent_aborts the plugin might arrange such that client won't send
updated LSN.

> (I think the same is true for streaming and stream_abort being sent only
> at the time the decoder reaches the ROLLBACK record in WAL.  However, I
> did not try.
>

Yes, both streaming and 2PC behaves in a similar way in this regard.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] pg_permissions

2021-03-26 Thread Alvaro Herrera
On 2021-Mar-26, Joel Jacobson wrote:

> On Thu, Mar 25, 2021, at 17:51, Tom Lane wrote:

> > I wonder what performance will be like with lots o' objects.
> 
> I guess pg_get_acl() would need to be implemented using a switch(classid) 
> with 36 cases (one for each class)?

No, we have a generalized object query mechanism, see objectaddress.c

> Is your performance concern on how such switch statement will be optimized by 
> the C-compiler?

I guess he is concerned about the number of catalog accesses.


-- 
Álvaro Herrera39°49'30"S 73°17'W
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)




Re: PoC/WIP: Extended statistics on expressions

2021-03-26 Thread Dean Rasheed
On Thu, 25 Mar 2021 at 19:59, Tomas Vondra
 wrote:
>
> Attached is an updated patch series, with all the changes discussed
> here. I've cleaned up the ndistinct stuff a bit more (essentially
> reverting back from GroupExprInfo to GroupVarInfo name), and got rid of
> the UpdateStatisticsForTypeChange.
>

I've looked over all that and I think it's in pretty good shape. I
particularly like how much simpler the ndistinct code has now become.

Some (hopefully final) review comments:

1). I don't think index.c is the right place for
StatisticsGetRelation(). I appreciate that it is very similar to the
adjacent IndexGetRelation() function, but index.c is really only for
index-related code, so I think StatisticsGetRelation() should go in
statscmds.c

2). Perhaps the error message at statscmds.c:293 should read

   "expression cannot be used in multivariate statistics because its
type %s has no default btree operator class"

(i.e., add the word "multivariate", since the same expression *can* be
used in univariate statistics even though it has no less-than
operator).

3). The comment for ATExecAddStatistics() should probably mention that
"ALTER TABLE ADD STATISTICS" isn't a command in the grammar, in a
similar way to other similar functions, e.g.:

/*
 * ALTER TABLE ADD STATISTICS
 *
 * This is no such command in the grammar, but we use this internally to add
 * AT_ReAddStatistics subcommands to rebuild extended statistics after a table
 * column type change.
 */

4). The comment at the start of ATPostAlterTypeParse() needs updating
to mention CREATE STATISTICS statements.

5). I think ATPostAlterTypeParse() should also attempt to preserve any
COMMENTs attached to statistics objects, i.e., something like:

--- src/backend/commands/tablecmds.c.orig2021-03-26 10:39:38.328631864 +
+++ src/backend/commands/tablecmds.c2021-03-26 10:47:21.042279580 +
@@ -12619,6 +12619,9 @@
 CreateStatsStmt  *stmt = (CreateStatsStmt *) stm;
 AlterTableCmd *newcmd;

+/* keep the statistics object's comment */
+stmt->stxcomment = GetComment(oldId, StatisticExtRelationId, 0);
+
 newcmd = makeNode(AlterTableCmd);
 newcmd->subtype = AT_ReAddStatistics;
 newcmd->def = (Node *) stmt;

6). Comment typo at extended_stats.c:2532 - s/statitics/statistics/

7). I don't think that the big XXX comment near the start of
estimate_multivariate_ndistinct() is really relevant anymore, now that
the code has been simplified and we no longer extract Vars from
expressions, so perhaps it can just be deleted.

Regards,
Dean




Re: [PATCH] add concurrent_abort callback for output plugin

2021-03-26 Thread Markus Wanner

On 26.03.21 11:19, Amit Kapila wrote:

No, I am not assuming that. I am just trying to describe you that it
is not necessary that we will be able to detect concurrent abort in
each and every case.


Sure.  Nor am I claiming that would be necessary or that the patch 
changed anything about it.


As it stands, assuming the the output plugin basically just forwards the 
events and the subscriber tries to replicate them as is, the following 
would happen on the subscriber for a concurrently aborted two-phase 
transaction:


 * start a transaction (begin_prepare_cb)
 * apply changes for it (change_cb)
 * digress to other, unrelated transactions (leaving unspecified what
   exactly happens to the opened transaction)
 * attempt to rollback a transaction that has not ever been prepared
   (rollback_prepared_cb)

The point of the patch is for the output plugin to get proper 
transaction entry and exit callbacks.  Even in the unfortunate case of a 
concurrent abort.  It offers the output plugin a clean way to learn that 
the decoder stopped decoding for the current transaction and it won't 
possibly see a prepare_cb for it (despite the decoder having passed the 
PREPARE record in WAL).



The other related thing is it may not be a good idea to finish the
transaction


You're speaking subscriber side here.  And yes, I agree, the subscriber 
should not abort the transaction at a concurrent_abort.  I never claimed 
it should.


If you are curious, in our case I made the subscriber PREPARE the 
transaction at its end when receiving a concurrent_abort notification, 
so that the subscriber:


 * can hop out of that started transaction and safely proceed
   to process events for other transactions, and
 * has the transaction in the appropriate state for processing the
   subsequent rollback_prepared_cb, once that gets through

That's probably not ideal in the sense that subscribers do unnecessary 
work.  However, it pretty closely replicates the transaction's state as 
it was on the origin at any given point in time (by LSN).


Regards

Markus




Re: [PATCH] pg_permissions

2021-03-26 Thread Joel Jacobson
On Fri, Mar 26, 2021, at 11:30, Alvaro Herrera wrote:
> On 2021-Mar-26, Joel Jacobson wrote:
> 
> > On Thu, Mar 25, 2021, at 17:51, Tom Lane wrote:
> 
> > > I wonder what performance will be like with lots o' objects.
> > 
> > I guess pg_get_acl() would need to be implemented using a switch(classid) 
> > with 36 cases (one for each class)?
> 
> No, we have a generalized object query mechanism, see objectaddress.c

That's where I was looking actually and noticed the switch with 36 cases, in 
the function getObjectDescription().

/Joel


Re: [PATCH] pg_permissions

2021-03-26 Thread Alvaro Herrera
On 2021-Mar-26, Joel Jacobson wrote:

> On Fri, Mar 26, 2021, at 11:30, Alvaro Herrera wrote:
> > On 2021-Mar-26, Joel Jacobson wrote:
> > 
> > > On Thu, Mar 25, 2021, at 17:51, Tom Lane wrote:
> > 
> > > > I wonder what performance will be like with lots o' objects.
> > > 
> > > I guess pg_get_acl() would need to be implemented using a switch(classid) 
> > > with 36 cases (one for each class)?
> > 
> > No, we have a generalized object query mechanism, see objectaddress.c
> 
> That's where I was looking actually and noticed the switch with 36 cases, in 
> the function getObjectDescription().

Ah! well, you don't have to repeat that.

AFAICS the way to do it is like AlterObjectOwner_internal obtains data
-- first do get_catalog_object_by_oid (gives you the HeapTuple that
represents the object), then
heap_getattr( ..., get_object_attnum_acl(), ..), and there you have the
ACL which you can "explode" (or maybe just return as-is).

AFAICS if you do this, it's just one cache lookups per object, or
one indexscan for the cases with no by-OID syscache.  It should be much
cheaper than the UNION ALL query.  And you use pg_shdepend to guide
this, so you only do it for the objects that you already know are
interesting.

-- 
Álvaro Herrera   Valdivia, Chile
"La conclusión que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusión de ellos" (Tanenbaum)




Re: PoC/WIP: Extended statistics on expressions

2021-03-26 Thread Tomas Vondra



On 3/26/21 12:37 PM, Dean Rasheed wrote:
> On Thu, 25 Mar 2021 at 19:59, Tomas Vondra
>  wrote:
>>
>> Attached is an updated patch series, with all the changes discussed
>> here. I've cleaned up the ndistinct stuff a bit more (essentially
>> reverting back from GroupExprInfo to GroupVarInfo name), and got rid of
>> the UpdateStatisticsForTypeChange.
>>
> 
> I've looked over all that and I think it's in pretty good shape. I
> particularly like how much simpler the ndistinct code has now become.
> 
> Some (hopefully final) review comments:
> 
> 1). I don't think index.c is the right place for
> StatisticsGetRelation(). I appreciate that it is very similar to the
> adjacent IndexGetRelation() function, but index.c is really only for
> index-related code, so I think StatisticsGetRelation() should go in
> statscmds.c
> 

Ah, right, I forgot about this. I wonder if we should add
catalog/statistics.c, similar to catalog/index.c (instead of adding it
locally to statscmds.c).

> 2). Perhaps the error message at statscmds.c:293 should read
> 
>"expression cannot be used in multivariate statistics because its
> type %s has no default btree operator class"
> 
> (i.e., add the word "multivariate", since the same expression *can* be
> used in univariate statistics even though it has no less-than
> operator).
> 
> 3). The comment for ATExecAddStatistics() should probably mention that
> "ALTER TABLE ADD STATISTICS" isn't a command in the grammar, in a
> similar way to other similar functions, e.g.:
> 
> /*
>  * ALTER TABLE ADD STATISTICS
>  *
>  * This is no such command in the grammar, but we use this internally to add
>  * AT_ReAddStatistics subcommands to rebuild extended statistics after a table
>  * column type change.
>  */
> 
> 4). The comment at the start of ATPostAlterTypeParse() needs updating
> to mention CREATE STATISTICS statements.
> 
> 5). I think ATPostAlterTypeParse() should also attempt to preserve any
> COMMENTs attached to statistics objects, i.e., something like:
> 
> --- src/backend/commands/tablecmds.c.orig2021-03-26 10:39:38.328631864 
> +
> +++ src/backend/commands/tablecmds.c2021-03-26 10:47:21.042279580 +
> @@ -12619,6 +12619,9 @@
>  CreateStatsStmt  *stmt = (CreateStatsStmt *) stm;
>  AlterTableCmd *newcmd;
> 
> +/* keep the statistics object's comment */
> +stmt->stxcomment = GetComment(oldId, StatisticExtRelationId, 0);
> +
>  newcmd = makeNode(AlterTableCmd);
>  newcmd->subtype = AT_ReAddStatistics;
>  newcmd->def = (Node *) stmt;
> 
> 6). Comment typo at extended_stats.c:2532 - s/statitics/statistics/
> 
> 7). I don't think that the big XXX comment near the start of
> estimate_multivariate_ndistinct() is really relevant anymore, now that
> the code has been simplified and we no longer extract Vars from
> expressions, so perhaps it can just be deleted.
> 

Thanks! I'll fix these, and then will consider getting it committed
sometime later today, once the buildfarm does some testing on the other
stuff I already committed.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: WIP: BRIN multi-range indexes

2021-03-26 Thread Tomas Vondra



On 3/23/21 7:28 PM, Alvaro Herrera wrote:
> On 2021-Mar-23, Tomas Vondra wrote:
> 
>> FWIW there's yet another difference between the current BRIN opclass
>> definition, compared to what CREATE OPERATOR CLASS would do. Or more
>> precisely, how we'd define opfamily for the cross-type cases (integer,
>> float and timestamp cases).
>>
>> AFAICS we don't really need pg_amproc entries for the cross-type cases,
>> we just need the operators, so pg_amproc entries like
>>
>> { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype =>
>> 'timestamptz',
>>   amprocrighttype => 'timestamp', amprocnum => '1',
>>   amproc => 'brin_minmax_opcinfo' },
>>
>> are unnecessary. The attached patch cleans that up, without breaking any
>> regression tests. Or is there a reason why we need those?
> 
> ... ooh ...
> 
> When you say "just the operators" you mean the pg_amop entries, right?
> 
> I think I agree -- cross-type amproc entries are unlikely to have any
> use.
> 

I've pushed a cleanup of the unnecessary pg_amproc entries for the
built-in opclasses, and I have omitted them from the definition of the
new opclasses. Buildfarm seems happy, so far.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: WIP: BRIN multi-range indexes

2021-03-26 Thread Tomas Vondra
Hi,

I've pushed both the bloom and minmax-multi indexes today.

Based on the feedback and limitations described before I decided to keep
them in core (i.e. not move them to contrib), but it was very useful
experiment as it uncovered a couple issues - both in existing code, and
in the definition of the new opclasses.

After further thought I've concluded that the decision to ditch the old
signature (in a1c649d889) was probably wrong, so I've added the support
back, per what Alexander originally proposed.

While doing so, I've noticed that the NULL handling may be a few bricks
shy, because with (oi_regular_nulls == false) it should probably keep
passing the NULL scan keys to the consistent function. I'll look into
that and get it fixed.


Thanks everyone who helped with those patches!

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Stronger safeguard for archive recovery not to miss data

2021-03-26 Thread David Steele

On 3/25/21 9:23 PM, Fujii Masao wrote:


On 2021/03/25 23:21, David Steele wrote:

On 1/25/21 3:55 AM, Laurenz Albe wrote:

On Mon, 2021-01-25 at 08:19 +, osumi.takami...@fujitsu.com wrote:
I think you should pst another patch where the second, now 
superfluous, error

message is removed.


Updated. This patch showed no failure during regression tests
and has been aligned by pgindent.


Looks good to me.
I'll set it to "ready for committer" again.


Fujii, does the new patch in [1] address your concerns?


No. I'm still not sure if this patch is good idea... I understand
why this safeguard is necessary. OTOH I'm afraid it increases
a bit the risk that users get unstartable database, i.e., lose whole 
database.

But maybe I'm concerned about rare case and my opinion is minority one.
So I'd like to hear more opinions about this patch.


After reviewing the patch I am +1. I think allowing corruption in 
recovery by default is not a good idea. There is currently a warning but 
I don't think that is nearly strong enough and is easily missed.


Also, "data may be missing" makes this sound like an acceptable 
situation. What is really happening is corruption is being introduced 
that may make the cluster unusable or at the very least lead to errors 
during normal operation.


If we want to allow recovery to play past this point I think it would 
make more sense to have a GUC (like ignore_invalid_pages [1]) that 
allows recovery to proceed and emits a warning instead of fatal.


Looking the patch, I see a few things:

1) Typo in the tests:

This protection shold apply to recovery mode

should be:

This protection should apply to recovery mode

2) I don't think it should be the job of this patch to restructure the 
if conditions, even if it is more efficient. It just obscures the 
purpose of the patch. So, I would revert all the changes in xlog.c 
except changing the warning to an error:


-   ereport(WARNING,
-(errmsg("WAL was generated with wal_level=minimal, data may be 
missing"),
- errhint("This happens if you temporarily set wal_level=minimal 
without taking a new base backup.")));

+   ereport(FATAL,
+	(errmsg("WAL was generated with wal_level=minimal, cannot continue 
recovering"),
+	 errdetail("This happens if you temporarily set wal_level=minimal 
on the server."),
+	 errhint("Run recovery again from a new base backup taken after 
setting wal_level higher than minimal")));


--
-David
da...@pgmasters.net

[1] 
https://www.postgresql.org/message-id/E1iu6D8-tK-Cm%40gemulon.postgresql.org





Re: [PATCH] pg_permissions

2021-03-26 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Mar-26, Joel Jacobson wrote:
>> On Thu, Mar 25, 2021, at 17:51, Tom Lane wrote:
>> I wonder what performance will be like with lots o' objects.

> I guess he is concerned about the number of catalog accesses.

My concern is basically that you're forcing the join between
pg_shdepend and $everything_else to be done as a nested loop.
It will work well, up to where you have so many objects that
it doesn't ... but the planner will have no way to improve it.

Having said that, I don't really see a better way either.
Materializing $everything_else via a UNION ALL seems like
no fun from a maintenance perspective, plus we're not that
great on optimizing such constructs either.

regards, tom lane




Re: Allow matching whole DN from a client certificate

2021-03-26 Thread Andrew Dunstan

On 3/24/21 12:54 AM, Michael Paquier wrote:

[numerous useful comments]


OK, here's a new patch. I hope to commit this within a few days.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index b420486a0a..951af49e9a 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -598,7 +598,7 @@ hostnogssenc  database  user
 
   
-   In addition to the method-specific options listed below, there is one
+   In addition to the method-specific options listed below, there is a
method-independent authentication option clientcert, which
can be specified in any hostssl record.
This option can be set to verify-ca or
@@ -612,6 +612,28 @@ hostnogssenc  database  userhostssl entries.
   
+  
+   On any record using client certificate authentication (i.e. one
+   using the cert authentication method or one
+   using the clientcert option), you can specify
+   which part of the client certificate credentials to match using
+   the clientname option. This option can have one
+   of two values. If you specify clientname=CN, which
+   is the default, the username is matched against the certificate's
+   Common Name (CN). If instead you specify
+   clientname=DN the username is matched against the
+   entire Distinguished Name (DN) of the certificate.
+   This option is probably best used in conjunction with a username map.
+   The comparison is done with the DN in
+   https://tools.ietf.org/html/rfc2253";>RFC 2253
+   format. To see the DN of a client certificate
+   in this format, do
+
+openssl x509 -in myclient.crt -noout --subject -nameopt RFC2253 | sed "s/^subject=//"
+
+Care needs to be taken when using this option, especially when using
+regular expression matching against the DN.
+  
  
 

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 994251e7d9..32ef05783a 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -2800,12 +2800,23 @@ static int
 CheckCertAuth(Port *port)
 {
 	int			status_check_usermap = STATUS_ERROR;
+	char	   *peer_username;
 
 	Assert(port->ssl);
 
+	/* select the correct field to compare */
+	switch (port->hba->clientcertname)
+	{
+		case clientCertDN:
+			peer_username = port->peer_dn;
+			break;
+		default:
+			peer_username = port->peer_cn;
+	}
+
 	/* Make sure we have received a username in the certificate */
-	if (port->peer_cn == NULL ||
-		strlen(port->peer_cn) <= 0)
+	if (peer_username == NULL ||
+		strlen(peer_username) <= 0)
 	{
 		ereport(LOG,
 (errmsg("certificate authentication failed for user \"%s\": client certificate contains no user name",
@@ -2813,8 +2824,8 @@ CheckCertAuth(Port *port)
 		return STATUS_ERROR;
 	}
 
-	/* Just pass the certificate cn to the usermap check */
-	status_check_usermap = check_usermap(port->hba->usermap, port->user_name, port->peer_cn, false);
+	/* Just pass the certificate cn/dn to the usermap check */
+	status_check_usermap = check_usermap(port->hba->usermap, port->user_name, peer_username, false);
 	if (status_check_usermap != STATUS_OK)
 	{
 		/*
@@ -2824,9 +2835,18 @@ CheckCertAuth(Port *port)
 		 */
 		if (port->hba->clientcert == clientCertFull && port->hba->auth_method != uaCert)
 		{
-			ereport(LOG,
-	(errmsg("certificate validation (clientcert=verify-full) failed for user \"%s\": CN mismatch",
-			port->user_name)));
+			if (port->hba->clientcertname == clientCertDN)
+			{
+ereport(LOG,
+		(errmsg("certificate validation (clientcert=verify-full) failed for user \"%s\": DN mismatch",
+port->user_name)));
+			}
+			else
+			{
+ereport(LOG,
+		(errmsg("certificate validation (clientcert=verify-full) failed for user \"%s\": CN mismatch",
+port->user_name)));
+			}
 		}
 	}
 	return status_check_usermap;
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 5ce3f27855..65b07c7d3f 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -551,22 +551,26 @@ aloop:
 	/* Get client certificate, if available. */
 	port->peer = SSL_get_peer_certificate(port->ssl);
 
-	/* and extract the Common Name from it. */
+	/* and extract the Common Name and Distinguished Name from it. */
 	port->peer_cn = NULL;
+	port->peer_dn = NULL;
 	port->peer_cert_valid = false;
 	if (port->peer != NULL)
 	{
 		int			len;
+		X509_NAME  *x509name = X509_get_subject_name(port->peer);
+		char	   *peer_dn;
+		BIO		   *bio = NULL;
+		BUF_MEM*bio_buf = NULL;
 
-		len = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer),
-		NID_commonName, NULL, 0);
+		len = X509_NAME_get_text_by_NID(x509name, NID_commonName, NULL, 0);
 		if (len != -1)
 		{
 			char	   *peer_cn;
 
 			peer_cn = MemoryContextAlloc(TopMemoryContext, len + 1);
-			r = X509_NAME_ge

Re: WIP: BRIN multi-range indexes

2021-03-26 Thread Alvaro Herrera
On 2021-Mar-26, Tomas Vondra wrote:

> I've pushed both the bloom and minmax-multi indexes today.

Congratulations!  I think this reimplementation of the minmax opclass
infrastructure makes BRIN much more useful (read: actually usable).

-- 
Álvaro Herrera   Valdivia, Chile




Re: WIP: BRIN multi-range indexes

2021-03-26 Thread Tomas Vondra



On 3/26/21 2:08 PM, Tomas Vondra wrote:
> Hi,
> 
> I've pushed both the bloom and minmax-multi indexes today.
> 

Hmmm, I see a couple buildfarm animals are upset about the minmax-multi
patch. It does pass for me both on x86_64 and 32-bit ARM (rpi4), so I'm
unable to reproduce this. But most of the machines having issues seem to
be sparc variants, and snapper does this:


[New LWP 1471]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: parallel worker for PID 1350
 '.
Program terminated with signal 10, Bus error.
#0  0x007167fc in int82gt (fcinfo=0xffcc66d0) at int8.c:399
399 int64   val1 = PG_GETARG_INT64(0);
#0  0x007167fc in int82gt (fcinfo=0xffcc66d0) at int8.c:399
#1  0x00887a94 in FunctionCall2Coll (flinfo=0xb81a2c, collation=0,
arg1=12242916, arg2=0) at fmgr.c:1163


I recall seeing "bus error" on sparc with other patches because of
alignment issues, so I wonder if this is what's happening here.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [UNVERIFIED SENDER] Re: Challenges preventing us moving to 64 bit transaction id (XID)?

2021-03-26 Thread David Steele

Hi Jim,

On 3/26/21 12:01 AM, Thomas Munro wrote:

On Fri, Mar 26, 2021 at 2:57 AM David Steele  wrote:

On 1/22/21 6:46 PM, Finnerty, Jim wrote:

First 3 patches derived from the original 64-bit xid patch set by Alexander 
Korotkov


The patches no longer apply
(http://cfbot.cputube.org/patch_32_2960.log), so marked Waiting on Author.

I also removed the PG14 target since this is a fresh patch set after a
long hiatus with no new review.


I just wanted to say that I'm definitely interested in progress in
this area, and I'm sure many others are too.  Let's talk again about
incremental steps in the PG15 cycle.  The reason for lack of responses
on this thread is most likely due to being at the business end of the
PG14 cycle.


Indeed. I certainly didn't mean to imply that this work is not valuable, 
just that it is too late to consider it for PG14.


I'm going to move this to the 2021-07 CF and leave it in the Waiting for 
Author state. It would probably be best to wait until just before the CF 
to rebase since anything you do now will likely be broken by the flurry 
of commits that will happen in the next two weeks before feature freeze.


Regards,
--
-David
da...@pgmasters.net




Re: WIP: BRIN multi-range indexes

2021-03-26 Thread Tom Lane
Tomas Vondra  writes:
> I recall seeing "bus error" on sparc with other patches because of
> alignment issues, so I wonder if this is what's happening here.

Try compiling with the address sanitizer enabled.  Per c.h,

 * Testing can be done with "-fsanitize=alignment -fsanitize-trap=alignment"
 * on clang, or "-fsanitize=alignment -fno-sanitize-recover=alignment" on gcc.

regards, tom lane




Re: WIP: BRIN multi-range indexes

2021-03-26 Thread Tomas Vondra
On 3/26/21 2:55 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> I recall seeing "bus error" on sparc with other patches because of
>> alignment issues, so I wonder if this is what's happening here.
> 
> Try compiling with the address sanitizer enabled.  Per c.h,
> 
>  * Testing can be done with "-fsanitize=alignment -fsanitize-trap=alignment"
>  * on clang, or "-fsanitize=alignment -fno-sanitize-recover=alignment" on gcc.
> 

Bingo! I see the one failing x86_64 machine has -fsanitize=alignment.


thanks

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: invalid data in file backup_label problem on windows

2021-03-26 Thread David Steele

On 3/21/21 10:40 AM, Magnus Hagander wrote:

On Sat, Mar 20, 2021 at 3:10 AM wangsh.f...@fujitsu.com
 wrote:


David Steele  wrote:


It's not clear to me what text editors have to do with this? Are you
editing the file manually?


When I execute SELECT * FROM pg_stop_backup(false, true) in psql.

The results will be shown like:
 lsn|   labelfile   
| spcmapfile
 
+-+
 0/2000138 | START WAL LOCATION: 0/228 (file 
00010002)+|
| CHECKPOINT LOCATION: 0/260
   +|
| BACKUP METHOD: streamed   
  +|
| BACKUP FROM: master   
   +
 ..
The results only will be shown on screen and this function will not generate 
any files. What I do is write
the second field(labelfile) to a new file backup_label and write the third 
field(spcmapfile) to tablespace_map if
the third field is not null.

Therefore, I choose a text editor to help me write the file.
I copy the a line in second field and paste this to text editor and press the 
'enter' key, repeat this action util
all the line have be pasted to text editor, then save the file.

If this is not a good way to create the backup_label file, can you tell me how 
can I create this file on windows?


These APIs are really not designed to be run manually from a CLI and
copy/paste the results.

Running them from literally any script or program should make that
easy, by getting the actual value out and storing it.


You might consider using pg_basebackup, which does all this for you and 
is well tested.



I think the real problem is this file on windows is opened with binary mode. If 
I use libpq to get the result and write
the result to file directly(the default action on windows is open file in text 
mode), this problem will be happened.
So I consider this is a bug.


No, the problem is you are using copy/paste and in doing so you are
*changing'* the value that is being returned. You'll either need to
update your copy/paste procedure to not mess with the newlines, or to
use a better way to get the data out.

If we need to clarify that in the documentation, I'm fine with that.
Maybe add an extra sentence to the part about not modifying the output
to mention that this includes changing newslines and also encoding
(which would also break it, if you managed to find a non-ascii
compatible encoding). Maybe even something along the line of "the
contents have to be written in binary mode"?


Perhaps something like the attached?

Regards,
--
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index c5557d5444..b548817360 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -913,7 +913,8 @@ SELECT * FROM pg_stop_backup(false, true);
  backup_label in the root directory of the backup. The
  third field should be written to a file named
  tablespace_map unless the field is empty. These 
files are
- vital to the backup working, and must be written without modification.
+ vital to the backup working and must be written without modification, 
which 
+ may require setting binary mode when writing on some platforms (e.g. 
Windows).
 




Re: WIP: BRIN multi-range indexes

2021-03-26 Thread Tomas Vondra
On 3/26/21 3:04 PM, Tomas Vondra wrote:
> On 3/26/21 2:55 PM, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> I recall seeing "bus error" on sparc with other patches because of
>>> alignment issues, so I wonder if this is what's happening here.
>>
>> Try compiling with the address sanitizer enabled.  Per c.h,
>>
>>  * Testing can be done with "-fsanitize=alignment -fsanitize-trap=alignment"
>>  * on clang, or "-fsanitize=alignment -fno-sanitize-recover=alignment" on 
>> gcc.
>>
> 
> Bingo! I see the one failing x86_64 machine has -fsanitize=alignment.
> 

Yeah, the deserialization is borked. It assumes it can just point into
the serialized representation of the summary, but that is "compacted" by
ignoring alignment. Hence the failures. For me it fails only for timetz
and interval types, but perhaps sparc is more sensitive, or maybe it's
about 32/64 bits too (the only backtrace I'm aware of is from snapper,
so assuming it's 32bits it'd make sense it fails on int8).

I have a feeling I made the same mistake in serialization of MCV stats
some time ago, shame on me. I'll get this fixed.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: MultiXact\SLRU buffers configuration

2021-03-26 Thread Andrey Borodin



> 26 марта 2021 г., в 11:00, Andrey Borodin  написал(а):
> 
>> I'm not saying the 0002 patch is bug-free yet though, it's a bit finickity.
> I think the idea of speeding up linear search is really really good for 
> scaling SLRUs. It's not even about improving normal performance of the 
> cluster, but it's important from preventing pathological degradation under 
> certain circumstances. Bigger cache really saves SLAs :) I'll look into the 
> patch more closely this weekend. Thank you!


Some thoughts on HashTable patch:
1. Can we allocate bigger hashtable to reduce probability of collisions?
2. Can we use specialised hashtable for this case? I'm afraid hash_search() 
does comparable number of CPU cycles as simple cycle from 0 to 128. We could 
inline everything and avoid hashp->hash(keyPtr, hashp->keysize) call. I'm not 
insisting on special hash though, just an idea.
3. pageno in SlruMappingTableEntry seems to be unused.

Thanks!

Best regards, Andrey Borodin.



Re: WIP: BRIN multi-range indexes

2021-03-26 Thread Tomas Vondra
On 3/26/21 3:45 PM, Tomas Vondra wrote:
> On 3/26/21 3:04 PM, Tomas Vondra wrote:
>> On 3/26/21 2:55 PM, Tom Lane wrote:
>>> Tomas Vondra  writes:
 I recall seeing "bus error" on sparc with other patches because of
 alignment issues, so I wonder if this is what's happening here.
>>>
>>> Try compiling with the address sanitizer enabled.  Per c.h,
>>>
>>>  * Testing can be done with "-fsanitize=alignment -fsanitize-trap=alignment"
>>>  * on clang, or "-fsanitize=alignment -fno-sanitize-recover=alignment" on 
>>> gcc.
>>>
>>
>> Bingo! I see the one failing x86_64 machine has -fsanitize=alignment.
>>
> 
> Yeah, the deserialization is borked. It assumes it can just point into
> the serialized representation of the summary, but that is "compacted" by
> ignoring alignment. Hence the failures. For me it fails only for timetz
> and interval types, but perhaps sparc is more sensitive, or maybe it's
> about 32/64 bits too (the only backtrace I'm aware of is from snapper,
> so assuming it's 32bits it'd make sense it fails on int8).
> 
> I have a feeling I made the same mistake in serialization of MCV stats
> some time ago, shame on me. I'll get this fixed.
> 

I've pushed a fix, ensuring proper alignment. There was a second issue
that'd affect big endian systems, so I fixed that too.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Bug on update timing of walrcv->flushedUpto variable

2021-03-26 Thread 蔡梦娟(玊于)
Hi, all

Recently, I found a bug on update timing of walrcv->flushedUpto variable, 
consider the following scenario, there is one Primary node, one Standby node 
which streaming from Primary:
There are a large number of SQL running in the Primary, and the length of the 
xlog record generated by these SQL maybe greater than the left space of current 
page so that it needs to be written cross pages. As shown below, the length of 
the last_xlog of wal_1 is greater than the left space of last_page, so it has 
to be written in wal_2. If Primary crashed after flused the last_page of wal_1 
to disk, the remian content of last_xlog hasn't been flushed in time, then the 
last_xlog in wal_1 will be incomplete. And Standby also received the wal_1 by 
wal-streaming in this case.
[日志1.png]

Primary restarts after crash, during the crash recovery, Primary will find that 
the last_xlog of wal_1 is invalid, and it will cover the space of last_xlog by 
inserting new xlog record. However, Standby won't do this, and there will be 
xlog inconsistency between Primary and standby at this time.


When Standby restarts and replays the last_xlog, it will first get the content 
of XLogRecord structure (the header of last_xlog is completed flushed), and 
find that it has to reassemble the last_xlog, the next page of last_xlog is 
within wal_2, which not exists in pg_wal of Standby. So it request xlog 
streaming from Primary to get the wal_2, and update the walrcv->flushedUpto 
when it has received new xlog and flushed them to disk, now the value of 
walrcv->flushedUpto is some LSN within wal_2.


Standby get wal_2 from Primary, but the content of the first page of wal_2 is 
not the remain content of last_xlog, which has already been covered by new xlog 
in Primary. Standby checked and found that the record is invalid, it will read 
the last_xlog again, and call the WaitForWALToBecomeAvailable function, in this 
function it will shutdown the wal-streaming and read the record from pg_wal.


Again, the record read from pg_wal is also invalid, so Standby will request 
wal-streaming again, and it is worth noting that the value of 
walrcv->flushedUpto has already been set to wal_2 before, which is greater than 
the LSN Standby needs, so the variable havedata in WaitForWALToBecomeAvailable 
is always true, and Standby considers that it received the xlog, it will read 
the content from wal_2.


Next is the endless loop: Standby found the xlog is invalid -> read the 
last_xlog again -> shutdown wal-streaming and read xlog from pg_wal -> found 
the xlog is invalid -> request wal-streaming, expect to get the correct xlog, 
but it will return from WaitForWALToBecomeAvailable immediately because the 
walrcv->flushedUpto is always greater than the LSN it needs ->read and found 
the xlog is invalid -> read the last_xlog again ->..


In this case, Standby will never get the correct xlog record until it restarts


The confusing point is: why only updates the walrcv->flushedUpto at the first 
startup of walreceiver on a specific timeline, not each time when request xlog 
streaming? In above case, it is also reasonable to update walrcv->flushedUpto 
to wal_1 when Standby re-receive wal_1. So I changed to update the 
walrcv->flushedUpto each time when request xlog streaming, which is the patch I 
want to share with you, based on postgresql-13.2, what do you think of this 
change?

By the way, I also want to know why call pgstat_reset_all function during 
recovery process?

Thanks & Best Regard


0001-Update-walrcv-flushedUpto-each-time-when-request-xlog-streaming.patch
Description: Binary data


Re: invalid data in file backup_label problem on windows

2021-03-26 Thread Andrew Dunstan


On 3/26/21 10:19 AM, David Steele wrote:
>
>> No, the problem is you are using copy/paste and in doing so you are
>> *changing'* the value that is being returned. You'll either need to
>> update your copy/paste procedure to not mess with the newlines, or to
>> use a better way to get the data out.
>>
>> If we need to clarify that in the documentation, I'm fine with that.
>> Maybe add an extra sentence to the part about not modifying the output
>> to mention that this includes changing newslines and also encoding
>> (which would also break it, if you managed to find a non-ascii
>> compatible encoding). Maybe even something along the line of "the
>> contents have to be written in binary mode"?
>
> Perhaps something like the attached?
>
>


That seems a bit opaque.  Let's tell them exactly what they need to avoid.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: wal stats questions

2021-03-26 Thread Andres Freund
Hi,

On 2021-03-25 16:37:10 +0900, Kyotaro Horiguchi wrote:
> On the other hand, the counters are incremented in XLogInsertRecord()
> and I think we don't want add instructions there.

I don't really buy this. Setting a boolean to true, in a cacheline
you're already touching, isn't that much compared to all the other stuff
in there. The branch to check if wal stats timing etc is enabled is much
more expensive.  I think we should just set a boolean to true and leave
it at that.

Greetings,

Andres Freund




Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-26 Thread Andres Freund
Hi,

On 2021-03-25 21:23:17 +0900, Fujii Masao wrote:
> > This strikes me as a quite a misleading function name.
> 
> Yeah, better name is always welcome :)

It might just be best to not introduce a generic function and just open
code one just for the stats collector...


> > Outside of very
> > narrow circumstances a normal exit shouldn't use _exit(1). Neither 1 no
> > _exit() (as opposed to exit()) seem appropriate. This seems like a bad
> > idea.
> 
> So you're thinking that the stats collector should do proc_exit(0)
> or something even when it receives immediate shutdown request?

> One idea to avoid using _exit(1) is to change the SIGQUIT handler
> so that it just sets the flag. Then if the stats collector detects that
> the flag is set in the main loop, it gets out of the loop,
> skips writing the permanent stats file and then exits with exit(0).
> That is, normal and immediate shutdown requests are treated
> almost the same way in the stats collector. Only the difference of
> them is whether it saves the stats to the file or not. Thought?

My main complaint isn't so much that you made the stats collector
_exit(1). It's that that you added a function that sounded generic, but
should basically not be used anywhere (we have very few non-shmem
connected processes left - I don't think that number will increase).


Greetings,

Andres Freund




Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-26 Thread Andres Freund
Hi,

On 2021-03-26 09:27:19 +0900, Masahiro Ikeda wrote:
> On 2021/03/25 19:48, Fujii Masao wrote:
> > Yes. So we should wait for the shared memory stats patch to be committed
> > before working on walreceiver stats patch more?
> 
> Yes, I agree.

Agreed.

One thing that I didn't quite see discussed enough in the thread so far:
Have you considered what regressions the stats file removal in immediate
shutdowns could cause? Right now we will - kind of accidentally - keep
the stats file for immediate shutdowns, which means that autovacuum etc
will still have stats to work with after the next start. After this, not
so much?

Greetings,

Andres Freund




Re: Walsender may fail to send wal to the end.

2021-03-26 Thread Andres Freund
Hi,

On 2021-03-26 18:20:14 +0900, Kyotaro Horiguchi wrote:
> This is because XLogSendPhysical detects removal of the wal segment
> currently reading by shutdown checkpoint.  However, there' no fear of
> overwriting of WAL segments at the time.
>
> So I think we can omit the call to CheckXLogRemoved() while
> MyWalSnd->state is WALSNDSTTE_STOPPING because the state comes after
> the shutdown checkpoint completes.
>
> Of course that doesn't help if walsender was running two segments
> behind. There still could be a small window for the failure.  But it's
> a great help to save the case of just 1 segment behind.

-1. This seems like a bandaid to make a broken configuration work a tiny
bit better, without actually being meaningfully better.

Greetings,

Andres Freund




Re: invalid data in file backup_label problem on windows

2021-03-26 Thread Magnus Hagander
On Fri, Mar 26, 2021 at 5:52 PM Andrew Dunstan  wrote:
>
>
> On 3/26/21 10:19 AM, David Steele wrote:
> >
> >> No, the problem is you are using copy/paste and in doing so you are
> >> *changing'* the value that is being returned. You'll either need to
> >> update your copy/paste procedure to not mess with the newlines, or to
> >> use a better way to get the data out.
> >>
> >> If we need to clarify that in the documentation, I'm fine with that.
> >> Maybe add an extra sentence to the part about not modifying the output
> >> to mention that this includes changing newslines and also encoding
> >> (which would also break it, if you managed to find a non-ascii
> >> compatible encoding). Maybe even something along the line of "the
> >> contents have to be written in binary mode"?
> >
> > Perhaps something like the attached?
> >
> >
>
>
> That seems a bit opaque.  Let's tell them exactly what they need to avoid.

Yeah, it seems a bit imprecise. Maybe something like "which includes
things like opening the file in binary mode"? (I want the "includes"
part because it also means other things, this is not the only thing).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Calendar support in localization

2021-03-26 Thread Daniel Verite
Thomas Munro wrote:

> Right, so if this is done by trying to extend Daniel Verite's icu_ext
> extension (link given earlier) and Robert's idea of a fast-castable
> type, I suppose you might want now()::icu_date + '1 month'::internal
> to advance you by one Ethiopic month if you have done SET
> icu_ext.ICU_LC_TIME = 'am_ET@calendar=traditional'.

I've pushed a calendar branch on icu_ext [1] with preliminary support
for non-gregorian calendars through ICU, so far with only format and parse
of timetamptz.
The ICU locale drives both the localization of field names (language) and the
choice of calendar.

It looks like this:

\set fmt 'dd//  HH:mm:ss.SSS zz'

 WITH list(cal) AS ( values 
('gregorian'),
('japanese'),
('buddhist'),
('roc'),
('persian'),
('islamic-civil'),
('islamic'),
('hebrew'),
('chinese'),
('indian'),
('coptic'),
('ethiopic'),
('ethiopic-amete-alem'),
('iso8601'),
('dangi')
),
fmt AS (
 select
  cal,
  icu_format_date(now(),  :'fmt', 'fr@calendar='||cal) as now_str,
  icu_format_date(now()+'1 month'::interval,  :'fmt', 'fr@calendar='||cal) as
plus_1m
  from list
)
SELECT
 cal,
 now_str,
 icu_parse_date(now_str, :'fmt', 'fr@calendar='||cal) as now_parsed,
 plus_1m,
 icu_parse_date(plus_1m, :'fmt', 'fr@calendar='||cal) as plus_1m_parsed
FROM fmt;


-[ RECORD 1 ]--+---
cal| gregorian
now_str| 26/mars/2021 après Jésus-Christ 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 26/avril/2021 après Jésus-Christ 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 2 ]--+---
cal| japanese
now_str| 26/mars/0033 Heisei 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 26/avril/0033 Heisei 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 3 ]--+---
cal| buddhist
now_str| 26/mars/2564 ère bouddhique 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 26/avril/2564 ère bouddhique 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 4 ]--+---
cal| roc
now_str| 26/mars/0110 RdC 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 26/avril/0110 RdC 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 5 ]--+---
cal| persian
now_str| 06/farvardin/1400 Anno Persico 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 06/ordibehešt/1400 Anno Persico 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 6 ]--+---
cal| islamic-civil
now_str| 12/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 7 ]--+---
cal| islamic
now_str| 13/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 8 ]--+---
cal| hebrew
now_str| 13/nissan/5781 Anno Mundi 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 14/iyar/5781 Anno Mundi 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 9 ]--+---
cal| chinese
now_str| 14/èryuè/0038 78 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 15/sānyuè/0038 78 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 10 ]-+---
cal| indian
now_str| 05/chaitra/1943 ère Saka 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 06/vaishākh/1943 ère Saka 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 11 ]-+---
cal| coptic
now_str| 17/barmahât/1737 après Dioclétien 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 18/barmoudah/1737 après Dioclétien 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 12 ]-+---
cal| ethiopic
now_str| 17/mägabit/2013 après l’Incarnation 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01

Re: Query about pg asynchronous processing support

2021-03-26 Thread legrand legrand
Hi,

You should search informations about postgres hooks like thoses used in
extension pg_stat_statements
https://github.com/postgres/postgres/blob/master/contrib/pg_stat_statements/pg_stat_statements.c

And about background capabilities as thoses used in extension pg_background
https://github.com/vibhorkum/pg_background

There are also extensions working with indexes like hypopg
https://github.com/HypoPG/hypopg

Good luck 
Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: invalid data in file backup_label problem on windows

2021-03-26 Thread David Steele

On 3/26/21 1:20 PM, Magnus Hagander wrote:

On Fri, Mar 26, 2021 at 5:52 PM Andrew Dunstan  wrote:

On 3/26/21 10:19 AM, David Steele wrote:



No, the problem is you are using copy/paste and in doing so you are
*changing'* the value that is being returned. You'll either need to
update your copy/paste procedure to not mess with the newlines, or to
use a better way to get the data out.

If we need to clarify that in the documentation, I'm fine with that.
Maybe add an extra sentence to the part about not modifying the output
to mention that this includes changing newslines and also encoding
(which would also break it, if you managed to find a non-ascii
compatible encoding). Maybe even something along the line of "the
contents have to be written in binary mode"?


Perhaps something like the attached?


That seems a bit opaque.  Let's tell them exactly what they need to avoid.


Yeah, it seems a bit imprecise. Maybe something like "which includes
things like opening the file in binary mode"? (I want the "includes"
part because it also means other things, this is not the only thing).


OK, how about the attached?

Regards,
--
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index c5557d5444..b1899135c8 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -913,7 +913,8 @@ SELECT * FROM pg_stop_backup(false, true);
  backup_label in the root directory of the backup. The
  third field should be written to a file named
  tablespace_map unless the field is empty. These 
files are
- vital to the backup working, and must be written without modification.
+ vital to the backup working and must be written without modification, 
which 
+ may include opening the file in binary mode.
 




[PATCH] Allow CustomScan nodes to signal projection support

2021-03-26 Thread Sven Klemm
Hi,

The attached patch allows CustomScan nodes to signal whether they
support projection.
Currently all CustomScan nodes are treated as supporting projection.
But it would be nice
for custom nodes to opt out of this to prevent postgres from modifying
the targetlist of
the custom node.

For features similar to set-returning functions the function call
needs to be a top level expression in a custom node that implements it
but any targetlist adjustments
might be modified by postgres because it is not aware of the special
meaning of those function calls and it might push down a different
targetlist in the node cause it assumes
the node can project.

I named the flag CUSTOMPATH_SUPPORT_PROJECTION similar to the other
custom node flags, but this would revert the current logic and nodes
would have to opt into
projection. I thought about naming it CUSTOMPATH_CANNOT_PROJECT to keep
the current default and make it an opt out. But that would make the
flag name notably different from the other flag names. Any opinions on
the flag name and whether it should be opt in or opt out?

-- 
Regards, Sven Klemm


v1-0001-costumscan_projection_flag.patch
Description: Binary data


Re: UniqueKey on Partitioned table.

2021-03-26 Thread Dmitry Dolgov
> On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:
>
> The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
> single relation part and may have bugs. I just attached it here for design
> review only. and the not-null-attrs is just v1 which we can continue
> discussing on the original thread[2].

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?




Re: Support for NSS as a libpq TLS backend

2021-03-26 Thread Stephen Frost
Greetings,

* Jacob Champion (pchamp...@vmware.com) wrote:
> On Wed, 2021-03-24 at 14:10 -0400, Stephen Frost wrote:
> > * Jacob Champion (pchamp...@vmware.com) wrote:
> > > I could see this being a problem if two client certificate nicknames
> > > collide across multiple in-use databases, maybe?
> > 
> > Right, in such a case either cert might get returned and it's possible
> > that the "wrong" one is returned and therefore the connection would end
> > up failing, assuming that they aren't actually the same and just happen
> > to be in both.
> > 
> > Seems like we could use SECMOD_OpenUserDB() and then pass the result
> > from that into PK11_ListCertsInSlot() and scan through the certs in just
> > the specified database to find the one we're looking for if we really
> > feel compelled to try and address this risk.  I've reached out to the
> > NSS folks to see if they have any thoughts about the best way to address
> > this.
> 
> Some additional findings (NSS 3.63), please correct me if I've made any 
> mistakes:
> 
> The very first NSSInitContext created is special. If it contains a database, 
> that database will be considered part of the "internal" slot and its 
> certificates can be referenced directly by nickname. If it doesn't have a 
> database, the internal slot has no certificates, and it will continue to have 
> zero certificates until NSS is completely shut down and reinitialized with a 
> new "first" context.
> 
> Databases that are opened *after* the first one are given their own separate 
> slots. Any certificates that are part of those databases seemingly can't be 
> referenced directly by nickname. They have to be prefixed by their token name 
> -- a name which you don't have if you used NSS_InitContext() to create the 
> database. You have to use SECMOD_OpenUserDB() instead. This explains some 
> strange failures I was seeing in local testing, where the order of 
> InitContext determined whether our client certificate selection succeeded or 
> failed.

This is more-or-less what we would want though, right..?  If a user asks
for a connection with ssl_database=blah and sslcert=whatever, we'd want
to open database 'blah' and search (just) that database for cert
'whatever'.  We could possibly offer other options in the future but
certainly this would work and be the most straight-forward and expected
behavior.

> If you SECMOD_OpenUserDB() a database that is identical to the first 
> (internal) database, NSS deduplicates for you and just returns the internal 
> slot. Which seems like it's helpful, except you're not allowed to close that 
> database, and you have to know not to close it by checking to see whether 
> that slot is the "internal key slot". It appears to remain open until NSS is 
> shut down entirely.

Seems like we shouldn't do that and should just use SECMOD_OpenUserDB()
for opening databases.

> But if you open a database that is *not* the magic internal database,
> and then open a duplicate of that one, NSS creates yet another new slot
> for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource
> hog, depending on the global state of the process at the time libpq
> opens its first connection. We won't be able to control what the parent
> application will do before loading us up.

I would think we'd want to avoid re-opening the same database multiple
times, to avoid the duplicate slots and such.  If the application code
does it themselves, well, there's not much we can do about that, but we
could at least avoid doing so in *our* code.  I wouldn't expect us to be
opening hundreds of databases either and so keeping a simple list around
of what we've opened and scanning it seems like it'd be workable.  Of
course, this could likely be improved in the future but I would think
that'd be good for an initial implementation.

We could also just generally caution users in our documentation against
using multiple databases.  The NSS folks discourage doing so and it
doesn't strike me as being a terribly useful thing to do anyway, at
least from within one invocation of an application.  Still, if we could
make it work reasonably well, then I'd say we should go ahead and do so.

> It also doesn't look like any of the SECMOD_* machinery that we're
> looking at is thread-safe, but I'd really like to be wrong...

That's unfortuante but solvable by using our own locks, similar
to what's done in fe-secure-openssl.c.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: libpq debug log

2021-03-26 Thread alvhe...@alvh.no-ip.org
Hello

I added an option to the new libpq_pipeline program that it activates
libpq trace.  It works nicely and I think we can add that to the
regression tests.  However I have two observations.

1. The trace output for the error message won't be very nice, because it
prints line numbers.  So if I want to match the output to an "expected"
file, it would break every time somebody edits the source file where the
error originates and the ereport() line is moved.  For example:

<   70  ErrorResponseS "ERROR" V "ERROR" C "22012" M 
"division by zero" F "numeric.c" L "8366" R "div_var" \x00 "Z"

The line number 8366 in this line would be problematic.  I think if we
want regression testing for this, we should have another trace flag to
suppress output of a few of these fields.  I would have it print only S,
C and M.

(Hey, what the heck is that "Z" at the end of the message?  I thought
the error ended right at the \x00 ...)


2. The < and > characters are not good for visual inspection.  I
replaced them with F and B and I think it's much clearer.  Compare:

>   27  Query"SET lc_messages TO "C""
<   8   CommandComplete  "SET"
<   5   ReadyForQueryI
>   21  Parse"" "SELECT $1" 1 23
>   19  Bind "" "" 0 1 1 '1' 1 0
>   6   Describe P ""
>   9   Execute  "" 0
>   4   Sync
<   4   ParseComplete
<   4   BindComplete
<   33  RowDescription   1 "?column?" 0 0 23 4 -1 0
<   11  DataRow  1 1 '1'
<   13  CommandComplete  "SELECT 1"
<   5   ReadyForQueryI
>   4   Terminate

with

F   27  Query"SET lc_messages TO "C""
B   8   CommandComplete  "SET"
B   5   ReadyForQueryI
F   21  Parse"" "SELECT $1" 1 23
F   19  Bind "" "" 0 1 1 '1' 1 0
F   6   Describe P ""
F   9   Execute  "" 0
F   4   Sync
B   4   ParseComplete
B   4   BindComplete
B   33  RowDescription   1 "?column?" 0 0 23 4 -1 0
B   11  DataRow  1 1 '1'
B   13  CommandComplete  "SELECT 1"
B   5   ReadyForQueryI
F   4   Terminate

I think the second one is much easier on the eye.

(This one is the output from "libpq_pipeline simple_pipeline").

-- 
Álvaro Herrera   Valdivia, Chile
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)




Re: MultiXact\SLRU buffers configuration

2021-03-26 Thread Thomas Munro
On Sat, Mar 27, 2021 at 4:52 AM Andrey Borodin  wrote:
> Some thoughts on HashTable patch:
> 1. Can we allocate bigger hashtable to reduce probability of collisions?

Yeah, good idea, might require some study.

> 2. Can we use specialised hashtable for this case? I'm afraid hash_search() 
> does comparable number of CPU cycles as simple cycle from 0 to 128. We could 
> inline everything and avoid hashp->hash(keyPtr, hashp->keysize) call. I'm not 
> insisting on special hash though, just an idea.

I tried really hard to not fall into this rabbit h [hack hack
hack], OK, here's a first attempt to use simplehash, Andres's
steampunk macro-based robinhood template that we're already using for
several other things, and murmurhash which is inlineable and
branch-free.  I had to tweak it to support "in-place" creation and
fixed size (in other words, no allocators, for use in shared memory).
Then I was annoyed that I had to add a "status" member to our struct,
so I tried to fix that.  Definitely needs more work to think about
failure modes when running out of memory, how much spare space you
need, etc.

I have not experimented with this much beyond hacking until the tests
pass, but it *should* be more efficient...

> 3. pageno in SlruMappingTableEntry seems to be unused.

It's the key (dynahash uses the first N bytes of your struct as the
key, but in this new simplehash version it's more explicit).
From 5f5d4ed8ae2808766ac1fd48f68602ef530e3833 Mon Sep 17 00:00:00 2001
From: Andrey Borodin 
Date: Mon, 15 Feb 2021 21:51:56 +0500
Subject: [PATCH v13 1/5] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Author: Andrey M. Borodin 
Reviewed-by: Anastasia Lubennikova 
Reviewed-by: Tomas Vondra 
Reviewed-by: Alexander Korotkov 
Reviewed-by: Gilles Darold 
Reviewed-by: Thomas Munro 
Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86%40yandex-team.ru
---
 doc/src/sgml/config.sgml  | 137 ++
 src/backend/access/transam/clog.c |   6 +
 src/backend/access/transam/commit_ts.c|   5 +-
 src/backend/access/transam/multixact.c|   8 +-
 src/backend/access/transam/subtrans.c |   5 +-
 src/backend/commands/async.c  |   8 +-
 src/backend/storage/lmgr/predicate.c  |   4 +-
 src/backend/utils/init/globals.c  |   8 +
 src/backend/utils/misc/guc.c  |  77 ++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h|   4 -
 src/include/access/subtrans.h |   3 -
 src/include/commands/async.h  |   5 -
 src/include/miscadmin.h   |   7 +
 src/include/storage/predicate.h   |   4 -
 15 files changed, 261 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ddc6d789d8..f1112bfa9c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1886,6 +1886,143 @@ include_dir 'conf.d'

   
  
+ 
+
+  multixact_offsets_buffers (integer)
+  
+   multixact_offsets_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to used to cache the contents
+of pg_multixact/offsets (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 8.
+This parameter can only be set at server start.
+   
+  
+ 
+
+
+  multixact_members_buffers (integer)
+  
+   multixact_members_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to used to cache the contents
+of pg_multixact/members (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 16.
+This parameter can only be set at server start.
+   
+  
+ 
+ 
+
+  subtrans_buffers (integer)
+  
+   subtrans_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to used to cache the contents
+of pg_subtrans (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 8.
+This parameter can only be set at server start.
+   
+  
+ 
+ 
+
+  notify_buffers (integer)
+  
+   notify_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to used to cache the contents
+of pg_notify (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+ 

Re: SQL/JSON: JSON_TABLE

2021-03-26 Thread Andrew Dunstan

On 3/25/21 8:10 AM, David Steele wrote:
> On 1/20/21 8:42 PM, Nikita Glukhov wrote:
>> Thank you for review.
>>
>> Attached 45th version of the patches. "SQL/JSON functions" patch
>> corresponds to
>> v52 patch set posted in the separate thread.
>
> Another rebase needed (http://cfbot.cputube.org/patch_32_2902.log),
> marked Waiting on Author.
>
> I can see that Álvaro suggested that the patch be split up so it can
> be reviewed and committed in pieces. It looks like you've done that to
> some extent, but I wonder if more can be done. In particular, it looks
> like that first patch could be broken up -- at lot.
>
>

I've rebased this. Note that the large first patch is just the
accumulated patches from the 'SQL/JSON functions' thread, and should be
reviewed there. Only patches 2 thru 4 should be reviewed here. In fact
there are no changes at all in those patches from the previous set other
than a little patch fuzz. The only substantial changes are in patch 1,
which had bitrotted. However, I'm posting a new series to keep the
numbering in sync.


If the cfbot is happy I will set back to 'Needs review'


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com



0001-SQL-JSON-functions-v46.patch.gz
Description: application/gzip


0002-JSON_TABLE-v46.patch.gz
Description: application/gzip


0003-JSON_TABLE-PLAN-DEFAULT-clause-v46.patch.gz
Description: application/gzip


0004-JSON_TABLE-PLAN-clause-v46.patch.gz
Description: application/gzip


Re: SQL/JSON: JSON_TABLE

2021-03-26 Thread Erik Rijkers
> On 2021.03.26. 21:28 Andrew Dunstan  wrote:
> On 3/25/21 8:10 AM, David Steele wrote:
> > On 1/20/21 8:42 PM, Nikita Glukhov wrote:
> >> Thank you for review.
> >>
> >> Attached 45th version of the patches. "SQL/JSON functions" patch
> >> corresponds to
> >> v52 patch set posted in the separate thread.
> >
> > Another rebase needed (http://cfbot.cputube.org/patch_32_2902.log),
> > marked Waiting on Author.
> >
> > I can see that Álvaro suggested that the patch be split up so it can
> > be reviewed and committed in pieces. It looks like you've done that to
> > some extent, but I wonder if more can be done. In particular, it looks
> > like that first patch could be broken up -- at lot.
> >
> >
> 
> I've rebased this. Note that the large first patch is just the
> accumulated patches from the 'SQL/JSON functions' thread, and should be
> reviewed there. Only patches 2 thru 4 should be reviewed here. In fact
> there are no changes at all in those patches from the previous set other
> than a little patch fuzz. The only substantial changes are in patch 1,
> which had bitrotted. However, I'm posting a new series to keep the
> numbering in sync.
> 
> If the cfbot is happy I will set back to 'Needs review'

> 0001-SQL-JSON-functions-v46.patch
> 0002-JSON_TABLE-v46.patch
> 0003-JSON_TABLE-PLAN-DEFAULT-clause-v46.patch
> 0004-JSON_TABLE-PLAN-clause-v46.patch


Hi,

The four v46 patches apply fine, but on compile I get (debian/gcc):

make --quiet -j 4
make[3]: *** No rule to make target 'parse_jsontable.o', needed by 
'objfiles.txt'.  Stop.
make[3]: *** Waiting for unfinished jobs
make[2]: *** [parser-recursive] Error 2
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2
common.mk:39: recipe for target 'parser-recursive' failed
Makefile:42: recipe for target 'all-backend-recurse' failed
GNUmakefile:11: recipe for target 'all-src-recurse' failed


Erik




Re: SQL/JSON: functions

2021-03-26 Thread Andrew Dunstan


On 3/26/21 4:22 PM, Andrew Dunstan wrote:
> On 3/8/21 1:55 PM, Ibrar Ahmed wrote:
>>
>> On Sat, Jan 23, 2021 at 3:37 PM Erik Rijkers > > wrote:
>>
>> On 2021-01-20 03:49, Nikita Glukhov wrote:
>>
>> > [0001-Add-common-SQL-JSON-clauses-v52.patch.gz]
>> > [0002-SQL-JSON-constructors-v52.patch.gz]
>> > [0003-IS-JSON-predicate-v52.patch.gz]
>> > [0004-SQL-JSON-query-functions-v52.patch.gz]
>> > [0005-SQL-JSON-functions-for-json-type-v52.patch.gz]
>> > [0006-GUC-sql_json-v52.patch.gz]
>>
>> Hi,
>>
>> I read through the file func.sgml (only that file) and put the
>> errors/peculiarities in the attached diff.  (Small stuff; typos
>> really)
>>
>>
>> Your patch includes a CREATE TABLE my_films + INSERT, to run the
>> examples against.  I think this is a great idea and we should do
>> it more
>> often.
>>
>> But, the table has a text-column to contain the subsequently inserted
>> json values. The insert runs fine but it turns out that some later
>> examples queries only run against a jsonb column.  So I propose to
>> change:
>>    CREATE TABLE my_films (js text);
>> to:
>>    CREATE TABLE my_films (js jsonb);
>>
>> This change is not yet included in the attached file.  An alternative
>> would be to cast the text-column in the example queries as js::jsonb
>>
>>
>> I also noticed that some errors were different in the sgml file
>> than 'in
>> the event':
>>
>>
>>     SELECT JSON_QUERY(js, '$.favorites[*].kind' ERROR ON ERROR) FROM
>> my_films_jsonb;
>>     (table 'my_films_jsonb' is the same as your 'my_films', but
>> with js
>> as a jsonb column)
>>
>> manual says: "ERROR: more than one SQL/JSON item"
>>   in reality: "ERROR: JSON path expression in JSON_QUERY should
>> return
>> singleton item without wrapper"
>>          and:   "HINT: use WITH WRAPPER clause to wrap SQL/JSON item
>> sequence into array"
>>
>>
>> Thanks,
>>
>> Erik Rijkers
>>
>> >
>> > --
>> > Nikita Glukhov
>> > Postgres Professional: http://www.postgrespro.com
>> 
>> > The Russian Postgres Company
>>
>>
>> The patch (func.sgml.20210123.diff) does not apply successfully.
>>
>> http://cfbot.cputube.org/patch_32_2901.log
>> 
>>
>> 
>> === Applying patches on top of PostgreSQL commit ID 
>> 0ce4cd04da558178b0186057b721c50a00b7a945 ===
>> === applying patch ./func.sgml.20210123.diff
>> patching file doc/src/sgml/func.sgml
>> Hunk #1 FAILED at 16968.
>> Hunk #2 FAILED at 17034.
>> ...
>> Hunk #19 FAILED at 18743.
>> 19 out of 19 hunks FAILED -- saving rejects to file 
>> doc/src/sgml/func.sgml.rej
>> 
>>
>> Can we get a rebase? 
>>
>> I am marking the patch "Waiting on Author".
>
>
>
> I've rebased this, and applied some of Erik's changes.
>
>
> I'll set it back to 'Needs Review' if the cfbot is happy.


It's not. There are errors  when building with llvm. I'll investigate.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: SQL/JSON: JSON_TABLE

2021-03-26 Thread Andrew Dunstan


On 3/26/21 4:48 PM, Erik Rijkers wrote:
>> On 2021.03.26. 21:28 Andrew Dunstan  wrote:
>> On 3/25/21 8:10 AM, David Steele wrote:
>>> On 1/20/21 8:42 PM, Nikita Glukhov wrote:
 Thank you for review.

 Attached 45th version of the patches. "SQL/JSON functions" patch
 corresponds to
 v52 patch set posted in the separate thread.
>>> Another rebase needed (http://cfbot.cputube.org/patch_32_2902.log),
>>> marked Waiting on Author.
>>>
>>> I can see that Álvaro suggested that the patch be split up so it can
>>> be reviewed and committed in pieces. It looks like you've done that to
>>> some extent, but I wonder if more can be done. In particular, it looks
>>> like that first patch could be broken up -- at lot.
>>>
>>>
>> I've rebased this. Note that the large first patch is just the
>> accumulated patches from the 'SQL/JSON functions' thread, and should be
>> reviewed there. Only patches 2 thru 4 should be reviewed here. In fact
>> there are no changes at all in those patches from the previous set other
>> than a little patch fuzz. The only substantial changes are in patch 1,
>> which had bitrotted. However, I'm posting a new series to keep the
>> numbering in sync.
>>
>> If the cfbot is happy I will set back to 'Needs review'
>> 0001-SQL-JSON-functions-v46.patch
>> 0002-JSON_TABLE-v46.patch
>> 0003-JSON_TABLE-PLAN-DEFAULT-clause-v46.patch
>> 0004-JSON_TABLE-PLAN-clause-v46.patch
>
> Hi,
>
> The four v46 patches apply fine, but on compile I get (debian/gcc):
>
> make --quiet -j 4
> make[3]: *** No rule to make target 'parse_jsontable.o', needed by 
> 'objfiles.txt'.  Stop.
> make[3]: *** Waiting for unfinished jobs
> make[2]: *** [parser-recursive] Error 2
> make[2]: *** Waiting for unfinished jobs
> make[1]: *** [all-backend-recurse] Error 2
> make: *** [all-src-recurse] Error 2
> common.mk:39: recipe for target 'parser-recursive' failed
> Makefile:42: recipe for target 'all-backend-recurse' failed
> GNUmakefile:11: recipe for target 'all-src-recurse' failed
>


Yeah, I messed up :-( Forgot to git-add some files.


will repost soon.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Support for NSS as a libpq TLS backend

2021-03-26 Thread Jacob Champion
On Fri, 2021-03-26 at 15:33 -0400, Stephen Frost wrote:
> * Jacob Champion (pchamp...@vmware.com) wrote:
> > Databases that are opened *after* the first one are given their own
> > separate slots. [...]
> 
> This is more-or-less what we would want though, right..?  If a user asks
> for a connection with ssl_database=blah and sslcert=whatever, we'd want
> to open database 'blah' and search (just) that database for cert
> 'whatever'.  We could possibly offer other options in the future but
> certainly this would work and be the most straight-forward and expected
> behavior.

Yes, but see below.

> > If you SECMOD_OpenUserDB() a database that is identical to the first
> > (internal) database, NSS deduplicates for you and just returns the
> > internal slot. Which seems like it's helpful, except you're not
> > allowed to close that database, and you have to know not to close it
> > by checking to see whether that slot is the "internal key slot". It
> > appears to remain open until NSS is shut down entirely.
> 
> Seems like we shouldn't do that and should just use SECMOD_OpenUserDB()
> for opening databases.

We don't have control over whether or not this happens. If the
application embedding libpq has already loaded the database into the
internal slot via its own NSS initialization, then when we call
SECMOD_OpenUserDB() for that same database, the internal slot will be
returned and we have to handle it accordingly.

It's not a huge amount of work, but it is magic knowledge that has to
be maintained, especially in the absence of specialized clientside
tests.

> > But if you open a database that is *not* the magic internal database,
> > and then open a duplicate of that one, NSS creates yet another new slot
> > for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource
> > hog, depending on the global state of the process at the time libpq
> > opens its first connection. We won't be able to control what the parent
> > application will do before loading us up.
> 
> I would think we'd want to avoid re-opening the same database multiple
> times, to avoid the duplicate slots and such.  If the application code
> does it themselves, well, there's not much we can do about that, but we
> could at least avoid doing so in *our* code.  I wouldn't expect us to be
> opening hundreds of databases either and so keeping a simple list around
> of what we've opened and scanning it seems like it'd be workable.  Of
> course, this could likely be improved in the future but I would think
> that'd be good for an initial implementation.
> 
> [...]
> 
> > It also doesn't look like any of the SECMOD_* machinery that we're
> > looking at is thread-safe, but I'd really like to be wrong...
> 
> That's unfortuante but solvable by using our own locks, similar
> to what's done in fe-secure-openssl.c.

Yeah. I was hoping to avoid implementing our own locks and refcounts,
but it seems like it's going to be required.

--Jacob


Re: Support for NSS as a libpq TLS backend

2021-03-26 Thread Stephen Frost
Greetings,

* Jacob Champion (pchamp...@vmware.com) wrote:
> On Fri, 2021-03-26 at 15:33 -0400, Stephen Frost wrote:
> > * Jacob Champion (pchamp...@vmware.com) wrote:
> > > Databases that are opened *after* the first one are given their own
> > > separate slots. [...]
> > 
> > This is more-or-less what we would want though, right..?  If a user asks
> > for a connection with ssl_database=blah and sslcert=whatever, we'd want
> > to open database 'blah' and search (just) that database for cert
> > 'whatever'.  We could possibly offer other options in the future but
> > certainly this would work and be the most straight-forward and expected
> > behavior.
> 
> Yes, but see below.
> 
> > > If you SECMOD_OpenUserDB() a database that is identical to the first
> > > (internal) database, NSS deduplicates for you and just returns the
> > > internal slot. Which seems like it's helpful, except you're not
> > > allowed to close that database, and you have to know not to close it
> > > by checking to see whether that slot is the "internal key slot". It
> > > appears to remain open until NSS is shut down entirely.
> > 
> > Seems like we shouldn't do that and should just use SECMOD_OpenUserDB()
> > for opening databases.
> 
> We don't have control over whether or not this happens. If the
> application embedding libpq has already loaded the database into the
> internal slot via its own NSS initialization, then when we call
> SECMOD_OpenUserDB() for that same database, the internal slot will be
> returned and we have to handle it accordingly.
> 
> It's not a huge amount of work, but it is magic knowledge that has to
> be maintained, especially in the absence of specialized clientside
> tests.

Ah..  yeah, fair enough.  We could document that we discourage
applications from doing so, but I agree that we'll need to deal with it
since it could happen.

> > > But if you open a database that is *not* the magic internal database,
> > > and then open a duplicate of that one, NSS creates yet another new slot
> > > for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource
> > > hog, depending on the global state of the process at the time libpq
> > > opens its first connection. We won't be able to control what the parent
> > > application will do before loading us up.
> > 
> > I would think we'd want to avoid re-opening the same database multiple
> > times, to avoid the duplicate slots and such.  If the application code
> > does it themselves, well, there's not much we can do about that, but we
> > could at least avoid doing so in *our* code.  I wouldn't expect us to be
> > opening hundreds of databases either and so keeping a simple list around
> > of what we've opened and scanning it seems like it'd be workable.  Of
> > course, this could likely be improved in the future but I would think
> > that'd be good for an initial implementation.
> > 
> > [...]
> > 
> > > It also doesn't look like any of the SECMOD_* machinery that we're
> > > looking at is thread-safe, but I'd really like to be wrong...
> > 
> > That's unfortuante but solvable by using our own locks, similar
> > to what's done in fe-secure-openssl.c.
> 
> Yeah. I was hoping to avoid implementing our own locks and refcounts,
> but it seems like it's going to be required.

Yeah, afraid so.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: SQL/JSON: functions

2021-03-26 Thread Andrew Dunstan


On 3/26/21 4:49 PM, Andrew Dunstan wrote:
> On 3/26/21 4:22 PM, Andrew Dunstan wrote:
>> On 3/8/21 1:55 PM, Ibrar Ahmed wrote:
>>> On Sat, Jan 23, 2021 at 3:37 PM Erik Rijkers >> > wrote:
>>>
>>> On 2021-01-20 03:49, Nikita Glukhov wrote:
>>>
>>> > [0001-Add-common-SQL-JSON-clauses-v52.patch.gz]
>>> > [0002-SQL-JSON-constructors-v52.patch.gz]
>>> > [0003-IS-JSON-predicate-v52.patch.gz]
>>> > [0004-SQL-JSON-query-functions-v52.patch.gz]
>>> > [0005-SQL-JSON-functions-for-json-type-v52.patch.gz]
>>> > [0006-GUC-sql_json-v52.patch.gz]
>>>
>>> Hi,
>>>
>>> I read through the file func.sgml (only that file) and put the
>>> errors/peculiarities in the attached diff.  (Small stuff; typos
>>> really)
>>>
>>>
>>> Your patch includes a CREATE TABLE my_films + INSERT, to run the
>>> examples against.  I think this is a great idea and we should do
>>> it more
>>> often.
>>>
>>> But, the table has a text-column to contain the subsequently inserted
>>> json values. The insert runs fine but it turns out that some later
>>> examples queries only run against a jsonb column.  So I propose to
>>> change:
>>>    CREATE TABLE my_films (js text);
>>> to:
>>>    CREATE TABLE my_films (js jsonb);
>>>
>>> This change is not yet included in the attached file.  An alternative
>>> would be to cast the text-column in the example queries as js::jsonb
>>>
>>>
>>> I also noticed that some errors were different in the sgml file
>>> than 'in
>>> the event':
>>>
>>>
>>>     SELECT JSON_QUERY(js, '$.favorites[*].kind' ERROR ON ERROR) FROM
>>> my_films_jsonb;
>>>     (table 'my_films_jsonb' is the same as your 'my_films', but
>>> with js
>>> as a jsonb column)
>>>
>>> manual says: "ERROR: more than one SQL/JSON item"
>>>   in reality: "ERROR: JSON path expression in JSON_QUERY should
>>> return
>>> singleton item without wrapper"
>>>          and:   "HINT: use WITH WRAPPER clause to wrap SQL/JSON item
>>> sequence into array"
>>>
>>>
>>> Thanks,
>>>
>>> Erik Rijkers
>>>
>>> >
>>> > --
>>> > Nikita Glukhov
>>> > Postgres Professional: http://www.postgrespro.com
>>> 
>>> > The Russian Postgres Company
>>>
>>>
>>> The patch (func.sgml.20210123.diff) does not apply successfully.
>>>
>>> http://cfbot.cputube.org/patch_32_2901.log
>>> 
>>>
>>> 
>>> === Applying patches on top of PostgreSQL commit ID 
>>> 0ce4cd04da558178b0186057b721c50a00b7a945 ===
>>> === applying patch ./func.sgml.20210123.diff
>>> patching file doc/src/sgml/func.sgml
>>> Hunk #1 FAILED at 16968.
>>> Hunk #2 FAILED at 17034.
>>> ...
>>> Hunk #19 FAILED at 18743.
>>> 19 out of 19 hunks FAILED -- saving rejects to file 
>>> doc/src/sgml/func.sgml.rej
>>> 
>>>
>>> Can we get a rebase? 
>>>
>>> I am marking the patch "Waiting on Author".
>>
>>
>> I've rebased this, and applied some of Erik's changes.
>>
>>
>> I'll set it back to 'Needs Review' if the cfbot is happy.
>
> It's not. There are errors  when building with llvm. I'll investigate.



Specifically, patch 4 (SQL-JSON-query-functions) fails with this when
built with LLVM:


cache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -Wno-format-truncation
-Wno-stringop-truncation -g -O2  -fPIC -D__STDC_LIMIT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_CONSTANT_MACROS -D_GNU_SOURCE
-I/usr/include  -I../../../../src/include
-I/home/andrew/pgl/pg_head/src/include  -D_GNU_SOURCE   -c -o
llvmjit_expr.o /home/andrew/pgl/pg_head/src/backend/jit/llvm/llvmjit_expr.c
In file included from /home/andrew/pgl/pg_head/src/include/postgres.h:46,
 from
/home/andrew/pgl/pg_head/src/backend/jit/llvm/llvmjit_expr.c:16:
/home/andrew/pgl/pg_head/src/backend/jit/llvm/llvmjit_expr.c: In
function ‘llvm_compile_expr’:
/home/andrew/pgl/pg_head/src/backend/jit/llvm/llvmjit_expr.c:2348:30:
warning: initialization of ‘struct LLVMOpaqueValue *’ from incompatible
pointer type ‘ExprEvalStep *’ {aka ‘struct ExprEvalStep *’}
[-Wincompatible-pointer-types]
 2348 | v_state, v_econtext, op);
  |  ^~
/home/andrew/pgl/pg_head/src/include/c.h:734:34: note: in definition of
macro ‘lengthof’
  734 | #define lengthof(array) (sizeof (array) / sizeof ((array)[0]))
  |  ^
/home/andrew/pgl/pg_head/src/backend/jit/llvm/llvmjit_expr.c:2347:5:
note: in expansion of macro ‘build_EvalXFunc’
 2347 | build_EvalXFunc(b, mod, "ExecEvalJson",
  | ^~~
/home/andrew/pgl/pg_head/src/backend/jit/llvm/llvmjit_expr.c:2348:30:
note: (near initialization for ‘(anonymous)[0]’)
 2348 | v_state, v_e

Re: Proposal: Save user's original authenticated identity for logging

2021-03-26 Thread Jacob Champion
On Fri, 2021-03-26 at 09:12 +0900, Michael Paquier wrote:
> Does it really matter to have the full contents of the file from the
> previous tests though?

For a few of the bugs I was tracking down, it was imperative. The tests
aren't isolated enough (or at all) to keep one from affecting the
others. And if the test is written incorrectly, or becomes incorrect
due to implementation changes, then the log files are really the only
way to debug after a false positive -- with truncation, the bad test
succeeds incorrectly and then swallows the evidence. :)

> Could you make the comments in those various areas more
> explicit about the difference and that it is intentional to register
> the auth ID before checking the user map?  Anybody reading this code
> in the future may get confused with the differences in handling all
> that according to the auth type involved if that's not clearly
> stated.

I took a stab at this in v12, attached.

--Jacob
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 0647d7cc32..d31fff744c 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -1228,7 +1228,11 @@ pg_GSS_checkauth(Port *port)
 
/*
 * Copy the original name of the authenticated principal into our 
backend
-* memory for display later. This is also our authenticated identity.
+* memory for display later.
+*
+* This is also our authenticated identity. Set it now, rather than 
waiting
+* for check_usermap below, because authentication has already 
succeeded and
+* we want the log file to reflect that.
 */
port->gss->princ = MemoryContextStrdup(TopMemoryContext, gbuf.value);
set_authn_id(port, gbuf.value);
@@ -1573,7 +1577,9 @@ pg_SSPI_recvauth(Port *port)
 
/*
 * We have all of the information necessary to construct the 
authenticated
-* identity.
+* identity. Set it now, rather than waiting for check_usermap below,
+* because authentication has already succeeded and we want the log 
file to
+* reflect that.
 */
if (port->hba->compat_realm)
{
@@ -1977,7 +1983,11 @@ ident_inet_done:
 
if (ident_return)
{
-   /* Success! Store the identity and check the usermap */
+   /*
+* Success! Store the identity and check the usermap. (Setting 
the
+* authenticated identity is done before checking the usermap, 
because
+* at this point, authentication has succeeded.)
+*/
set_authn_id(port, ident_user);
return check_usermap(port->hba->usermap, port->user_name, 
ident_user, false);
}
@@ -2036,8 +2046,9 @@ auth_peer(hbaPort *port)
}
 
/*
-* Make a copy of static getpw*() result area. This is our authenticated
-* identity.
+* Make a copy of static getpw*() result area; this is our authenticated
+* identity. Set it before calling check_usermap, because 
authentication has
+* already succeeded and we want the log file to reflect that.
 */
set_authn_id(port, pw->pw_name);
 
@@ -2901,7 +2912,9 @@ CheckCertAuth(Port *port)
if (port->hba->auth_method == uaCert)
{
/*
-* The client's Subject DN is our authenticated identity.
+* The client's Subject DN is our authenticated identity. Set 
it now,
+* rather than waiting for check_usermap below, because 
authentication
+* has already succeeded and we want the log file to reflect 
that.
 */
if (!port->peer_dn)
{
From c21a83e71d2829accf004f5b8437a6eb115b0860 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 8 Feb 2021 10:53:20 -0800
Subject: [PATCH v12 1/2] ssl: store client's DN in port->peer_dn

Original patch by Andrew Dunstan:

https://www.postgresql.org/message-id/fd96ae76-a8e3-ef8e-a642-a592f5b76771%40dunslane.net

but I've taken out the clientname=DN functionality; all that will be
needed for the next patch is the DN string.
---
 src/backend/libpq/be-secure-openssl.c | 53 +++
 src/include/libpq/libpq-be.h  |  1 +
 2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 5ce3f27855..18321703da 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -551,22 +551,25 @@ aloop:
 	/* Get client certificate, if available. */
 	port->peer = SSL_get_peer_certificate(port->ssl);
 
-	/* and extract the Common Name from it. */
+	/* and extract the Common Name / Distinguished Name from it. */
 	port->peer_cn = NULL;
+	port->peer_dn = NULL;
 	port->peer_cert_valid = false;
 	if (port->peer != NULL)
 	{
 		int			len;
+		X509_NAME  *x509name = X509_get_subject_name(port->peer);
+		char	   *pee

Re: PoC/WIP: Extended statistics on expressions

2021-03-26 Thread Tomas Vondra
On 3/26/21 1:54 PM, Tomas Vondra wrote:
> 
> 
> On 3/26/21 12:37 PM, Dean Rasheed wrote:
>> On Thu, 25 Mar 2021 at 19:59, Tomas Vondra
>>  wrote:
>>>
>>> Attached is an updated patch series, with all the changes discussed
>>> here. I've cleaned up the ndistinct stuff a bit more (essentially
>>> reverting back from GroupExprInfo to GroupVarInfo name), and got rid of
>>> the UpdateStatisticsForTypeChange.
>>>
>>
>> I've looked over all that and I think it's in pretty good shape. I
>> particularly like how much simpler the ndistinct code has now become.
>>
>> Some (hopefully final) review comments:
>>
>> ...
>>
> 
> Thanks! I'll fix these, and then will consider getting it committed
> sometime later today, once the buildfarm does some testing on the other
> stuff I already committed.
> 

OK, pushed after a bit more polishing and testing. I've noticed one more
missing piece in describe (expressions missing in \dX), so I fixed that.

May the buildfarm be merciful ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: libpq debug log

2021-03-26 Thread alvhe...@alvh.no-ip.org
Proposed changes on top of v29.

-- 
Álvaro Herrera   Valdivia, Chile
>From b32ae3805bb28553c0a1cf308c6ed27f58576f3c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 26 Mar 2021 19:12:12 -0300
Subject: [PATCH 1/5] libpq_pipeline: add -t support for PQtrace

---
 .../modules/libpq_pipeline/libpq_pipeline.c   | 84 +--
 1 file changed, 59 insertions(+), 25 deletions(-)

diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c
index 846ee9f5ab..34d085035b 100644
--- a/src/test/modules/libpq_pipeline/libpq_pipeline.c
+++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c
@@ -23,6 +23,7 @@
 #include "catalog/pg_type_d.h"
 #include "common/fe_memutils.h"
 #include "libpq-fe.h"
+#include "pg_getopt.h"
 #include "portability/instr_time.h"
 
 
@@ -30,6 +31,9 @@ static void exit_nicely(PGconn *conn);
 
 const char *const progname = "libpq_pipeline";
 
+/* Options and defaults */
+char	   *tracefile = NULL;	/* path to PQtrace() file */
+
 
 #define DEBUG
 #ifdef DEBUG
@@ -1209,8 +1213,10 @@ usage(const char *progname)
 {
 	fprintf(stderr, "%s tests libpq's pipeline mode.\n\n", progname);
 	fprintf(stderr, "Usage:\n");
-	fprintf(stderr, "  %s tests", progname);
-	fprintf(stderr, "  %s testname [conninfo [number_of_rows]]\n", progname);
+	fprintf(stderr, "  %s [OPTION] tests\n", progname);
+	fprintf(stderr, "  %s [OPTION] TESTNAME [CONNINFO [NUMBER_OF_ROWS]\n", progname);
+	fprintf(stderr, "\nOptions:\n");
+	fprintf(stderr, "  -t TRACEFILE   generate a libpq trace to TRACEFILE\n");
 }
 
 static void
@@ -1231,37 +1237,54 @@ main(int argc, char **argv)
 {
 	const char *conninfo = "";
 	PGconn	   *conn;
+	FILE	   *trace;
+	char	   *testname;
 	int			numrows = 1;
 	PGresult   *res;
+	int			c;
 
-	if (strcmp(argv[1], "tests") == 0)
+	while ((c = getopt(argc, argv, "t:")) != -1)
 	{
-		print_test_list();
-		exit(0);
+		switch (c)
+		{
+			case 't':			/* trace file */
+tracefile = pg_strdup(optarg);
+break;
+		}
 	}
 
-	/*
-	 * The testname parameter is mandatory; it can be followed by a conninfo
-	 * string and number of rows.
-	 */
-	if (argc < 2 || argc > 4)
+	if (optind < argc)
+	{
+		testname = argv[optind];
+		optind++;
+	}
+	else
 	{
 		usage(argv[0]);
 		exit(1);
 	}
 
-	if (argc >= 3)
-		conninfo = pg_strdup(argv[2]);
+	if (strcmp(testname, "tests") == 0)
+	{
+		print_test_list();
+		exit(0);
+	}
 
-	if (argc >= 4)
+	if (optind < argc)
+	{
+		conninfo = argv[optind];
+		optind++;
+	}
+	if (optind < argc)
 	{
 		errno = 0;
-		numrows = strtol(argv[3], NULL, 10);
+		numrows = strtol(argv[optind], NULL, 10);
 		if (errno != 0 || numrows <= 0)
 		{
-			fprintf(stderr, "couldn't parse \"%s\" as a positive integer\n", argv[3]);
+			fprintf(stderr, "couldn't parse \"%s\" as a positive integer\n", argv[optind]);
 			exit(1);
 		}
+		optind++;
 	}
 
 	/* Make a connection to the database */
@@ -1272,30 +1295,41 @@ main(int argc, char **argv)
 PQerrorMessage(conn));
 		exit_nicely(conn);
 	}
+
+	/* Set the trace file, if requested */
+	if (tracefile != NULL)
+	{
+		trace = fopen(tracefile, "w+");
+
+		if (trace == NULL)
+			pg_fatal("could not open file \"%s\": %m", tracefile);
+		PQtrace(conn, trace);
+		PQtraceSetFlags(conn, PQTRACE_SUPPRESS_TIMESTAMPS);
+	}
+
 	res = PQexec(conn, "SET lc_messages TO \"C\"");
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 		pg_fatal("failed to set lc_messages: %s", PQerrorMessage(conn));
 
-	if (strcmp(argv[1], "disallowed_in_pipeline") == 0)
+	if (strcmp(testname, "disallowed_in_pipeline") == 0)
 		test_disallowed_in_pipeline(conn);
-	else if (strcmp(argv[1], "multi_pipelines") == 0)
+	else if (strcmp(testname, "multi_pipelines") == 0)
 		test_multi_pipelines(conn);
-	else if (strcmp(argv[1], "pipeline_abort") == 0)
+	else if (strcmp(testname, "pipeline_abort") == 0)
 		test_pipeline_abort(conn);
-	else if (strcmp(argv[1], "pipelined_insert") == 0)
+	else if (strcmp(testname, "pipelined_insert") == 0)
 		test_pipelined_insert(conn, numrows);
-	else if (strcmp(argv[1], "prepared") == 0)
+	else if (strcmp(testname, "prepared") == 0)
 		test_prepared(conn);
-	else if (strcmp(argv[1], "simple_pipeline") == 0)
+	else if (strcmp(testname, "simple_pipeline") == 0)
 		test_simple_pipeline(conn);
-	else if (strcmp(argv[1], "singlerow") == 0)
+	else if (strcmp(testname, "singlerow") == 0)
 		test_singlerowmode(conn);
-	else if (strcmp(argv[1], "transaction") == 0)
+	else if (strcmp(testname, "transaction") == 0)
 		test_transaction(conn);
 	else
 	{
-		fprintf(stderr, "\"%s\" is not a recognized test name\n", argv[1]);
-		usage(argv[0]);
+		fprintf(stderr, "\"%s\" is not a recognized test name\n", testname);
 		exit(1);
 	}
 
-- 
2.20.1

>From 4b62c3159fe3aa8445317a5d65b7e81d91c7fba6 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 26 Mar 2021 19:13:04 -0300
Subject: [PATCH 2/5] put FILE * arg always first

---
 src/interfaces/libpq/libpq-trace.c | 234 ++---
 1 fil

Showing applied extended statistics in explain

2021-03-26 Thread Tomas Vondra
Hi,

With extended statistics it may not be immediately obvious if they were
applied and to which clauses. If you have multiple extended statistics,
we may also apply them in different order, etc. And with expressions,
there's also the question of matching expressions to the statistics.

So it seems useful to include this into in the explain plan - show which
statistics were applied, in which order. Attached is an early PoC patch
doing that in VERBOSE mode. I'll add it to the next CF.


A simple example demonstrating the idea:

==

  create table t (a int, b int);
  insert into t select mod(i,10), mod(i,10)
from generate_series(1,10) s(i);

  create statistics s on a, b from t;
  analyze t;

test=# explain (verbose) select * from t where a = 1 and b = 1;
  QUERY PLAN
---
 Seq Scan on public.t  (cost=0.00..1943.00 rows=10040 width=8)
   Output: a, b
   Filter: ((t.a = 1) AND (t.b = 1))
   Statistics: public.s  Clauses: ((a = 1) AND (b = 1))
(4 rows)

test=# explain (verbose) select 1 from t group by a, b;
  QUERY PLAN
--
 HashAggregate  (cost=1943.00..1943.10 rows=10 width=12)
   Output: 1, a, b
   Group Key: t.a, t.b
   ->  Seq Scan on public.t  (cost=0.00..1443.00 rows=10 width=8)
 Output: a, b
 Statistics: public.s  Clauses: (a AND b)
(6 rows)

==

The current implementation is a bit ugly PoC, with a couple annoying
issues that need to be solved:

1) The information is stashed in multiple lists added to a Plan. Maybe
there's a better place, and maybe we need to invent a better way to
track the info (a new node stashed in a single List).

2) The deparsing is modeled (i.e. copied) from how we deal with index
quals, but it's having issues with nested OR clauses, because there are
nested RestrictInfo nodes and the deparsing does not expect that.

3) It does not work for functional dependencies, because we effectively
"merge" all functional dependencies and apply the entries. Not sure how
to display this, but I think it should show the individual dependencies
actually applied.

4) The info is collected always, but I guess we should do that only when
in explain mode. Not sure how expensive it is.

5) It includes just statistics name + clauses, but maybe we should
include additional info (e.g estimate for that combination of clauses).

6) The clauses in the grouping query are transformed to AND list, which
is wrong. This is easy to fix, I was lazy to do that in a PoC patch.

7) It does not show statistics for individual expressions. I suppose
examine_variable could add it to the rel somehow, and maybe we could do
that with index expressions too?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 4629d1d9b1fc5f6c3bc93e0544b0c022345086c9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 18 Mar 2021 15:09:24 +0100
Subject: [PATCH] show stats in explain

---
 src/backend/commands/explain.c| 97 +++
 src/backend/nodes/copyfuncs.c |  4 +
 src/backend/nodes/makefuncs.c | 11 +++
 src/backend/nodes/outfuncs.c  |  4 +
 src/backend/nodes/readfuncs.c |  4 +
 src/backend/optimizer/plan/createplan.c   | 15 
 src/backend/optimizer/util/relnode.c  | 12 +++
 src/backend/optimizer/util/restrictinfo.c | 35 
 src/backend/statistics/extended_stats.c   |  5 ++
 src/backend/utils/adt/selfuncs.c  | 11 +++
 src/backend/utils/cache/lsyscache.c   | 49 
 src/include/nodes/makefuncs.h |  2 +
 src/include/nodes/pathnodes.h |  5 ++
 src/include/nodes/plannodes.h |  5 ++
 src/include/optimizer/restrictinfo.h  |  2 +
 src/include/utils/lsyscache.h |  3 +
 16 files changed, 264 insertions(+)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..7a4520f151 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -77,6 +77,9 @@ static void show_qual(List *qual, const char *qlabel,
 static void show_scan_qual(List *qual, const char *qlabel,
 		   PlanState *planstate, List *ancestors,
 		   ExplainState *es);
+static void show_scan_stats(List *stats, List *clauses, List *ors,
+			PlanState *planstate, List *ancestors,
+			ExplainState *es);
 static void show_upper_qual(List *qual, const char *qlabel,
 			PlanState *planstate, List *ancestors,
 			ExplainState *es);
@@ -1720,6 +1723,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			if (plan->qual)
 show_instrumentation_count("Rows Removed by Filter", 1,
 		   planstate, es);
+			if (es->verbose)
+show_scan_stats

Re: PoC/WIP: Extended statistics on expressions

2021-03-26 Thread Tomas Vondra



On 3/27/21 1:17 AM, Tomas Vondra wrote:
> On 3/26/21 1:54 PM, Tomas Vondra wrote:
>>
>>
>> On 3/26/21 12:37 PM, Dean Rasheed wrote:
>>> On Thu, 25 Mar 2021 at 19:59, Tomas Vondra
>>>  wrote:

 Attached is an updated patch series, with all the changes discussed
 here. I've cleaned up the ndistinct stuff a bit more (essentially
 reverting back from GroupExprInfo to GroupVarInfo name), and got rid of
 the UpdateStatisticsForTypeChange.

>>>
>>> I've looked over all that and I think it's in pretty good shape. I
>>> particularly like how much simpler the ndistinct code has now become.
>>>
>>> Some (hopefully final) review comments:
>>>
>>> ...
>>>
>>
>> Thanks! I'll fix these, and then will consider getting it committed
>> sometime later today, once the buildfarm does some testing on the other
>> stuff I already committed.
>>
> 
> OK, pushed after a bit more polishing and testing. I've noticed one more
> missing piece in describe (expressions missing in \dX), so I fixed that.
> 
> May the buildfarm be merciful ...
> 

LOL! It failed on *my* buildfarm machine, because apparently some of the
expressions used in stats_ext depend on locale and the machine is using
cs_CZ.UTF-8. Will fix later ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: MultiXact\SLRU buffers configuration

2021-03-26 Thread Andrey Borodin



> 27 марта 2021 г., в 01:26, Thomas Munro  написал(а):
> 
> On Sat, Mar 27, 2021 at 4:52 AM Andrey Borodin  wrote:
>> Some thoughts on HashTable patch:
>> 1. Can we allocate bigger hashtable to reduce probability of collisions?
> 
> Yeah, good idea, might require some study.
In a long run we always have this table filled with nslots. But the keys will 
be usually consecutive numbers (current working set of CLOG\Multis\etc). So in 
a happy hashing scenario collisions will only appear for some random backward 
jumps. I think just size = nslots * 2 will produce results which cannot be 
improved significantly.
And this reflects original growth strategy SH_GROW(tb, tb->size * 2).

>> 2. Can we use specialised hashtable for this case? I'm afraid hash_search() 
>> does comparable number of CPU cycles as simple cycle from 0 to 128. We could 
>> inline everything and avoid hashp->hash(keyPtr, hashp->keysize) call. I'm 
>> not insisting on special hash though, just an idea.
> 
> I tried really hard to not fall into this rabbit h [hack hack
> hack], OK, here's a first attempt to use simplehash,

> Andres's
> steampunk macro-based robinhood template
Sounds magnificent.

> that we're already using for
> several other things
I could not find much tests to be sure that we do not break something...

> , and murmurhash which is inlineable and
> branch-free.
I think pageno is a hash already. Why hash any further? And pages accessed 
together will have smaller access time due to colocation.

>  I had to tweak it to support "in-place" creation and
> fixed size (in other words, no allocators, for use in shared memory).
We really need to have a test to know what happens when this structure goes out 
of memory, as you mentioned below. What would be apropriate place for 
simplehash tests?

> Then I was annoyed that I had to add a "status" member to our struct,
> so I tried to fix that.
Indeed, sizeof(SlruMappingTableEntry) == 9 seems strange. Will simplehash align 
it well?


Thanks!

Best regards, Andrey Borodin.



Re: UniqueKey on Partitioned table.

2021-03-26 Thread Andy Fan
On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:
> >
> > The attached is a UnqiueKey with EquivalenceClass patch, I just complete
> the
> > single relation part and may have bugs. I just attached it here for
> design
> > review only. and the not-null-attrs is just v1 which we can continue
> > discussing on the original thread[2].
>
> Thanks for the patch. After a short look through it I'm a bit confused
> and wanted to clarify, now uniquekeys list could contain both Expr and
> EquivalenceClass?
>

Yes,  That's because I don't want to create a new EquivalenceClass (which
would make the PlannerInfo->eq_classes longer) if we don't have
one , then I just used one Expr instead for this case.  However during the
test, I found some EquivalenceClass with only 1 EquivalenceMember
unexpectedly.

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


Re: Extend more usecase for planning time partition pruning and init partition pruning.

2021-03-26 Thread Andy Fan
Hi David:

On Mon, Mar 8, 2021 at 9:34 AM David Rowley  wrote:

> On Thu, 4 Mar 2021 at 22:07, Amit Langote  wrote:
> > * Or maybe have you considered generalizing what
> > build_implied_pruning_quals() does so that other places like
> > indxpath.c can use the facility?
>
> I agree with doing it another way.  There's plenty of other queries
> which we could produce a better plan for if EquivalenceClass knew
> about things like IN conditions and >=, >, < and <= btree ops.
>
> It seems wrong to code anything in this regard that's specific to
> partition pruning.
>
> Please see [1] for an idea. IIRC, the implementation was not well
> received and there were concerns about having to evaluate additional
> needless quals. That part I think can be coded around. The trick will
> be to know when and when not to use additional quals.
>
> The show stopper for me was having a more efficient way to find if a
> given Expr exists in an EquivalenceClass. This is why I didn't take
> the idea further, at the time. My implementation in that patch
> required lots of looping to find if a given Expr had an existing
> EquivalenceMember, to which there was a danger of that becoming slow
> for complex queries.
>
> I'm unsure right now if it would be possible to build standard
> EquivalenceMembers and EquivalenceFilters in the same pass.  I think
> it might require 2 passes since you only can use IN and range type
> quals for Exprs that actually have a EquivalenceMember. So you need to
> wait until you're certain there's some equality OpExpr before adding
> EquivalenceFilters. (Pass 1 can perhaps remember if anything looks
> interesting and then skip pass 2 if there's not...??)
>
> EquivalenceClass might be slightly faster now since we have
> RelOptInfo.eclass_indexes. However, I've not checked to see if the
> indexes will be ready in time for when you'd be building the
> additional filters. I'm guessing that they wouldn't be since you'd
> still be building the EquivalenceClasses at that time.  Certainly,
> process_equivalence() could do much faster lookups of Exprs if there
> was some global index for all EquivalenceMembers. However,
> equalfuncs.c only gives us true or false if two nodes are equal().
> We'd need to either have a -1, 0, +1 value or be able to hash nodes
> and put things into a hash table. Else we're stuck trawling through
> lists comparing each item 1-by-1. That's pretty slow. Especially with
> complex queries.
>
> Both Andres and I have previously suggested ways to improve Node
> searching.  My idea is likely easier to implement, as it just changed
> equalfuncs.c to add a function that returns -1, 0, +1 so we could use
> a binary search tree to index Nodes. Andres' idea [2] is likely the
> better of the two. Please have a look at that. It'll allow us to
> easily build a function to hash nodes and put them in a hash table.
>
> To get [1], the implementation will need to be pretty smart. There's
> concern about the idea. See [3]. You'll need to ensure you're not
> adding too much planner overhead and also not slowing down execution
> for cases by adding additional qual evals that are redundant.
>
> It's going to take some effort to make everyone happy here.
>

I truly understand what you are saying here, and believe that needs some
more hard work to do.   I am not sure I am prepared to do that at current
stage.  So I will give up this idea now and continue to work with this
when time is permitted.  I have marked the commitfest entry as "Returned
with
Feedback".   Thanks for the detailed information!


> David
>
> [1]
> https://www.postgresql.org/message-id/flat/CAKJS1f9fPdLKM6%3DSUZAGwucH3otbsPk6k0YT8-A1HgjFapL-zQ%40mail.gmail.com#024ad18e19bb9b6c022fb572edc8c992
> [2]
> https://www.postgresql.org/message-id/flat/20190828234136.fk2ndqtld3onfrrp%40alap3.anarazel.de
> [3]
> https://www.postgresql.org/message-id/flat/30810.1449335...@sss.pgh.pa.us#906319f5e212fc3a6a682f16da079f04
>


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


Re: [PATCH] add concurrent_abort callback for output plugin

2021-03-26 Thread Amit Kapila
On Fri, Mar 26, 2021 at 5:50 PM Markus Wanner
 wrote:
>
> On 26.03.21 11:19, Amit Kapila wrote:
> > No, I am not assuming that. I am just trying to describe you that it
> > is not necessary that we will be able to detect concurrent abort in
> > each and every case.
>
> Sure.  Nor am I claiming that would be necessary or that the patch
> changed anything about it.
>
> As it stands, assuming the the output plugin basically just forwards the
> events and the subscriber tries to replicate them as is, the following
> would happen on the subscriber for a concurrently aborted two-phase
> transaction:
>
>   * start a transaction (begin_prepare_cb)
>   * apply changes for it (change_cb)
>   * digress to other, unrelated transactions (leaving unspecified what
> exactly happens to the opened transaction)
>   * attempt to rollback a transaction that has not ever been prepared
> (rollback_prepared_cb)
>
> The point of the patch is for the output plugin to get proper
> transaction entry and exit callbacks.  Even in the unfortunate case of a
> concurrent abort.  It offers the output plugin a clean way to learn that
> the decoder stopped decoding for the current transaction and it won't
> possibly see a prepare_cb for it (despite the decoder having passed the
> PREPARE record in WAL).
>
> > The other related thing is it may not be a good idea to finish the
> > transaction
>
> You're speaking subscriber side here.  And yes, I agree, the subscriber
> should not abort the transaction at a concurrent_abort.  I never claimed
> it should.
>
> If you are curious, in our case I made the subscriber PREPARE the
> transaction at its end when receiving a concurrent_abort notification,
> so that the subscriber:
>
>   * can hop out of that started transaction and safely proceed
> to process events for other transactions, and
>   * has the transaction in the appropriate state for processing the
> subsequent rollback_prepared_cb, once that gets through
>
> That's probably not ideal in the sense that subscribers do unnecessary
> work.
>

Isn't it better to send prepare from the publisher in such a case so
that subscribers can know about it when rollback prepared arrives? I
think we have already done the same (sent prepare, exactly to handle
the case you have described above) for *streamed* transactions.

-- 
With Regards,
Amit Kapila.