date:20201124

Migration Oracle multitenant database to PostgreSQL ?

2020-11-24 Thread ROS Didier

Hi
I would like to know if it is possible to migrate Oracle multitenant 
database (with multiple PDB) to PostgreSQL ?

Thanks in advance

Best Regards

D ROS
EDF



Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.


This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.

Re: Migration Oracle multitenant database to PostgreSQL ?

2020-11-24 Thread Thomas Kellerer



ROS Didier schrieb am 24.11.2020 um 09:09:
> I would like to know if it is possible to migrate Oracle multitenant
> database (with multiple PDB) to PostgreSQL ?
Postgres' databases are very similar to Oracle's PDBs.

Probably the biggest difference is, that you can't shutdown
a single database as you can do with a PDB.

Database users in Postgres are like Oracle's "common users", they are
global for the whole instance (aka "cluster" in Postgres' terms). There
are no database specific users.

Thomas

回复: vac_update_datfrozenxid will raise "wrong tuple length" if pg_database tuple contains toast attribute.

2020-11-24 Thread Junfeng Yang

Hi hackers,

Can anyone help to verify this?

RE: POC: postgres_fdw insert batching

2020-11-24 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> 1) We're calling it "batch_size" but the API function is named
> postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
> to postgresGetModifyBatchSize()? That has the advantage it'd work if we
> ever add support for batching to UPDATE/DELETE.

Actually, I was in two minds whether the term batch or bulk is better.  Because 
Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK COLLECT INTO 
array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch 
updates" and its API method names (addBatch, executeBatch).

But it seems better or common to use batch according to the etymology and the 
following Stack Overflow page:

https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch

OTOH, as for the name GetModifyBatchSize() you suggest, I think 
GetInsertBatchSize may be better.  That is, this API deals with multiple 
records in a single INSERT statement.  Your GetModifyBatchSize will be reserved 
for statement batching when libpq has supported batch/pipelining to execute 
multiple INSERT/UPDATE/DELETE statements, as in the following JDBC batch 
updates.  What do you think?

CODE EXAMPLE 14-1 Creating and executing a batch of insert statements 
--
Statement stmt = con.createStatement(); 
stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')"); 
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')"); 
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)"); 

// submit a batch of update commands for execution 
int[] updateCounts = stmt.executeBatch(); 
--


> 2) Do we have to lookup the batch_size in create_foreign_modify (in
> server/table options)? I'd have expected to look it up while planning
> the modify and then pass it through the list, just like the other
> FdwModifyPrivateIndex stuff. But maybe that's not possible.

Don't worry, create_foreign_modify() is called from PlanForeignModify() during 
planning.  Unfortunately, it's also called from BeginForeignInsert(), but other 
stuff passed to create_foreign_modify() including the query string is 
constructed there.


> 3) That reminds me - should we show the batching info on EXPLAIN? That
> seems like a fairly interesting thing to show to the user. Perhaps
> showing the average batch size would also be useful? Or maybe not, we
> create the batches as large as possible, with the last one smaller.

Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change 
dynamically based on the planning or system state unlike shared buffers and 
parallel workers.  OTOH, I sometimes want to see what configuration parameter 
values the user set, such as work_mem, enable_*, and shared_buffers, together 
with the query plan (EXPLAIN and auto_explain).  For example, it'd be nice if 
EXPLAIN (parameters on) could do that.  Some relevant FDW-related parameters 
could be included in that output.

> 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
> over for every tuple. I don't know it that has measurable impact, but it
> seems a bit excessive IMO. I don't think we should support the batch
> size changing during execution (seems tricky).

Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value 
that was already saved in a struct in create_foreign_modify().


Regards
Takayuki Tsunakawa

Re: Deduplicate aggregates and transition functions in planner

2020-11-24 Thread Heikki Linnakangas


On 19/11/2020 12:38, Heikki Linnakangas wrote:

So barring objections, I'm going to push the attached updated patch that
includes the removal of AggrefExprState, and leave CookedAggrefs or
other further refactorings for the future.


Done. Thanks!

- Heikki

Re: Implementing Incremental View Maintenance

2020-11-24 Thread Yugo NAGATA

On Wed, 11 Nov 2020 19:10:35 +0300
Konstantin Knizhnik  wrote:

Thank you for reviewing this patch!

> 
> The patch is not applied to the current master because makeFuncCall 
> prototype is changed,
> I fixed it by adding COAERCE_CALL_EXPLICIT.

The rebased patch was submitted.

> Ooops! Now TPS are much lower:
> 
> tps = 141.767347 (including connections establishing)
> 
> Speed of updates is reduced more than 70 times!
> Looks like we loose parallelism because almost the same result I get 
> with just one connection.

As you and Ishii-san mentioned in other posts, I think the reason would be a
table lock on the materialized view that is acquired during view maintenance.
I will explain more a bit in another post.

> 4. Finally let's create one more view (it is reasonable to expect that 
> analytics will run many different queries and so need multiple views).
> 
> create incremental materialized view teller_avgs as select 
> t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on 
> a.bid=t.bid group by t.tid;
> 
> It is great that not only simple aggregates like SUM are supported, but 
> also AVG.
> But insertion speed now is reduced twice - 72TPS.

Yes, the current implementation takes twice time for updating a table time
when a new incrementally maintainable materialized view is defined on the
table because view maintenance is performed for each view.

> 
> So good news is that incremental materialized views really work.
> And bad news is that maintenance overhead is too large which 
> significantly restrict applicability of this approach.
> Certainly in case of dominated read-only workload such materialized 
> views can significantly improve performance.
> But unfortunately my dream that them allow to combine OLAP+OLPT is not 
> currently realized.

As you concluded, there is a large overhead on updating base tables in the
current implementation because it is immediate maintenance in which the view
is updated in the same sentence where its base table is modified. Therefore,
this is not suitable to OLTP workload where there are frequent updates of
tables. 

For suppressing maintenance overhead in such workload, we have to implement
"deferred maintenance" which collects table change logs and updates the view
in another transaction afterward.

Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: Implementing Incremental View Maintenance

2020-11-24 Thread Yugo NAGATA

On Thu, 12 Nov 2020 15:37:42 +0300
Konstantin Knizhnik  wrote:

> Well, creation of proper indexes for table is certainly responsibility 
> of DBA.
> But users may not consider materialized view as normal table. So the 
> idea that index should
> be explicitly created for materialized view seems to be not so obvious.
>  From the other side, implementation of materialized view knows which 
> index is needed for performing efficient incremental update.
> I wonder if it can create such index itself implicitly or at least 
> produce notice with proposal to create such index.

That makes sense. Especially for aggregate views, it is obvious that
creating an index on expressions used in GROUP BY is effective. For
other views, creating an index on columns that come from primary keys
of base tables would be effective if any.

However, if any base table doesn't have a primary or unique key or such
key column is not contained in the view's target list, it is hard to
decide an appropriate index on the view. We can create an index on all
columns in the target list, but it could cause overhead on view maintenance. 
So, just producing notice would be better for such cases. 

> I looked throw your patch for exclusive table locks and found this 
> fragment in matview.c:
> 
>      /*
>       * Wait for concurrent transactions which update this materialized 
> view at
>       * READ COMMITED. This is needed to see changes committed in other
>       * transactions. No wait and raise an error at REPEATABLE READ or
>       * SERIALIZABLE to prevent update anomalies of matviews.
>       * XXX: dead-lock is possible here.
>       */
>      if (!IsolationUsesXactSnapshot())
>          LockRelationOid(matviewOid, ExclusiveLock);
>      else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))
> 
> 
> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
> It is still about 7 times slower than performance without incremental view.
> But now the gap is not so dramatic. And it seems to be clear that this 
> exclusive lock on matview is real show stopper for concurrent updates.
> I do not know which race conditions and anomalies we can get if replace 
> table-level lock with row-level lock here.

I explained it here:
https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp

For example, suppose there is a view V = R*S that joins tables R and S,
and there are two concurrent transactions T1 which changes table R to R'
and T2 which changes S to S'. Without any lock,  in READ COMMITTED mode,
V would be updated to R'*S in T1, and R*S' in T2, so it would cause
inconsistency.  By locking the view V, transactions T1, T2 are processed
serially and this inconsistency can be avoided.

Especially, suppose that tuple dR is inserted into R in T1, and dS is
inserted into S in T2, where dR and dS will be joined in according to
the view definition. In this situation, without any lock, the change of V is
computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
be included in the results.  This inconsistency could not be resolved by
row-level lock.

> But I think that this problem should be addressed in any case: single 
> client update mode is very rare scenario.

This behavior is explained in rules.sgml like this:

+
+Concurrent Transactions
+
+Suppose an IMMV is defined on two base tables and each
+table was modified in different a concurrent transaction simultaneously.
+In the transaction which was committed first, IMMV can 
+be updated considering only the change which happened in this transaction.
+On the other hand, in order to update the view correctly in the transaction
+which was committed later, we need to know the changes occurred in
+both transactions.  For this reason, ExclusiveLock
+is held on an IMMV immediately after a base table is
+modified in READ COMMITTED mode to make sure that
+the IMMV is updated in the latter transaction after
+the former transaction is committed.  In REPEATABLE READ
+or SERIALIZABLE mode, an error is raised immediately
+if lock acquisition fails because any changes which occurred in
+other transactions are not be visible in these modes and 
+IMMV cannot be updated correctly in such situations.
+
+

Hoever, should we describe explicitly its impact on performance here?

> I attached to this mail profile of pgbench workload with defined 
> incremental view (with index).
> May be you will find it useful.

Thank you for your profiling! Hmm, it shows that overhead of executing
query for calculating the delta (refresh_mateview_datfill) and applying
the delta (SPI_exec) is large I will investigate if more optimizations
to reduce the overhead is possible.

> 
> One more disappointing observation of materialized views (now 
> non-incremental).
> Time of creation of non-incremental materialized view is about 18 seconds:
> 
> postgres=# create materialized view teller_avgs as

Re: Implementing Incremental View Maintenance

2020-11-24 Thread Konstantin Knizhnik

On 24.11.2020 12:21, Yugo NAGATA wrote:

I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
It is still about 7 times slower than performance without incremental view.
But now the gap is not so dramatic. And it seems to be clear that this
exclusive lock on matview is real show stopper for concurrent updates.
I do not know which race conditions and anomalies we can get if replace
table-level lock with row-level lock here.

I explained it here:
https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp

For example, suppose there is a view V = R*S that joins tables R and S,

and there are two concurrent transactions T1 which changes table R to R'
and T2 which changes S to S'. Without any lock, in READ COMMITTED mode,
V would be updated to R'*S in T1, and R*S' in T2, so it would cause
inconsistency. By locking the view V, transactions T1, T2 are processed
serially and this inconsistency can be avoided.

Especially, suppose that tuple dR is inserted into R in T1, and dS is
inserted into S in T2, where dR and dS will be joined in according to
the view definition. In this situation, without any lock, the change of V is
computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
be included in the results. This inconsistency could not be resolved by
row-level lock.

But I think that this problem should be addressed in any case: single
client update mode is very rare scenario.

This behavior is explained in rules.sgml like this:

+
+Concurrent Transactions
+
+Suppose an IMMV is defined on two base tables and each
+table was modified in different a concurrent transaction simultaneously.
+In the transaction which was committed first, IMMV can
+be updated considering only the change which happened in this transaction.
+On the other hand, in order to update the view correctly in the transaction
+which was committed later, we need to know the changes occurred in
+both transactions. For this reason, ExclusiveLock
+is held on an IMMV immediately after a base table is
+modified in READ COMMITTED mode to make sure that
+the IMMV is updated in the latter transaction after
+the former transaction is committed. In REPEATABLE READ
+or SERIALIZABLE mode, an error is raised immediately
+if lock acquisition fails because any changes which occurred in
+other transactions are not be visible in these modes and
+IMMV cannot be updated correctly in such situations.
+
+

Hoever, should we describe explicitly its impact on performance here?

Sorry, I didn't think much about this problem.
But I think that it is very important to try to find some solution of
the problem.
The most obvious optimization is not to use exclusive table lock if view
depends just on one table (contains no joins).

Looks like there are no any anomalies in this case, are there?

Yes, most analytic queries contain joins (just two queries among 22
TPC-H have no joins).

So may be this optimization will not help much.

I wonder if it is possible to somehow use predicate locking mechanism of
Postgres to avoid this anomalies without global lock?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] logical decoding of two-phase transactions

2020-11-24 Thread Ajin Cherian

On Mon, Nov 23, 2020 at 10:35 PM Amit Kapila  wrote:
> For the first two, as the xact is still not visible to others so we
> don't need to make it behave like a committed txn. To make the (DDL)
> changes visible to the current txn, the message
> REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID copies the snapshot which
> fills the subxip array. This will be sufficient to make the changes
> visible to the current txn. For the third, I have checked the code
> that whenever we have any change message the base snapshot gets set
> via SnapBuildProcessChange. It is possible that I have missed
> something but I don't want to call SnapbuildCommittedTxn in
> DecodePrepare unless we have a clear reason for the same so leaving it
> for now. Can you or someone see any reason for the same?

I reviewed and tested this and like you said, SnapBuildProcessChange
sets the base snapshot for every change.
I did various tests using DDL updates and haven't seen any issues so
far. I agree with your analysis.

regards,
Ajin

Re: psql: add \si, \sm, \st and \sr functions to show CREATE commands for indexes, matviews, triggers and tables

2020-11-24 Thread Anastasia Lubennikova

On 18.08.2020 17:25, Tom Lane wrote:

a.pervush...@postgrespro.ru writes:

[ si_st_sm_sr_v2.patch ]

I hadn't particularly noticed this thread before, but I happened to
look through this patch, and I've got to say that this proposed feature
seems like an absolute disaster from a maintenance standpoint. There
will be no value in an \st command that is only 90% accurate; the produced
DDL has to be 100% correct. This means that, if we accept this feature,
psql will have to know everything pg_dump knows about how to construct the
DDL describing tables, indexes, views, etc. That is a lot of code, and
it's messy, and it changes nontrivially on a very regular basis. I can't
accept that we want another copy in psql --- especially one that looks
nothing like what pg_dump has.

There've been repeated discussions about somehow extracting pg_dump's
knowledge into a library that would also be available to other client
programs (see e.g. the concurrent thread at [1]). That's quite a tall
order, which is why it's not happened yet. But I think we really need
to have something like that before we can accept this feature for psql.

BTW, as an example of why this is far more difficult than it might
seem at first glance, this patch doesn't even begin to meet the
expectation stated at the top of describe.c:

* Support for the various \d ("describe") commands. Note that the current
* expectation is that all functions in this file will succeed when working
* with servers of versions 7.4 and up. It's okay to omit irrelevant
* information for an old server, but not to fail outright.

It might be okay for this to cut off at 8.0 or so, as I think pg_dump
does, but not to just fail on older servers.

Another angle, which I'm not even sure how we want to think about it, is
security. It will not do for "\et" to allow some attacker to replace
function calls appearing in the table's CHECK constraints, for instance.
So this means you've got to be very aware of CVE-2018-1058-style attacks.
Our answer to that for pg_dump has partially depended on restricting the
search_path used at both dump and restore time ... but I don't think \et
gets to override the search path that the psql user is using. I'm not
sure what that means in practice but it certainly requires some thought
before we add the feature, not after.

Anyway, I can see the attraction of having psql commands like these,
but "write a bunch of new code that we'll have to maintain" does not
seem like a desirable way to get them.

regards, tom lane

[1]
https://www.postgresql.org/message-id/flat/9df8a3d3-13d2-116d-26ab-6a273c1ed38c%402ndquadrant.com

Since there has been no activity on this thread since before the CF and
no response from the author I have marked this "returned with feedback".

Alexandra, feel free to resubmit it to the next commitfest, when you
have time to address the issues raised in the review.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Implementing Incremental View Maintenance

2020-11-24 Thread Yugo NAGATA

On Tue, 24 Nov 2020 12:46:57 +0300
Konstantin Knizhnik  wrote:

> 
> 
> On 24.11.2020 12:21, Yugo NAGATA wrote:
> >
> >> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 
> >> connections.
> >> It is still about 7 times slower than performance without incremental view.
> >> But now the gap is not so dramatic. And it seems to be clear that this
> >> exclusive lock on matview is real show stopper for concurrent updates.
> >> I do not know which race conditions and anomalies we can get if replace
> >> table-level lock with row-level lock here.
> > I explained it here:
> > https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp
> >   
> > For example, suppose there is a view V = R*S that joins tables R and S,
> > and there are two concurrent transactions T1 which changes table R to R'
> > and T2 which changes S to S'. Without any lock,  in READ COMMITTED mode,
> > V would be updated to R'*S in T1, and R*S' in T2, so it would cause
> > inconsistency.  By locking the view V, transactions T1, T2 are processed
> > serially and this inconsistency can be avoided.
> >
> > Especially, suppose that tuple dR is inserted into R in T1, and dS is
> > inserted into S in T2, where dR and dS will be joined in according to
> > the view definition. In this situation, without any lock, the change of V is
> > computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
> > be included in the results.  This inconsistency could not be resolved by
> > row-level lock.
> >
> >> But I think that this problem should be addressed in any case: single
> >> client update mode is very rare scenario.
> > This behavior is explained in rules.sgml like this:
> >
> > +
> > +Concurrent Transactions
> > +
> > +Suppose an IMMV is defined on two base tables and 
> > each
> > +table was modified in different a concurrent transaction 
> > simultaneously.
> > +In the transaction which was committed first, IMMV 
> > can
> > +be updated considering only the change which happened in this 
> > transaction.
> > +On the other hand, in order to update the view correctly in the 
> > transaction
> > +which was committed later, we need to know the changes occurred in
> > +both transactions.  For this reason, ExclusiveLock
> > +is held on an IMMV immediately after a base table is
> > +modified in READ COMMITTED mode to make sure that
> > +the IMMV is updated in the latter transaction after
> > +the former transaction is committed.  In REPEATABLE 
> > READ
> > +or SERIALIZABLE mode, an error is raised immediately
> > +if lock acquisition fails because any changes which occurred in
> > +other transactions are not be visible in these modes and
> > +IMMV cannot be updated correctly in such situations.
> > +
> > +
> >
> > Hoever, should we describe explicitly its impact on performance here?
> >   
> 
> Sorry, I didn't think much about this problem.
> But I think that it is very important to try to find some solution of 
> the problem.
> The most obvious optimization is not to use exclusive table lock if view 
> depends just on one table (contains no joins).
> Looks like there are no any anomalies in this case, are there?

Thank you for your suggestion! That makes sense.
 
> Yes, most analytic queries contain joins (just two queries among 22 
> TPC-H  have no joins).
> So may be this optimization will not help much.

Yes, but if a user want to incrementally maintain only aggregate views on a 
large
table, like TPC-H Q1, it will be helpful. For this optimization, we have to only
check the number of RTE in the rtable list and it would be cheap.

> I wonder if it is possible to somehow use predicate locking mechanism of 
> Postgres to avoid this anomalies without global lock?

You mean that, ,instead of using any table lock, if any possibility of the
anomaly is detected using predlock mechanism then abort the transaction?

I don't have concrete idea to implement it and know if it is possible yet,
but I think it is worth to consider this. Thanks.


Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: LogwrtResult contended spinlock

2020-11-24 Thread Anastasia Lubennikova


On 04.09.2020 20:13, Andres Freund wrote:

Hi,

On 2020-09-04 10:05:45 -0700, Andres Freund wrote:

On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:

Looking at patterns like this

if (XLogCtl->LogwrtRqst.Write < EndPos)
XLogCtl->LogwrtRqst.Write = EndPos;

It seems possible to implement with

 do {
XLogRecPtr  currwrite;

 currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
if (currwrite > EndPos)
 break;  // already done by somebody else
 if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
   currwrite, EndPos))
 break;  // successfully updated
 } while (true);

This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
seem good material for a general routine.

This *seems* correct to me, though this is muddy territory to me.  Also,
are there better ways to go about this?

Hm, I was thinking that we'd first go for reading it without a spinlock,
but continuing to write it as we currently do.

But yea, I can't see an issue with what you propose here. I personally
find do {} while () weird and avoid it when not explicitly useful, but
that's extremely minor, obviously.

Re general routine: On second thought, it might actually be worth having
it. Even just for LSNs - there's plenty places where it's useful to
ensure a variable is at least a certain size.  I think I would be in
favor of a general helper function.

Do you mean by general helper function something like this?

void
swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)
{
  while (true) {
    XLogRecPtr  currwrite;

    currwrite = pg_atomic_read_u64(old_value);

    if (to_largest)
  if (currwrite > new_value)
    break;  /* already done by somebody else */
    else
  if (currwrite < new_value)
    break;  /* already done by somebody else */

    if (pg_atomic_compare_exchange_u64(old_value,
   currwrite, new_value))
    break;  /* already done by somebody else */
  }
}


which will be called like
swap_lsn(XLogCtl->LogwrtRqst.Write, EndPos, true);



Greetings,

Andres Freund


This CF entry was inactive for a while. Alvaro, are you going to 
continue working on it?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2

2020-11-24 Thread Michael Paquier

On Sat, Nov 21, 2020 at 10:19:42AM +0900, Michael Paquier wrote:
> What you meant and what I meant was slightly different here.  I meant
> publishing a header in src/include/common/ that would get installed,
> and I'd rather avoid that.  And you mean to have the header for local
> consumption in src/common/.  I would be fine with your third option as
> well.  Your suggestion is more consistent with what we do for the rest
> of src/common/ and libpq actually.  So I don't mind switching to
> that.

I got to look at your suggestion, and finished with the attached which
is pretty close my previous set, except that MSVC scripts as well as
the header includes needed a slight refresh.

Please note that the OpenSSL docs tell that EVP_DigestInit() is
obsolete and that applications should just use EVP_DigestInit_ex(), so
I have kept the original:
https://www.openssl.org/docs/man1.1.1/man3/EVP_DigestInit.html

The PG_CRYPTOHASH macro in cryptohash.h has been changed as you
suggested.  What do you think?
--
Michael
From b4ec42146bfe8c9580c31d32a619e7712c519486 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 24 Nov 2020 19:36:13 +0900
Subject: [PATCH v5 1/3] Rework SHA2 and crypto hash APIs

This will make easier a switch to EVP for the OpenSSL SHA2 layer.  Note
that the layer introduced here is generalized for the purpose of a
future integration with HMAC, MD5, and even more.
---
 src/include/common/checksum_helper.h  |  13 +-
 src/include/common/cryptohash.h   |  40 
 src/include/common/scram-common.h |  17 +-
 src/include/common/sha2.h |  89 +---
 src/include/replication/backup_manifest.h |   3 +-
 src/backend/libpq/auth-scram.c|  94 +
 src/backend/replication/backup_manifest.c |  25 ++-
 src/backend/replication/basebackup.c  |  24 ++-
 src/backend/utils/adt/cryptohashes.c  |  53 +++--
 src/common/Makefile   |   6 +-
 src/common/checksum_helper.c  |  79 +--
 src/common/cryptohash.c   | 189 +
 src/common/cryptohash_openssl.c   | 196 ++
 src/common/scram-common.c | 165 ++-
 src/common/sha2.c |  23 +-
 .../common/sha2.h => common/sha2_int.h}   |  38 +---
 src/common/sha2_openssl.c | 102 -
 src/bin/pg_verifybackup/parse_manifest.c  |  15 +-
 src/bin/pg_verifybackup/pg_verifybackup.c |  24 ++-
 src/interfaces/libpq/fe-auth-scram.c  | 114 +-
 contrib/pgcrypto/internal-sha2.c  | 188 -
 src/tools/msvc/Mkvcbuild.pm   |   3 +-
 src/tools/pgindent/typedefs.list  |   1 +
 23 files changed, 928 insertions(+), 573 deletions(-)
 create mode 100644 src/include/common/cryptohash.h
 create mode 100644 src/common/cryptohash.c
 create mode 100644 src/common/cryptohash_openssl.c
 copy src/{include/common/sha2.h => common/sha2_int.h} (73%)
 delete mode 100644 src/common/sha2_openssl.c

diff --git a/src/include/common/checksum_helper.h b/src/include/common/checksum_helper.h
index 48b0745dad..b07a34e7e4 100644
--- a/src/include/common/checksum_helper.h
+++ b/src/include/common/checksum_helper.h
@@ -14,6 +14,7 @@
 #ifndef CHECKSUM_HELPER_H
 #define CHECKSUM_HELPER_H
 
+#include "common/cryptohash.h"
 #include "common/sha2.h"
 #include "port/pg_crc32c.h"
 
@@ -41,10 +42,10 @@ typedef enum pg_checksum_type
 typedef union pg_checksum_raw_context
 {
 	pg_crc32c	c_crc32c;
-	pg_sha224_ctx c_sha224;
-	pg_sha256_ctx c_sha256;
-	pg_sha384_ctx c_sha384;
-	pg_sha512_ctx c_sha512;
+	pg_cryptohash_ctx *c_sha224;
+	pg_cryptohash_ctx *c_sha256;
+	pg_cryptohash_ctx *c_sha384;
+	pg_cryptohash_ctx *c_sha512;
 } pg_checksum_raw_context;
 
 /*
@@ -66,8 +67,8 @@ typedef struct pg_checksum_context
 extern bool pg_checksum_parse_type(char *name, pg_checksum_type *);
 extern char *pg_checksum_type_name(pg_checksum_type);
 
-extern void pg_checksum_init(pg_checksum_context *, pg_checksum_type);
-extern void pg_checksum_update(pg_checksum_context *, const uint8 *input,
+extern int	pg_checksum_init(pg_checksum_context *, pg_checksum_type);
+extern int	pg_checksum_update(pg_checksum_context *, const uint8 *input,
 			   size_t len);
 extern int	pg_checksum_final(pg_checksum_context *, uint8 *output);
 
diff --git a/src/include/common/cryptohash.h b/src/include/common/cryptohash.h
new file mode 100644
index 00..0e4a6631a3
--- /dev/null
+++ b/src/include/common/cryptohash.h
@@ -0,0 +1,40 @@
+/*-
+ *
+ * cryptohash.h
+ *	  Generic headers for cryptographic hash functions.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		  src/include/common/cryptohash.h
+ *
+ *-

RE: Parallel Inserts in CREATE TABLE AS

2020-11-24 Thread Hou, Zhijie

Hi,

I'm very interested in this feature,
and I'm looking at the patch, here are some comments.

1.
+   if (!TupIsNull(outerTupleSlot))
+   {
+   (void) 
node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+   node->ps.state->es_processed++;
+   }
+
+   if(TupIsNull(outerTupleSlot))
+   break;
+   }

How about the following style:

if(TupIsNull(outerTupleSlot))
Break;

(void) node->ps.dest->receiveSlot(outerTupleSlot, 
node->ps.dest);
node->ps.state->es_processed++;

Which looks cleaner.


2.
+
+   if (into != NULL &&
+   IsA(into, IntoClause))
+   {

The check can be replaced by ISCTAS(into).


3.
+   /*
+* For parallelizing inserts in CTAS i.e. making each
+* parallel worker inerst it's tuples, we must send
+* information such as intoclause(for each worker

'inerst' looks like a typo (insert).


4.
+   /* Estimate space for into clause for CTAS. */
+   if (ISCTAS(planstate->intoclause))
+   {
+   intoclausestr = nodeToString(planstate->intoclause);
+   shm_toc_estimate_chunk(&pcxt->estimator, strlen(intoclausestr) 
+ 1);
+   shm_toc_estimate_keys(&pcxt->estimator, 1);
+   }
...
+   if (intoclausestr != NULL)
+   {
+   char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+   
strlen(intoclausestr) + 1);
+   strcpy(shmptr, intoclausestr);
+   shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+   }

The code here call strlen(intoclausestr) for two times,
After checking the existing code in ExecInitParallelPlan,
It used to store the strlen in a variable.

So how about the following style:

intoclause_len = strlen(intoclausestr);
...
/* Store serialized intoclause. */
intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1);
memcpy(shmptr, intoclausestr, intoclause_len + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);

the code in ExecInitParallelPlan 


5.
+   if (intoclausestr != NULL)
+   {
+   char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+   
strlen(intoclausestr) + 1);
+   strcpy(shmptr, intoclausestr);
+   shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+   }
+
/* Set up the tuple queues that the workers will write into. */
-   pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+   if (intoclausestr == NULL)
+   pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);

The two check about intoclausestr seems can be combined like:

if (intoclausestr != NULL)
{
...
}
else
{
...
}

Best regards,
houzj

Remove cache_plan argument comments to ri_PlanCheck

2020-11-24 Thread Li Japin

Hi, hackers

I found that the cache_plan argument to ri_PlanCheck already been remove since
5b7ba75f7ff854003231e8099e3038c7e2eba875.   I think we can remove the comments
tor cache_plan to ri_PlanCheck.

diff --git a/src/backend/utils/adt/ri_triggers.c 
b/src/backend/utils/adt/ri_triggers.c
index 7e2b2e3dd6..02b1a3868f 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2130,9 +2130,6 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, 
uint32 hashvalue)

 /*
  * Prepare execution plan for a query to enforce an RI restriction
- *
- * If cache_plan is true, the plan is saved into our plan hashtable
- * so that we don't need to plan it again.
  */
 static SPIPlanPtr
 ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,

--
Best regards
Japin Li
ChengDu WenWu Information Technology Co.,Ltd.



cache_plan-to-ri_PlanCheck.diff
Description: cache_plan-to-ri_PlanCheck.diff

Re: [HACKERS] Custom compression methods

2020-11-24 Thread Dilip Kumar

On Sat, Nov 21, 2020 at 3:50 AM Robert Haas  wrote:

Most of the comments looks fine to me but I have a slightly different
opinion for one of the comment so replying only for that.

> I'm worried about how expensive this might be, and I think we could
> make it cheaper. The reason why I think this might be expensive is:
> currently, for every datum, you have a single direct function call.
> Now, with this, you first have a direct function call to
> GetCompressionOidFromCompressionId(). Then you have a call to
> GetCompressionRoutine(), which does a syscache lookup and calls a
> handler function, which is quite a lot more expensive than a single
> function call. And the handler isn't even returning a statically
> allocated structure, but is allocating new memory every time, which
> involves more function calls and maybe memory leaks. Then you use the
> results of all that to make an indirect function call.
>
> I'm not sure exactly what combination of things we could use to make
> this better, but it seems like there are a few possibilities:
>
> (1) The handler function could return a pointer to the same
> CompressionRoutine every time instead of constructing a new one every
> time.
> (2) The CompressionRoutine to which the handler function returns a
> pointer could be statically allocated instead of being built at
> runtime.
> (3) GetCompressionRoutine could have an OID -> handler cache instead
> of relying on syscache + calling the handler function all over again.
> (4) For the compression types that have dedicated bit patterns in the
> high bits of the compressed TOAST size, toast_compress_datum() could
> just have hard-coded logic to use the correct handlers instead of
> translating the bit pattern into an OID and then looking it up over
> again.
> (5) Going even further than #4 we could skip the handler layer
> entirely for such methods, and just call the right function directly.
>
> I think we should definitely do (1), and also (2) unless there's some
> reason it's hard. (3) doesn't need to be part of this patch, but might
> be something to consider later in the series. It's possible that it
> doesn't have enough benefit to be worth the work, though. Also, I
> think we should do either (4) or (5). I have a mild preference for (5)
> unless it looks too ugly.
>
> Note that I'm not talking about hard-coding a fast path for a
> hard-coded list of OIDs - which would seem a little bit unprincipled -
> but hard-coding a fast path for the bit patterns that are themselves
> hard-coded. I don't think we lose anything in terms of extensibility
> or even-handedness there; it's just avoiding a bunch of rigamarole
> that doesn't really buy us anything.
>
> All these points apply equally to toast_decompress_datum_slice() and
> toast_compress_datum().

I agree that (1) and (2) we shall definitely do as part of the first
patch and (3) we might do in later patches.  I think from (4) and (5)
I am more inclined to do (4) for a couple of reasons
a) If we bypass the handler function and directly calls the
compression and decompression routines then we need to check whether
the current executable is compiled with this particular compression
library or not for example in 'lz4handler' we have this below check,
now if we don't have the handler function we either need to put this
in each compression/decompression functions or we need to put is in
each caller.
Datum
lz4handler(PG_FUNCTION_ARGS)
{
#ifndef HAVE_LIBLZ4
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("not built with lz4 support")));
#else

b) Another reason is that once we start supporting the compression
options (0006-Support-compression-methods-options.patch) then we also
need to call 'cminitstate_function' for parsing the compression
options and then calling the compression function, so we need to
hardcode multiple function calls.

I think b) is still okay but because of a) I am more inclined to do
(4), what is your opinion on this?

About (4), one option is that we directly call the correct handler
function for the built-in type directly from
toast_(de)compress(_slice) functions but in that case, we are
duplicating the code, another option is that we call the
GetCompressionRoutine() a common function and in that, for the
built-in type, we can directly call the corresponding handler function
and get the routine.  The only thing is to avoid duplicating in
decompression routine we need to convert CompressionId to Oid before
calling GetCompressionRoutine(), but now we can avoid sys cache lookup
for the built-in type.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: pgbench and timestamps (bounced)

2020-11-24 Thread Anastasia Lubennikova


On 11.09.2020 16:59, Fabien COELHO wrote:


Hello Tom,


It requires a mutex around the commands, I tried to do some windows
implementation which may or may not work.


Ugh, I'd really rather not do that.  Even disregarding the effects
of a mutex, though, my initial idea for fixing this has a big problem:
if we postpone PREPARE of the query until first execution, then it's
happening during timed execution of the benchmark scenario and thus
distorting the timing figures.  (Maybe if we'd always done it like
that, it'd be okay, but I'm quite against changing the behavior now
that it's stood for a long time.)


Hmmm.

Prepare is done *once* per client, ISTM that the impact on any 
statistically significant benchmark is nul in practice, or it would 
mean that the benchmark settings are too low.


Second, the mutex is only used when absolutely necessary, only for the 
substitution part of the query (replacing :stuff by ?), because 
scripts are shared between threads. This is just once, in an unlikely 
case occuring at the beginning.



However, perhaps there's more than one way to fix this.  Once we've
scanned all of the script and seen all the \set commands, we know
(in principle) the set of all variable names that are in use.
So maybe we could fix this by

(1) During the initial scan of the script, make variable-table
entries for every \set argument, with the values shown as undefined
for the moment.  Do not try to parse SQL commands in this scan,
just collect them.


The issue with this approach is

  SELECT 1 AS one \gset pref_

which will generate a "pref_one" variable, and these names cannot be 
guessed without SQL parsing and possibly execution. That is why the

preparation is delayed to when the variables are actually known.


(2) Make another scan in which we identify variable references
in the SQL commands and issue PREPAREs (if enabled).



(3) Perform the timed run.

This avoids any impact of this bug fix on the semantics or timing
of the benchmark proper.  I'm not sure offhand whether this
approach makes any difference for the concerns you had about
identifying/suppressing variable references inside quotes.


I do not think this plan is workable, because of the \gset issue.

I do not see that the conditional mutex and delayed PREPARE would have 
any significant (measurable) impact on an actual (reasonable) 
benchmark run.


A workable solution would be that each client actually execute each 
script once before starting the actual benchmark. It would still need 
a mutex and also a sync barrier (which I'm proposing in some other 
thread). However this may raise some other issues because then some 
operations would be trigger out of the benchmarking run, which may or 
may not be desirable.


So I'm not to keen to go that way, and I think the proposed solution 
is reasonable from a benchmarking point of view as the impact is 
minimal, although not zero.



CFM reminder.

Hi, this entry is "Waiting on Author" and the thread was inactive for a 
while. I see this discussion still has some open questions. Are you 
going to continue working on it, or should I mark it as "returned with 
feedback" until a better time?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Libpq support to connect to standby server as priority

2020-11-24 Thread Anastasia Lubennikova


On 30.09.2020 10:57, Greg Nancarrow wrote:

Thanks for your thoughts, patches and all the pointers.
I'll be looking at all of them.
(And yes, the comma instead of bitwise OR is of course an error,
somehow made and gone unnoticed; the next field in the struct is an
enum, so accepts any int value).

Regards,
Greg Nancarrow
Fujitsu Australia


CFM reminder.

Hi, this entry is "Waiting on Author" and the thread was inactive for a 
while. As far as I see, the patch needs some further work.
Are you going to continue working on it, or should I mark it as 
"returned with feedback" until a better time?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: libpq compression

2020-11-24 Thread Konstantin Knizhnik

Based on Andres review I have implemented the following changes in 
libpq_compression:


1. Make it possible to specify list of compression algorithms in 
connection string.

2. Make it possible to specify compression level.
3. Use "_pq_.compression" instead of "compression"  in startup package.
4. Use full names instead of one-character encoding for compression 
algorithm names.


So now it is possible to open connection in this way:

    psql "dbname=postgres compression=zstd:5,zlib"


New version of the patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/configure b/configure
index ace4ed5..deba608 100755
--- a/configure
+++ b/configure
@@ -700,6 +700,7 @@ LD
 LDFLAGS_SL
 LDFLAGS_EX
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 XML2_LIBS
@@ -867,6 +868,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 '
@@ -8571,6 +8573,85 @@ fi
 
 
 
+#
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+  LIBS="-lzstd $LIBS"
+
+else
+  as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
+
 
 #
 # Zlib
diff --git a/configure.ac b/configure.ac
index 5b91c83..93a5285 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1000,6 +1000,13 @@ PGAC_ARG_BOOL(with, zlib, yes,
 AC_SUBST(with_zlib)
 
 #
+# Zstd
+#
+PGAC_ARG_BOOL(with, zstd, no,
+  [use zstd])
+AC_SUBST(with_zstd)
+
+#
 # Assignments
 #
 
@@ -1186,6 +1193,14 @@ failure.  It is possible the compiler isn't looking in the proper directory.
 Use --without-zlib to disable zlib support.])])
 fi
 
+if test "$with_zstd" = yes; then
+  AC_CHECK_LIB(zstd, ZSTD_decompressStream, [],
+   [AC_MSG_ERROR([zstd library not found
+If you have zstd already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-zstd to disable zstd support.])])
+fi
+
 if test "$enable_spinlocks" = yes; then
   AC_DEFINE(HAVE_SPINLOCKS, 1, [Define to 1 if you have spinlocks.])
 else
@@ -1400,6 +1415,13 @@ failure.  It is possible the compiler isn't looking in the proper directory.
 Use --without-zlib to disable zlib support.])])
 fi
 
+if test "$with_zstd" = yes; then
+  AC_CHECK_HEADER(zstd.h, [], [AC_MSG_ERROR([zstd header not found
+If you have zstd already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-zstd to disable zstd support.])])
+fi
+
 if test "$with_gssapi" = yes ; then
   AC_CHECK_HEADERS(gssapi/gssapi.h, [],
 	[AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])])
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 9ce32fb..140724d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1225,6 +1225,22 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  compression
+  
+  
+Request compression of libpq traffic. Client sends to the server list of compression algorithms, supported by client library.
+If server supports one of this algorithms, then it acknowledges use of this algorithm and then all libpq messages send both from client to server and
+visa versa will be compressed. If server is not supporting any of the suggested algorithms, then it replies with 'n' (no compression)
+message and it is up to the cl

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-11-24 Thread Magnus Hagander

On Fri, Nov 20, 2020 at 4:46 PM Peter Eisentraut
 wrote:
>
> On 2020-11-09 13:05, Magnus Hagander wrote:
> > PFA a rebased version of this patch on top of what has happened since,
> > and changing the pg_upgrade parameter to be --no-scripts.
>
> It seems were are still finding out more nuances about pg_upgrade, but
> looking at initdb for moment, I think the solution for wrapper scripts
> is to just run initdb with >dev/null.  Or maybe if that looks a bit too
> hackish, a --quiet option that turns everything on stdout off.
>
> I think initdb has gotten a bit too chatty over time.  I think if it
> printed nothing on stdout by default and the current output would be
> some kind of verbose or debug mode, we wouldn't really lose much.  With
> that in mind, I'm a bit concerned about adding options (and thus
> documentation surface area etc.) to select exactly which slice of the
> chattiness to omit.

I agree that it's getting unnecessarily chatty, but things like the
locale that it has detected I think is very useful information to
output. Though I guess the same could be said for a few other things,
but does it *ever' pick anything other than 128Mb/100 for example? :)

The main difference between them is that some information is
informational but unnecessary, but the "next steps instructions" are
*incorrect* in most cases when executed by a wrapper. I'd argue that
even if we show them only with --verbose, we should still have a way
of not outputing the information that's going to be incorrect for the
end user.

I think it boils down to that today the output from initdb is entirely
geared towards people running initdb directly and starting their
server manually, and very few people outside the actual PostgreSQL
developers ever do that. But there are still a lot of people who run
initdb through their wrapper manually (for redhat you have to do that,
for debian you only have to do it if you're creating a secondary
cluster but that still a pretty common operation).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] Custom compression methods

2020-11-24 Thread Robert Haas

On Tue, Nov 24, 2020 at 7:11 AM Dilip Kumar  wrote:
> About (4), one option is that we directly call the correct handler
> function for the built-in type directly from
> toast_(de)compress(_slice) functions but in that case, we are
> duplicating the code, another option is that we call the
> GetCompressionRoutine() a common function and in that, for the
> built-in type, we can directly call the corresponding handler function
> and get the routine.  The only thing is to avoid duplicating in
> decompression routine we need to convert CompressionId to Oid before
> calling GetCompressionRoutine(), but now we can avoid sys cache lookup
> for the built-in type.

Suppose that we have a variable lz4_methods (like heapam_methods) that
is always defined, whether or not lz4 support is present. It's defined
like this:

const CompressionAmRoutine lz4_compress_methods = {
.datum_compress = lz4_datum_compress,
.datum_decompress = lz4_datum_decompress,
.datum_decompress_slice = lz4_datum_decompress_slice
};

(It would be good, I think, to actually name things something like
this - in particular why would we have TableAmRoutine and
IndexAmRoutine but not include "Am" in the one for compression? In
general I think tableam is a good pattern to adhere to and we should
try to make this patch hew closely to it.)

Then those functions are contingent on #ifdef HAVE_LIBLZ4: they either
do their thing, or complain that lz4 compression is not supported.
Then in this function you can just say, well, if we have the 01 bit
pattern, handler = &lz4_compress_methods and proceed from there.

BTW, I think the "not supported" message should probably use the 'by
this build' language we use in some places i.e.

[rhaas pgsql]$ git grep errmsg.*'this build' | grep -vF .po:
contrib/pg_prewarm/pg_prewarm.c: errmsg("prefetch is not supported by
this build")));
src/backend/libpq/be-secure-openssl.c: (errmsg("\"%s\" setting \"%s\"
not supported by this build",
src/backend/libpq/be-secure-openssl.c: (errmsg("\"%s\" setting \"%s\"
not supported by this build",
src/backend/libpq/hba.c: errmsg("local connections are not supported
by this build"),
src/backend/libpq/hba.c: errmsg("hostssl record cannot match because
SSL is not supported by this build"),
src/backend/libpq/hba.c: errmsg("hostgssenc record cannot match
because GSSAPI is not supported by this build"),
src/backend/libpq/hba.c: errmsg("invalid authentication method \"%s\":
not supported by this build",
src/backend/utils/adt/pg_locale.c: errmsg("ICU is not supported in
this build"), \
src/backend/utils/misc/guc.c: GUC_check_errmsg("Bonjour is not
supported by this build");
src/backend/utils/misc/guc.c: GUC_check_errmsg("SSL is not supported
by this build");

-- 
Robert Haas
EDB: http://www.enterprisedb.com

About adding a new filed to a struct in primnodes.h

2020-11-24 Thread Andy Fan

Hi:

For example, we added a new field in a node  in primnodes.h

struct FuncExpr
{

 +  int newf;
};

then we modified the copy/read/out functions for this node.  In
_readFuncExpr,
we probably add something like

static FuncExpr
_readFuncExpr(..)
{
..
+ READ_INT_FILED(newf);
};

Then we will get a compatible issue if we create a view with the node in
the older version and access the view with the new binary.  I think we can
have bypass this issue easily with something like

READ_INT_FIELD_UNMUST(newf, defaultvalue);

However I didn't see any code like this in our code base.   does it doesn't
work or is it something not worth doing?

-- 
Best Regards
Andy Fan

Re: Remove cache_plan argument comments to ri_PlanCheck

2020-11-24 Thread Amit Kapila

On Tue, Nov 24, 2020 at 4:46 PM Li Japin  wrote:
>
> Hi, hackers
>
> I found that the cache_plan argument to ri_PlanCheck already been remove since
> 5b7ba75f7ff854003231e8099e3038c7e2eba875.   I think we can remove the comments
> tor cache_plan to ri_PlanCheck.
>
> diff --git a/src/backend/utils/adt/ri_triggers.c 
> b/src/backend/utils/adt/ri_triggers.c
> index 7e2b2e3dd6..02b1a3868f 100644
> --- a/src/backend/utils/adt/ri_triggers.c
> +++ b/src/backend/utils/adt/ri_triggers.c
> @@ -2130,9 +2130,6 @@ InvalidateConstraintCacheCallBack(Datum arg, int 
> cacheid, uint32 hashvalue)
>
>  /*
>   * Prepare execution plan for a query to enforce an RI restriction
> - *
> - * If cache_plan is true, the plan is saved into our plan hashtable
> - * so that we don't need to plan it again.
>   */
>  static SPIPlanPtr
>  ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
>

Your patch looks good to me.

-- 
With Regards,
Amit Kapila.

Re: vac_update_datfrozenxid will raise "wrong tuple length" if pg_database tuple contains toast attribute.

2020-11-24 Thread Amit Kapila

On Tue, Nov 24, 2020 at 2:07 PM Junfeng Yang  wrote:
>
> Hi hackers,
>
> Can anyone help to verify this?
>

I think one way to get feedback is to register this patch for the next
commit fest (https://commitfest.postgresql.org/31/)

-- 
With Regards,
Amit Kapila.

Re: [patch] CLUSTER blocks scanned progress reporting

2020-11-24 Thread Fujii Masao





On 2020/11/21 2:32, Matthias van de Meent wrote:

Hi,

The pg_stat_progress_cluster view can report incorrect
heap_blks_scanned values when synchronize_seqscans is enabled, because
it allows the sequential heap scan to not start at block 0. This can
result in wraparounds in the heap_blks_scanned column when the table
scan wraps around, and starting the next phase with heap_blks_scanned
!= heap_blks_total. This issue was introduced with the
pg_stat_progress_cluster view.


Good catch! I agree that this is a bug.



The attached patch fixes the issue by accounting for a non-0
heapScan->rs_startblock and calculating the correct number with a
non-0 heapScan->rs_startblock in mind.


Thanks for the patch! It basically looks good to me.

It's a bit waste of cycles to calculate and update the number of scanned
blocks every cycles. So I'm inclined to change the code as follows.
Thought?

+   BlockNumber prev_cblock = InvalidBlockNumber;

+   if (prev_cblock != heapScan->rs_cblock)
+   {
+   
pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+  
  (heapScan->rs_cblock +
+  
   heapScan->rs_nblocks -
+  
   heapScan->rs_startblock
+  
  ) % heapScan->rs_nblocks + 1);
+   prev_cblock = heapScan->rs_cblock;
+   }

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Migration Oracle multitenant database to PostgreSQL ?

2020-11-24 Thread Bruce Momjian

On Tue, Nov 24, 2020 at 09:22:26AM +0100, Thomas Kellerer wrote:
> 
> ROS Didier schrieb am 24.11.2020 um 09:09:
> > I would like to know if it is possible to migrate Oracle multitenant
> > database (with multiple PDB) to PostgreSQL ?
> Postgres' databases are very similar to Oracle's PDBs.
> 
> Probably the biggest difference is, that you can't shutdown
> a single database as you can do with a PDB.

I guess you could lock users out of a single database by changing
pg_hba.conf and doing reload.

> Database users in Postgres are like Oracle's "common users", they are
> global for the whole instance (aka "cluster" in Postgres' terms). There
> are no database specific users.

Good to know.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: [PATCH] LWLock self-deadlock detection

2020-11-24 Thread Ashutosh Bapat

This looks useful. LWLockCheckSelfDeadlock() could use LWLockHeldByMe
variant instead of copying that code with possibly a change in that
function to return the required information.

I am also seeing a pattern
Assert(LWLockHeldByMe*())
LWLockAcquire()

at some places. Should we change LWLockAcquire to do
Assert(LWLockHelpByMe()) always to detect such occurrences? Enhance
that pattern to print the information that your patch prints?

It looks weird that we can detect a self deadlock but not handle it.
But handling it requires remembering multiple LWLocks held and then
release them those many times. That may not leave LWLocks LW anymore.

On Thu, Nov 19, 2020 at 4:02 PM Craig Ringer
 wrote:
>
> Hi all
>
> Here's a patch I wrote a while ago to detect and report when a 
> LWLockAcquire() results in a simple self-deadlock due to the caller already 
> holding the LWLock.
>
> To avoid affecting hot-path performance, it only fires the check on the first 
> iteration through the retry loops in LWLockAcquire() and LWLockWaitForVar(), 
> and just before we sleep, once the fast-path has been missed.
>
> I wrote an earlier version of this when I was chasing down some hairy issues 
> with background workers deadlocking on some exit paths because ereport(ERROR) 
> or elog(ERROR) calls fired when a LWLock was held would cause a 
> before_shmem_exit or on_shmem_exit cleanup function to deadlock when it tried 
> to acquire the same lock.
>
> But it's an easy enough mistake to make and a seriously annoying one to track 
> down, so I figured I'd post it for consideration. Maybe someone else will get 
> some use out of it even if nobody likes the idea of merging it.
>
> As written the check runs only for --enable-cassert builds or when LOCK_DEBUG 
> is defined.

-- 
Best Wishes,
Ashutosh Bapat

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-11-24 Thread Bruce Momjian

On Tue, Nov 24, 2020 at 01:32:45PM +0100, Magnus Hagander wrote:
> I think it boils down to that today the output from initdb is entirely
> geared towards people running initdb directly and starting their
> server manually, and very few people outside the actual PostgreSQL
> developers ever do that. But there are still a lot of people who run
> initdb through their wrapper manually (for redhat you have to do that,
> for debian you only have to do it if you're creating a secondary
> cluster but that still a pretty common operation).

I think the big issue is that pg_upgrade not only output progress
messages, but created files in the current directory, while initdb, by
definition, is creating files in PGDATA.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: [doc] plan invalidation when statistics are update

2020-11-24 Thread Fujii Masao





On 2020/11/19 14:33, torikoshia wrote:

On 2020-11-18 11:35, Fujii Masao wrote:

Thanks for your comment!


On 2020/11/18 11:04, torikoshia wrote:

Hi,

AFAIU, when the planner statistics are updated, generic plans are invalidated 
and PostgreSQL recreates. However, the manual doesn't seem to explain it 
explicitly.

   https://www.postgresql.org/docs/devel/sql-prepare.html

I guess this case is included in 'whenever database objects used in the 
statement have definitional (DDL) changes undergone', but I feel it's hard to 
infer.

Since updates of the statistics can often happen, how about describing this 
case explicitly like an attached patch?


+1 to add that note.

-   statement.  Also, if the value of  changes
+   statement. For example, when the planner statistics of the statement
+   are updated, PostgreSQL re-analyzes and
+   re-plans the statement.

I don't think "For example," is necessary.

"planner statistics of the statement" sounds vague? Does the statement
is re-analyzed and re-planned only when the planner statistics of database
objects used in the statement are updated? If yes, we should describe
that to make the note a bit more explicitly?


Yes. As far as I confirmed, updating statistics which are not used in
prepared statements doesn't trigger re-analyze and re-plan.

Since plan invalidations for DDL changes and statistcal changes are caused
by PlanCacheRelCallback(Oid 'relid'), only the prepared statements using
'relid' relation seem invalidated.> 
Attached updated patch.


Thanks for confirming that and updating the patch!
Barring any objection, I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: LogwrtResult contended spinlock

2020-11-24 Thread Alvaro Herrera

On 2020-Nov-24, Anastasia Lubennikova wrote:

> On 04.09.2020 20:13, Andres Freund wrote:

> > Re general routine: On second thought, it might actually be worth having
> > it. Even just for LSNs - there's plenty places where it's useful to
> > ensure a variable is at least a certain size.  I think I would be in
> > favor of a general helper function.
> Do you mean by general helper function something like this?
> 
> void
> swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)

Something like that, yeah, though maybe name it "pg_atomic_increase_lsn"
or some similar name that makes it clear that 

1. it is supposed to use atomics
2. it can only be used to *advance* a value rather than a generic swap.

(I'm not 100% clear that that's the exact API we need.)

> This CF entry was inactive for a while. Alvaro, are you going to continue
> working on it?

Yes, please move it forward.  I'll post an update sometime before the
next CF.

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-11-24 Thread Magnus Hagander

On Tue, Nov 24, 2020 at 3:12 PM Bruce Momjian  wrote:
>
> On Tue, Nov 24, 2020 at 01:32:45PM +0100, Magnus Hagander wrote:
> > I think it boils down to that today the output from initdb is entirely
> > geared towards people running initdb directly and starting their
> > server manually, and very few people outside the actual PostgreSQL
> > developers ever do that. But there are still a lot of people who run
> > initdb through their wrapper manually (for redhat you have to do that,
> > for debian you only have to do it if you're creating a secondary
> > cluster but that still a pretty common operation).
>
> I think the big issue is that pg_upgrade not only output progress
> messages, but created files in the current directory, while initdb, by
> definition, is creating files in PGDATA.

To be clear, my comments above were primarily about initdb, not
pg_upgrade, as that's what Peter was commenting on as well.

pg_upgrade is a somewhat different but also interesting case. I think
the actual progress output is more interesting in pg_upgrade as it's
more likely to take measurable amounts of time. Whereas in initdb,
it's actually the "detected parameter values" that are the most
interesting parts.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: walsender bug: stuck during shutdown

2020-11-24 Thread Alvaro Herrera

Hello,

On 2020-Nov-24, Fujii Masao wrote:

> Thanks for working on this!
> Could you tell me the discussion thread where Chloe Dives reported the issue 
> to?
> Sorry I could not find that..

It was not public -- sorry I didn't make that clear.

> I'd like to see the procedure to reproduce the issue.

Here's the script.


Thanks!
import psycopg2

from psycopg2.extras import LogicalReplicationConnection, REPLICATION_LOGICAL


def _logical_replication_callback(message):
''' Deal with a single audit_json message; see _process_message. We get one message, therefore one
call to this method, per committed transaction on the source database.
'''
print("Raw message: " + message)
message.cursor.send_feedback(flush_lsn=message.data_start)


def main():
slot_name = 'snitch_papersnap_testing'

connection = psycopg2.connect(
host='fab-devdb02',
port=5432,
dbname='postgres',
user='postgres',
connection_factory=LogicalReplicationConnection,
)

with connection.cursor() as cursor:
cursor.execute("SELECT COUNT(*) FROM pg_replication_slots WHERE slot_name = %s", (slot_name,))
slot_exists, = cursor.fetchone()

if slot_exists:
cursor.drop_replication_slot(slot_name)
slot_exists = False

if not slot_exists:
cursor.create_replication_slot(slot_name, REPLICATION_LOGICAL, output_plugin='test_decoding')

cursor.start_replication(slot_name, REPLICATION_LOGICAL, decode=True)
print("Logical replication started")
cursor.consume_stream(_logical_replication_callback)


if __name__ == '__main__':
main()

Re: About adding a new filed to a struct in primnodes.h

2020-11-24 Thread Alvaro Herrera

On 2020-Nov-24, Andy Fan wrote:

> then we modified the copy/read/out functions for this node.  In
> _readFuncExpr,
> we probably add something like

> [ ... ]

> Then we will get a compatible issue if we create a view with the node in
> the older version and access the view with the new binary.

When nodes are modified, you have to increment CATALOG_VERSION_NO which
makes the new code incompatible with a datadir previously created -- for
precisely this reason.

Re: bug in pageinspect's "tuple data" feature

2020-11-24 Thread Alvaro Herrera

On 2020-Nov-24, Michael Paquier wrote:

> On Mon, Nov 23, 2020 at 09:11:26AM +0200, Heikki Linnakangas wrote:
> > On 21/11/2020 21:32, Alvaro Herrera wrote:
> >> This is pretty unhelpful; it would be better not to try to print the
> >> data instead of dying.  With that, at least you can know where the
> >> problem is.
> >> 
> >> This was introduced in d6061f83a166 (2015).  Proposed patch to fix it
> >> (by having the code print a null "data" instead of dying) is attached.
> > 
> > Null seems misleading. Maybe something like "invalid", or print a warning?

Good idea, thanks.

> How did you get into this state to begin with?

The data was corrupted for whatever reason.  I don't care why or how, I
just need to fix it.  If the data isn't corrupted, then I don't use
pageinspect in the first place.

> get_raw_page() uses ReadBufferExtended() which gives some level of
> protection already, so shouldn't it be better to return an ERROR with
> ERRCODE_DATA_CORRUPTED and the block involved?

What would I gain from doing that?  It's even more unhelpful, because it
is intentional rather than accidental.

Re: Terminate the idle sessions

2020-11-24 Thread David G. Johnston

On Mon, Nov 23, 2020 at 11:22 PM Li Japin  wrote:

>
> How about use “foreign-data wrapper” replace “postgres_fdw”?
>

I don't see much value in avoiding mentioning that specific term - my
proposal turned it into an example instead of being exclusive.


> - This parameter should be set to zero if you use some
> connection-pooling software,
> - or pg servers used by postgres_fdw, because connections might be
> closed unexpectedly.
> + This parameter should be set to zero if you use
> connection-pooling software,
> + or PostgreSQL servers connected to
> using foreign-data
> + wrapper, because connections might be closed unexpectedly.
>  
>

Maybe:

+ or your PostgreSQL server receives connection from postgres_fdw or
similar middleware.
+ Such software is expected to self-manage its connections.
David J.

Re: [patch] CLUSTER blocks scanned progress reporting

2020-11-24 Thread Matthias van de Meent

On Tue, 24 Nov 2020 at 15:05, Fujii Masao  wrote:
>
> On 2020/11/21 2:32, Matthias van de Meent wrote:
> > Hi,
> >
> > The pg_stat_progress_cluster view can report incorrect
> > heap_blks_scanned values when synchronize_seqscans is enabled, because
> > it allows the sequential heap scan to not start at block 0. This can
> > result in wraparounds in the heap_blks_scanned column when the table
> > scan wraps around, and starting the next phase with heap_blks_scanned
> > != heap_blks_total. This issue was introduced with the
> > pg_stat_progress_cluster view.
>
> Good catch! I agree that this is a bug.
>
> >
> > The attached patch fixes the issue by accounting for a non-0
> > heapScan->rs_startblock and calculating the correct number with a
> > non-0 heapScan->rs_startblock in mind.
>
> Thanks for the patch! It basically looks good to me.

Thanks for the feedback!

> It's a bit waste of cycles to calculate and update the number of scanned
> blocks every cycles. So I'm inclined to change the code as follows.
> Thought?
>
> +   BlockNumber prev_cblock = InvalidBlockNumber;
> 
> +   if (prev_cblock != heapScan->rs_cblock)
> +   {
> +   
> pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
> + 
>(heapScan->rs_cblock +
> + 
> heapScan->rs_nblocks -
> + 
> heapScan->rs_startblock
> + 
>) % heapScan->rs_nblocks + 1);
> +   prev_cblock = heapScan->rs_cblock;
> +   }

That seems quite reasonable.

I noticed that with my proposed patch it is still possible to go to
the next phase while heap_blks_scanned != heap_blks_total. This can
happen when the final heap pages contain only dead tuples, so no tuple
is returned from the last heap page(s) of the scan. As the
heapScan->rs_cblock is set to InvalidBlockNumber when the scan is
finished (see heapam.c#1060-1072), I think it would be correct to set
heap_blks_scanned to heapScan->rs_nblocks at the end of the scan
instead.

Please find attached a patch applying the suggested changes.

Matthias van de Meent
From b3327cace3bebdb15006834e21672fc30cb2f0bb Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 20 Nov 2020 16:23:59 +0100
Subject: [PATCH v2] Fix CLUSTER progress reporting of number of blocks
 scanned.

The heapScan need not start at block 0, so heapScan->rs_cblock need not be the
correct value for amount of blocks scanned. A more correct value is
 ((heapScan->rs_cblock - heapScan->rs_startblock + heapScan->rs_nblocks) %
   heapScan->rs_nblocks), as it accounts for the wraparound and the initial
offset of the heapScan.

Additionally, a heap scan need not return tuples from the last scanned page.
This means that when table_scan_getnextslot returns false, we must manually
update the heap_blks_scanned parameter to the number of blocks in the heap
scan.
---
 src/backend/access/heap/heapam_handler.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dcaea7135f..f20d4bed07 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -698,6 +698,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	Datum	   *values;
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
+	BlockNumber prev_cblock = InvalidBlockNumber;
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -793,14 +794,37 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		else
 		{
 			if (!table_scan_getnextslot(tableScan, ForwardScanDirection, slot))
+			{
+/*
+ * A heap scan need not return tuples for the last page it has
+ * scanned. To ensure that heap_blks_scanned is equivalent to
+ * total_heap_blks after the table scan phase, this parameter
+ * is manually updated to the correct value when the table scan
+ * finishes.
+ */
+pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+			 heapScan->rs_nblocks);
 break;
+			}
 
 			/*
 			 * In scan-and-sort mode and also VACUUM FULL, set heap blocks
 			 * scanned
+			 *
+			 * Note that heapScan may start at an offset and wrap around, i.e.
+			 * rs_startblock may be >0, and rs_cblock may end with a number
+			 * below rs_startblock. To prevent showing this wraparound to the
+			 * user, we offset rs_cblock by rs_startblock (modulo rs_nblocks).
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
-		 heapScan->rs_cblo

Re: abstract Unix-domain sockets

2020-11-24 Thread David G. Johnston

On Mon, Nov 23, 2020 at 9:00 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> Or is it the case that we always attempt to bind the TCP/IP port,
> regardless of the presence of a socket file, in which case the failure for
> port binding does cover the socket situation as well?
>

This cannot always be the case since the listened-to IP address matters.

I think the socket file error message hint is appropriate.  I'd consider it
a bug if that code is effectively unreachable (the fact that the hint
exists supports this conclusion).  If we add "abstract unix sockets" where
we likewise prevent two servers from listening on the same channel, the
absence of such a check for the socket file is even more unexpected.  At
minimum we should at least declare whether we will even try and whether
such a socket file check is best effort or simply generally reliable.

David J.

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-11-24 Thread Bruce Momjian

On Tue, Nov 24, 2020 at 04:05:26PM +0100, Magnus Hagander wrote:
> pg_upgrade is a somewhat different but also interesting case. I think
> the actual progress output is more interesting in pg_upgrade as it's
> more likely to take measurable amounts of time. Whereas in initdb,
> it's actually the "detected parameter values" that are the most
> interesting parts.

Originally, initdb did take some time for each step.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-11-24 Thread Justin Pryzby

On Sat, Oct 31, 2020 at 01:36:11PM -0500, Justin Pryzby wrote:
> > > From the grammar perspective ANY option is available for any command
> > > that uses parenthesized option list. All the checks and validations
> > > are performed at the corresponding command code.
> > > This analyze_keyword is actually doing only an ANALYZE word
> > > normalization if it's used as an option. Why it could be harmful?
> > 
> > Michael has not replied since then, but he was relatively positive about
> > 0005 initially, so I put it as a first patch now.
> 
> Thanks.  I rebased Alexey's latest patch on top of recent changes to 
> cluster.c.
> This puts the generic grammar changes first.  I wasn't paying much attention 
> to
> that part, so still waiting for a committer review.

@cfbot: rebased
>From 4a8e71ac704bd4c58e54703298b5234946c666a3 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 2 Sep 2020 23:05:16 +0300
Subject: [PATCH v30 1/6] Refactor gram.y in order to add a common
 parenthesized option list

Previously there were two identical option lists
(explain_option_list and vac_analyze_option_list) + very similar
reindex_option_list.  It does not seem to make
sense to maintain identical option lists in the grammar, since
all new options are added and parsed in the backend code.

That way, new common_option_list added in order to replace all
explain_option_list, vac_analyze_option_list and probably
also reindex_option_list.
---
 src/backend/parser/gram.y | 61 +--
 1 file changed, 14 insertions(+), 47 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index efc9c99754..5c86063459 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -315,10 +315,10 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 create_extension_opt_item alter_extension_opt_item
 
 %type 	opt_lock lock_type cast_context
-%type 		vac_analyze_option_name
-%type 	vac_analyze_option_elem
-%type 	vac_analyze_option_list
-%type 	vac_analyze_option_arg
+%type 		common_option_name
+%type 	common_option_elem
+%type 	common_option_list
+%type 	common_option_arg
 %type 	drop_option
 %type 	opt_or_replace opt_no
 opt_grant_grant_option opt_grant_admin_option
@@ -513,10 +513,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	generic_option_arg
 %type 	generic_option_elem alter_generic_option_elem
 %type 	generic_option_list alter_generic_option_list
-%type 		explain_option_name
-%type 	explain_option_arg
-%type 	explain_option_elem
-%type 	explain_option_list
 
 %type 	reindex_target_type reindex_target_multitable
 %type 	reindex_option_list reindex_option_elem
@@ -10483,7 +10479,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose opt_analyze opt_vacuum_relati
 	n->is_vacuumcmd = true;
 	$$ = (Node *)n;
 }
-			| VACUUM '(' vac_analyze_option_list ')' opt_vacuum_relation_list
+			| VACUUM '(' common_option_list ')' opt_vacuum_relation_list
 {
 	VacuumStmt *n = makeNode(VacuumStmt);
 	n->options = $3;
@@ -10504,7 +10500,7 @@ AnalyzeStmt: analyze_keyword opt_verbose opt_vacuum_relation_list
 	n->is_vacuumcmd = false;
 	$$ = (Node *)n;
 }
-			| analyze_keyword '(' vac_analyze_option_list ')' opt_vacuum_relation_list
+			| analyze_keyword '(' common_option_list ')' opt_vacuum_relation_list
 {
 	VacuumStmt *n = makeNode(VacuumStmt);
 	n->options = $3;
@@ -10514,12 +10510,12 @@ AnalyzeStmt: analyze_keyword opt_verbose opt_vacuum_relation_list
 }
 		;
 
-vac_analyze_option_list:
-			vac_analyze_option_elem
+common_option_list:
+			common_option_elem
 {
 	$$ = list_make1($1);
 }
-			| vac_analyze_option_list ',' vac_analyze_option_elem
+			| common_option_list ',' common_option_elem
 {
 	$$ = lappend($1, $3);
 }
@@ -10530,19 +10526,19 @@ analyze_keyword:
 			| ANALYSE /* British */
 		;
 
-vac_analyze_option_elem:
-			vac_analyze_option_name vac_analyze_option_arg
+common_option_elem:
+			common_option_name common_option_arg
 {
 	$$ = makeDefElem($1, $2, @1);
 }
 		;
 
-vac_analyze_option_name:
+common_option_name:
 			NonReservedWord			{ $$ = $1; }
 			| analyze_keyword		{ $$ = "analyze"; }
 		;
 
-vac_analyze_option_arg:
+common_option_arg:
 			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
 			| NumericOnly			{ $$ = (Node *) $1; }
 			| /* EMPTY */			{ $$ = NULL; }
@@ -10624,7 +10620,7 @@ ExplainStmt:
 	n->options = list_make1(makeDefElem("verbose", NULL, @2));
 	$$ = (Node *) n;
 }
-		| EXPLAIN '(' explain_option_list ')' ExplainableStmt
+		| EXPLAIN '(' common_option_list ')' ExplainableStmt
 {
 	ExplainStmt *n = makeNode(ExplainStmt);
 	n->query = $5;
@@ -10645,35 +10641,6 @@ ExplainableStmt:
 			| ExecuteStmt	/* by default all are $$=$1 */
 		;
 
-explain_option_list:
-			explain_option_elem
-{
-	$$ = list_make1($1);
-}
-			| explai

Re: abstract Unix-domain sockets

2020-11-24 Thread Peter Eisentraut


On 2020-11-23 17:00, David G. Johnston wrote:
So presently there is no functioning code to prevent two PostgreSQL 
instances from using the same socket so long as they do not also use the 
same data directory?  We only handle the case of an unclean crash - 
where the pid and socket are both left behind - having the system tell 
the user to remove the pid lock file but then auto-replacing the socket 
(I was conflating the behavior with the pid lock file and the socket file).


I would expect that we handle port misconfiguration also, by not 
auto-replacing the socket and instead have the existing error message 
(with modified hint) remain behind.  This provides behavior consistent 
with TCP port binding.  Or is it the case that we always attempt to bind 
the TCP/IP port, regardless of the presence of a socket file, in which 
case the failure for port binding does cover the socket situation as 
well?  If this is the case, pointing that out in [1] and a code comment, 
while removing that particular error as "dead code", would work.


We're subject to whatever the kernel behavior is.  If the kernel doesn't 
report address conflicts for Unix-domain sockets, then we can't do 
anything about that.  Having an error message ready in case the kernel 
does report such an error is not useful if it never does.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/

Re: [HACKERS] Custom compression methods

2020-11-24 Thread Dilip Kumar

On Tue, Nov 24, 2020 at 7:14 PM Robert Haas  wrote:
>
> On Tue, Nov 24, 2020 at 7:11 AM Dilip Kumar  wrote:
> > About (4), one option is that we directly call the correct handler
> > function for the built-in type directly from
> > toast_(de)compress(_slice) functions but in that case, we are
> > duplicating the code, another option is that we call the
> > GetCompressionRoutine() a common function and in that, for the
> > built-in type, we can directly call the corresponding handler function
> > and get the routine.  The only thing is to avoid duplicating in
> > decompression routine we need to convert CompressionId to Oid before
> > calling GetCompressionRoutine(), but now we can avoid sys cache lookup
> > for the built-in type.
>
> Suppose that we have a variable lz4_methods (like heapam_methods) that
> is always defined, whether or not lz4 support is present. It's defined
> like this:
>
> const CompressionAmRoutine lz4_compress_methods = {
> .datum_compress = lz4_datum_compress,
> .datum_decompress = lz4_datum_decompress,
> .datum_decompress_slice = lz4_datum_decompress_slice
> };

Yeah, this makes sense.

>
> (It would be good, I think, to actually name things something like
> this - in particular why would we have TableAmRoutine and
> IndexAmRoutine but not include "Am" in the one for compression? In
> general I think tableam is a good pattern to adhere to and we should
> try to make this patch hew closely to it.)

For the compression routine name, I did not include "Am" because
currently, we are storing the compression method in the new catalog
"pg_compression" not in the pg_am.   So are you suggesting that we
should store the compression methods also in the pg_am instead of
creating a new catalog?  IMHO, storing the compression methods in a
new catalog is a better option instead of storing them in pg_am
because actually, the compression methods are not the same as heap or
index AMs, I mean they are actually not the access methods.  Am I
missing something?

> Then those functions are contingent on #ifdef HAVE_LIBLZ4: they either
> do their thing, or complain that lz4 compression is not supported.
> Then in this function you can just say, well, if we have the 01 bit
> pattern, handler = &lz4_compress_methods and proceed from there.

Okay

> BTW, I think the "not supported" message should probably use the 'by
> this build' language we use in some places i.e.
>

Okay

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread Peter Eisentraut


On 2020-11-24 01:52, Dagfinn Ilmari Mannsåker wrote:

The Clang documentation¹ suggest an even neater solution, which would
eliminate the repetitive empty pg_attribute_foo #defines in the trailing
#else/#endif block in commit 1fa22a43a56e1fe44c7bb3a3d5ef31be5bcac41d:

#ifndef __has_attribute
#define __has_attribute(x) 0
#endif


Yes, this was also mentioned and agreed earlier in the thread, but then 
we apparently forgot to update the patch.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/

Re: abstract Unix-domain sockets

2020-11-24 Thread David G. Johnston

On Tue, Nov 24, 2020 at 8:45 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> We're subject to whatever the kernel behavior is.  If the kernel doesn't
> report address conflicts for Unix-domain sockets, then we can't do
> anything about that.  Having an error message ready in case the kernel
> does report such an error is not useful if it never does.
>

It's a file, we can check for its existence in user-space.

David J.

1 2 >

1 - 100 of 137 matches

Mail list logo