Migration Oracle multitenant database to PostgreSQL ?

2020-11-24 Thread ROS Didier
Hi
I would like to know if it is possible to migrate Oracle multitenant 
database (with multiple PDB) to PostgreSQL ?

Thanks in advance

Best Regards

D ROS
EDF



Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.


This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.


Re: Migration Oracle multitenant database to PostgreSQL ?

2020-11-24 Thread Thomas Kellerer


ROS Didier schrieb am 24.11.2020 um 09:09:
> I would like to know if it is possible to migrate Oracle multitenant
> database (with multiple PDB) to PostgreSQL ?
Postgres' databases are very similar to Oracle's PDBs.

Probably the biggest difference is, that you can't shutdown
a single database as you can do with a PDB.

Database users in Postgres are like Oracle's "common users", they are
global for the whole instance (aka "cluster" in Postgres' terms). There
are no database specific users.

Thomas









回复: vac_update_datfrozenxid will raise "wrong tuple length" if pg_database tuple contains toast attribute.

2020-11-24 Thread Junfeng Yang
Hi hackers,

Can anyone help to verify this?



RE: POC: postgres_fdw insert batching

2020-11-24 Thread tsunakawa.ta...@fujitsu.com
From: Tomas Vondra 
> 1) We're calling it "batch_size" but the API function is named
> postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
> to postgresGetModifyBatchSize()? That has the advantage it'd work if we
> ever add support for batching to UPDATE/DELETE.

Actually, I was in two minds whether the term batch or bulk is better.  Because 
Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK COLLECT INTO 
array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch 
updates" and its API method names (addBatch, executeBatch).

But it seems better or common to use batch according to the etymology and the 
following Stack Overflow page:

https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch

OTOH, as for the name GetModifyBatchSize() you suggest, I think 
GetInsertBatchSize may be better.  That is, this API deals with multiple 
records in a single INSERT statement.  Your GetModifyBatchSize will be reserved 
for statement batching when libpq has supported batch/pipelining to execute 
multiple INSERT/UPDATE/DELETE statements, as in the following JDBC batch 
updates.  What do you think?

CODE EXAMPLE 14-1 Creating and executing a batch of insert statements 
--
Statement stmt = con.createStatement(); 
stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')"); 
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')"); 
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)"); 

// submit a batch of update commands for execution 
int[] updateCounts = stmt.executeBatch(); 
--


> 2) Do we have to lookup the batch_size in create_foreign_modify (in
> server/table options)? I'd have expected to look it up while planning
> the modify and then pass it through the list, just like the other
> FdwModifyPrivateIndex stuff. But maybe that's not possible.

Don't worry, create_foreign_modify() is called from PlanForeignModify() during 
planning.  Unfortunately, it's also called from BeginForeignInsert(), but other 
stuff passed to create_foreign_modify() including the query string is 
constructed there.


> 3) That reminds me - should we show the batching info on EXPLAIN? That
> seems like a fairly interesting thing to show to the user. Perhaps
> showing the average batch size would also be useful? Or maybe not, we
> create the batches as large as possible, with the last one smaller.

Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change 
dynamically based on the planning or system state unlike shared buffers and 
parallel workers.  OTOH, I sometimes want to see what configuration parameter 
values the user set, such as work_mem, enable_*, and shared_buffers, together 
with the query plan (EXPLAIN and auto_explain).  For example, it'd be nice if 
EXPLAIN (parameters on) could do that.  Some relevant FDW-related parameters 
could be included in that output.

> 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
> over for every tuple. I don't know it that has measurable impact, but it
> seems a bit excessive IMO. I don't think we should support the batch
> size changing during execution (seems tricky).

Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value 
that was already saved in a struct in create_foreign_modify().


Regards
Takayuki Tsunakawa



Re: Deduplicate aggregates and transition functions in planner

2020-11-24 Thread Heikki Linnakangas

On 19/11/2020 12:38, Heikki Linnakangas wrote:

So barring objections, I'm going to push the attached updated patch that
includes the removal of AggrefExprState, and leave CookedAggrefs or
other further refactorings for the future.


Done. Thanks!

- Heikki




Re: Implementing Incremental View Maintenance

2020-11-24 Thread Yugo NAGATA
On Wed, 11 Nov 2020 19:10:35 +0300
Konstantin Knizhnik  wrote:

Thank you for reviewing this patch!

> 
> The patch is not applied to the current master because makeFuncCall 
> prototype is changed,
> I fixed it by adding COAERCE_CALL_EXPLICIT.

The rebased patch was submitted.

> Ooops! Now TPS are much lower:
> 
> tps = 141.767347 (including connections establishing)
> 
> Speed of updates is reduced more than 70 times!
> Looks like we loose parallelism because almost the same result I get 
> with just one connection.

As you and Ishii-san mentioned in other posts, I think the reason would be a
table lock on the materialized view that is acquired during view maintenance.
I will explain more a bit in another post.

> 4. Finally let's create one more view (it is reasonable to expect that 
> analytics will run many different queries and so need multiple views).
> 
> create incremental materialized view teller_avgs as select 
> t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on 
> a.bid=t.bid group by t.tid;
> 
> It is great that not only simple aggregates like SUM are supported, but 
> also AVG.
> But insertion speed now is reduced twice - 72TPS.

Yes, the current implementation takes twice time for updating a table time
when a new incrementally maintainable materialized view is defined on the
table because view maintenance is performed for each view.

> 
> So good news is that incremental materialized views really work.
> And bad news is that maintenance overhead is too large which 
> significantly restrict applicability of this approach.
> Certainly in case of dominated read-only workload such materialized 
> views can significantly improve performance.
> But unfortunately my dream that them allow to combine OLAP+OLPT is not 
> currently realized.

As you concluded, there is a large overhead on updating base tables in the
current implementation because it is immediate maintenance in which the view
is updated in the same sentence where its base table is modified. Therefore,
this is not suitable to OLTP workload where there are frequent updates of
tables. 

For suppressing maintenance overhead in such workload, we have to implement
"deferred maintenance" which collects table change logs and updates the view
in another transaction afterward.

Regards,
Yugo Nagata


-- 
Yugo NAGATA 




Re: Implementing Incremental View Maintenance

2020-11-24 Thread Yugo NAGATA
On Thu, 12 Nov 2020 15:37:42 +0300
Konstantin Knizhnik  wrote:

> Well, creation of proper indexes for table is certainly responsibility 
> of DBA.
> But users may not consider materialized view as normal table. So the 
> idea that index should
> be explicitly created for materialized view seems to be not so obvious.
>  From the other side, implementation of materialized view knows which 
> index is needed for performing efficient incremental update.
> I wonder if it can create such index itself implicitly or at least 
> produce notice with proposal to create such index.

That makes sense. Especially for aggregate views, it is obvious that
creating an index on expressions used in GROUP BY is effective. For
other views, creating an index on columns that come from primary keys
of base tables would be effective if any.

However, if any base table doesn't have a primary or unique key or such
key column is not contained in the view's target list, it is hard to
decide an appropriate index on the view. We can create an index on all
columns in the target list, but it could cause overhead on view maintenance. 
So, just producing notice would be better for such cases. 

> I looked throw your patch for exclusive table locks and found this 
> fragment in matview.c:
> 
>      /*
>       * Wait for concurrent transactions which update this materialized 
> view at
>       * READ COMMITED. This is needed to see changes committed in other
>       * transactions. No wait and raise an error at REPEATABLE READ or
>       * SERIALIZABLE to prevent update anomalies of matviews.
>       * XXX: dead-lock is possible here.
>       */
>      if (!IsolationUsesXactSnapshot())
>          LockRelationOid(matviewOid, ExclusiveLock);
>      else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))
> 
> 
> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
> It is still about 7 times slower than performance without incremental view.
> But now the gap is not so dramatic. And it seems to be clear that this 
> exclusive lock on matview is real show stopper for concurrent updates.
> I do not know which race conditions and anomalies we can get if replace 
> table-level lock with row-level lock here.

I explained it here:
https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp
 
For example, suppose there is a view V = R*S that joins tables R and S,
and there are two concurrent transactions T1 which changes table R to R'
and T2 which changes S to S'. Without any lock,  in READ COMMITTED mode,
V would be updated to R'*S in T1, and R*S' in T2, so it would cause
inconsistency.  By locking the view V, transactions T1, T2 are processed
serially and this inconsistency can be avoided.

Especially, suppose that tuple dR is inserted into R in T1, and dS is
inserted into S in T2, where dR and dS will be joined in according to
the view definition. In this situation, without any lock, the change of V is
computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
be included in the results.  This inconsistency could not be resolved by
row-level lock.

> But I think that this problem should be addressed in any case: single 
> client update mode is very rare scenario.

This behavior is explained in rules.sgml like this:

+
+Concurrent Transactions
+
+Suppose an IMMV is defined on two base tables and each
+table was modified in different a concurrent transaction simultaneously.
+In the transaction which was committed first, IMMV can 
+be updated considering only the change which happened in this transaction.
+On the other hand, in order to update the view correctly in the transaction
+which was committed later, we need to know the changes occurred in
+both transactions.  For this reason, ExclusiveLock
+is held on an IMMV immediately after a base table is
+modified in READ COMMITTED mode to make sure that
+the IMMV is updated in the latter transaction after
+the former transaction is committed.  In REPEATABLE READ
+or SERIALIZABLE mode, an error is raised immediately
+if lock acquisition fails because any changes which occurred in
+other transactions are not be visible in these modes and 
+IMMV cannot be updated correctly in such situations.
+
+

Hoever, should we describe explicitly its impact on performance here?
 
> I attached to this mail profile of pgbench workload with defined 
> incremental view (with index).
> May be you will find it useful.

Thank you for your profiling! Hmm, it shows that overhead of executing
query for calculating the delta (refresh_mateview_datfill) and applying
the delta (SPI_exec) is large I will investigate if more optimizations
to reduce the overhead is possible.

> 
> One more disappointing observation of materialized views (now 
> non-incremental).
> Time of creation of non-incremental materialized view is about 18 seconds:
> 
> postgres=# create materialized view teller_avgs as

Re: Implementing Incremental View Maintenance

2020-11-24 Thread Konstantin Knizhnik




On 24.11.2020 12:21, Yugo NAGATA wrote:



I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
It is still about 7 times slower than performance without incremental view.
But now the gap is not so dramatic. And it seems to be clear that this
exclusive lock on matview is real show stopper for concurrent updates.
I do not know which race conditions and anomalies we can get if replace
table-level lock with row-level lock here.

I explained it here:
https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp
  
For example, suppose there is a view V = R*S that joins tables R and S,

and there are two concurrent transactions T1 which changes table R to R'
and T2 which changes S to S'. Without any lock,  in READ COMMITTED mode,
V would be updated to R'*S in T1, and R*S' in T2, so it would cause
inconsistency.  By locking the view V, transactions T1, T2 are processed
serially and this inconsistency can be avoided.

Especially, suppose that tuple dR is inserted into R in T1, and dS is
inserted into S in T2, where dR and dS will be joined in according to
the view definition. In this situation, without any lock, the change of V is
computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
be included in the results.  This inconsistency could not be resolved by
row-level lock.


But I think that this problem should be addressed in any case: single
client update mode is very rare scenario.

This behavior is explained in rules.sgml like this:

+
+Concurrent Transactions
+
+Suppose an IMMV is defined on two base tables and each
+table was modified in different a concurrent transaction simultaneously.
+In the transaction which was committed first, IMMV can
+be updated considering only the change which happened in this transaction.
+On the other hand, in order to update the view correctly in the transaction
+which was committed later, we need to know the changes occurred in
+both transactions.  For this reason, ExclusiveLock
+is held on an IMMV immediately after a base table is
+modified in READ COMMITTED mode to make sure that
+the IMMV is updated in the latter transaction after
+the former transaction is committed.  In REPEATABLE READ
+or SERIALIZABLE mode, an error is raised immediately
+if lock acquisition fails because any changes which occurred in
+other transactions are not be visible in these modes and
+IMMV cannot be updated correctly in such situations.
+
+

Hoever, should we describe explicitly its impact on performance here?
  


Sorry, I didn't think much about this problem.
But I think that it is very important to try to find some solution of 
the problem.
The most obvious optimization is not to use exclusive table lock if view 
depends just on one table (contains no joins).

Looks like there are no any anomalies in this case, are there?

Yes, most analytic queries contain joins (just two queries among 22 
TPC-H  have no joins).

So may be this optimization will not help much.

I wonder if it is possible to somehow use predicate locking mechanism of 
Postgres to avoid this anomalies without global lock?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [HACKERS] logical decoding of two-phase transactions

2020-11-24 Thread Ajin Cherian
On Mon, Nov 23, 2020 at 10:35 PM Amit Kapila  wrote:
> For the first two, as the xact is still not visible to others so we
> don't need to make it behave like a committed txn. To make the (DDL)
> changes visible to the current txn, the message
> REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID copies the snapshot which
> fills the subxip array. This will be sufficient to make the changes
> visible to the current txn. For the third, I have checked the code
> that whenever we have any change message the base snapshot gets set
> via SnapBuildProcessChange. It is possible that I have missed
> something but I don't want to call SnapbuildCommittedTxn in
> DecodePrepare unless we have a clear reason for the same so leaving it
> for now. Can you or someone see any reason for the same?

I reviewed and tested this and like you said, SnapBuildProcessChange
sets the base snapshot for every change.
I did various tests using DDL updates and haven't seen any issues so
far. I agree with your analysis.

regards,
Ajin




Re: psql: add \si, \sm, \st and \sr functions to show CREATE commands for indexes, matviews, triggers and tables

2020-11-24 Thread Anastasia Lubennikova

On 18.08.2020 17:25, Tom Lane wrote:

a.pervush...@postgrespro.ru writes:

[ si_st_sm_sr_v2.patch ]

I hadn't particularly noticed this thread before, but I happened to
look through this patch, and I've got to say that this proposed feature
seems like an absolute disaster from a maintenance standpoint.  There
will be no value in an \st command that is only 90% accurate; the produced
DDL has to be 100% correct.  This means that, if we accept this feature,
psql will have to know everything pg_dump knows about how to construct the
DDL describing tables, indexes, views, etc.  That is a lot of code, and
it's messy, and it changes nontrivially on a very regular basis.  I can't
accept that we want another copy in psql --- especially one that looks
nothing like what pg_dump has.

There've been repeated discussions about somehow extracting pg_dump's
knowledge into a library that would also be available to other client
programs (see e.g. the concurrent thread at [1]).  That's quite a tall
order, which is why it's not happened yet.  But I think we really need
to have something like that before we can accept this feature for psql.

BTW, as an example of why this is far more difficult than it might
seem at first glance, this patch doesn't even begin to meet the
expectation stated at the top of describe.c:

  * Support for the various \d ("describe") commands.  Note that the current
  * expectation is that all functions in this file will succeed when working
  * with servers of versions 7.4 and up.  It's okay to omit irrelevant
  * information for an old server, but not to fail outright.

It might be okay for this to cut off at 8.0 or so, as I think pg_dump
does, but not to just fail on older servers.

Another angle, which I'm not even sure how we want to think about it, is
security.  It will not do for "\et" to allow some attacker to replace
function calls appearing in the table's CHECK constraints, for instance.
So this means you've got to be very aware of CVE-2018-1058-style attacks.
Our answer to that for pg_dump has partially depended on restricting the
search_path used at both dump and restore time ... but I don't think \et
gets to override the search path that the psql user is using.  I'm not
sure what that means in practice but it certainly requires some thought
before we add the feature, not after.

Anyway, I can see the attraction of having psql commands like these,
but "write a bunch of new code that we'll have to maintain" does not
seem like a desirable way to get them.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/9df8a3d3-13d2-116d-26ab-6a273c1ed38c%402ndquadrant.com




Since there has been no activity on this thread since before the CF and
no response from the author I have marked this "returned with feedback".

Alexandra, feel free to resubmit it to the next commitfest, when you 
have time to address the issues raised in the review.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Implementing Incremental View Maintenance

2020-11-24 Thread Yugo NAGATA
On Tue, 24 Nov 2020 12:46:57 +0300
Konstantin Knizhnik  wrote:

> 
> 
> On 24.11.2020 12:21, Yugo NAGATA wrote:
> >
> >> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 
> >> connections.
> >> It is still about 7 times slower than performance without incremental view.
> >> But now the gap is not so dramatic. And it seems to be clear that this
> >> exclusive lock on matview is real show stopper for concurrent updates.
> >> I do not know which race conditions and anomalies we can get if replace
> >> table-level lock with row-level lock here.
> > I explained it here:
> > https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp
> >   
> > For example, suppose there is a view V = R*S that joins tables R and S,
> > and there are two concurrent transactions T1 which changes table R to R'
> > and T2 which changes S to S'. Without any lock,  in READ COMMITTED mode,
> > V would be updated to R'*S in T1, and R*S' in T2, so it would cause
> > inconsistency.  By locking the view V, transactions T1, T2 are processed
> > serially and this inconsistency can be avoided.
> >
> > Especially, suppose that tuple dR is inserted into R in T1, and dS is
> > inserted into S in T2, where dR and dS will be joined in according to
> > the view definition. In this situation, without any lock, the change of V is
> > computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
> > be included in the results.  This inconsistency could not be resolved by
> > row-level lock.
> >
> >> But I think that this problem should be addressed in any case: single
> >> client update mode is very rare scenario.
> > This behavior is explained in rules.sgml like this:
> >
> > +
> > +Concurrent Transactions
> > +
> > +Suppose an IMMV is defined on two base tables and 
> > each
> > +table was modified in different a concurrent transaction 
> > simultaneously.
> > +In the transaction which was committed first, IMMV 
> > can
> > +be updated considering only the change which happened in this 
> > transaction.
> > +On the other hand, in order to update the view correctly in the 
> > transaction
> > +which was committed later, we need to know the changes occurred in
> > +both transactions.  For this reason, ExclusiveLock
> > +is held on an IMMV immediately after a base table is
> > +modified in READ COMMITTED mode to make sure that
> > +the IMMV is updated in the latter transaction after
> > +the former transaction is committed.  In REPEATABLE 
> > READ
> > +or SERIALIZABLE mode, an error is raised immediately
> > +if lock acquisition fails because any changes which occurred in
> > +other transactions are not be visible in these modes and
> > +IMMV cannot be updated correctly in such situations.
> > +
> > +
> >
> > Hoever, should we describe explicitly its impact on performance here?
> >   
> 
> Sorry, I didn't think much about this problem.
> But I think that it is very important to try to find some solution of 
> the problem.
> The most obvious optimization is not to use exclusive table lock if view 
> depends just on one table (contains no joins).
> Looks like there are no any anomalies in this case, are there?

Thank you for your suggestion! That makes sense.
 
> Yes, most analytic queries contain joins (just two queries among 22 
> TPC-H  have no joins).
> So may be this optimization will not help much.

Yes, but if a user want to incrementally maintain only aggregate views on a 
large
table, like TPC-H Q1, it will be helpful. For this optimization, we have to only
check the number of RTE in the rtable list and it would be cheap.

> I wonder if it is possible to somehow use predicate locking mechanism of 
> Postgres to avoid this anomalies without global lock?

You mean that, ,instead of using any table lock, if any possibility of the
anomaly is detected using predlock mechanism then abort the transaction?

I don't have concrete idea to implement it and know if it is possible yet,
but I think it is worth to consider this. Thanks.


Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: LogwrtResult contended spinlock

2020-11-24 Thread Anastasia Lubennikova

On 04.09.2020 20:13, Andres Freund wrote:

Hi,

On 2020-09-04 10:05:45 -0700, Andres Freund wrote:

On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:

Looking at patterns like this

if (XLogCtl->LogwrtRqst.Write < EndPos)
XLogCtl->LogwrtRqst.Write = EndPos;

It seems possible to implement with

 do {
XLogRecPtr  currwrite;

 currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
if (currwrite > EndPos)
 break;  // already done by somebody else
 if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
   currwrite, EndPos))
 break;  // successfully updated
 } while (true);

This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
seem good material for a general routine.

This *seems* correct to me, though this is muddy territory to me.  Also,
are there better ways to go about this?

Hm, I was thinking that we'd first go for reading it without a spinlock,
but continuing to write it as we currently do.

But yea, I can't see an issue with what you propose here. I personally
find do {} while () weird and avoid it when not explicitly useful, but
that's extremely minor, obviously.

Re general routine: On second thought, it might actually be worth having
it. Even just for LSNs - there's plenty places where it's useful to
ensure a variable is at least a certain size.  I think I would be in
favor of a general helper function.

Do you mean by general helper function something like this?

void
swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)
{
  while (true) {
    XLogRecPtr  currwrite;

    currwrite = pg_atomic_read_u64(old_value);

    if (to_largest)
  if (currwrite > new_value)
    break;  /* already done by somebody else */
    else
  if (currwrite < new_value)
    break;  /* already done by somebody else */

    if (pg_atomic_compare_exchange_u64(old_value,
   currwrite, new_value))
    break;  /* already done by somebody else */
  }
}


which will be called like
swap_lsn(XLogCtl->LogwrtRqst.Write, EndPos, true);



Greetings,

Andres Freund


This CF entry was inactive for a while. Alvaro, are you going to 
continue working on it?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2

2020-11-24 Thread Michael Paquier
On Sat, Nov 21, 2020 at 10:19:42AM +0900, Michael Paquier wrote:
> What you meant and what I meant was slightly different here.  I meant
> publishing a header in src/include/common/ that would get installed,
> and I'd rather avoid that.  And you mean to have the header for local
> consumption in src/common/.  I would be fine with your third option as
> well.  Your suggestion is more consistent with what we do for the rest
> of src/common/ and libpq actually.  So I don't mind switching to
> that.

I got to look at your suggestion, and finished with the attached which
is pretty close my previous set, except that MSVC scripts as well as
the header includes needed a slight refresh.

Please note that the OpenSSL docs tell that EVP_DigestInit() is
obsolete and that applications should just use EVP_DigestInit_ex(), so
I have kept the original:
https://www.openssl.org/docs/man1.1.1/man3/EVP_DigestInit.html

The PG_CRYPTOHASH macro in cryptohash.h has been changed as you
suggested.  What do you think?
--
Michael
From b4ec42146bfe8c9580c31d32a619e7712c519486 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 24 Nov 2020 19:36:13 +0900
Subject: [PATCH v5 1/3] Rework SHA2 and crypto hash APIs

This will make easier a switch to EVP for the OpenSSL SHA2 layer.  Note
that the layer introduced here is generalized for the purpose of a
future integration with HMAC, MD5, and even more.
---
 src/include/common/checksum_helper.h  |  13 +-
 src/include/common/cryptohash.h   |  40 
 src/include/common/scram-common.h |  17 +-
 src/include/common/sha2.h |  89 +---
 src/include/replication/backup_manifest.h |   3 +-
 src/backend/libpq/auth-scram.c|  94 +
 src/backend/replication/backup_manifest.c |  25 ++-
 src/backend/replication/basebackup.c  |  24 ++-
 src/backend/utils/adt/cryptohashes.c  |  53 +++--
 src/common/Makefile   |   6 +-
 src/common/checksum_helper.c  |  79 +--
 src/common/cryptohash.c   | 189 +
 src/common/cryptohash_openssl.c   | 196 ++
 src/common/scram-common.c | 165 ++-
 src/common/sha2.c |  23 +-
 .../common/sha2.h => common/sha2_int.h}   |  38 +---
 src/common/sha2_openssl.c | 102 -
 src/bin/pg_verifybackup/parse_manifest.c  |  15 +-
 src/bin/pg_verifybackup/pg_verifybackup.c |  24 ++-
 src/interfaces/libpq/fe-auth-scram.c  | 114 +-
 contrib/pgcrypto/internal-sha2.c  | 188 -
 src/tools/msvc/Mkvcbuild.pm   |   3 +-
 src/tools/pgindent/typedefs.list  |   1 +
 23 files changed, 928 insertions(+), 573 deletions(-)
 create mode 100644 src/include/common/cryptohash.h
 create mode 100644 src/common/cryptohash.c
 create mode 100644 src/common/cryptohash_openssl.c
 copy src/{include/common/sha2.h => common/sha2_int.h} (73%)
 delete mode 100644 src/common/sha2_openssl.c

diff --git a/src/include/common/checksum_helper.h b/src/include/common/checksum_helper.h
index 48b0745dad..b07a34e7e4 100644
--- a/src/include/common/checksum_helper.h
+++ b/src/include/common/checksum_helper.h
@@ -14,6 +14,7 @@
 #ifndef CHECKSUM_HELPER_H
 #define CHECKSUM_HELPER_H
 
+#include "common/cryptohash.h"
 #include "common/sha2.h"
 #include "port/pg_crc32c.h"
 
@@ -41,10 +42,10 @@ typedef enum pg_checksum_type
 typedef union pg_checksum_raw_context
 {
 	pg_crc32c	c_crc32c;
-	pg_sha224_ctx c_sha224;
-	pg_sha256_ctx c_sha256;
-	pg_sha384_ctx c_sha384;
-	pg_sha512_ctx c_sha512;
+	pg_cryptohash_ctx *c_sha224;
+	pg_cryptohash_ctx *c_sha256;
+	pg_cryptohash_ctx *c_sha384;
+	pg_cryptohash_ctx *c_sha512;
 } pg_checksum_raw_context;
 
 /*
@@ -66,8 +67,8 @@ typedef struct pg_checksum_context
 extern bool pg_checksum_parse_type(char *name, pg_checksum_type *);
 extern char *pg_checksum_type_name(pg_checksum_type);
 
-extern void pg_checksum_init(pg_checksum_context *, pg_checksum_type);
-extern void pg_checksum_update(pg_checksum_context *, const uint8 *input,
+extern int	pg_checksum_init(pg_checksum_context *, pg_checksum_type);
+extern int	pg_checksum_update(pg_checksum_context *, const uint8 *input,
 			   size_t len);
 extern int	pg_checksum_final(pg_checksum_context *, uint8 *output);
 
diff --git a/src/include/common/cryptohash.h b/src/include/common/cryptohash.h
new file mode 100644
index 00..0e4a6631a3
--- /dev/null
+++ b/src/include/common/cryptohash.h
@@ -0,0 +1,40 @@
+/*-
+ *
+ * cryptohash.h
+ *	  Generic headers for cryptographic hash functions.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		  src/include/common/cryptohash.h
+ *
+ *-

RE: Parallel Inserts in CREATE TABLE AS

2020-11-24 Thread Hou, Zhijie
Hi,

I'm very interested in this feature,
and I'm looking at the patch, here are some comments.

1.
+   if (!TupIsNull(outerTupleSlot))
+   {
+   (void) 
node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+   node->ps.state->es_processed++;
+   }
+
+   if(TupIsNull(outerTupleSlot))
+   break;
+   }

How about the following style:

if(TupIsNull(outerTupleSlot))
Break;

(void) node->ps.dest->receiveSlot(outerTupleSlot, 
node->ps.dest);
node->ps.state->es_processed++;

Which looks cleaner.


2.
+
+   if (into != NULL &&
+   IsA(into, IntoClause))
+   {

The check can be replaced by ISCTAS(into).


3.
+   /*
+* For parallelizing inserts in CTAS i.e. making each
+* parallel worker inerst it's tuples, we must send
+* information such as intoclause(for each worker

'inerst' looks like a typo (insert).


4.
+   /* Estimate space for into clause for CTAS. */
+   if (ISCTAS(planstate->intoclause))
+   {
+   intoclausestr = nodeToString(planstate->intoclause);
+   shm_toc_estimate_chunk(&pcxt->estimator, strlen(intoclausestr) 
+ 1);
+   shm_toc_estimate_keys(&pcxt->estimator, 1);
+   }
...
+   if (intoclausestr != NULL)
+   {
+   char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+   
strlen(intoclausestr) + 1);
+   strcpy(shmptr, intoclausestr);
+   shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+   }

The code here call strlen(intoclausestr) for two times,
After checking the existing code in ExecInitParallelPlan,
It used to store the strlen in a variable.

So how about the following style:

intoclause_len = strlen(intoclausestr);
...
/* Store serialized intoclause. */
intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1);
memcpy(shmptr, intoclausestr, intoclause_len + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);

the code in ExecInitParallelPlan 


5.
+   if (intoclausestr != NULL)
+   {
+   char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+   
strlen(intoclausestr) + 1);
+   strcpy(shmptr, intoclausestr);
+   shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+   }
+
/* Set up the tuple queues that the workers will write into. */
-   pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+   if (intoclausestr == NULL)
+   pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);

The two check about intoclausestr seems can be combined like:

if (intoclausestr != NULL)
{
...
}
else
{
...
}

Best regards,
houzj






Remove cache_plan argument comments to ri_PlanCheck

2020-11-24 Thread Li Japin
Hi, hackers

I found that the cache_plan argument to ri_PlanCheck already been remove since
5b7ba75f7ff854003231e8099e3038c7e2eba875.   I think we can remove the comments
tor cache_plan to ri_PlanCheck.

diff --git a/src/backend/utils/adt/ri_triggers.c 
b/src/backend/utils/adt/ri_triggers.c
index 7e2b2e3dd6..02b1a3868f 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2130,9 +2130,6 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, 
uint32 hashvalue)

 /*
  * Prepare execution plan for a query to enforce an RI restriction
- *
- * If cache_plan is true, the plan is saved into our plan hashtable
- * so that we don't need to plan it again.
  */
 static SPIPlanPtr
 ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,

--
Best regards
Japin Li
ChengDu WenWu Information Technology Co.,Ltd.



cache_plan-to-ri_PlanCheck.diff
Description: cache_plan-to-ri_PlanCheck.diff


Re: [HACKERS] Custom compression methods

2020-11-24 Thread Dilip Kumar
On Sat, Nov 21, 2020 at 3:50 AM Robert Haas  wrote:

Most of the comments looks fine to me but I have a slightly different
opinion for one of the comment so replying only for that.

> I'm worried about how expensive this might be, and I think we could
> make it cheaper. The reason why I think this might be expensive is:
> currently, for every datum, you have a single direct function call.
> Now, with this, you first have a direct function call to
> GetCompressionOidFromCompressionId(). Then you have a call to
> GetCompressionRoutine(), which does a syscache lookup and calls a
> handler function, which is quite a lot more expensive than a single
> function call. And the handler isn't even returning a statically
> allocated structure, but is allocating new memory every time, which
> involves more function calls and maybe memory leaks. Then you use the
> results of all that to make an indirect function call.
>
> I'm not sure exactly what combination of things we could use to make
> this better, but it seems like there are a few possibilities:
>
> (1) The handler function could return a pointer to the same
> CompressionRoutine every time instead of constructing a new one every
> time.
> (2) The CompressionRoutine to which the handler function returns a
> pointer could be statically allocated instead of being built at
> runtime.
> (3) GetCompressionRoutine could have an OID -> handler cache instead
> of relying on syscache + calling the handler function all over again.
> (4) For the compression types that have dedicated bit patterns in the
> high bits of the compressed TOAST size, toast_compress_datum() could
> just have hard-coded logic to use the correct handlers instead of
> translating the bit pattern into an OID and then looking it up over
> again.
> (5) Going even further than #4 we could skip the handler layer
> entirely for such methods, and just call the right function directly.
>
> I think we should definitely do (1), and also (2) unless there's some
> reason it's hard. (3) doesn't need to be part of this patch, but might
> be something to consider later in the series. It's possible that it
> doesn't have enough benefit to be worth the work, though. Also, I
> think we should do either (4) or (5). I have a mild preference for (5)
> unless it looks too ugly.
>
> Note that I'm not talking about hard-coding a fast path for a
> hard-coded list of OIDs - which would seem a little bit unprincipled -
> but hard-coding a fast path for the bit patterns that are themselves
> hard-coded. I don't think we lose anything in terms of extensibility
> or even-handedness there; it's just avoiding a bunch of rigamarole
> that doesn't really buy us anything.
>
> All these points apply equally to toast_decompress_datum_slice() and
> toast_compress_datum().

I agree that (1) and (2) we shall definitely do as part of the first
patch and (3) we might do in later patches.  I think from (4) and (5)
I am more inclined to do (4) for a couple of reasons
a) If we bypass the handler function and directly calls the
compression and decompression routines then we need to check whether
the current executable is compiled with this particular compression
library or not for example in 'lz4handler' we have this below check,
now if we don't have the handler function we either need to put this
in each compression/decompression functions or we need to put is in
each caller.
Datum
lz4handler(PG_FUNCTION_ARGS)
{
#ifndef HAVE_LIBLZ4
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("not built with lz4 support")));
#else

b) Another reason is that once we start supporting the compression
options (0006-Support-compression-methods-options.patch) then we also
need to call 'cminitstate_function' for parsing the compression
options and then calling the compression function, so we need to
hardcode multiple function calls.

I think b) is still okay but because of a) I am more inclined to do
(4), what is your opinion on this?

About (4), one option is that we directly call the correct handler
function for the built-in type directly from
toast_(de)compress(_slice) functions but in that case, we are
duplicating the code, another option is that we call the
GetCompressionRoutine() a common function and in that, for the
built-in type, we can directly call the corresponding handler function
and get the routine.  The only thing is to avoid duplicating in
decompression routine we need to convert CompressionId to Oid before
calling GetCompressionRoutine(), but now we can avoid sys cache lookup
for the built-in type.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: pgbench and timestamps (bounced)

2020-11-24 Thread Anastasia Lubennikova

On 11.09.2020 16:59, Fabien COELHO wrote:


Hello Tom,


It requires a mutex around the commands, I tried to do some windows
implementation which may or may not work.


Ugh, I'd really rather not do that.  Even disregarding the effects
of a mutex, though, my initial idea for fixing this has a big problem:
if we postpone PREPARE of the query until first execution, then it's
happening during timed execution of the benchmark scenario and thus
distorting the timing figures.  (Maybe if we'd always done it like
that, it'd be okay, but I'm quite against changing the behavior now
that it's stood for a long time.)


Hmmm.

Prepare is done *once* per client, ISTM that the impact on any 
statistically significant benchmark is nul in practice, or it would 
mean that the benchmark settings are too low.


Second, the mutex is only used when absolutely necessary, only for the 
substitution part of the query (replacing :stuff by ?), because 
scripts are shared between threads. This is just once, in an unlikely 
case occuring at the beginning.



However, perhaps there's more than one way to fix this.  Once we've
scanned all of the script and seen all the \set commands, we know
(in principle) the set of all variable names that are in use.
So maybe we could fix this by

(1) During the initial scan of the script, make variable-table
entries for every \set argument, with the values shown as undefined
for the moment.  Do not try to parse SQL commands in this scan,
just collect them.


The issue with this approach is

  SELECT 1 AS one \gset pref_

which will generate a "pref_one" variable, and these names cannot be 
guessed without SQL parsing and possibly execution. That is why the

preparation is delayed to when the variables are actually known.


(2) Make another scan in which we identify variable references
in the SQL commands and issue PREPAREs (if enabled).



(3) Perform the timed run.

This avoids any impact of this bug fix on the semantics or timing
of the benchmark proper.  I'm not sure offhand whether this
approach makes any difference for the concerns you had about
identifying/suppressing variable references inside quotes.


I do not think this plan is workable, because of the \gset issue.

I do not see that the conditional mutex and delayed PREPARE would have 
any significant (measurable) impact on an actual (reasonable) 
benchmark run.


A workable solution would be that each client actually execute each 
script once before starting the actual benchmark. It would still need 
a mutex and also a sync barrier (which I'm proposing in some other 
thread). However this may raise some other issues because then some 
operations would be trigger out of the benchmarking run, which may or 
may not be desirable.


So I'm not to keen to go that way, and I think the proposed solution 
is reasonable from a benchmarking point of view as the impact is 
minimal, although not zero.



CFM reminder.

Hi, this entry is "Waiting on Author" and the thread was inactive for a 
while. I see this discussion still has some open questions. Are you 
going to continue working on it, or should I mark it as "returned with 
feedback" until a better time?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Libpq support to connect to standby server as priority

2020-11-24 Thread Anastasia Lubennikova

On 30.09.2020 10:57, Greg Nancarrow wrote:

Thanks for your thoughts, patches and all the pointers.
I'll be looking at all of them.
(And yes, the comma instead of bitwise OR is of course an error,
somehow made and gone unnoticed; the next field in the struct is an
enum, so accepts any int value).

Regards,
Greg Nancarrow
Fujitsu Australia


CFM reminder.

Hi, this entry is "Waiting on Author" and the thread was inactive for a 
while. As far as I see, the patch needs some further work.
Are you going to continue working on it, or should I mark it as 
"returned with feedback" until a better time?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: libpq compression

2020-11-24 Thread Konstantin Knizhnik
Based on Andres review I have implemented the following changes in 
libpq_compression:


1. Make it possible to specify list of compression algorithms in 
connection string.

2. Make it possible to specify compression level.
3. Use "_pq_.compression" instead of "compression"  in startup package.
4. Use full names instead of one-character encoding for compression 
algorithm names.


So now it is possible to open connection in this way:

    psql "dbname=postgres compression=zstd:5,zlib"


New version of the patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/configure b/configure
index ace4ed5..deba608 100755
--- a/configure
+++ b/configure
@@ -700,6 +700,7 @@ LD
 LDFLAGS_SL
 LDFLAGS_EX
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 XML2_LIBS
@@ -867,6 +868,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 '
@@ -8571,6 +8573,85 @@ fi
 
 
 
+#
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+  LIBS="-lzstd $LIBS"
+
+else
+  as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
+
 
 #
 # Zlib
diff --git a/configure.ac b/configure.ac
index 5b91c83..93a5285 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1000,6 +1000,13 @@ PGAC_ARG_BOOL(with, zlib, yes,
 AC_SUBST(with_zlib)
 
 #
+# Zstd
+#
+PGAC_ARG_BOOL(with, zstd, no,
+  [use zstd])
+AC_SUBST(with_zstd)
+
+#
 # Assignments
 #
 
@@ -1186,6 +1193,14 @@ failure.  It is possible the compiler isn't looking in the proper directory.
 Use --without-zlib to disable zlib support.])])
 fi
 
+if test "$with_zstd" = yes; then
+  AC_CHECK_LIB(zstd, ZSTD_decompressStream, [],
+   [AC_MSG_ERROR([zstd library not found
+If you have zstd already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-zstd to disable zstd support.])])
+fi
+
 if test "$enable_spinlocks" = yes; then
   AC_DEFINE(HAVE_SPINLOCKS, 1, [Define to 1 if you have spinlocks.])
 else
@@ -1400,6 +1415,13 @@ failure.  It is possible the compiler isn't looking in the proper directory.
 Use --without-zlib to disable zlib support.])])
 fi
 
+if test "$with_zstd" = yes; then
+  AC_CHECK_HEADER(zstd.h, [], [AC_MSG_ERROR([zstd header not found
+If you have zstd already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-zstd to disable zstd support.])])
+fi
+
 if test "$with_gssapi" = yes ; then
   AC_CHECK_HEADERS(gssapi/gssapi.h, [],
 	[AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])])
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 9ce32fb..140724d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1225,6 +1225,22 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  compression
+  
+  
+Request compression of libpq traffic. Client sends to the server list of compression algorithms, supported by client library.
+If server supports one of this algorithms, then it acknowledges use of this algorithm and then all libpq messages send both from client to server and
+visa versa will be compressed. If server is not supporting any of the suggested algorithms, then it replies with 'n' (no compression)
+message and it is up to the cl

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-11-24 Thread Magnus Hagander
On Fri, Nov 20, 2020 at 4:46 PM Peter Eisentraut
 wrote:
>
> On 2020-11-09 13:05, Magnus Hagander wrote:
> > PFA a rebased version of this patch on top of what has happened since,
> > and changing the pg_upgrade parameter to be --no-scripts.
>
> It seems were are still finding out more nuances about pg_upgrade, but
> looking at initdb for moment, I think the solution for wrapper scripts
> is to just run initdb with >dev/null.  Or maybe if that looks a bit too
> hackish, a --quiet option that turns everything on stdout off.
>
> I think initdb has gotten a bit too chatty over time.  I think if it
> printed nothing on stdout by default and the current output would be
> some kind of verbose or debug mode, we wouldn't really lose much.  With
> that in mind, I'm a bit concerned about adding options (and thus
> documentation surface area etc.) to select exactly which slice of the
> chattiness to omit.

I agree that it's getting unnecessarily chatty, but things like the
locale that it has detected I think is very useful information to
output. Though I guess the same could be said for a few other things,
but does it *ever' pick anything other than 128Mb/100 for example? :)

The main difference between them is that some information is
informational but unnecessary, but the "next steps instructions" are
*incorrect* in most cases when executed by a wrapper. I'd argue that
even if we show them only with --verbose, we should still have a way
of not outputing the information that's going to be incorrect for the
end user.

I think it boils down to that today the output from initdb is entirely
geared towards people running initdb directly and starting their
server manually, and very few people outside the actual PostgreSQL
developers ever do that. But there are still a lot of people who run
initdb through their wrapper manually (for redhat you have to do that,
for debian you only have to do it if you're creating a secondary
cluster but that still a pretty common operation).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: [HACKERS] Custom compression methods

2020-11-24 Thread Robert Haas
On Tue, Nov 24, 2020 at 7:11 AM Dilip Kumar  wrote:
> About (4), one option is that we directly call the correct handler
> function for the built-in type directly from
> toast_(de)compress(_slice) functions but in that case, we are
> duplicating the code, another option is that we call the
> GetCompressionRoutine() a common function and in that, for the
> built-in type, we can directly call the corresponding handler function
> and get the routine.  The only thing is to avoid duplicating in
> decompression routine we need to convert CompressionId to Oid before
> calling GetCompressionRoutine(), but now we can avoid sys cache lookup
> for the built-in type.

Suppose that we have a variable lz4_methods (like heapam_methods) that
is always defined, whether or not lz4 support is present. It's defined
like this:

const CompressionAmRoutine lz4_compress_methods = {
.datum_compress = lz4_datum_compress,
.datum_decompress = lz4_datum_decompress,
.datum_decompress_slice = lz4_datum_decompress_slice
};

(It would be good, I think, to actually name things something like
this - in particular why would we have TableAmRoutine and
IndexAmRoutine but not include "Am" in the one for compression? In
general I think tableam is a good pattern to adhere to and we should
try to make this patch hew closely to it.)

Then those functions are contingent on #ifdef HAVE_LIBLZ4: they either
do their thing, or complain that lz4 compression is not supported.
Then in this function you can just say, well, if we have the 01 bit
pattern, handler = &lz4_compress_methods and proceed from there.

BTW, I think the "not supported" message should probably use the 'by
this build' language we use in some places i.e.

[rhaas pgsql]$ git grep errmsg.*'this build' | grep -vF .po:
contrib/pg_prewarm/pg_prewarm.c: errmsg("prefetch is not supported by
this build")));
src/backend/libpq/be-secure-openssl.c: (errmsg("\"%s\" setting \"%s\"
not supported by this build",
src/backend/libpq/be-secure-openssl.c: (errmsg("\"%s\" setting \"%s\"
not supported by this build",
src/backend/libpq/hba.c: errmsg("local connections are not supported
by this build"),
src/backend/libpq/hba.c: errmsg("hostssl record cannot match because
SSL is not supported by this build"),
src/backend/libpq/hba.c: errmsg("hostgssenc record cannot match
because GSSAPI is not supported by this build"),
src/backend/libpq/hba.c: errmsg("invalid authentication method \"%s\":
not supported by this build",
src/backend/utils/adt/pg_locale.c: errmsg("ICU is not supported in
this build"), \
src/backend/utils/misc/guc.c: GUC_check_errmsg("Bonjour is not
supported by this build");
src/backend/utils/misc/guc.c: GUC_check_errmsg("SSL is not supported
by this build");

-- 
Robert Haas
EDB: http://www.enterprisedb.com




About adding a new filed to a struct in primnodes.h

2020-11-24 Thread Andy Fan
Hi:

For example, we added a new field in a node  in primnodes.h

struct FuncExpr
{

 +  int newf;
};

then we modified the copy/read/out functions for this node.  In
_readFuncExpr,
we probably add something like

static FuncExpr
_readFuncExpr(..)
{
..
+ READ_INT_FILED(newf);
};

Then we will get a compatible issue if we create a view with the node in
the older version and access the view with the new binary.  I think we can
have bypass this issue easily with something like

READ_INT_FIELD_UNMUST(newf, defaultvalue);

However I didn't see any code like this in our code base.   does it doesn't
work or is it something not worth doing?

-- 
Best Regards
Andy Fan


Re: Remove cache_plan argument comments to ri_PlanCheck

2020-11-24 Thread Amit Kapila
On Tue, Nov 24, 2020 at 4:46 PM Li Japin  wrote:
>
> Hi, hackers
>
> I found that the cache_plan argument to ri_PlanCheck already been remove since
> 5b7ba75f7ff854003231e8099e3038c7e2eba875.   I think we can remove the comments
> tor cache_plan to ri_PlanCheck.
>
> diff --git a/src/backend/utils/adt/ri_triggers.c 
> b/src/backend/utils/adt/ri_triggers.c
> index 7e2b2e3dd6..02b1a3868f 100644
> --- a/src/backend/utils/adt/ri_triggers.c
> +++ b/src/backend/utils/adt/ri_triggers.c
> @@ -2130,9 +2130,6 @@ InvalidateConstraintCacheCallBack(Datum arg, int 
> cacheid, uint32 hashvalue)
>
>  /*
>   * Prepare execution plan for a query to enforce an RI restriction
> - *
> - * If cache_plan is true, the plan is saved into our plan hashtable
> - * so that we don't need to plan it again.
>   */
>  static SPIPlanPtr
>  ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
>

Your patch looks good to me.

-- 
With Regards,
Amit Kapila.




Re: vac_update_datfrozenxid will raise "wrong tuple length" if pg_database tuple contains toast attribute.

2020-11-24 Thread Amit Kapila
On Tue, Nov 24, 2020 at 2:07 PM Junfeng Yang  wrote:
>
> Hi hackers,
>
> Can anyone help to verify this?
>

I think one way to get feedback is to register this patch for the next
commit fest (https://commitfest.postgresql.org/31/)

-- 
With Regards,
Amit Kapila.




Re: [patch] CLUSTER blocks scanned progress reporting

2020-11-24 Thread Fujii Masao




On 2020/11/21 2:32, Matthias van de Meent wrote:

Hi,

The pg_stat_progress_cluster view can report incorrect
heap_blks_scanned values when synchronize_seqscans is enabled, because
it allows the sequential heap scan to not start at block 0. This can
result in wraparounds in the heap_blks_scanned column when the table
scan wraps around, and starting the next phase with heap_blks_scanned
!= heap_blks_total. This issue was introduced with the
pg_stat_progress_cluster view.


Good catch! I agree that this is a bug.



The attached patch fixes the issue by accounting for a non-0
heapScan->rs_startblock and calculating the correct number with a
non-0 heapScan->rs_startblock in mind.


Thanks for the patch! It basically looks good to me.

It's a bit waste of cycles to calculate and update the number of scanned
blocks every cycles. So I'm inclined to change the code as follows.
Thought?

+   BlockNumber prev_cblock = InvalidBlockNumber;

+   if (prev_cblock != heapScan->rs_cblock)
+   {
+   
pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+  
  (heapScan->rs_cblock +
+  
   heapScan->rs_nblocks -
+  
   heapScan->rs_startblock
+  
  ) % heapScan->rs_nblocks + 1);
+   prev_cblock = heapScan->rs_cblock;
+   }

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Migration Oracle multitenant database to PostgreSQL ?

2020-11-24 Thread Bruce Momjian
On Tue, Nov 24, 2020 at 09:22:26AM +0100, Thomas Kellerer wrote:
> 
> ROS Didier schrieb am 24.11.2020 um 09:09:
> > I would like to know if it is possible to migrate Oracle multitenant
> > database (with multiple PDB) to PostgreSQL ?
> Postgres' databases are very similar to Oracle's PDBs.
> 
> Probably the biggest difference is, that you can't shutdown
> a single database as you can do with a PDB.

I guess you could lock users out of a single database by changing
pg_hba.conf and doing reload.

> Database users in Postgres are like Oracle's "common users", they are
> global for the whole instance (aka "cluster" in Postgres' terms). There
> are no database specific users.

Good to know.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: [PATCH] LWLock self-deadlock detection

2020-11-24 Thread Ashutosh Bapat
This looks useful. LWLockCheckSelfDeadlock() could use LWLockHeldByMe
variant instead of copying that code with possibly a change in that
function to return the required information.

I am also seeing a pattern
Assert(LWLockHeldByMe*())
LWLockAcquire()

at some places. Should we change LWLockAcquire to do
Assert(LWLockHelpByMe()) always to detect such occurrences? Enhance
that pattern to print the information that your patch prints?

It looks weird that we can detect a self deadlock but not handle it.
But handling it requires remembering multiple LWLocks held and then
release them those many times. That may not leave LWLocks LW anymore.

On Thu, Nov 19, 2020 at 4:02 PM Craig Ringer
 wrote:
>
> Hi all
>
> Here's a patch I wrote a while ago to detect and report when a 
> LWLockAcquire() results in a simple self-deadlock due to the caller already 
> holding the LWLock.
>
> To avoid affecting hot-path performance, it only fires the check on the first 
> iteration through the retry loops in LWLockAcquire() and LWLockWaitForVar(), 
> and just before we sleep, once the fast-path has been missed.
>
> I wrote an earlier version of this when I was chasing down some hairy issues 
> with background workers deadlocking on some exit paths because ereport(ERROR) 
> or elog(ERROR) calls fired when a LWLock was held would cause a 
> before_shmem_exit or on_shmem_exit cleanup function to deadlock when it tried 
> to acquire the same lock.
>
> But it's an easy enough mistake to make and a seriously annoying one to track 
> down, so I figured I'd post it for consideration. Maybe someone else will get 
> some use out of it even if nobody likes the idea of merging it.
>
> As written the check runs only for --enable-cassert builds or when LOCK_DEBUG 
> is defined.



-- 
Best Wishes,
Ashutosh Bapat




Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-11-24 Thread Bruce Momjian
On Tue, Nov 24, 2020 at 01:32:45PM +0100, Magnus Hagander wrote:
> I think it boils down to that today the output from initdb is entirely
> geared towards people running initdb directly and starting their
> server manually, and very few people outside the actual PostgreSQL
> developers ever do that. But there are still a lot of people who run
> initdb through their wrapper manually (for redhat you have to do that,
> for debian you only have to do it if you're creating a secondary
> cluster but that still a pretty common operation).

I think the big issue is that pg_upgrade not only output progress
messages, but created files in the current directory, while initdb, by
definition, is creating files in PGDATA.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: [doc] plan invalidation when statistics are update

2020-11-24 Thread Fujii Masao




On 2020/11/19 14:33, torikoshia wrote:

On 2020-11-18 11:35, Fujii Masao wrote:

Thanks for your comment!


On 2020/11/18 11:04, torikoshia wrote:

Hi,

AFAIU, when the planner statistics are updated, generic plans are invalidated 
and PostgreSQL recreates. However, the manual doesn't seem to explain it 
explicitly.

   https://www.postgresql.org/docs/devel/sql-prepare.html

I guess this case is included in 'whenever database objects used in the 
statement have definitional (DDL) changes undergone', but I feel it's hard to 
infer.

Since updates of the statistics can often happen, how about describing this 
case explicitly like an attached patch?


+1 to add that note.

-   statement.  Also, if the value of  changes
+   statement. For example, when the planner statistics of the statement
+   are updated, PostgreSQL re-analyzes and
+   re-plans the statement.

I don't think "For example," is necessary.

"planner statistics of the statement" sounds vague? Does the statement
is re-analyzed and re-planned only when the planner statistics of database
objects used in the statement are updated? If yes, we should describe
that to make the note a bit more explicitly?


Yes. As far as I confirmed, updating statistics which are not used in
prepared statements doesn't trigger re-analyze and re-plan.

Since plan invalidations for DDL changes and statistcal changes are caused
by PlanCacheRelCallback(Oid 'relid'), only the prepared statements using
'relid' relation seem invalidated.> 
Attached updated patch.


Thanks for confirming that and updating the patch!
Barring any objection, I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: LogwrtResult contended spinlock

2020-11-24 Thread Alvaro Herrera
On 2020-Nov-24, Anastasia Lubennikova wrote:

> On 04.09.2020 20:13, Andres Freund wrote:

> > Re general routine: On second thought, it might actually be worth having
> > it. Even just for LSNs - there's plenty places where it's useful to
> > ensure a variable is at least a certain size.  I think I would be in
> > favor of a general helper function.
> Do you mean by general helper function something like this?
> 
> void
> swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)

Something like that, yeah, though maybe name it "pg_atomic_increase_lsn"
or some similar name that makes it clear that 

1. it is supposed to use atomics
2. it can only be used to *advance* a value rather than a generic swap.

(I'm not 100% clear that that's the exact API we need.)

> This CF entry was inactive for a while. Alvaro, are you going to continue
> working on it?

Yes, please move it forward.  I'll post an update sometime before the
next CF.




Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-11-24 Thread Magnus Hagander
On Tue, Nov 24, 2020 at 3:12 PM Bruce Momjian  wrote:
>
> On Tue, Nov 24, 2020 at 01:32:45PM +0100, Magnus Hagander wrote:
> > I think it boils down to that today the output from initdb is entirely
> > geared towards people running initdb directly and starting their
> > server manually, and very few people outside the actual PostgreSQL
> > developers ever do that. But there are still a lot of people who run
> > initdb through their wrapper manually (for redhat you have to do that,
> > for debian you only have to do it if you're creating a secondary
> > cluster but that still a pretty common operation).
>
> I think the big issue is that pg_upgrade not only output progress
> messages, but created files in the current directory, while initdb, by
> definition, is creating files in PGDATA.

To be clear, my comments above were primarily about initdb, not
pg_upgrade, as that's what Peter was commenting on as well.

pg_upgrade is a somewhat different but also interesting case. I think
the actual progress output is more interesting in pg_upgrade as it's
more likely to take measurable amounts of time. Whereas in initdb,
it's actually the "detected parameter values" that are the most
interesting parts.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: walsender bug: stuck during shutdown

2020-11-24 Thread Alvaro Herrera
Hello,

On 2020-Nov-24, Fujii Masao wrote:

> Thanks for working on this!
> Could you tell me the discussion thread where Chloe Dives reported the issue 
> to?
> Sorry I could not find that..

It was not public -- sorry I didn't make that clear.

> I'd like to see the procedure to reproduce the issue.

Here's the script.


Thanks!
import psycopg2

from psycopg2.extras import LogicalReplicationConnection, REPLICATION_LOGICAL


def _logical_replication_callback(message):
''' Deal with a single audit_json message; see _process_message. We get one message, therefore one
call to this method, per committed transaction on the source database.
'''
print("Raw message: " + message)
message.cursor.send_feedback(flush_lsn=message.data_start)


def main():
slot_name = 'snitch_papersnap_testing'

connection = psycopg2.connect(
host='fab-devdb02',
port=5432,
dbname='postgres',
user='postgres',
connection_factory=LogicalReplicationConnection,
)

with connection.cursor() as cursor:
cursor.execute("SELECT COUNT(*) FROM pg_replication_slots WHERE slot_name = %s", (slot_name,))
slot_exists, = cursor.fetchone()

if slot_exists:
cursor.drop_replication_slot(slot_name)
slot_exists = False

if not slot_exists:
cursor.create_replication_slot(slot_name, REPLICATION_LOGICAL, output_plugin='test_decoding')

cursor.start_replication(slot_name, REPLICATION_LOGICAL, decode=True)
print("Logical replication started")
cursor.consume_stream(_logical_replication_callback)


if __name__ == '__main__':
main()


Re: About adding a new filed to a struct in primnodes.h

2020-11-24 Thread Alvaro Herrera
On 2020-Nov-24, Andy Fan wrote:

> then we modified the copy/read/out functions for this node.  In
> _readFuncExpr,
> we probably add something like

> [ ... ]

> Then we will get a compatible issue if we create a view with the node in
> the older version and access the view with the new binary.

When nodes are modified, you have to increment CATALOG_VERSION_NO which
makes the new code incompatible with a datadir previously created -- for
precisely this reason.




Re: bug in pageinspect's "tuple data" feature

2020-11-24 Thread Alvaro Herrera
On 2020-Nov-24, Michael Paquier wrote:

> On Mon, Nov 23, 2020 at 09:11:26AM +0200, Heikki Linnakangas wrote:
> > On 21/11/2020 21:32, Alvaro Herrera wrote:
> >> This is pretty unhelpful; it would be better not to try to print the
> >> data instead of dying.  With that, at least you can know where the
> >> problem is.
> >> 
> >> This was introduced in d6061f83a166 (2015).  Proposed patch to fix it
> >> (by having the code print a null "data" instead of dying) is attached.
> > 
> > Null seems misleading. Maybe something like "invalid", or print a warning?

Good idea, thanks.

> How did you get into this state to begin with?

The data was corrupted for whatever reason.  I don't care why or how, I
just need to fix it.  If the data isn't corrupted, then I don't use
pageinspect in the first place.

> get_raw_page() uses ReadBufferExtended() which gives some level of
> protection already, so shouldn't it be better to return an ERROR with
> ERRCODE_DATA_CORRUPTED and the block involved?

What would I gain from doing that?  It's even more unhelpful, because it
is intentional rather than accidental.




Re: Terminate the idle sessions

2020-11-24 Thread David G. Johnston
On Mon, Nov 23, 2020 at 11:22 PM Li Japin  wrote:

>
> How about use “foreign-data wrapper” replace “postgres_fdw”?
>

I don't see much value in avoiding mentioning that specific term - my
proposal turned it into an example instead of being exclusive.


> - This parameter should be set to zero if you use some
> connection-pooling software,
> - or pg servers used by postgres_fdw, because connections might be
> closed unexpectedly.
> + This parameter should be set to zero if you use
> connection-pooling software,
> + or PostgreSQL servers connected to
> using foreign-data
> + wrapper, because connections might be closed unexpectedly.
>  
>

Maybe:

+ or your PostgreSQL server receives connection from postgres_fdw or
similar middleware.
+ Such software is expected to self-manage its connections.
David J.


Re: [patch] CLUSTER blocks scanned progress reporting

2020-11-24 Thread Matthias van de Meent
On Tue, 24 Nov 2020 at 15:05, Fujii Masao  wrote:
>
> On 2020/11/21 2:32, Matthias van de Meent wrote:
> > Hi,
> >
> > The pg_stat_progress_cluster view can report incorrect
> > heap_blks_scanned values when synchronize_seqscans is enabled, because
> > it allows the sequential heap scan to not start at block 0. This can
> > result in wraparounds in the heap_blks_scanned column when the table
> > scan wraps around, and starting the next phase with heap_blks_scanned
> > != heap_blks_total. This issue was introduced with the
> > pg_stat_progress_cluster view.
>
> Good catch! I agree that this is a bug.
>
> >
> > The attached patch fixes the issue by accounting for a non-0
> > heapScan->rs_startblock and calculating the correct number with a
> > non-0 heapScan->rs_startblock in mind.
>
> Thanks for the patch! It basically looks good to me.

Thanks for the feedback!

> It's a bit waste of cycles to calculate and update the number of scanned
> blocks every cycles. So I'm inclined to change the code as follows.
> Thought?
>
> +   BlockNumber prev_cblock = InvalidBlockNumber;
> 
> +   if (prev_cblock != heapScan->rs_cblock)
> +   {
> +   
> pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
> + 
>(heapScan->rs_cblock +
> + 
> heapScan->rs_nblocks -
> + 
> heapScan->rs_startblock
> + 
>) % heapScan->rs_nblocks + 1);
> +   prev_cblock = heapScan->rs_cblock;
> +   }

That seems quite reasonable.

I noticed that with my proposed patch it is still possible to go to
the next phase while heap_blks_scanned != heap_blks_total. This can
happen when the final heap pages contain only dead tuples, so no tuple
is returned from the last heap page(s) of the scan. As the
heapScan->rs_cblock is set to InvalidBlockNumber when the scan is
finished (see heapam.c#1060-1072), I think it would be correct to set
heap_blks_scanned to heapScan->rs_nblocks at the end of the scan
instead.

Please find attached a patch applying the suggested changes.

Matthias van de Meent
From b3327cace3bebdb15006834e21672fc30cb2f0bb Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 20 Nov 2020 16:23:59 +0100
Subject: [PATCH v2] Fix CLUSTER progress reporting of number of blocks
 scanned.

The heapScan need not start at block 0, so heapScan->rs_cblock need not be the
correct value for amount of blocks scanned. A more correct value is
 ((heapScan->rs_cblock - heapScan->rs_startblock + heapScan->rs_nblocks) %
   heapScan->rs_nblocks), as it accounts for the wraparound and the initial
offset of the heapScan.

Additionally, a heap scan need not return tuples from the last scanned page.
This means that when table_scan_getnextslot returns false, we must manually
update the heap_blks_scanned parameter to the number of blocks in the heap
scan.
---
 src/backend/access/heap/heapam_handler.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dcaea7135f..f20d4bed07 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -698,6 +698,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 	Datum	   *values;
 	bool	   *isnull;
 	BufferHeapTupleTableSlot *hslot;
+	BlockNumber prev_cblock = InvalidBlockNumber;
 
 	/* Remember if it's a system catalog */
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -793,14 +794,37 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		else
 		{
 			if (!table_scan_getnextslot(tableScan, ForwardScanDirection, slot))
+			{
+/*
+ * A heap scan need not return tuples for the last page it has
+ * scanned. To ensure that heap_blks_scanned is equivalent to
+ * total_heap_blks after the table scan phase, this parameter
+ * is manually updated to the correct value when the table scan
+ * finishes.
+ */
+pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+			 heapScan->rs_nblocks);
 break;
+			}
 
 			/*
 			 * In scan-and-sort mode and also VACUUM FULL, set heap blocks
 			 * scanned
+			 *
+			 * Note that heapScan may start at an offset and wrap around, i.e.
+			 * rs_startblock may be >0, and rs_cblock may end with a number
+			 * below rs_startblock. To prevent showing this wraparound to the
+			 * user, we offset rs_cblock by rs_startblock (modulo rs_nblocks).
 			 */
-			pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
-		 heapScan->rs_cblo

Re: abstract Unix-domain sockets

2020-11-24 Thread David G. Johnston
On Mon, Nov 23, 2020 at 9:00 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> Or is it the case that we always attempt to bind the TCP/IP port,
> regardless of the presence of a socket file, in which case the failure for
> port binding does cover the socket situation as well?
>

This cannot always be the case since the listened-to IP address matters.

I think the socket file error message hint is appropriate.  I'd consider it
a bug if that code is effectively unreachable (the fact that the hint
exists supports this conclusion).  If we add "abstract unix sockets" where
we likewise prevent two servers from listening on the same channel, the
absence of such a check for the socket file is even more unexpected.  At
minimum we should at least declare whether we will even try and whether
such a socket file check is best effort or simply generally reliable.

David J.


Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-11-24 Thread Bruce Momjian
On Tue, Nov 24, 2020 at 04:05:26PM +0100, Magnus Hagander wrote:
> pg_upgrade is a somewhat different but also interesting case. I think
> the actual progress output is more interesting in pg_upgrade as it's
> more likely to take measurable amounts of time. Whereas in initdb,
> it's actually the "detected parameter values" that are the most
> interesting parts.

Originally, initdb did take some time for each step.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-11-24 Thread Justin Pryzby
On Sat, Oct 31, 2020 at 01:36:11PM -0500, Justin Pryzby wrote:
> > > From the grammar perspective ANY option is available for any command
> > > that uses parenthesized option list. All the checks and validations
> > > are performed at the corresponding command code.
> > > This analyze_keyword is actually doing only an ANALYZE word
> > > normalization if it's used as an option. Why it could be harmful?
> > 
> > Michael has not replied since then, but he was relatively positive about
> > 0005 initially, so I put it as a first patch now.
> 
> Thanks.  I rebased Alexey's latest patch on top of recent changes to 
> cluster.c.
> This puts the generic grammar changes first.  I wasn't paying much attention 
> to
> that part, so still waiting for a committer review.

@cfbot: rebased
>From 4a8e71ac704bd4c58e54703298b5234946c666a3 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 2 Sep 2020 23:05:16 +0300
Subject: [PATCH v30 1/6] Refactor gram.y in order to add a common
 parenthesized option list

Previously there were two identical option lists
(explain_option_list and vac_analyze_option_list) + very similar
reindex_option_list.  It does not seem to make
sense to maintain identical option lists in the grammar, since
all new options are added and parsed in the backend code.

That way, new common_option_list added in order to replace all
explain_option_list, vac_analyze_option_list and probably
also reindex_option_list.
---
 src/backend/parser/gram.y | 61 +--
 1 file changed, 14 insertions(+), 47 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index efc9c99754..5c86063459 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -315,10 +315,10 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 create_extension_opt_item alter_extension_opt_item
 
 %type 	opt_lock lock_type cast_context
-%type 		vac_analyze_option_name
-%type 	vac_analyze_option_elem
-%type 	vac_analyze_option_list
-%type 	vac_analyze_option_arg
+%type 		common_option_name
+%type 	common_option_elem
+%type 	common_option_list
+%type 	common_option_arg
 %type 	drop_option
 %type 	opt_or_replace opt_no
 opt_grant_grant_option opt_grant_admin_option
@@ -513,10 +513,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	generic_option_arg
 %type 	generic_option_elem alter_generic_option_elem
 %type 	generic_option_list alter_generic_option_list
-%type 		explain_option_name
-%type 	explain_option_arg
-%type 	explain_option_elem
-%type 	explain_option_list
 
 %type 	reindex_target_type reindex_target_multitable
 %type 	reindex_option_list reindex_option_elem
@@ -10483,7 +10479,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose opt_analyze opt_vacuum_relati
 	n->is_vacuumcmd = true;
 	$$ = (Node *)n;
 }
-			| VACUUM '(' vac_analyze_option_list ')' opt_vacuum_relation_list
+			| VACUUM '(' common_option_list ')' opt_vacuum_relation_list
 {
 	VacuumStmt *n = makeNode(VacuumStmt);
 	n->options = $3;
@@ -10504,7 +10500,7 @@ AnalyzeStmt: analyze_keyword opt_verbose opt_vacuum_relation_list
 	n->is_vacuumcmd = false;
 	$$ = (Node *)n;
 }
-			| analyze_keyword '(' vac_analyze_option_list ')' opt_vacuum_relation_list
+			| analyze_keyword '(' common_option_list ')' opt_vacuum_relation_list
 {
 	VacuumStmt *n = makeNode(VacuumStmt);
 	n->options = $3;
@@ -10514,12 +10510,12 @@ AnalyzeStmt: analyze_keyword opt_verbose opt_vacuum_relation_list
 }
 		;
 
-vac_analyze_option_list:
-			vac_analyze_option_elem
+common_option_list:
+			common_option_elem
 {
 	$$ = list_make1($1);
 }
-			| vac_analyze_option_list ',' vac_analyze_option_elem
+			| common_option_list ',' common_option_elem
 {
 	$$ = lappend($1, $3);
 }
@@ -10530,19 +10526,19 @@ analyze_keyword:
 			| ANALYSE /* British */
 		;
 
-vac_analyze_option_elem:
-			vac_analyze_option_name vac_analyze_option_arg
+common_option_elem:
+			common_option_name common_option_arg
 {
 	$$ = makeDefElem($1, $2, @1);
 }
 		;
 
-vac_analyze_option_name:
+common_option_name:
 			NonReservedWord			{ $$ = $1; }
 			| analyze_keyword		{ $$ = "analyze"; }
 		;
 
-vac_analyze_option_arg:
+common_option_arg:
 			opt_boolean_or_string	{ $$ = (Node *) makeString($1); }
 			| NumericOnly			{ $$ = (Node *) $1; }
 			| /* EMPTY */			{ $$ = NULL; }
@@ -10624,7 +10620,7 @@ ExplainStmt:
 	n->options = list_make1(makeDefElem("verbose", NULL, @2));
 	$$ = (Node *) n;
 }
-		| EXPLAIN '(' explain_option_list ')' ExplainableStmt
+		| EXPLAIN '(' common_option_list ')' ExplainableStmt
 {
 	ExplainStmt *n = makeNode(ExplainStmt);
 	n->query = $5;
@@ -10645,35 +10641,6 @@ ExplainableStmt:
 			| ExecuteStmt	/* by default all are $$=$1 */
 		;
 
-explain_option_list:
-			explain_option_elem
-{
-	$$ = list_make1($1);
-}
-			| explai

Re: abstract Unix-domain sockets

2020-11-24 Thread Peter Eisentraut

On 2020-11-23 17:00, David G. Johnston wrote:
So presently there is no functioning code to prevent two PostgreSQL 
instances from using the same socket so long as they do not also use the 
same data directory?  We only handle the case of an unclean crash - 
where the pid and socket are both left behind - having the system tell 
the user to remove the pid lock file but then auto-replacing the socket 
(I was conflating the behavior with the pid lock file and the socket file).


I would expect that we handle port misconfiguration also, by not 
auto-replacing the socket and instead have the existing error message 
(with modified hint) remain behind.  This provides behavior consistent 
with TCP port binding.  Or is it the case that we always attempt to bind 
the TCP/IP port, regardless of the presence of a socket file, in which 
case the failure for port binding does cover the socket situation as 
well?  If this is the case, pointing that out in [1] and a code comment, 
while removing that particular error as "dead code", would work.


We're subject to whatever the kernel behavior is.  If the kernel doesn't 
report address conflicts for Unix-domain sockets, then we can't do 
anything about that.  Having an error message ready in case the kernel 
does report such an error is not useful if it never does.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: [HACKERS] Custom compression methods

2020-11-24 Thread Dilip Kumar
On Tue, Nov 24, 2020 at 7:14 PM Robert Haas  wrote:
>
> On Tue, Nov 24, 2020 at 7:11 AM Dilip Kumar  wrote:
> > About (4), one option is that we directly call the correct handler
> > function for the built-in type directly from
> > toast_(de)compress(_slice) functions but in that case, we are
> > duplicating the code, another option is that we call the
> > GetCompressionRoutine() a common function and in that, for the
> > built-in type, we can directly call the corresponding handler function
> > and get the routine.  The only thing is to avoid duplicating in
> > decompression routine we need to convert CompressionId to Oid before
> > calling GetCompressionRoutine(), but now we can avoid sys cache lookup
> > for the built-in type.
>
> Suppose that we have a variable lz4_methods (like heapam_methods) that
> is always defined, whether or not lz4 support is present. It's defined
> like this:
>
> const CompressionAmRoutine lz4_compress_methods = {
> .datum_compress = lz4_datum_compress,
> .datum_decompress = lz4_datum_decompress,
> .datum_decompress_slice = lz4_datum_decompress_slice
> };

Yeah, this makes sense.

>
> (It would be good, I think, to actually name things something like
> this - in particular why would we have TableAmRoutine and
> IndexAmRoutine but not include "Am" in the one for compression? In
> general I think tableam is a good pattern to adhere to and we should
> try to make this patch hew closely to it.)

For the compression routine name, I did not include "Am" because
currently, we are storing the compression method in the new catalog
"pg_compression" not in the pg_am.   So are you suggesting that we
should store the compression methods also in the pg_am instead of
creating a new catalog?  IMHO, storing the compression methods in a
new catalog is a better option instead of storing them in pg_am
because actually, the compression methods are not the same as heap or
index AMs, I mean they are actually not the access methods.  Am I
missing something?

> Then those functions are contingent on #ifdef HAVE_LIBLZ4: they either
> do their thing, or complain that lz4 compression is not supported.
> Then in this function you can just say, well, if we have the 01 bit
> pattern, handler = &lz4_compress_methods and proceed from there.

Okay

> BTW, I think the "not supported" message should probably use the 'by
> this build' language we use in some places i.e.
>

Okay

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread Peter Eisentraut

On 2020-11-24 01:52, Dagfinn Ilmari Mannsåker wrote:

The Clang documentation¹ suggest an even neater solution, which would
eliminate the repetitive empty pg_attribute_foo #defines in the trailing
#else/#endif block in commit 1fa22a43a56e1fe44c7bb3a3d5ef31be5bcac41d:

#ifndef __has_attribute
#define __has_attribute(x) 0
#endif


Yes, this was also mentioned and agreed earlier in the thread, but then 
we apparently forgot to update the patch.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: abstract Unix-domain sockets

2020-11-24 Thread David G. Johnston
On Tue, Nov 24, 2020 at 8:45 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> We're subject to whatever the kernel behavior is.  If the kernel doesn't
> report address conflicts for Unix-domain sockets, then we can't do
> anything about that.  Having an error message ready in case the kernel
> does report such an error is not useful if it never does.
>

It's a file, we can check for its existence in user-space.

David J.


Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread Tom Lane
David Rowley  writes:
> Pushed.

walleye's been failing since this patchset went in:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2020-11-24%2000%3A25%3A31

ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard 
-Wno-format-truncation -Wno-stringop-truncation -g -O2 -I../../../src/include  
-I./src/include/port/win32 -I/c/msys/local/include  -I/c/Python35/include 
-I/c/OpenSSL-Win64/include -I/c/msys/local/include 
"-I../../../src/include/port/win32" -DWIN32_STACK_RLIMIT=4194304 -DBUILDING_DLL 
 -c -o autovacuum.o autovacuum.c
C:\\Users\\BUILDE~1.SER\\AppData\\Local\\Temp\\cc4HR3xZ.s: Assembler messages:
C:\\Users\\BUILDE~1.SER\\AppData\\Local\\Temp\\cc4HR3xZ.s:5900: Error: 
.seh_savexmm offset is negative
make[3]: *** [autovacuum.o] Error 1

I have no idea what to make of that, but it looks more like a compiler bug
than anything else.

regards, tom lane




Re: libpq compression

2020-11-24 Thread Robert Haas
On Tue, Nov 24, 2020 at 7:33 AM Konstantin Knizhnik
 wrote:
> New version of the patch is attached.

I read over the comments from Andres (and Peter) suggesting that this
ought to be on-the-fly configurable. Here are some thoughts on making
that work with the wire protocol:

If the client potentially wants to use compression at some point it
should include _pq_.compression in the startup message. The value
associated with _pq_.compression should be a comma-separated list of
compression methods which the client understands. If the server
responds with a NegotiateProtocolVersion message, then it either
includes _pq_.compression (in which case the server does not support
compression) or it does not (in which case the server does support
compression). If no NegotiateProtocolVersion message is returned, then
the server is from before November 2017
(ae65f6066dc3d19a55f4fdcd3b30003c5ad8dbed) and compression is not
supported.

If the client requests compression and the server supports it, it
should return a new SupportedCompressionTypes message following
NegotiateProtocolMessage response. That should be a list of
compression methods which the server understands. At this point, the
clent and the server each know what methods the other understands.
Each should now feel free to select a compression method the other
side understands, and to switch methods whenever desired, as long as
they only select from methods the other side has said that they
understand. The patch seems to think that the compression method has
to be the same in both directions and that it can never change, but
there's no real reason for that. Let each side start out uncompressed
and then let it issue a new SetCompressionMethod protocol message to
switch the compression method whenever it wants. After sending that
message it begins using the new compression type. The other side
doesn't have to agree. That way, you don't have to worry about
synchronizing the two directions. Each side is just telling the other
what is choosing to do, from among the options the other side said it
could understand.

It's an interesting question whether it's best to "wrap" the
compressed messages in some way. For example, imagine that instead of
just compressing the bytes and sending them out, you send a message of
some new type CompressedMessage whose payload is another protocol
message. Then a piece of middleware could decide to decompress each
message just enough to see whether it wants to do anything with the
message and if not just forward it. It could also choose to inject its
own messages into the message stream which wouldn't necessarily need
to be compressed, because the wrapper allows mixing of compressed and
uncompressed messages. The big disadvantage of this approach is that
in many cases it will be advantageous to compress consecutive protocol
messages as a unit. For example, when the extend query protocol is in
use, the client will commonly send P-B-D-E-S maybe even in one network
packet. It will compress better if all of those messages are
compressed as one rather than separately. That consideration argues
for the approach the patch actually takes (though the documentation
isn't really very clear about what the patch is actually doing here so
I might be misunderstanding). However, note that when the client or
server does a "flush" this requires "flushing" at the compression
layer also, so that all pending data can be sent. zlib can certainly
do that; I assume other algorithms can too, but I don't really know.
If there are algorithms that don't have that built in, this approach
might be an issue.

Another thing to think about is whether it's really beneficial to
compress stuff like P-B-D-E-S. I would guess that the benefits here
come mostly from compressing DataRow and CopyData messages, and that
compressing messages that don't contain much payload data may just be
a waste of CPU cycles. On the other hand, it's not impossible for it
to win: the query might be long, and query text is probably highly
compressible. Designing something that is specific to certain message
types is probably a bridge too far, but at least I think this is a
strong argument that the compression method shouldn't have to be the
same in both directions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PoC/WIP: Extended statistics on expressions

2020-11-24 Thread Justin Pryzby
On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
> 0004 - Seems fine. IMHO not really "silly errors" but OK.

This is one of the same issues you pointed out - shadowing a variable.
Could be backpatched.

On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
> > +errmsg("statistics expressions and 
> > predicates can refer only to the table being indexed")));
> > +* partial-index predicates.  Create it in the per-index context to 
> > be
> > 
> > I think these are copied and shouldn't mention "indexes" or "predicates".  
> > Or
> > should statistics support predicates, too ?
> > 
> 
> Right. Stupid copy-pasto.

Right, but then I was wondering if CREATE STATS should actually support
predicates, since one use case is to do what indexes do without their overhead.
I haven't thought about it enough yet.

> 0006 - Not sure. I think CreateStatistics can be fixed with less code,
> keeping it more like PG13 (good for backpatching). Not sure why rename
> extended statistics to multi-variate statistics - we use "extended"
> everywhere.

-   if (build_expressions && (list_length(stxexprs) == 0))
+   if (!build_expressions_only && (list_length(stmt->exprs) < 2))
ereport(ERROR,  
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-errmsg("extended expression statistics require 
at least one expression")));
+errmsg("multi-variate statistics require at 
least two columns")));

I think all of "CREATE STATISTICS" has been known as "extended stats", so I
think it may be confusing to say that it requires two columns for the general
facility.

> Not sure what's the point of serialize_expr_stats changes,
> that's code is mostly copy-paste from update_attstats.

Right.  I think "i" is poor variable name when it isn't a loop variable and not
of limited scope.

> 0007 - I suspect this makes the pg_stats_ext too complex to work with,
> IMHO we should move this to a separate view.

Right - then unnest() the whole thing and return one row per expression rather
than array, as you've done.  Maybe the docs should say that this returns one
row per expression.

Looking quickly at your new patch: I guess you know there's a bunch of
lingering references to "indexes" and "predicates":

I don't know if you want to go to the effort to prohibit this.
postgres=# CREATE STATISTICS asf ON (tableoid::int+1) FROM t;
CREATE STATISTICS

I think a lot of people will find this confusing:

postgres=# CREATE STATISTICS asf ON i FROM t;
ERROR:  extended statistics require at least 2 columns
postgres=# CREATE STATISTICS asf ON (i) FROM t;
CREATE STATISTICS

postgres=# CREATE STATISTICS asf (expressions) ON i FROM t;
ERROR:  extended expression statistics require at least one expression
postgres=# CREATE STATISTICS asf (expressions) ON (i) FROM t;
CREATE STATISTICS

I haven't looked, but is it possible to make it work without parens ?

-- 
Justin




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-11-24 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> I took a quick look through this.  This is just MHO, of course:
> >> 
> >> * I don't think it's okay to change the existing signatures of
> >> pg_ls_logdir() et al.
> 
> > I disagree that we need to stress over this- we pretty routinely change
> > the signature of various catalogs and functions and anyone using these
> > is already of the understanding that we are free to make such changes
> > between major versions.
> 
> Well, like I said, just MHO.  Anybody else want to weigh in?
> 
> I'm mostly concerned about removing the isdir output of pg_stat_file().
> Maybe we could compromise to the extent of keeping that, allowing it
> to be partially duplicative of a file-type-code output column.

I don't have any particular issue with keeping isdir as a convenience
column.  I agree it'll now be a bit duplicative but that seems alright.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: POC: postgres_fdw insert batching

2020-11-24 Thread Tomas Vondra



On 11/24/20 9:45 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
>> 1) We're calling it "batch_size" but the API function is named
>> postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
>> to postgresGetModifyBatchSize()? That has the advantage it'd work if we
>> ever add support for batching to UPDATE/DELETE.
> 
> Actually, I was in two minds whether the term batch or bulk is better.  
> Because Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK 
> COLLECT INTO array and FORALL in array INSERT INTO, while JDBC uses batch as 
> in "batch updates" and its API method names (addBatch, executeBatch).
> 
> But it seems better or common to use batch according to the etymology and the 
> following Stack Overflow page:
> 
> https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch
> 
> OTOH, as for the name GetModifyBatchSize() you suggest, I think 
> GetInsertBatchSize may be better.  That is, this API deals with multiple 
> records in a single INSERT statement.  Your GetModifyBatchSize will be 
> reserved for statement batching when libpq has supported batch/pipelining to 
> execute multiple INSERT/UPDATE/DELETE statements, as in the following JDBC 
> batch updates.  What do you think?
> 

I don't know. I was really only thinking about batching in the context
of a single DML command, not about batching of multiple commands at the
protocol level. IMHO it's far more likely we'll add support for batching
for DELETE/UPDATE than libpq pipelining, which seems rather different
from how the FDW API works. Which is why I was suggesting to use a name
that would work for all DML commands, not just for inserts.

> CODE EXAMPLE 14-1 Creating and executing a batch of insert statements 
> --
> Statement stmt = con.createStatement(); 
> stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')"); 
> stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')"); 
> stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)"); 
> 
> // submit a batch of update commands for execution 
> int[] updateCounts = stmt.executeBatch(); 
> --
> 

Sure. We already have a patch to support something like this at the
libpq level, IIRC. But I'm not sure how well that matches the FDW API
approach in general.

> 
>> 2) Do we have to lookup the batch_size in create_foreign_modify (in
>> server/table options)? I'd have expected to look it up while planning
>> the modify and then pass it through the list, just like the other
>> FdwModifyPrivateIndex stuff. But maybe that's not possible.
> 
> Don't worry, create_foreign_modify() is called from PlanForeignModify() 
> during planning.  Unfortunately, it's also called from BeginForeignInsert(), 
> but other stuff passed to create_foreign_modify() including the query string 
> is constructed there.
> 

Hmm, ok.

> 
>> 3) That reminds me - should we show the batching info on EXPLAIN? That
>> seems like a fairly interesting thing to show to the user. Perhaps
>> showing the average batch size would also be useful? Or maybe not, we
>> create the batches as large as possible, with the last one smaller.
> 
> Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change 
> dynamically based on the planning or system state unlike shared buffers and 
> parallel workers.  OTOH, I sometimes want to see what configuration parameter 
> values the user set, such as work_mem, enable_*, and shared_buffers, together 
> with the query plan (EXPLAIN and auto_explain).  For example, it'd be nice if 
> EXPLAIN (parameters on) could do that.  Some relevant FDW-related parameters 
> could be included in that output.
> 

Not sure, but I'd guess knowing whether batching is used would be
useful. We only print the single-row SQL query, which kinda gives the
impression that there's no batching.

>> 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
>> over for every tuple. I don't know it that has measurable impact, but it
>> seems a bit excessive IMO. I don't think we should support the batch
>> size changing during execution (seems tricky).
> 
> Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value 
> that was already saved in a struct in create_foreign_modify().
> 

Well, I do worry for two reasons.

Firstly, the fact that in postgres_fdw the call is cheap does not mean
it'll be like that in every other FDW. Presumably, the other FDWs might
cache it in the struct and do the same thing, of course.

But the fact that we're calling it over and over for each row kinda
seems like we allow the value to change during execution, but I very
much doubt the code is expecting that. I haven't tried, but assume the
function first returns 10 and then 100. ISTM the code will allocate
ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
That can't end well. Sure, we can claim it's 

Re: [HACKERS] Custom compression methods

2020-11-24 Thread Robert Haas
On Tue, Nov 24, 2020 at 10:47 AM Dilip Kumar  wrote:
> For the compression routine name, I did not include "Am" because
> currently, we are storing the compression method in the new catalog
> "pg_compression" not in the pg_am.   So are you suggesting that we
> should store the compression methods also in the pg_am instead of
> creating a new catalog?  IMHO, storing the compression methods in a
> new catalog is a better option instead of storing them in pg_am
> because actually, the compression methods are not the same as heap or
> index AMs, I mean they are actually not the access methods.  Am I
> missing something?

Oh, I thought it had been suggested in previous discussions that these
should be treated as access methods rather than inventing a whole new
concept just for this, and it seemed like a good idea to me. I guess I
missed the fact that the patch wasn't doing it that way. Hmm.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PoC/WIP: Extended statistics on expressions

2020-11-24 Thread Tomas Vondra



On 11/24/20 5:23 PM, Justin Pryzby wrote:
> On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
>> 0004 - Seems fine. IMHO not really "silly errors" but OK.
> 
> This is one of the same issues you pointed out - shadowing a variable.
> Could be backpatched.
> 
> On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
>>> +errmsg("statistics expressions and 
>>> predicates can refer only to the table being indexed")));
>>> +* partial-index predicates.  Create it in the per-index context to 
>>> be
>>>
>>> I think these are copied and shouldn't mention "indexes" or "predicates".  
>>> Or
>>> should statistics support predicates, too ?
>>>
>>
>> Right. Stupid copy-pasto.
> 
> Right, but then I was wondering if CREATE STATS should actually support
> predicates, since one use case is to do what indexes do without their 
> overhead.
> I haven't thought about it enough yet.
> 

Well, it's not supported now, so the message is bogus. I'm not against
supporting "partial statistics" with predicates in the future, but it's
going to be non-trivial project on it's own. It's not something I can
bolt onto the current patch easily.

>> 0006 - Not sure. I think CreateStatistics can be fixed with less code,
>> keeping it more like PG13 (good for backpatching). Not sure why rename
>> extended statistics to multi-variate statistics - we use "extended"
>> everywhere.
> 
> -   if (build_expressions && (list_length(stxexprs) == 0))
> +   if (!build_expressions_only && (list_length(stmt->exprs) < 2))
> ereport(ERROR,  
> (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> -errmsg("extended expression statistics 
> require at least one expression")));
> +errmsg("multi-variate statistics require at 
> least two columns")));
> 
> I think all of "CREATE STATISTICS" has been known as "extended stats", so I
> think it may be confusing to say that it requires two columns for the general
> facility.
> 
>> Not sure what's the point of serialize_expr_stats changes,
>> that's code is mostly copy-paste from update_attstats.
> 
> Right.  I think "i" is poor variable name when it isn't a loop variable and 
> not
> of limited scope.
> 

OK, I understand. I'll consider tweaking that.

>> 0007 - I suspect this makes the pg_stats_ext too complex to work with,
>> IMHO we should move this to a separate view.
> 
> Right - then unnest() the whole thing and return one row per expression rather
> than array, as you've done.  Maybe the docs should say that this returns one
> row per expression.
> 
> Looking quickly at your new patch: I guess you know there's a bunch of
> lingering references to "indexes" and "predicates":
> 
> I don't know if you want to go to the effort to prohibit this.
> postgres=# CREATE STATISTICS asf ON (tableoid::int+1) FROM t;
> CREATE STATISTICS
> 

Hmm, we're already rejecting system attributes, I suppose we should do
the same thing for expressions on system attributes.

> I think a lot of people will find this confusing:
> 
> postgres=# CREATE STATISTICS asf ON i FROM t;
> ERROR:  extended statistics require at least 2 columns
> postgres=# CREATE STATISTICS asf ON (i) FROM t;
> CREATE STATISTICS
> 
> postgres=# CREATE STATISTICS asf (expressions) ON i FROM t;
> ERROR:  extended expression statistics require at least one expression
> postgres=# CREATE STATISTICS asf (expressions) ON (i) FROM t;
> CREATE STATISTICS
> 
> I haven't looked, but is it possible to make it work without parens ?
> 

Hmm, you're right that may be surprising. I suppose we could walk the
expressions while creating the statistics, and replace such trivial
expressions with the nested variable, but I haven't tried. I wonder what
the CREATE INDEX behavior would be in these cases.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Strange behavior with polygon and NaN

2020-11-24 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Fri, 20 Nov 2020 15:57:46 -0500, Tom Lane  wrote in 
>> I don't much like anything about float8_coef_mul().

> I have the same feeling on the function, but I concluded that
> coefficients and coordinates should be regarded as different things in
> the practical standpoint.

> For example, consider Ax + By + C == 0, if B is 0.0, we can remove the
> second term from the equation, regardless of the value of y, of course
> even if it were inf. that is, The function imitates that kind of
> removals.

Meh --- I can see where you're going with that, but I don't much like it.
I fear that it's as likely to introduce weird behaviors as remove any.

The core of the issue in

> | postgres=# select point(1e+300, 'Infinity') <-> line('{1,0,5}');
> |  ?column? 
> | --
> |   NaN

is that we generate the line y = Inf:

(gdb) p tmp
$1 = {A = 0, B = -1, C = inf}

and then try to find the intersection with {1,0,5} (x = -5), but that
calculation involves 0 * Inf so we get NaNs.  It seems reasonable that
the intersection should be (-5,Inf), but I don't think we should try
to force the normal calculation to produce that.  I think we'd be
better off to explicitly special-case vertical and/or horizontal lines
in line_interpt_line.

Actually though ... even if we successfully got that intersection
point, we'd still end up with a NaN distance between (1e300,Inf) and
(-5,Inf), on account of Inf - Inf being NaN.  I think this is correct
and we'd be ill-advised to try to force it to be something else.
Although we pretend that two Infs are equal for purposes such as
sorting, they aren't really, so we should not assume that their
difference is zero.

So that line of thought prompts me to tread *very* carefully when
trying to dodge NaN results.  We need to be certain that we
introduce only logically-defensible special cases.  Something like
float8_coef_mul() seems much more likely to lead us into errors
than away from them.

regards, tom lane




Re: libpq compression

2020-11-24 Thread Daniil Zakhlystov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   tested, failed
Documentation:tested, failed

Submission review
--
Is the patch in a patch format which has context? (eg: context diff format)
NO, need to fix
Does it apply cleanly to the current git master?
YES
Does it include reasonable tests, necessary doc patches, etc?
have docs, missing tests

Usability review
--
At the moment, the patch supports per-connection (permanent) compression. The 
frontend can specify the desired compression algorithms and compression levels 
and then negotiate the compression algorithm that is going to be used with the 
backend. In current state patch is missing the ability to enable/disable the 
compression on the backend side, I think it might be not great from the 
usability side.

Regarding on-the-fly configurable compression and different compression 
algorithms for each direction - these two ideas are promising but tend to make 
the implementation more complex. However, the current implementation can be 
extended to support these approaches in the future. For example, we can specify 
switchable on-the-fly compression as ‘switchable’ algorithm and negotiate it 
like the regular compression algorithm (like we currently negotiate ‘zstd’ and 
‘zlib’). ‘switchable’ algorithm may then introduce new specific messages to 
Postgres protocol to make the on-the-fly compression magic work.

The same applies to Robert’s idea of the different compression algorithms for 
different directions - we can introduce it later as a new compression algorithm 
with new specific protocol messages.

Does the patch actually implement that?
YES

Do we want that?
YES

Do we already have it?
NO

Does it follow SQL spec, or the community-agreed behavior?
To be discussed

Does it include pg_dump support (if applicable)?
not applicable

Are there dangers?
theoretically possible CRIME-like attack when using with SSL enabled

Have all the bases been covered?
To be discussed


Feature test
--

I’ve applied the patch, compiled, and tested it with configure options 
--enable-cassert and --enable-debug turned on. I’ve tested the following 
scenarios:

1. make check
===
 All 201 tests passed. 
===

2. make check-world
initially failed with:
== running regression test queries==
test postgres_fdw ... FAILED 4465 ms
== shutting down postmaster   ==
==
1 of 1 tests failed. 
==
The differences that caused some tests to fail can be viewed in the
file "/xxx/xxx/review/postgresql/contrib/postgres_fdw/regression.diffs".  A 
copy of the test summary that you see above is saved in the file 
"/xxx/xxx/review/postgresql/contrib/postgres_fdw/regression.out".

All tests passed after replacing ‘gsslib, target_session_attrs’ with ‘gsslib, 
compression, target_session_attrs’ in line 8914 of 
postgresql/contrib/postgres_fdw/expected/postgres_fdw.out

3. simple psql utility usage
psql -d "host=xxx port=5432 dbname=xxx user=xxx compression=1" 

4. pgbench tpcb-like w/ SSL turned ON
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=require compression=1" 
--builtin tpcb-like -t 70 --jobs=32 --client=700

5. pgbench tpcb-like w/ SSL turned OFF
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=disable compression=1" 
--builtin tpcb-like -t 70 --jobs=32 --client=700

6. pgbench initialization w/ SSL turned ON
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=require compression=1" 
-i -s 500

7. pgbench initialization w/ SSL turned OFF
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=disable compression=1" 
-i -s 500

8. Streaming physical replication. Recovery-related parameters
recovery_target_timeline = 'latest'
primary_conninfo = 'host=xxx port=5432 user=repl application_name=xxx 
compression=1'
primary_slot_name = 'xxx'
restore_command = 'some command'

9. This compression has been implemented in an experimental build of odyssey 
connection pooler and tested with ~1500 synthetic simultaneous clients 
configuration and ~300 GB databases.
During the testing, I’ve reported and fixed some of the issues.

Does the feature work as advertised?
YES
Are there corner cases the author has failed to consider?
NO
Are there any assertion failures or crashes?
NO


Performance review
--

Does the patch slow down simple tests?
NO

If it claims to improve performance, does it?
YES

Does it slow down other things?
Using compression may add a CPU overhead. This mostly depends on compression 
algorithm and chosen compression level. During testing with ZSTD algorithm and 
compression level 1 there was about 10% of CPU overhead in read/write balanced 
scenarios and almost no overhead in mostly read scenarios.


Coding review
--

In protocol.sgml:
>It can be just bo

Re: Online verification of checksums

2020-11-24 Thread David Steele

Hi Michael,

On 11/23/20 8:10 PM, Michael Paquier wrote:

On Mon, Nov 23, 2020 at 10:35:54AM -0500, Stephen Frost wrote:


Also- what is the point of reading the page from shared buffers
anyway..?  All we need to do is prove that the page will be rewritten
during WAL replay.  If we can prove that, we don't actually care what
the contents of the page are.  We certainly can't calculate the
checksum on a page we plucked out of shared buffers since we only
calculate the checksum when we go to write the page out.


A LSN-based check makes the thing tricky.  How do you make sure that
pd_lsn is not itself broken?  It could be perfectly possible that a
random on-disk corruption makes pd_lsn seen as having a correct value,
still the rest of the page is borked.


We are not just looking at one LSN value. Here are the steps we are 
proposing (I'll skip checks for zero pages here):


1) Test the page checksum. If it passes the page is OK.
2) If the checksum does not pass then record the page offset and LSN and 
continue.
3) After the file is copied, reopen and reread the file, seeking to 
offsets where possible invalid pages were recorded in the first pass.

a) If the page is now valid then it is OK.
b) If the page is not valid but the LSN has increased from the LSN 
recorded in the previous pass then it is OK. We can infer this because 
the LSN has been updated in a way that is not consistent with storage 
corruption.


This is what we are planning for the first round of improving our page 
checksum validation. We believe that doing the retry in a second pass 
will be faster and more reliable because some time will have passed 
since the first read without having to build in a delay for each page error.


A further improvement is to check the ascending LSNs found in 3b against 
PostgreSQL to be completely sure they are valid. We are planning this 
for our second round of improvements.


Reopening the file for the second pass does require some additional logic:

1) The file may have been deleted by PG since the first pass and in that 
case we won't report any page errors.
2) The file may have been truncated by PG since the first pass so we 
won't report any errors past the point of truncation.


A malicious attacker could easily trick these checks, but as Stephen 
pointed out elsewhere they would likely make the checksums valid which 
would escape detection anyway.


We believe that the chances of random storage corruption passing all 
these checks is incredibly small, but eventually we'll also check 
against the WAL to be completely sure.


Regards,
--
-David
da...@pgmasters.net




Re: mark/restore failures on unsorted merge joins

2020-11-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> The problem is that the planner calls ExecSupportsMarkRestore to
 >> find out whether a Materialize node is needed, and that function
 >> looks no further than the Path type of T_Index[Only]Path in order to
 >> return true, even though in this case it's a GiST index which does
 >> not support mark/restore.

 >> (Usually this can't be a problem because the merge join would need
 >> sorted input, thus the index scan would be a btree; but a merge join
 >> that doesn't actually have any sort keys could take unsorted input
 >> from any index type.)

 Tom> Sounds like the right analysis.

 >> Going forward, this looks like IndexOptInfo needs another am*
 >> boolean field, but that's probably not appropriate for the back
 >> branches; maybe as a workaround, ExecSupportsMarkRestore should just
 >> check for btree?

 Tom> Uh, why would you not just look to see if the ammarkpos/amrestrpos
 Tom> fields are non-null?

We don't (in the back branches) seem to have a pointer to the
IndexAmRoutine handy, only the oid? Obviously we can look it up from the
oid, but is that more overhead than we want in a join cost function,
given that this will be called for all potential mergejoins considered,
not just JOIN_FULL? Or is the overhead not worth bothering about?

-- 
Andrew (irc:RhodiumToad)




Re: [HACKERS] Custom compression methods

2020-11-24 Thread Tom Lane
Robert Haas  writes:
> On Tue, Nov 24, 2020 at 10:47 AM Dilip Kumar  wrote:
>> For the compression routine name, I did not include "Am" because
>> currently, we are storing the compression method in the new catalog
>> "pg_compression" not in the pg_am.   So are you suggesting that we
>> should store the compression methods also in the pg_am instead of
>> creating a new catalog?  IMHO, storing the compression methods in a
>> new catalog is a better option instead of storing them in pg_am
>> because actually, the compression methods are not the same as heap or
>> index AMs, I mean they are actually not the access methods.  Am I
>> missing something?

> Oh, I thought it had been suggested in previous discussions that these
> should be treated as access methods rather than inventing a whole new
> concept just for this, and it seemed like a good idea to me. I guess I
> missed the fact that the patch wasn't doing it that way. Hmm.

FWIW, I kind of agree with Robert's take on this.  Heap and index AMs
are pretty fundamentally different animals, yet we don't have a problem
sticking them in the same catalog.  I think anything that is related to
storage access could reasonably go into that catalog, rather than
inventing a new one.

regards, tom lane




Re: [PoC] Non-volatile WAL buffer

2020-11-24 Thread Tomas Vondra



On 11/24/20 7:34 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
>> So I wonder if using PMEM for the WAL buffer is the right way forward.
>> AFAIK the WAL buffer is quite concurrent (multiple clients writing
>> data), which seems to contradict the PMEM vs. DRAM trade-offs.
>>
>> The design I've originally expected would look more like this
>>
>> clients -> wal buffers (DRAM) -> wal segments (PMEM DAX)
>>
>> i.e. mostly what we have now, but instead of writing the WAL segments
>> "the usual way" we'd write them using mmap/memcpy, without fsync.
>>
>> I suppose that's what Heikki meant too, but I'm not sure.
> 
> SQL Server probably does so.  Please see the following page and the links in 
> "Next steps" section.  I'm saying "probably" because the document doesn't 
> clearly state whether SQL Server memcpys data from DRAM log cache to 
> non-volatile log cache only for transaction commits or for all log cache 
> writes.  I presume the former.
> 
> 
> Add persisted log buffer to a database
> https://docs.microsoft.com/en-us/sql/relational-databases/databases/add-persisted-log-buffer?view=sql-server-ver15
> --
> With non-volatile, tail of the log storage the pattern is
> 
> memcpy to LC
> memcpy to NV LC
> Set status
> Return control to caller (commit is now valid)
> ...
> 
> With this new functionality, we use a region of memory which is mapped to a 
> file on a DAX volume to hold that buffer. Since the memory hosted by the DAX 
> volume is already persistent, we have no need to perform a separate flush, 
> and can immediately continue with processing the next operation. Data is 
> flushed from this buffer to more traditional storage in the background.
> --
> 

Interesting, thanks for the likn. If I understand [1] correctly, they
essentially do this:

clients -> buffers (DRAM) -> buffers (PMEM) -> wal (storage)

that is, they insert the PMEM buffer between the LC (in DRAM) and
traditional (non-PMEM) storage, so that a commit does not need to do any
fsyncs etc.

It seems to imply the memcpy between DRAM and PMEM happens right when
writing the WAL, but I guess that's not strictly required - we might
just as well do that in the background, I think.

It's interesting that they only place the tail of the log on PMEM, i.e.
the PMEM buffer has limited size, and the rest of the log is not on
PMEM. It's a bit as if we inserted a PMEM buffer between our wal buffers
and the WAL segments, and kept the WAL segments on regular storage. That
could work, but I'd bet they did that because at that time the NV
devices were much smaller, and placing the whole log on PMEM was not
quite possible. So it might be unnecessarily complicated, considering
the PMEM device capacity is much higher now.

So I'd suggest we simply try this:

clients -> buffers (DRAM) -> wal segments (PMEM)

I plan to do some hacking and maybe hack together some simple tools to
benchmarks various approaches.


regards

[1]
https://docs.microsoft.com/en-us/archive/blogs/bobsql/how-it-works-it-just-runs-faster-non-volatile-memory-sql-server-tail-of-log-caching-on-nvdimm

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: mark/restore failures on unsorted merge joins

2020-11-24 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Uh, why would you not just look to see if the ammarkpos/amrestrpos
>  Tom> fields are non-null?

> We don't (in the back branches) seem to have a pointer to the
> IndexAmRoutine handy, only the oid?

Oh, sorry, I misread your comment to be that you wanted to add a field
to IndexAmRoutine.  You're right, the real issue here is that
ExecSupportsMarkRestore lacks any convenient access to the needed info,
and we need to add a bool to IndexOptInfo to fix that.

I don't see any compelling reason why you couldn't add the field at
the end in the back branches; that's what we usually do to avoid
ABI breaks.  Although actually (counts fields...) it looks like
there's at least one pad byte after amcanparallel, so you could
add a bool there without any ABI consequence, resulting in a
reasonably natural field order in all branches.

regards, tom lane




Re: libpq compression

2020-11-24 Thread Robert Haas
On Tue, Nov 24, 2020 at 12:35 PM Daniil Zakhlystov
 wrote:
> To sum up, I think that the current implementation already introduces good 
> benefits. As I proposed in the Usability review, we may introduce the new 
> approaches later as separate compression 'algorithms'.

I don't think the current patch is so close to being committable that
we shouldn't be considering what we really want to have here. It's one
thing to say, well, this patch is basically done, let's not start
redesigning it now. But that's not the case here. For example, I don't
see any committer accepting the comments in zpq_stream.c as adequate,
or the documentation, either. Some comments that have been made
previously, like Andres's remark about the non-standard message
construction in pq_configure(), have not been addressed, and I do not
think any committer is going to agree with the idea that the novel
method chosen by the patch is superior here, not least but not only
because it seems like it's endian-dependent. That function also uses
goto, which anybody thinking of committing this will surely try to get
rid of, and I'm pretty sure the sscanf() isn't good enough to reject
trailing garbage, and the error message that follows is improperly
capitalized. I'm sure there's other stuff, too: this is just based on
a quick look.

Before we start worrying about any of that stuff in too much detail, I
think it makes a lot of sense to step back and consider the design.
Honestly, the work of changing the design might be smaller than the
amount of cleanup the patch needs. But even if it's larger, it's
probably not vastly larger. And in any case, I quite disagree with the
idea that we should commit to a user-visible interface that exposes a
subset of the functionality that we needed and then try to glue the
rest of the functionality on top of it later. If we make a libpq
connection option called compression that controls the type of
compression that is used in both direction, then how exactly would we
extend that later to allow for different compression in the two
directions? Some syntax like compression=zlib/none, where the value
before the slash controls one direction and the value after the slash
controls the other? Maybe. But on the other hand, maybe it's better to
have separate connection options for client compression and server
compression. Or, maybe the kind of compression used by the server
should be controlled via a GUC rather than a connection option. Or,
maybe none of that is right and we should stick with the approach the
patch currently takes. But it's not like we can do something for v1
and then just change things randomly later: there will be
backward-compatibility to worry about. So the time to talk about the
general approach here is now, before anything gets committed, before
the project has committed itself to any particular design. If we
decide in that discussion that certain things can be left for the
future, that's fine. If we've have discussed how they could be added
without breaking backward compatibility, even better. But we can't
just skip over having that discussion.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

2020-11-24 Thread David Zhang
I verified the patch 
"v2-0001-Free-disk-space-for-dropped-relations-on-commit.patch" on master 
branch "0cc9932740f2bf572303b68438e4caf62de9". It works for me. Below is my 
test procedure and results.

=== Before the patch ===
#1 from psql console 1, create table and index then insert enough data
postgres=# CREATE TABLE test_tbl ( a int, b text);
postgres=# CREATE INDEX idx_test_tbl on test_tbl (a);
postgres=# INSERT INTO test_tbl SELECT generate_series(1,8000),'Hello 
world!';
postgres=# INSERT INTO test_tbl SELECT generate_series(1,8000),'Hello 
world!';

#2 check files size 
david:12867$ du -h
12G .

#3 from psql console 2, drop the index
postgres=# drop index idx_test_tbl;

#4 check files size in different ways,
david:12867$ du -h
7.8G.
david:12867$ ls -l
...
-rw--- 1 david david  0 Nov 23 20:07 16402
...

$ lsof -nP | grep '(deleted)' |grep pgdata
...
postgres  25736  david   45u  REG  259,2
  0   12592758 /home/david/sandbox/postgres/pgdata/base/12867/16402 (deleted)
postgres  25736  david   49u  REG  259,2 
1073741824   12592798 /home/david/sandbox/postgres/pgdata/base/12867/16402.1 
(deleted)
postgres  25736  david   53u  REG  259,2 
1073741824   12592739 /home/david/sandbox/postgres/pgdata/base/12867/16402.2 
(deleted)
postgres  25736  david   59u  REG  259,2  
372604928   12592800 /home/david/sandbox/postgres/pgdata/base/12867/16402.3 
(deleted)
...

The index relnode id "16402" displays size "0" from postgres database folder, 
but when using lsof to check, all 16402.x are still in used by a psql 
connection except 16402 is set to 0. Check it again after an hour, lsof shows 
the same results.

=== After the patch ===
Repeat step 1 ~ 4, lsof shows all the index relnode files (in this case, the 
index relnode id 16389) are removed within about 1 minute.
$ lsof -nP | grep '(deleted)' |grep pgdata
...
postgres  32707  david   66u  REG  259,2
  0   12592763 /home/david/sandbox/postgres/pgdata/base/12867/16389.1 (deleted)
postgres  32707  david   70u  REG  259,2
  0   12592823 /home/david/sandbox/postgres/pgdata/base/12867/16389.2 (deleted)
postgres  32707  david   74u  REG  259,2
  0   12592805 /home/david/sandbox/postgres/pgdata/base/12867/16389.3 (deleted)
...

One thing interesting for me is that, if the index is created after data 
records has been inserted, then lsof doesn't show this issue.

Re: [HACKERS] Custom compression methods

2020-11-24 Thread Robert Haas
On Tue, Nov 24, 2020 at 1:21 PM Tom Lane  wrote:
> FWIW, I kind of agree with Robert's take on this.  Heap and index AMs
> are pretty fundamentally different animals, yet we don't have a problem
> sticking them in the same catalog.  I think anything that is related to
> storage access could reasonably go into that catalog, rather than
> inventing a new one.

It's good to have your opinion on this since I wasn't totally sure
what was best, but for the record, I can't take credit. Looks like it
was Álvaro's suggestion originally:

http://postgr.es/m/20171130205155.7mgq2cuqv6zxi25a@alvherre.pgsql

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-11-24 Thread Alexey Kondratov

On 2020-11-24 06:52, Bharath Rupireddy wrote:

Thanks for the review comments.

On Mon, Nov 23, 2020 at 9:57 PM Alexey Kondratov
 wrote:


> v1-0001-postgres_fdw-function-to-discard-cached-connections.patch

This patch looks pretty straightforward for me, but there are some
things to be addressed IMO:

+   server = GetForeignServerByName(servername, true);
+
+   if (server != NULL)
+   {

Yes, you return a false if no server was found, but for me it worth
throwing an error in this case as, for example, dblink does in the
dblink_disconnect().



dblink_disconnect() "Returns status, which is always OK (since any
error causes the function to throw an error instead of returning)."
This behaviour doesn't seem okay to me.

Since we throw true/false, I would prefer to throw a warning(with a
reason) while returning false over an error.



I thought about something a bit more sophisticated:

1) Return 'true' if there were open connections and we successfully 
closed them.
2) Return 'false' in the no-op case, i.e. there were no open 
connections.
3) Rise an error if something went wrong. And non-existing server case 
belongs to this last category, IMO.


That looks like a semantically correct behavior, but let us wait for any 
other opinion.






+ result = disconnect_cached_connections(FOREIGNSERVEROID,
+hashvalue,
+false);

+   if (all || (!all && cacheid == FOREIGNSERVEROID &&
+   entry->server_hashvalue == hashvalue))
+   {
+   if (entry->conn != NULL &&
+   !all && cacheid == FOREIGNSERVEROID &&
+   entry->server_hashvalue == hashvalue)

These conditions look bulky for me. First, you pass FOREIGNSERVEROID 
to

disconnect_cached_connections(), but actually it just duplicates 'all'
flag, since when it is 'FOREIGNSERVEROID', then 'all == false'; when 
it

is '-1', then 'all == true'. That is all, there are only two calls of
disconnect_cached_connections(). That way, it seems that we should 
keep

only 'all' flag at least for now, doesn't it?



I added cachid as an argument to disconnect_cached_connections() for
reusability. Say, someone wants to use it with a user mapping then
they can pass cacheid USERMAPPINGOID, hash value of user mapping. The
cacheid == USERMAPPINGOID && entry->mapping_hashvalue == hashvalue can
be added to disconnect_cached_connections().



Yeah, I have got your point and motivation to add this argument, but how 
we can use it? To disconnect all connections belonging to some specific 
user mapping? But any user mapping is hard bound to some foreign server, 
AFAIK, so we can pass serverid-based hash in this case.


In the case of pgfdw_inval_callback() this argument makes sense, since 
syscache callbacks work that way, but here I can hardly imagine a case 
where we can use it. Thus, it still looks as a preliminary complication 
for me, since we do not have plans to use it, do we? Anyway, everything 
seems to be working fine, so it is up to you to keep this additional 
argument.




v1-0003-postgres_fdw-server-level-option-keep_connection.patch
This patch adds a new server level option, keep_connection, default
being on, when set to off, the local session doesn't cache the
connections associated with the foreign server.



This patch looks good to me, except one note:

(entry->used_in_current_xact &&
-   !keep_connections))
+   (!keep_connections || !entry->keep_connection)))
{

Following this logic:

1) If keep_connections == true, then per-server keep_connection has a 
*higher* priority, so one can disable caching of a single foreign 
server.


2) But if keep_connections == false, then it works like a global switch 
off indifferently of per-server keep_connection's, i.e. they have a 
*lower* priority.


It looks fine for me, at least I cannot propose anything better, but 
maybe it should be documented in 0004?




v1-0004-postgres_fdw-connection-cache-discard-tests-and-documentation.patch
This patch adds the tests and documentation related to this feature.



I have not read all texts thoroughly, but what caught my eye:

+   A GUC, postgres_fdw.keep_connections, default 
being
+   on, when set to off, the local 
session


I think that GUC acronym is used widely only in the source code and 
Postgres docs tend to do not use it at all, except from acronyms list 
and a couple of 'GUC parameters' collocation usage. And it never used in 
a singular form there, so I think that it should be rather:


A configuration parameter, 
postgres_fdw.keep_connections, default being...


+ 
+  Note that when postgres_fdw.keep_connections 
is set to
+  off, postgres_fdw discards either the 
connections
+  that are made previously and will be used by the local session or 
the
+  connections that will be made newly. But the connections that are

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

2020-11-24 Thread Pavel Borisov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Given we got two other reviews from Neil and David, I think I can finalize my 
own review and mark the patch as ready for committer if nobody has objections.
Thank you!

Pavel Borisov

The new status of this patch is: Ready for Committer


Re: [HACKERS] Custom compression methods

2020-11-24 Thread Alvaro Herrera
On 2020-Nov-24, Tom Lane wrote:

> Robert Haas  writes:

> > Oh, I thought it had been suggested in previous discussions that these
> > should be treated as access methods rather than inventing a whole new
> > concept just for this, and it seemed like a good idea to me. I guess I
> > missed the fact that the patch wasn't doing it that way. Hmm.
> 
> FWIW, I kind of agree with Robert's take on this.  Heap and index AMs
> are pretty fundamentally different animals, yet we don't have a problem
> sticking them in the same catalog.  I think anything that is related to
> storage access could reasonably go into that catalog, rather than
> inventing a new one.

Right -- Something like amname=lz4, amhandler=lz4handler, amtype=c.
The core code must of course know how to instantiate an AM of type
'c' and what to use it for.

https://postgr.es/m/20171213151818.75a20...@postgrespro.ru




Re: enable_incremental_sort changes query behavior

2020-11-24 Thread Tom Lane
James Coleman  writes:
> On Mon, Nov 23, 2020 at 2:24 PM Tom Lane  wrote:
>> 1. James was wondering, far upthread, why we would do projections
>> pre-sort or post-sort.  I think the answers can be found by studying
>> planner.c's make_sort_input_target(), which separates out what we want
>> to do pre-sort and post-sort.

> Does it imply we need to intentionally avoid SRFs also?

It's sort of a wart that we allow SRFs in ORDER BY at all, but my
expectation is that make_sort_input_target would prevent lower levels
of the planner from needing to think about that.  We don't allow SRFs
in index expressions, nor in WHERE clauses (so they'll never come up
as mergejoin sort keys).  So really there's no way that scan/join
processing should need to consider such cases.  Such a sort would
only ever be implemented via a final Sort atop ProjectSet.

>> 2. Robert is correct that under normal circumstances, the targetlists of
>> both baserels and join rels contain only Vars.

> Is that primarily a project policy? Or a limitation of our current
> planner (just can't push things down/carry the results back up as much
> as we'd like)? Or something else? In particular, it seems plausible
> there are cases where pushing down the sort on a non-Var expression to
> the base rel could improve plans, so I'm wondering if there's a reason
> to intentionally avoid that in the long or short run (or both).

I think you've just rediscovered Joe Hellerstein's thesis topic [1].
We ripped out the remnants of that code ages ago (search very early
git states for "JMH" if you're interested), because the complexity
vs. benefit ratio seemed pretty bad.  Maybe there'll be a case for
putting it back some day, but I'm dubious.  Note that we have the
ability to push down sorts-on-expressions anyway; that's not constrained
by what is in the relation targetlists.

> Interesting: so merge joins are an example of us pushing down sorts,
> which I assume means (part of) the answer to my question on (2) is
> that there's nothing inherently wrong/broken with evaluating
> expressions lower down the tree as long as we're careful about what is
> safe/unsafe with respect to volatility and parallelism?

Right, I don't see any fundamental problem with that, we just have
to be careful about these constraints.

> I have wondered if we should strictly require the expression to be in
> the target list even if nonvolatile, but prepare_sort_from_pathkeys
> doesn't seem to think that's necessary, so I'm comfortable with that
> unless there's something you know we haven't considered.

No, prepare_sort_from_pathkeys is happy to build a sort expression if it
can't find it already computed in the input.  The secret here is that
we should never get to that code with a "dangerous" sort expression,
because we should never have made a Path that would request such a thing
in the first place.  It's fairly obvious to me how we deal with the
consideration for volatile sortkeys.  We cannot restrict parallel-unsafe
sortkeys quite the same way, because the restriction only applies to
parallel not non-parallel plans.  Maybe it's sufficient to mark Paths
as parallel unsafe if they sort by parallel-unsafe sortkeys?  Is there
anyplace that is Just Assuming that paths it builds will be
parallel-safe?

regards, tom lane

[1] https://www.postgresql.org/message-id/28216.1023340706%40sss.pgh.pa.us




Re: mark/restore failures on unsorted merge joins

2020-11-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Oh, sorry, I misread your comment to be that you wanted to add a
 Tom> field to IndexAmRoutine. You're right, the real issue here is that
 Tom> ExecSupportsMarkRestore lacks any convenient access to the needed
 Tom> info, and we need to add a bool to IndexOptInfo to fix that.

 Tom> I don't see any compelling reason why you couldn't add the field
 Tom> at the end in the back branches; that's what we usually do to
 Tom> avoid ABI breaks. Although actually (counts fields...) it looks
 Tom> like there's at least one pad byte after amcanparallel, so you
 Tom> could add a bool there without any ABI consequence, resulting in a
 Tom> reasonably natural field order in all branches.

I guess that's close enough; this should suffice then.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index e2154ba86a..0c10f1d35c 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -417,6 +417,11 @@ ExecSupportsMarkRestore(Path *pathnode)
 	{
 		case T_IndexScan:
 		case T_IndexOnlyScan:
+			/*
+			 * Not all index types support mark/restore.
+			 */
+			return castNode(IndexPath, pathnode)->indexinfo->amcanmarkpos;
+
 		case T_Material:
 		case T_Sort:
 			return true;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 52c01eb86b..3e94256d34 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -284,6 +284,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
 			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
 relation->rd_tableam->scan_bitmap_next_block != NULL;
+			info->amcanmarkpos = (amroutine->ammarkpos != NULL &&
+  amroutine->amrestrpos != NULL);
 			info->amcostestimate = amroutine->amcostestimate;
 			Assert(info->amcostestimate != NULL);
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index abe6f570e3..5a10c1855d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -864,6 +864,7 @@ struct IndexOptInfo
 	bool		amhasgettuple;	/* does AM have amgettuple interface? */
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 	bool		amcanparallel;	/* does AM support parallel scan? */
+	bool		amcanmarkpos;	/* does AM support mark/restore? */
 	/* Rather than include amapi.h here, we declare amcostestimate like this */
 	void		(*amcostestimate) ();	/* AM's cost estimator */
 };


Re: libpq compression

2020-11-24 Thread Daniil Zakhlystov
I completely agree that backward-compatibility is important here.

I think that it is a good idea to clarify how the compression establishment 
works in the current version of the patch:

1. Frontend send the startup packet which may look like this:

_pq_.compression = 'zlib,zstd' (I omitted the variations with compression 
levels for clarity)

Then, on the backend, there are two possible cases:
2.1 If the backend is too old and doesn't know anything about the compression 
or if the compression is disabled on the backend, it just ignores the 
compression parameter
2.2 In the other case, the backend intersects the client compression method 
with its own supported ones and responds with compressionAck message which 
contains the index of the chosen compression method (or '-1' if it doesn't 
support any of the methods provided).

If the frontend receives the compressionAck message, there is also two cases:
3.1 If compressionAck contains '-1', do not initiate compression
3.2 In the other case, initialize the chosen compression method immediately.

My idea is that we can add new compression approaches in the future and 
initialize them differently on step 3.2.

For example, in the case of switchable compression:

1. Client sends a startup packet with _pq_.compression = 'switchable,zlib,zstd' 
- it means that client wants switchable compression or permanent zlib/zstd 
compression.

Again, two main cases on the backend:
2.1 Backend doesn't know about any compression or compression turned off => 
ignore the _pq_.compression

2.2.1 If the backend doesn't have switchable compression implemented, it won't 
have 'switchable' in his supported methods. So it will simply discard this 
method in the process of the intersection of the client and frontend 
compression methods and respond with some compressionAck message - choose 
permanent zlib, zstd, or nothing (-1).

2.2.2 If the backend supports switchable on the fly compression, it will have 
'switchable' in his supported methods so it may choose 'switchable' in his 
compressionAck response.

After that, on the frontend side:
3.1 If compressionAck contains '-1', do not initiate compression

3.2.1 If compressionAck has 'zstd' or 'zlib' as the chosen compression method, 
init permanent streaming compression immediately.

3.2.2 If compressionAck has 'switchable' as the chosen compression method, init 
the switchable compression. Initialization may involve sending some additional 
messages to the backend to negotiate the details like the supported switchable 
on the fly compression methods or any other details.

The same applies to the compression with the different algorithms in each 
direction. We can call it, for example, 'directional-specific' and init 
differently on step 3.2. The key is that we don't even have to decide the exact 
initialization protocol for 'switchable' and 'direction-specific'. It may be 
added in the future.

Basically, this is what I’ve meant in my previous message about the future 
expansion of the current design, I hope that I managed to clarify it.

Thanks,

Daniil Zakhlystov

> On Nov 24, 2020, at 11:35 PM, Robert Haas  wrote:
> 
> On Tue, Nov 24, 2020 at 12:35 PM Daniil Zakhlystov
>  wrote:
>> To sum up, I think that the current implementation already introduces good 
>> benefits. As I proposed in the Usability review, we may introduce the new 
>> approaches later as separate compression 'algorithms'.
> 
> I don't think the current patch is so close to being committable that
> we shouldn't be considering what we really want to have here. It's one
> thing to say, well, this patch is basically done, let's not start
> redesigning it now. But that's not the case here. For example, I don't
> see any committer accepting the comments in zpq_stream.c as adequate,
> or the documentation, either. Some comments that have been made
> previously, like Andres's remark about the non-standard message
> construction in pq_configure(), have not been addressed, and I do not
> think any committer is going to agree with the idea that the novel
> method chosen by the patch is superior here, not least but not only
> because it seems like it's endian-dependent. That function also uses
> goto, which anybody thinking of committing this will surely try to get
> rid of, and I'm pretty sure the sscanf() isn't good enough to reject
> trailing garbage, and the error message that follows is improperly
> capitalized. I'm sure there's other stuff, too: this is just based on
> a quick look.
> 
> Before we start worrying about any of that stuff in too much detail, I
> think it makes a lot of sense to step back and consider the design.
> Honestly, the work of changing the design might be smaller than the
> amount of cleanup the patch needs. But even if it's larger, it's
> probably not vastly larger. And in any case, I quite disagree with the
> idea that we should commit to a user-visible interface that exposes a
> subset of the functionality that we needed and then try to glue t

Re: mark/restore failures on unsorted merge joins

2020-11-24 Thread Tom Lane
Andrew Gierth  writes:
> I guess that's close enough; this should suffice then.

Looks about right.  Not sure if we need to bother with a regression test
case; once that's in, it'd be hard to break it.

regards, tom lane




Ready For Committer patches (CF 2020-11)

2020-11-24 Thread Anastasia Lubennikova

Hi,

with this message, I want to draw attention to the RFC patches on the 
current commitfest. It would be good if committers could take a look at 
them.


While doing a sweep through the CF, I have kicked a couple of entries 
back to Waiting on author, so now the list is correct. Now we have 17 
entries.


https://commitfest.postgresql.org/30/?status=3

The oldest RFC patches are listed below. These entries are quite large 
and complex. Still they got a good amount of reviews and it looks like 
after a long discussion they are in a good shape.


Generic type subscripting 

schema variables, LET command 

pgbench - add pseudo-random permutation function

Allow REINDEX, CLUSTER and VACUUM FULL to rebuild on new 
TABLESPACE/INDEX_TABLESPACE 


range_agg / multiranges 

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread David Rowley
On Wed, 25 Nov 2020 at 04:48, Peter Eisentraut
 wrote:
>
> On 2020-11-24 01:52, Dagfinn Ilmari Mannsåker wrote:
> > The Clang documentation¹ suggest an even neater solution, which would
> > eliminate the repetitive empty pg_attribute_foo #defines in the trailing
> > #else/#endif block in commit 1fa22a43a56e1fe44c7bb3a3d5ef31be5bcac41d:
> >
> > #ifndef __has_attribute
> > #define __has_attribute(x) 0
> > #endif
>
> Yes, this was also mentioned and agreed earlier in the thread, but then
> we apparently forgot to update the patch.

I wanted to let the buildfarm settle a bit before changing this again.
I plan on making the change today.

(I know walleye is still not happy)

David




Re: bug in pageinspect's "tuple data" feature

2020-11-24 Thread Robert Haas
On Sat, Nov 21, 2020 at 2:32 PM Alvaro Herrera  wrote:
> If you have a sufficiently broken data page, pageinspect throws an
> error when trying to examine the page:
>
> ERROR:  invalid memory alloc request size 18446744073709551451
>
> This is pretty unhelpful; it would be better not to try to print the
> data instead of dying.  With that, at least you can know where the
> problem is.
>
> This was introduced in d6061f83a166 (2015).  Proposed patch to fix it
> (by having the code print a null "data" instead of dying) is attached.

So I agree with the problem statement and I don't have any particular
objection to the patch as proposed, but I think it's just the tip of
the iceberg. These functions are tremendously careless about
validating the input data and will crash if you breath on them too
hard; we've just band-aided around that problem by making them
super-user only (e.g. see 3e1338475ffc2eac25de60a9de9ce689b763aced).
While that does address the security concern since superusers can do
tons of bad things anyway, it's not much help if you want to actually
be able to run them on damaged pages without having the world end, and
it's no help at all if you'd like to let them be run by a
non-superuser, since a constructed page can easily blow up the world.
The patch as proposed fixes one of many problems in this area and may
well be useful enough to commit without doing anything else, but I'd
actually really like to see us do the same sort of hardening here that
is present in the recently-committed amcheck-for-heap support, which
is robust against a wide variety of things of this sort rather than
just this one particularly. Again, this is not to say that you should
be on the hook for that; it's a general statement.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: mark/restore failures on unsorted merge joins

2020-11-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> I guess that's close enough; this should suffice then.

 Tom> Looks about right. Not sure if we need to bother with a regression
 Tom> test case; once that's in, it'd be hard to break it.

We could check the EXPLAIN output (since the Materialize node would show
up), but it's not easy to get stable plans since the choice of which
path to put on the outside is not fixed. Based on what I found when
actually testing the code, it probably wouldn't be worth the effort.

-- 
Andrew (irc:RhodiumToad)




Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread David Rowley
On Wed, 25 Nov 2020 at 04:55, Tom Lane  wrote:
>
> walleye's been failing since this patchset went in:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2020-11-24%2000%3A25%3A31
>
> ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith 
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels 
> -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type 
> -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard 
> -Wno-format-truncation -Wno-stringop-truncation -g -O2 -I../../../src/include 
>  -I./src/include/port/win32 -I/c/msys/local/include  -I/c/Python35/include 
> -I/c/OpenSSL-Win64/include -I/c/msys/local/include 
> "-I../../../src/include/port/win32" -DWIN32_STACK_RLIMIT=4194304 
> -DBUILDING_DLL  -c -o autovacuum.o autovacuum.c
> C:\\Users\\BUILDE~1.SER\\AppData\\Local\\Temp\\cc4HR3xZ.s: Assembler messages:
> C:\\Users\\BUILDE~1.SER\\AppData\\Local\\Temp\\cc4HR3xZ.s:5900: Error: 
> .seh_savexmm offset is negative
> make[3]: *** [autovacuum.o] Error 1
>
> I have no idea what to make of that, but it looks more like a compiler bug
> than anything else.

That's about the best I could come up with too when looking at that
yesterday.  The message gives me the impression that it might be
related to code arrangement. It does seem to be the assembler that's
complaining.

I wondered if #if !defined(__MINGW32__) && !defined(__MINGW64__) would
be the correct fix for it... aka, just define the new
pg_attribute_(hot|cold) macros to empty on MinGW.

David




Re: mark/restore failures on unsorted merge joins

2020-11-24 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Looks about right. Not sure if we need to bother with a regression
>  Tom> test case; once that's in, it'd be hard to break it.

> We could check the EXPLAIN output (since the Materialize node would show
> up), but it's not easy to get stable plans since the choice of which
> path to put on the outside is not fixed. Based on what I found when
> actually testing the code, it probably wouldn't be worth the effort.

If it's not easy to test, I agree it's not worth it.

(Given how long it took anyone to notice this, the difficulty of
making a stable test case is unsurprising, perhaps.)

regards, tom lane




Re: libpq compression

2020-11-24 Thread Konstantin Knizhnik



On 24.11.2020 21:35, Robert Haas wrote:

On Tue, Nov 24, 2020 at 12:35 PM Daniil Zakhlystov
 wrote:

To sum up, I think that the current implementation already introduces good 
benefits. As I proposed in the Usability review, we may introduce the new 
approaches later as separate compression 'algorithms'.

I don't think the current patch is so close to being committable that
we shouldn't be considering what we really want to have here. It's one
thing to say, well, this patch is basically done, let's not start
redesigning it now. But that's not the case here. For example, I don't
see any committer accepting the comments in zpq_stream.c as adequate,
or the documentation, either. Some comments that have been made
previously, like Andres's remark about the non-standard message
construction in pq_configure(), have not been addressed, and I do not
think any committer is going to agree with the idea that the novel
method chosen by the patch is superior here, not least but not only
because it seems like it's endian-dependent. That function also uses
goto, which anybody thinking of committing this will surely try to get
rid of, and I'm pretty sure the sscanf() isn't good enough to reject
trailing garbage, and the error message that follows is improperly
capitalized. I'm sure there's other stuff, too: this is just based on
a quick look.

Before we start worrying about any of that stuff in too much detail, I
think it makes a lot of sense to step back and consider the design.
Honestly, the work of changing the design might be smaller than the
amount of cleanup the patch needs. But even if it's larger, it's
probably not vastly larger. And in any case, I quite disagree with the
idea that we should commit to a user-visible interface that exposes a
subset of the functionality that we needed and then try to glue the
rest of the functionality on top of it later. If we make a libpq
connection option called compression that controls the type of
compression that is used in both direction, then how exactly would we
extend that later to allow for different compression in the two
directions? Some syntax like compression=zlib/none, where the value
before the slash controls one direction and the value after the slash
controls the other? Maybe. But on the other hand, maybe it's better to
have separate connection options for client compression and server
compression. Or, maybe the kind of compression used by the server
should be controlled via a GUC rather than a connection option. Or,
maybe none of that is right and we should stick with the approach the
patch currently takes. But it's not like we can do something for v1
and then just change things randomly later: there will be
backward-compatibility to worry about. So the time to talk about the
general approach here is now, before anything gets committed, before
the project has committed itself to any particular design. If we
decide in that discussion that certain things can be left for the
future, that's fine. If we've have discussed how they could be added
without breaking backward compatibility, even better. But we can't
just skip over having that discussion.

--
Robert Haas
EDB: http://www.enterprisedb.com


First of all thank you for review.
And I completely agree with you that this patch is not ready for 
committing -
at least documentation written in my bad english has to be checked and 
fixed. Also I have not tesyed it much at Windows

and other non-unix systems.

I do not want to discuss small technical things like sending compression 
message
in pq_configure. I have answered Andres why I can not use standard 
functions in this case:
just because this function is called in initialization of connection and 
so connection handle is not ready yet.
But this definitely can be changed (although it is not endian-dependent: 
libpq messages are using big-endian order).
Also it seems to be strange to discuss presence of "goto" in the code: 
we are not"puritans", are we? ;)
Yes, I also try to avoid use of goto-s as much as possible as them can 
make code less readable.
But complete prohibiting of goto-s and replacing them with artificial 
loops and redundant checks seems to be
a kind of radical approaches which rarely lead to something good. By the 
way - there 3 thousands gotos in Postgres code

(may be some of them are in generated code - I have not checked:)

So lets discuss more fundamental things, like your suggestion for 
complete redesign of compression support.
I am not against such discussion, although I personally do not think 
that there are a lot of topics for discussion here.
Definitely I do not want to say that my implementation is perfect and it 
can not be reimplemented in better way. Certainly it can.
But compression itself, especially compression of protocol messages is 
not a novel area. It was implemented many times in many
different systems (recently it was added to mysql) and it is hard to 
find some "afflatus" here.


We can suggest many different things whic

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread Tom Lane
David Rowley  writes:
> On Wed, 25 Nov 2020 at 04:55, Tom Lane  wrote:
>> walleye's been failing since this patchset went in:
>> I have no idea what to make of that, but it looks more like a compiler bug
>> than anything else.

> I wondered if #if !defined(__MINGW32__) && !defined(__MINGW64__) would
> be the correct fix for it... aka, just define the new
> pg_attribute_(hot|cold) macros to empty on MinGW.

I'd make any such fix as narrow as possible (ie MINGW64 only, based on
present evidence).  It'd be nice to have a compiler version upper bound
too, in the hopes that they'd fix it in future.  Maybe something like
"#if defined(__MINGW64__) && defined(__GNUC__) && __GNUC__ <= 8" ?

regards, tom lane




Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread Alvaro Herrera
On 2020-Nov-24, Tom Lane wrote:

> David Rowley  writes:
> > On Wed, 25 Nov 2020 at 04:55, Tom Lane  wrote:
> >> walleye's been failing since this patchset went in:
> >> I have no idea what to make of that, but it looks more like a compiler bug
> >> than anything else.
> 
> > I wondered if #if !defined(__MINGW32__) && !defined(__MINGW64__) would
> > be the correct fix for it... aka, just define the new
> > pg_attribute_(hot|cold) macros to empty on MinGW.
> 
> I'd make any such fix as narrow as possible (ie MINGW64 only, based on
> present evidence).  It'd be nice to have a compiler version upper bound
> too, in the hopes that they'd fix it in future.  Maybe something like
> "#if defined(__MINGW64__) && defined(__GNUC__) && __GNUC__ <= 8" ?

Apparently the bug was fixed days after it was reported,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86048
but they haven't made a release containing the fix yet.




Re: libpq compression

2020-11-24 Thread Konstantin Knizhnik



On 24.11.2020 20:34, Daniil Zakhlystov wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   tested, failed
Documentation:tested, failed

Submission review
--
Is the patch in a patch format which has context? (eg: context diff format)
NO, need to fix
Does it apply cleanly to the current git master?
YES
Does it include reasonable tests, necessary doc patches, etc?
have docs, missing tests

Usability review
--
At the moment, the patch supports per-connection (permanent) compression. The 
frontend can specify the desired compression algorithms and compression levels 
and then negotiate the compression algorithm that is going to be used with the 
backend. In current state patch is missing the ability to enable/disable the 
compression on the backend side, I think it might be not great from the 
usability side.

Regarding on-the-fly configurable compression and different compression 
algorithms for each direction - these two ideas are promising but tend to make 
the implementation more complex. However, the current implementation can be 
extended to support these approaches in the future. For example, we can specify 
switchable on-the-fly compression as ‘switchable’ algorithm and negotiate it 
like the regular compression algorithm (like we currently negotiate ‘zstd’ and 
‘zlib’). ‘switchable’ algorithm may then introduce new specific messages to 
Postgres protocol to make the on-the-fly compression magic work.

The same applies to Robert’s idea of the different compression algorithms for 
different directions - we can introduce it later as a new compression algorithm 
with new specific protocol messages.

Does the patch actually implement that?
YES

Do we want that?
YES

Do we already have it?
NO

Does it follow SQL spec, or the community-agreed behavior?
To be discussed

Does it include pg_dump support (if applicable)?
not applicable

Are there dangers?
theoretically possible CRIME-like attack when using with SSL enabled

Have all the bases been covered?
To be discussed


Feature test
--

I’ve applied the patch, compiled, and tested it with configure options 
--enable-cassert and --enable-debug turned on. I’ve tested the following 
scenarios:

1. make check
===
  All 201 tests passed.
===

2. make check-world
initially failed with:
== running regression test queries==
test postgres_fdw ... FAILED 4465 ms
== shutting down postmaster   ==
==
1 of 1 tests failed.
==
The differences that caused some tests to fail can be viewed in the
file "/xxx/xxx/review/postgresql/contrib/postgres_fdw/regression.diffs".  A copy of the 
test summary that you see above is saved in the file 
"/xxx/xxx/review/postgresql/contrib/postgres_fdw/regression.out".

All tests passed after replacing ‘gsslib, target_session_attrs’ with ‘gsslib, 
compression, target_session_attrs’ in line 8914 of 
postgresql/contrib/postgres_fdw/expected/postgres_fdw.out

3. simple psql utility usage
psql -d "host=xxx port=5432 dbname=xxx user=xxx compression=1"

4. pgbench tpcb-like w/ SSL turned ON
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=require compression=1" 
--builtin tpcb-like -t 70 --jobs=32 --client=700

5. pgbench tpcb-like w/ SSL turned OFF
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=disable compression=1" 
--builtin tpcb-like -t 70 --jobs=32 --client=700

6. pgbench initialization w/ SSL turned ON
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=require compression=1" 
-i -s 500

7. pgbench initialization w/ SSL turned OFF
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=disable compression=1" 
-i -s 500

8. Streaming physical replication. Recovery-related parameters
recovery_target_timeline = 'latest'
primary_conninfo = 'host=xxx port=5432 user=repl application_name=xxx 
compression=1'
primary_slot_name = 'xxx'
restore_command = 'some command'

9. This compression has been implemented in an experimental build of odyssey 
connection pooler and tested with ~1500 synthetic simultaneous clients 
configuration and ~300 GB databases.
During the testing, I’ve reported and fixed some of the issues.

Does the feature work as advertised?
YES
Are there corner cases the author has failed to consider?
NO
Are there any assertion failures or crashes?
NO


Performance review
--

Does the patch slow down simple tests?
NO

If it claims to improve performance, does it?
YES

Does it slow down other things?
Using compression may add a CPU overhead. This mostly depends on compression 
algorithm and chosen compression level. During testing with ZSTD algorithm and 
compression level 1 there was about 10% of CPU overhead in read/write balanced 
scenarios and almost no overhead in mostly read scenarios.


Coding review

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Nov-24, Tom Lane wrote:
>> I'd make any such fix as narrow as possible (ie MINGW64 only, based on
>> present evidence).  It'd be nice to have a compiler version upper bound
>> too, in the hopes that they'd fix it in future.  Maybe something like
>> "#if defined(__MINGW64__) && defined(__GNUC__) && __GNUC__ <= 8" ?

> Apparently the bug was fixed days after it was reported,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86048
> but they haven't made a release containing the fix yet.

Ah, great sleuthing!  So that says it occurs in 8.1 only, meaning
the version test could be like

#if defined(__MINGW64__) && __GNUC__ == 8 && __GNUC_MINOR__ == 1
// lobotomized code here
#else ...

It's not entirely clear from that bug report whether it can manifest on
gcc 8.1 on other platforms; maybe we should test for x86 in general
not __MINGW64__.

regards, tom lane




Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-24 Thread Alvaro Herrera
On 2020-Nov-23, Andres Freund wrote:

> On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

> > PROC_IN_LOGICAL_DECODING:
> > Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
> > ReplicationSlotRelease might be the most problematic one of the lot.
> > That's because a proc's xmin that had been ignored all along by
> > ComputeXidHorizons, will now be included in the computation.  Adding
> > asserts that proc->xmin and proc->xid are InvalidXid by the time we
> > reset the flag, I got hits in pg_basebackup, test_decoding and
> > subscription tests.  I think it's OK for ComputeXidHorizons (since it
> > just means that a vacuum that reads a later will remove less rows.)  But
> > in GetSnapshotData it is just not correct to have the Xmin go backwards.
> 
> I don't think there's a problem. PROC_IN_LOGICAL_DECODING can only be
> set when outside a transaction block, i.e. walsender. In which case
> there shouldn't be an xid/xmin, I think? Or did you gate your assert on
> PROC_IN_LOGICAL_DECODING being set?

Ah, you're right about this one --  I missed the significance of setting
the flag only "when outside of a transaction block" at the time we call
StartupDecodingContext.





Re: Libpq support to connect to standby server as priority

2020-11-24 Thread Tom Lane
Anastasia Lubennikova  writes:
> Hi, this entry is "Waiting on Author" and the thread was inactive for a 
> while. As far as I see, the patch needs some further work.
> Are you going to continue working on it, or should I mark it as 
> "returned with feedback" until a better time?

I'm inclined to go ahead and commit the 0001 patch I posted at [1]
(ie, change the implementation of GUC_REPORT to avoid intra-query
reports), since that addresses a performance problem that's
independent of the goal here.  The rest of this seems to still
be in Greg's court.

Has anyone got an opinion about the further improvement I suggested:

>> As it stands, 0001 reduces the ParameterStatus message traffic to
>> at most one per GUC per query, but it doesn't attempt to eliminate
>> duplicate ParameterStatus messages altogether.  We could do that
>> as a pretty simple adjustment if we're willing to expend the storage
>> to remember the last value sent to the client.  It might be worth
>> doing, since for example the function-SET-clause case would typically
>> lead to no net change in the GUC's value by the end of the query.

On reflection this seems worth doing, since excess client traffic
is far from free.

regards, tom lane

[1] https://www.postgresql.org/message-id/5708.1601145259%40sss.pgh.pa.us




Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-24 Thread Alvaro Herrera
On 2020-Nov-23, Tom Lane wrote:

> Alvaro Herrera  writes:

> > GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:
> 
> > In these cases, what we need is that the code computes some xmin (or
> > equivalent computation) based on a set of transactions that exclude
> > those marked with the flags.  The behavior we want is that if some
> > transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
> > one*.  In other words, if there's no Xid/Xmin, then the flag is not
> > important.  So if we can ensure that the flag is set first, and the
> > Xid/xmin is installed later, that's sufficient correctness and we don't
> > need to hold exclusive lock.  But if we can't ensure that, then we must
> > use exclusive lock, because otherwise we risk another process seeing our
> > Xid first and not our flag, which would be bad.
> 
> I don't buy this either.  You get the same result if someone looks just
> before you take the ProcArrayLock to set the flag.  So if there's a
> problem, it's inherent in the way that the flags are defined or used;
> the strength of lock used in this stanza won't affect it.

The problem is that the writes could be reordered in a way that makes
the Xid appear set to an onlooker before PROC_IN_VACUUM appears set.
Vacuum always sets the bit first, and *then* the xid.  If the reader
always reads it like that then it's not a problem.  But in order to
guarantee that, we would have to have a read barrier for each pass
through the loop.

With the LW_EXCLUSIVE lock, we block the readers so that the bit is
known set by the time they examine it.  As I understand, getting the
lock is itself a barrier, so there's no danger that we'll set the bit
and they won't see it.


... at least, that how I *imagine* the argument to be.  In practice,
vacuum_rel() calls GetSnapshotData() before installing the
PROC_IN_VACUUM bit, and therefore there *is* a risk that reader 1 will
get MyProc->xmin included in their snapshot (because bit not yet set),
and reader 2 won't.  If my understanding is correct, then we should move
the PushActiveSnapshot(GetTransactionSnapshot()) call to after we have
the PROC_IN_VACUUM bit set.




Re: Libpq support to connect to standby server as priority

2020-11-24 Thread Alvaro Herrera
On 2020-Nov-24, Tom Lane wrote:

> I'm inclined to go ahead and commit the 0001 patch I posted at [1]
> (ie, change the implementation of GUC_REPORT to avoid intra-query
> reports), since that addresses a performance problem that's
> independent of the goal here.  The rest of this seems to still
> be in Greg's court.

Sounded a good idea to me.

> Has anyone got an opinion about the further improvement I suggested:
> 
> >> As it stands, 0001 reduces the ParameterStatus message traffic to
> >> at most one per GUC per query, but it doesn't attempt to eliminate
> >> duplicate ParameterStatus messages altogether.  We could do that
> >> as a pretty simple adjustment if we're willing to expend the storage
> >> to remember the last value sent to the client.  It might be worth
> >> doing, since for example the function-SET-clause case would typically
> >> lead to no net change in the GUC's value by the end of the query.
> 
> On reflection this seems worth doing, since excess client traffic
> is far from free.

Agreed.  If this is just a few hundred bytes of server-side local memory
per backend, it seems definitely worth it.




Re: Libpq support to connect to standby server as priority

2020-11-24 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Nov-24, Tom Lane wrote:
>>> As it stands, 0001 reduces the ParameterStatus message traffic to
>>> at most one per GUC per query, but it doesn't attempt to eliminate
>>> duplicate ParameterStatus messages altogether.  We could do that
>>> as a pretty simple adjustment if we're willing to expend the storage
>>> to remember the last value sent to the client.  It might be worth
>>> doing, since for example the function-SET-clause case would typically
>>> lead to no net change in the GUC's value by the end of the query.

>> On reflection this seems worth doing, since excess client traffic
>> is far from free.

> Agreed.  If this is just a few hundred bytes of server-side local memory
> per backend, it seems definitely worth it.

Yeah, given the current set of GUC_REPORT variables, it's hard to see
the storage for their last-reported values amounting to much.  The need
for an extra pointer field in each GUC variable record might eat more
space than the actually-live values :-(

regards, tom lane




Add table access method as an option to pgbench

2020-11-24 Thread David Zhang

Hi Hackers,

I noticed that there is a table access method API added starting from 
PG12. In other words, Postgresql open the door for developers to add 
their own access methods, for example, zheap, zedstore etc. However, the 
current pgbench doesn't have an option to allow user to specify which 
table access method to use during the initialization phase. If we can 
have an option to help user address this issue, it would be great. For 
example, if user want to do a performance benchmark to compare "zheap" 
with "heap", then the commands can be something like below:


pgbench -d -i postgres --table-am=heap

pgbench -d -i postgres --table-am=zheap

I know there is a parameter like below available in postgresql.conf:

#default_table_access_method = 'heap'

But, providing another option for the end user may not be a bad idea, 
and it might make the tests easier at some points.


The attached file is quick patch for this.

Thoughts?


Thank you,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
From 798f970e22034d33146265ed98922c605d0dc237 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Tue, 24 Nov 2020 15:14:42 -0800
Subject: [PATCH] add table access method option to pgbench

---
 src/bin/pgbench/pgbench.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 08a5947a9e..24bc6bdbe3 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -188,6 +188,11 @@ int64  latency_limit = 0;
 char  *tablespace = NULL;
 char  *index_tablespace = NULL;
 
+/*
+ * table access method selection
+ */
+char  *tableam = NULL;
+
 /*
  * Number of "pgbench_accounts" partitions.  0 is the default and means no
  * partitioning.
@@ -643,6 +648,7 @@ usage(void)
   "  --partitions=NUM partition pgbench_accounts into 
NUM parts (default: 0)\n"
   "  --tablespace=TABLESPACE  create tables in the specified 
tablespace\n"
   "  --unlogged-tablescreate tables as unlogged 
tables\n"
+  "  --table-am=TABLEAM   create tables using specified 
access method\n"
   "\nOptions to select what to run:\n"
   "  -b, --builtin=NAME[@W]   add builtin script NAME weighted 
at W (default: 1)\n"
   "   (use \"-b list\" to list 
available scripts)\n"
@@ -3750,12 +3756,14 @@ initCreateTables(PGconn *con)
for (i = 0; i < lengthof(DDLs); i++)
{
charopts[256];
+   charopam[256];
charbuffer[256];
const struct ddlinfo *ddl = &DDLs[i];
const char *cols;
 
/* Construct new create table statement. */
opts[0] = '\0';
+   opam[0] = '\0';
 
/* Partition pgbench_accounts table */
if (partition_method != PART_NONE && strcmp(ddl->table, 
"pgbench_accounts") == 0)
@@ -3776,11 +3784,22 @@ initCreateTables(PGconn *con)
PQfreemem(escape_tablespace);
}
 
+   if (tableam != NULL)
+   {
+   char   *escape_tableam;
+
+   escape_tableam = PQescapeIdentifier(con, tableam,
+   
   strlen(tableam));
+   snprintf(opam + strlen(opam), sizeof(opam) - 
strlen(opam),
+" using %s", escape_tableam);
+   PQfreemem(escape_tableam);
+   }
+
cols = (scale >= SCALE_32BIT_THRESHOLD) ? ddl->bigcols : 
ddl->smcols;
 
-   snprintf(buffer, sizeof(buffer), "create%s table %s(%s)%s",
+   snprintf(buffer, sizeof(buffer), "create%s table %s(%s)%s%s",
 unlogged_tables ? " unlogged" : "",
-ddl->table, cols, opts);
+ddl->table, cols, opam, opts);
 
executeStatement(con, buffer);
}
@@ -5422,6 +5441,7 @@ main(int argc, char **argv)
{"show-script", required_argument, NULL, 10},
{"partitions", required_argument, NULL, 11},
{"partition-method", required_argument, NULL, 12},
+   {"table-am", required_argument, NULL, 13},
{NULL, 0, NULL, 0}
};
 
@@ -5795,6 +5815,10 @@ main(int argc, char **argv)
exit(1);
}
break;
+   case 13:/* table-am */
+   initialization_option_set = true;
+   tableam = pg_strdup(optarg);
+   break;
   

RE: A few new options for CHECKPOINT

2020-11-24 Thread tsunakawa.ta...@fujitsu.com
From: Bossart, Nathan 
> The main purpose of this patch is to give users more control over their 
> manually
> requested checkpoints or restartpoints.  I suspect the most useful option is
> IMMEDIATE, which can help avoid checkpoint- related IO spikes.  However, I
> didn't see any strong reason to prevent users from also adjusting FORCE and
> WAIT.

I think just IMMEDIATE would suffice, too.  But could you tell us why you got 
to want to give users more control?  Could we know concrete example situations 
where users want to perform CHECKPOINT with options?


Regards
Takayuki Tsunakawa



Re: About adding a new filed to a struct in primnodes.h

2020-11-24 Thread Andy Fan
On Tue, Nov 24, 2020 at 11:11 PM Alvaro Herrera 
wrote:

> On 2020-Nov-24, Andy Fan wrote:
>
> > then we modified the copy/read/out functions for this node.  In
> > _readFuncExpr,
> > we probably add something like
>
> > [ ... ]
>
> > Then we will get a compatible issue if we create a view with the node in
> > the older version and access the view with the new binary.
>
> When nodes are modified, you have to increment CATALOG_VERSION_NO which
> makes the new code incompatible with a datadir previously created


Thank you Alvaro. I just think this issue can be avoided without causing
the incompatible issue. IIUC, after the new binary isn't compatible with
the datadir,  the user has to dump/restore the whole database or use
pg_upgrade.  It is kind of extra work.


-- for precisely this reason.
>

I probably didn't get the real point of this,  sorry about that.

-- 
Best Regards
Andy Fan


RE: [PoC] Non-volatile WAL buffer

2020-11-24 Thread tsunakawa.ta...@fujitsu.com
From: Tomas Vondra 
> It's interesting that they only place the tail of the log on PMEM, i.e.
> the PMEM buffer has limited size, and the rest of the log is not on
> PMEM. It's a bit as if we inserted a PMEM buffer between our wal buffers
> and the WAL segments, and kept the WAL segments on regular storage. That
> could work, but I'd bet they did that because at that time the NV
> devices were much smaller, and placing the whole log on PMEM was not
> quite possible. So it might be unnecessarily complicated, considering
> the PMEM device capacity is much higher now.
> 
> So I'd suggest we simply try this:
> 
> clients -> buffers (DRAM) -> wal segments (PMEM)
> 
> I plan to do some hacking and maybe hack together some simple tools to
> benchmarks various approaches.

I'm in favor of your approach.  Yes, Intel PMEM were available in 128/256/512 
GB when I checked last year.  That's more than enough to place all WAL 
segments, so a small PMEM wal buffer is not necessary.  I'm excited to see 
Postgres gain more power.


Regards
Takayuki Tsunakawa



Re: A few new options for CHECKPOINT

2020-11-24 Thread Bossart, Nathan
On 11/24/20, 4:03 PM, "tsunakawa.ta...@fujitsu.com" 
 wrote:
> From: Bossart, Nathan 
>> The main purpose of this patch is to give users more control over their 
>> manually
>> requested checkpoints or restartpoints.  I suspect the most useful option is
>> IMMEDIATE, which can help avoid checkpoint- related IO spikes.  However, I
>> didn't see any strong reason to prevent users from also adjusting FORCE and
>> WAIT.
>
> I think just IMMEDIATE would suffice, too.  But could you tell us why you got 
> to want to give users more control?  Could we know concrete example 
> situations where users want to perform CHECKPOINT with options?

It may be useful for backups taken with the "consistent snapshot"
approach.  As noted in the documentation [0], running CHECKPOINT
before taking the snapshot can reduce recovery time.  However, users
might wish to avoid the IO spike caused by an immediate checkpoint.

Nathan

[0] https://www.postgresql.org/docs/devel/backup-file.html



Re: Libpq support to connect to standby server as priority

2020-11-24 Thread Tom Lane
I wrote:
> Alvaro Herrera  writes:
>> Agreed.  If this is just a few hundred bytes of server-side local memory
>> per backend, it seems definitely worth it.

> Yeah, given the current set of GUC_REPORT variables, it's hard to see
> the storage for their last-reported values amounting to much.  The need
> for an extra pointer field in each GUC variable record might eat more
> space than the actually-live values :-(

Here's a v2 that does it like that.

regards, tom lane

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7c5f7c775b..34ed0e7558 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4229,6 +4229,9 @@ PostgresMain(int argc, char *argv[],
 pgstat_report_activity(STATE_IDLE, NULL);
 			}
 
+			/* Report any recently-changed GUC options */
+			ReportChangedGUCOptions();
+
 			ReadyForQuery(whereToSendOutput);
 			send_ready_for_query = false;
 		}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bb34630e8e..245a3472bc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -4822,6 +4822,8 @@ static bool guc_dirty;			/* true if need to do commit/abort work */
 
 static bool reporting_enabled;	/* true to enable GUC_REPORT */
 
+static bool report_needed;		/* true if any GUC_REPORT reports are needed */
+
 static int	GUCNestLevel = 0;	/* 1 when in main transaction */
 
 
@@ -5452,6 +5454,7 @@ InitializeOneGUCOption(struct config_generic *gconf)
 	gconf->reset_scontext = PGC_INTERNAL;
 	gconf->stack = NULL;
 	gconf->extra = NULL;
+	gconf->last_reported = NULL;
 	gconf->sourcefile = NULL;
 	gconf->sourceline = 0;
 
@@ -5828,7 +5831,10 @@ ResetAllOptions(void)
 		gconf->scontext = gconf->reset_scontext;
 
 		if (gconf->flags & GUC_REPORT)
-			ReportGUCOption(gconf);
+		{
+			gconf->status |= GUC_NEEDS_REPORT;
+			report_needed = true;
+		}
 	}
 }
 
@@ -6215,7 +6221,10 @@ AtEOXact_GUC(bool isCommit, int nestLevel)
 
 			/* Report new value if we changed it */
 			if (changed && (gconf->flags & GUC_REPORT))
-ReportGUCOption(gconf);
+			{
+gconf->status |= GUC_NEEDS_REPORT;
+report_needed = true;
+			}
 		}		/* end of stack-popping loop */
 
 		if (stack != NULL)
@@ -6257,17 +6266,60 @@ BeginReportingGUCOptions(void)
 		if (conf->flags & GUC_REPORT)
 			ReportGUCOption(conf);
 	}
+
+	report_needed = false;
+}
+
+/*
+ * ReportChangedGUCOptions: report recently-changed GUC_REPORT variables
+ *
+ * This is called just before we wait for a new client query.
+ *
+ * By handling things this way, we ensure that a ParameterStatus message
+ * is sent at most once per variable per query, even if the variable
+ * changed multiple times within the query.  That's quite possible when
+ * using features such as function SET clauses.  Function SET clauses
+ * also tend to cause values to change intraquery but eventually revert
+ * to their prevailing values; ReportGUCOption is responsible for avoiding
+ * redundant reports in such cases.
+ */
+void
+ReportChangedGUCOptions(void)
+{
+	/* Quick exit if not (yet) enabled */
+	if (!reporting_enabled)
+		return;
+
+	/* Quick exit if no values have been changed */
+	if (!report_needed)
+		return;
+
+	/* Transmit new values of interesting variables */
+	for (int i = 0; i < num_guc_variables; i++)
+	{
+		struct config_generic *conf = guc_variables[i];
+
+		if ((conf->flags & GUC_REPORT) && (conf->status & GUC_NEEDS_REPORT))
+			ReportGUCOption(conf);
+	}
+
+	report_needed = false;
 }
 
 /*
  * ReportGUCOption: if appropriate, transmit option value to frontend
+ *
+ * We need not transmit the value if it's the same as what we last
+ * transmitted.  However, clear the NEEDS_REPORT flag in any case.
  */
 static void
 ReportGUCOption(struct config_generic *record)
 {
-	if (reporting_enabled && (record->flags & GUC_REPORT))
+	char	   *val = _ShowOption(record, false);
+
+	if (record->last_reported == NULL ||
+		strcmp(val, record->last_reported) != 0)
 	{
-		char	   *val = _ShowOption(record, false);
 		StringInfoData msgbuf;
 
 		pq_beginmessage(&msgbuf, 'S');
@@ -6275,8 +6327,19 @@ ReportGUCOption(struct config_generic *record)
 		pq_sendstring(&msgbuf, val);
 		pq_endmessage(&msgbuf);
 
-		pfree(val);
+		/*
+		 * We need a long-lifespan copy.  If strdup() fails due to OOM, we'll
+		 * set last_reported to NULL and thereby possibly make a duplicate
+		 * report later.
+		 */
+		if (record->last_reported)
+			free(record->last_reported);
+		record->last_reported = strdup(val);
 	}
+
+	pfree(val);
+
+	record->status &= ~GUC_NEEDS_REPORT;
 }
 
 /*
@@ -7695,7 +7758,10 @@ set_config_option(const char *name, const char *value,
 	}
 
 	if (changeVal && (record->flags & GUC_REPORT))
-		ReportGUCOption(record);
+	{
+		record->status |= GUC_NEEDS_REPORT;
+		report_needed = true;
+	}
 
 	return changeVal ? 1 : -1;
 }
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 073c8f3e06..6a20a3bcec 100644
--- a/src/include/utils/guc.h
++

RE: A few new options for CHECKPOINT

2020-11-24 Thread tsunakawa.ta...@fujitsu.com
From: Bossart, Nathan 
> It may be useful for backups taken with the "consistent snapshot"
> approach.  As noted in the documentation [0], running CHECKPOINT
> before taking the snapshot can reduce recovery time.  However, users
> might wish to avoid the IO spike caused by an immediate checkpoint.
> 
> [0] https://www.postgresql.org/docs/devel/backup-file.html

Ah, understood.  I agree that the slow or spread manual checkpoint is good to 
have.


Regards
Takayuki Tsunakawa



Re: [PoC] Non-volatile WAL buffer

2020-11-24 Thread Ashwin Agrawal
On Sun, Nov 22, 2020 at 5:23 PM Tomas Vondra 
wrote:

> I'm not entirely sure whether the "pmemdax" (i.e. unpatched instance
> with WAL on PMEM DAX device) is actually safe, but I included it anyway
> to see what difference is.


I am curious to learn more on this aspect. Kernels have provided support
for "pmemdax" mode so what part is unsafe in stack.

Reading the numbers it seems only at smaller scale modified PostgreSQL is
giving enhanced benefit over unmodified PostgreSQL with "pmemdax". For most
of other cases the numbers are pretty close between these two setups, so
curious to learn, why even modify PostgreSQL if unmodified PostgreSQL can
provide similar benefit with just DAX mode.


Re: About adding a new filed to a struct in primnodes.h

2020-11-24 Thread Andy Fan
On Wed, Nov 25, 2020 at 8:10 AM Andy Fan  wrote:

>
>
> On Tue, Nov 24, 2020 at 11:11 PM Alvaro Herrera 
> wrote:
>
>> On 2020-Nov-24, Andy Fan wrote:
>>
>> > then we modified the copy/read/out functions for this node.  In
>> > _readFuncExpr,
>> > we probably add something like
>>
>> > [ ... ]
>>
>> > Then we will get a compatible issue if we create a view with the node in
>> > the older version and access the view with the new binary.
>>
>> When nodes are modified, you have to increment CATALOG_VERSION_NO which
>> makes the new code incompatible with a datadir previously created
>
>
> Thank you Alvaro. I just think this issue can be avoided without causing
> the incompatible issue. IIUC, after the new binary isn't compatible with
> the datadir,  the user has to dump/restore the whole database or use
> pg_upgrade.  It is kind of extra work.
>
>
> -- for precisely this reason.
>>
>
> I probably didn't get the real point of this,  sorry about that.
>
> --
> Best Regards
> Andy Fan
>


What I mean here is something like below.

diff --git a/src/backend/nodes/read.c b/src/backend/nodes/read.c
index 8c1e39044c..c3eba00639 100644
--- a/src/backend/nodes/read.c
+++ b/src/backend/nodes/read.c
@@ -29,6 +29,7 @@

 /* Static state for pg_strtok */
 static const char *pg_strtok_ptr = NULL;
+static const char *pg_strtok_extend(int *length, bool testonly);

 /* State flag that determines how readfuncs.c should treat location fields
*/
 #ifdef WRITE_READ_PARSE_PLAN_TREES
@@ -102,6 +103,20 @@ stringToNodeWithLocations(const char *str)
 #endif


+const char*
+pg_strtok(int *length)
+{
+ return pg_strtok_extend(length, false);
+}
+
+/*
+ * Just peak the next filed name without changing the global state.
+ */
+const char*
+pg_peak_next_field(int *length)
+{
+ return pg_strtok_extend(length, true);
+}
 /*
  *
  * the lisp token parser
@@ -149,7 +164,7 @@ stringToNodeWithLocations(const char *str)
  * as a single token.
  */
 const char *
-pg_strtok(int *length)
+pg_strtok_extend(int *length,  bool testonly)
 {
  const char *local_str; /* working pointer to string */
  const char *ret_str; /* start of token to return */
@@ -162,7 +177,8 @@ pg_strtok(int *length)
  if (*local_str == '\0')
  {
  *length = 0;
- pg_strtok_ptr = local_str;
+ if (!testonly)
+ pg_strtok_ptr = local_str;
  return NULL; /* no more tokens */
  }

@@ -199,7 +215,8 @@ pg_strtok(int *length)
  if (*length == 2 && ret_str[0] == '<' && ret_str[1] == '>')
  *length = 0;

- pg_strtok_ptr = local_str;
+ if (!testonly)
+ pg_strtok_ptr = local_str;

  return ret_str;
 }


-- the below is a demo code to use it.
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 76ab5ae8b7..c19cd45793 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -689,13 +689,27 @@ _readFuncExpr(void)

  READ_OID_FIELD(funcid);
  READ_OID_FIELD(funcresulttype);
- READ_BOOL_FIELD(funcretset);
+ token = pg_peak_next_field(&length);
+ if (memcmp(token, ":funcretset", strlen(":funcretset")) == 0)
+ {
+ READ_BOOL_FIELD(funcretset);
+ }
+ else
+ local_node->funcretset = false;
+
  READ_BOOL_FIELD(funcvariadic);
  READ_ENUM_FIELD(funcformat, CoercionForm);
- READ_OID_FIELD(funccollid);
+   READ_OID_FIELD(funccollid);
  READ_OID_FIELD(inputcollid);
  READ_NODE_FIELD(args);
- READ_LOCATION_FIELD(location);
+
+token = pg_peak_next_field(&length);
+ if (memcmp(token, ":location", strlen(":location")) == 0)
+ {
+ READ_LOCATION_FIELD(location);
+ }
+ else
+ local_node->location = -1;

  READ_DONE();
 }


After writing it, I am feeling that this will waste a bit of performance
since we need to token a part of the string twice.  But looks the overhead
looks good to me and can be avoided if we refactor the code again with
"read_fieldname_or_nomove(char *fieldname,  int *length) " function.


-- 
Best Regards
Andy Fan


Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread Tom Lane
Alvaro Herrera  writes:
>> On Wed, 25 Nov 2020 at 04:55, Tom Lane  wrote:
>>> walleye's been failing since this patchset went in:
>>> I have no idea what to make of that, but it looks more like a compiler bug
>>> than anything else.

> Apparently the bug was fixed days after it was reported,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86048
> but they haven't made a release containing the fix yet.

Wait ... the second part of that doesn't seem to be true.
According to

http://mingw-w64.org/doku.php/versions

mingw-w64 has made at least three releases since this
bug was fixed.  Surely they're shipping something newer
than 8.1.0 by now.

So maybe, rather than hacking up the attribute stuff for
a bug that might bite us again anyway in future, we ought
to press walleye's owner to install a more recent compiler.

regards, tom lane




Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread David Rowley
On Wed, 25 Nov 2020 at 14:28, Tom Lane  wrote:
> So maybe, rather than hacking up the attribute stuff for
> a bug that might bite us again anyway in future, we ought
> to press walleye's owner to install a more recent compiler.

I think that seems like a better idea.  I had thoughts about
installing a quick for now to give the owner of walleye a bit of time
for the upgrade.  From what I can tell, the latest version of minGW
comes with GCC 9.2 [1]

David

[1] https://osdn.net/projects/mingw/releases/




Re: About adding a new filed to a struct in primnodes.h

2020-11-24 Thread Tom Lane
Andy Fan  writes:
> What I mean here is something like below.

What exactly would be the value of that?

There is work afoot, or at least on people's to-do lists, to mechanize
creation of the outfuncs/readfuncs/etc code directly from the Node
struct declarations.  So special cases for particular fields are going
to be looked on with great disfavor, even if you can make a case that
there's some reason to do it.  (Which I'm not seeing.  We've certainly
never had to do it in the past.)

regards, tom lane




Re: [PoC] Non-volatile WAL buffer

2020-11-24 Thread Tomas Vondra
On 11/25/20 1:27 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
>> It's interesting that they only place the tail of the log on PMEM,
>> i.e. the PMEM buffer has limited size, and the rest of the log is
>> not on PMEM. It's a bit as if we inserted a PMEM buffer between our
>> wal buffers and the WAL segments, and kept the WAL segments on
>> regular storage. That could work, but I'd bet they did that because
>> at that time the NV devices were much smaller, and placing the
>> whole log on PMEM was not quite possible. So it might be
>> unnecessarily complicated, considering the PMEM device capacity is
>> much higher now.
>> 
>> So I'd suggest we simply try this:
>> 
>> clients -> buffers (DRAM) -> wal segments (PMEM)
>> 
>> I plan to do some hacking and maybe hack together some simple tools
>> to benchmarks various approaches.
> 
> I'm in favor of your approach.  Yes, Intel PMEM were available in
> 128/256/512 GB when I checked last year.  That's more than enough to
> place all WAL segments, so a small PMEM wal buffer is not necessary.
> I'm excited to see Postgres gain more power.
>

Cool. FWIW I'm not 100% sure it's the right approach, but I think it's
worth testing. In the worst case we'll discover that this architecture
does not allow fully leveraging PMEM benefits, or maybe it won't work
for some other reason and the approach proposed here will work better.
Let's play a bit and we'll see.

I have hacked a very simple patch doing this (essentially replacing
open/write/close calls in xlog.c with pmem calls). It's a bit rough but
seems good enough for testing/experimenting. I'll polish it a bit, do
some benchmarks, and share some numbers in a day or two.


regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread David Rowley
On Wed, 25 Nov 2020 at 14:35, David Rowley  wrote:
>
> On Wed, 25 Nov 2020 at 14:28, Tom Lane  wrote:
> > So maybe, rather than hacking up the attribute stuff for
> > a bug that might bite us again anyway in future, we ought
> > to press walleye's owner to install a more recent compiler.
>
> I think that seems like a better idea.  I had thoughts about
> installing a quick for now to give the owner of walleye a bit of time
> for the upgrade.  From what I can tell, the latest version of minGW
> comes with GCC 9.2 [1]

So, how about the attached today and I'll email Joseph about walleye
and see if he can upgrade to a newer minGW version.

David
diff --git a/src/include/c.h b/src/include/c.h
index 3d04749007..4257300601 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -200,6 +200,21 @@
 #define pg_noinline
 #endif
 
+/*
+ * For now, just define pg_attribute_cold and pg_attribute_hot to be empty
+ * macros on minGW 8.1.  There appears to be a compiler bug that results in
+ * compilation failure.  At this time, we still have at least one buildfarm
+ * animal running that compiler, so this should make that green again. It's
+ * likely this compiler is not popular enough to warrant keeping this code
+ * around forever, so let's just remove it once the last buildfarm animal
+ * upgrades.
+ */
+#if defined(__MINGW64__) && __GNUC__ == 8 && __GNUC_MINOR__ == 1
+
+#define pg_attribute_cold
+#define pg_attribute_hot
+
+#else
 /*
  * Marking certain functions as "hot" or "cold" can be useful to assist the
  * compiler in arranging the assembly code in a more efficient way.
@@ -216,6 +231,7 @@
 #define pg_attribute_hot
 #endif
 
+#endif /* defined(__MINGW64__) && __GNUC__ == 8 && __GNUC_MINOR__ == 1 
*/
 /*
  * Mark a point as unreachable in a portable fashion.  This should preferably
  * be something that the compiler understands, to aid code generation.


Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread Tom Lane
David Rowley  writes:
> On Wed, 25 Nov 2020 at 14:28, Tom Lane  wrote:
>> So maybe, rather than hacking up the attribute stuff for
>> a bug that might bite us again anyway in future, we ought
>> to press walleye's owner to install a more recent compiler.

> I think that seems like a better idea.  I had thoughts about
> installing a quick for now to give the owner of walleye a bit of time
> for the upgrade.  From what I can tell, the latest version of minGW
> comes with GCC 9.2 [1]

mingw and mingw-w64 seem to be distinct projects with separate
release schedules.  The latter's webpage isn't too clear about
which gcc version is in each of their releases.  But they seem
to be organized enough to put out releases roughly annually,
so I'm supposing they aren't falling too far behind gcc upstream.

regards, tom lane




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-11-24 Thread Craig Ringer
On Wed, Nov 25, 2020 at 2:43 AM Alexey Kondratov 
wrote:

> On 2020-11-24 06:52, Bharath Rupireddy wrote:
> > Thanks for the review comments.
> >
> > On Mon, Nov 23, 2020 at 9:57 PM Alexey Kondratov
> >  wrote:
> >>
> >> > v1-0001-postgres_fdw-function-to-discard-cached-connections.patch
> >>
> >> This patch looks pretty straightforward for me, but there are some
> >> things to be addressed IMO:
> >>
> >> +   server = GetForeignServerByName(servername, true);
> >> +
> >> +   if (server != NULL)
> >> +   {
> >>
> >> Yes, you return a false if no server was found, but for me it worth
> >> throwing an error in this case as, for example, dblink does in the
> >> dblink_disconnect().
> >>
> >
> > dblink_disconnect() "Returns status, which is always OK (since any
> > error causes the function to throw an error instead of returning)."
> > This behaviour doesn't seem okay to me.
> >
> > Since we throw true/false, I would prefer to throw a warning(with a
> > reason) while returning false over an error.
> >
>
> I thought about something a bit more sophisticated:
>
> 1) Return 'true' if there were open connections and we successfully
> closed them.
> 2) Return 'false' in the no-op case, i.e. there were no open
> connections.
> 3) Rise an error if something went wrong. And non-existing server case
> belongs to this last category, IMO.
>
> That looks like a semantically correct behavior, but let us wait for any
> other opinion.
>
> >
> >>
> >> + result = disconnect_cached_connections(FOREIGNSERVEROID,
> >> +hashvalue,
> >> +false);
> >>
> >> +   if (all || (!all && cacheid == FOREIGNSERVEROID &&
> >> +   entry->server_hashvalue == hashvalue))
> >> +   {
> >> +   if (entry->conn != NULL &&
> >> +   !all && cacheid == FOREIGNSERVEROID &&
> >> +   entry->server_hashvalue == hashvalue)
> >>
> >> These conditions look bulky for me. First, you pass FOREIGNSERVEROID
> >> to
> >> disconnect_cached_connections(), but actually it just duplicates 'all'
> >> flag, since when it is 'FOREIGNSERVEROID', then 'all == false'; when
> >> it
> >> is '-1', then 'all == true'. That is all, there are only two calls of
> >> disconnect_cached_connections(). That way, it seems that we should
> >> keep
> >> only 'all' flag at least for now, doesn't it?
> >>
> >
> > I added cachid as an argument to disconnect_cached_connections() for
> > reusability. Say, someone wants to use it with a user mapping then
> > they can pass cacheid USERMAPPINGOID, hash value of user mapping. The
> > cacheid == USERMAPPINGOID && entry->mapping_hashvalue == hashvalue can
> > be added to disconnect_cached_connections().
> >
>
> Yeah, I have got your point and motivation to add this argument, but how
> we can use it? To disconnect all connections belonging to some specific
> user mapping? But any user mapping is hard bound to some foreign server,
> AFAIK, so we can pass serverid-based hash in this case.
>
> In the case of pgfdw_inval_callback() this argument makes sense, since
> syscache callbacks work that way, but here I can hardly imagine a case
> where we can use it. Thus, it still looks as a preliminary complication
> for me, since we do not have plans to use it, do we? Anyway, everything
> seems to be working fine, so it is up to you to keep this additional
> argument.
>
> >
> > v1-0003-postgres_fdw-server-level-option-keep_connection.patch
> > This patch adds a new server level option, keep_connection, default
> > being on, when set to off, the local session doesn't cache the
> > connections associated with the foreign server.
> >
>
> This patch looks good to me, except one note:
>
> (entry->used_in_current_xact &&
> -   !keep_connections))
> +   (!keep_connections || !entry->keep_connection)))
> {
>
> Following this logic:
>
> 1) If keep_connections == true, then per-server keep_connection has a
> *higher* priority, so one can disable caching of a single foreign
> server.
>
> 2) But if keep_connections == false, then it works like a global switch
> off indifferently of per-server keep_connection's, i.e. they have a
> *lower* priority.
>
> It looks fine for me, at least I cannot propose anything better, but
> maybe it should be documented in 0004?
>
> >
> >
> v1-0004-postgres_fdw-connection-cache-discard-tests-and-documentation.patch
> > This patch adds the tests and documentation related to this feature.
> >
>
> I have not read all texts thoroughly, but what caught my eye:
>
> +   A GUC, postgres_fdw.keep_connections, default
> being
> +   on, when set to off, the local
> session
>
> I think that GUC acronym is used widely only in the source code and
> Postgres docs tend to do not use it at all, except from acronyms list
> and a couple of 'GUC parameters' collocation usage. And it never used in
> a singular form the

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

2020-11-24 Thread Tom Lane
David Rowley  writes:
> So, how about the attached today and I'll email Joseph about walleye
> and see if he can upgrade to a newer minGW version.

WFM.  (Note I already cc'd Joseph on this thread.)

regards, tom lane




  1   2   >