Re: adding partitioned tables to publications

2019-10-21 Thread Amit Langote
Hi Petr,

Thanks for your comments.

On Sun, Oct 13, 2019 at 5:01 AM Petr Jelinek  wrote:
> On 07/10/2019 02:55, Amit Langote wrote:
> > One cannot currently add partitioned tables to a publication.
> >
> > create table p (a int, b int) partition by hash (a);
> > create table p1 partition of p for values with (modulus 3, remainder 0);
> > create table p2 partition of p for values with (modulus 3, remainder 1);
> > create table p3 partition of p for values with (modulus 3, remainder 2);
> >
> > create publication publish_p for table p;
> > ERROR:  "p" is a partitioned table
> > DETAIL:  Adding partitioned tables to publications is not supported.
> > HINT:  You can add the table partitions individually.
> >
> > One can do this instead:
> >
> > create publication publish_p1 for table p1;
> > create publication publish_p2 for table p2;
> > create publication publish_p3 for table p3;
>
> Or just create publication publish_p for table p1, p2, p3;

Yep, facepalm! :)

So, one doesn't really need as many publication objects as there are
partitions as my version suggests, which is good.  Although, as you
can tell, a user would still manually need to keep the set of
published partitions up to date, for example when new partitions are
added.

> > but maybe that's too much code to maintain for users.
> >
> > I propose that we make this command:
> >
> > create publication publish_p for table p;
> >
>
> +1
>
> > automatically add all the partitions to the publication.  Also, any
> > future partitions should also be automatically added to the
> > publication.  So, publishing a partitioned table automatically
> > publishes all of its existing and future partitions.  Attached patch
> > implements that.
> >
> > What doesn't change with this patch is that the partitions on the
> > subscription side still have to match one-to-one with the partitions
> > on the publication side, because the changes are still replicated as
> > being made to the individual partitions, not as the changes to the
> > root partitioned table.  It might be useful to implement that
> > functionality on the publication side, because it allows users to
> > define the replication target any way they need to, but this patch
> > doesn't implement that.
> >
>
> Yeah for that to work subscription would need to also need to be able to
> write to partitioned tables, so it needs both sides to add support for
> this.

Ah, I didn't know that the subscription code doesn't out-of-the-box
support tuple routing.  Indeed, we will need to fix that.

> I think if we do both what you did and the transparent handling of
> root only, we'll need new keyword to differentiate the two. It might
> make sense to think about if we want your way to need an extra keyword
> or the transparent one will need it.

I didn't think about that but maybe you are right.

> One issue that I see reading the patch is following set of commands:
>
> CREATE TABLE foo ...;
> CREATE PUBLICATION mypub FOR TABLE foo;
>
> CREATE TABLE bar ...;
> ALTER PUBLICATION mypub ADD TABLE bar;
>
> ALTER TABLE foo ATTACH PARTITION bar ...;
> ALTER TABLE foo DETACH PARTITION bar ...;
>
> This will end up with bar not being in any publication even though it
> was explicitly added.

I tested and bar continues to be in the publication with above steps:

create table foo (a int) partition by list (a);
create publication mypub for table foo;
create table bar (a int);
alter publication mypub add table bar;
\d bar
Table "public.bar"
 Column │  Type   │ Collation │ Nullable │ Default
┼─┼───┼──┼─
 a  │ integer │   │  │
Publications:
"mypub"

alter table foo attach partition bar for values in (1);
\d bar
Table "public.bar"
 Column │  Type   │ Collation │ Nullable │ Default
┼─┼───┼──┼─
 a  │ integer │   │  │
Partition of: foo FOR VALUES IN (1)
Publications:
"mypub"

-- can't now drop bar from mypub (its membership is no longer standalone)
alter publication mypub drop table bar;
ERROR:  cannot drop partition "bar" from an inherited publication
HINT:  Drop the parent from publication instead.

alter table foo detach partition bar;

-- bar is still in mypub (now a standalone member)
\d bar
Table "public.bar"
 Column │  Type   │ Collation │ Nullable │ Default
┼─┼───┼──┼─
 a  │ integer │   │  │
Publications:
"mypub"

-- ok to drop now from mypub
alter publication mypub drop table bar;

Thanks,
Amit




Re: dropdb --force

2019-10-21 Thread Pavel Stehule
Hi


> When you say 'without any problem', do you mean to say that the drop
> database is successful?  In your tests were those sessions idle?  If
> so, I think we should try with busy sessions.  Also, if you write any
> script to do these tests, kindly share the same so that others can
> also repeat those tests.
>

I run pgbench on database with -i -S 100 and -c 1000. It is maximum for my
notebook. There was high load - about 50, and still DROP DATABASE FORCE is
working without any problems



> >(under low load (only pg_sleep was called).
> >
>
> I guess this is also possible because immediately after
> TerminateOtherDBBackends, we call CountOtherDBBackends which again
> give 5s to allow active connections to die. I am wondering why not we
> call CountOtherDBBackends from TerminateOtherDBBackends after sending
> the SIGTERM to all the sessions that are connected to the database
> being dropped?  Currently, it looks odd that first, we wait for 100ms
> after sending the signal and then separately wait for 5s in another
> function.
>

I checked code, and would not to change calls. Now I think the code is well
readable and has logical sense. But we can decrease sleep in
TerminateOtherDBBackends from 100 ms to 5 ms (just to increase chance to be
all killed processes done, and then waiting in CountOtherDBBackends 100ms
will not be hit.


> Other comments:
> 1.
> diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
> index 26a0bcf718..c8f545be18 100644
> --- a/src/test/regress/sql/psql.sql
> +++ b/src/test/regress/sql/psql.sql
> @@ -1182,3 +1182,11 @@ drop role regress_partitioning_role;
>
>  -- \d on toast table (use pg_statistic's toast table, which has a known
> name)
>  \d pg_toast.pg_toast_2619
> +
> +--
> +-- DROP DATABASE FORCE test of syntax (should not to raise syntax error)
> +--
> +drop database not_exists (force);
> +drop database not_exists with (force);
> +drop database if exists not_exists (force);
> +drop database if exists not_exists with (force);
>
> I don't think psql.sql is the right place to add such tests.
> Actually, there doesn't appear to be any database specific test file.
> However, if we want to add to any existing file, then maybe
> drop_if_exits.sql could be a better place for these tests as compare
> to psql.sql.
>

moved


> 2.
> + if (nprepared > 0)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_IN_USE),
> + errmsg_plural("database \"%s\" is used by %d prepared transaction",
> +"database \"%s\" is used by %d prepared transactions",
> +nprepared,
> +get_database_name(databaseId), nprepared)));
>
> I think it is better if we mimic exactly what we have in the failure
> of  CountOtherDBBackends.
>

done

Regards

Pavel


> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
diff --git a/doc/src/sgml/ref/drop_database.sgml b/doc/src/sgml/ref/drop_database.sgml
index 3ac06c984a..9b9cabe71c 100644
--- a/doc/src/sgml/ref/drop_database.sgml
+++ b/doc/src/sgml/ref/drop_database.sgml
@@ -21,7 +21,11 @@ PostgreSQL documentation
 
  
 
-DROP DATABASE [ IF EXISTS ] name
+DROP DATABASE [ IF EXISTS ] name [ [ WITH ] ( option [, ...] ) ] 
+
+where option can be one of:
+
+FORCE
 
  
 
@@ -32,9 +36,11 @@ DROP DATABASE [ IF EXISTS ] name
DROP DATABASE drops a database. It removes the
catalog entries for the database and deletes the directory
containing the data.  It can only be executed by the database owner.
-   Also, it cannot be executed while you or anyone else are connected
-   to the target database.  (Connect to postgres or any
-   other database to issue this command.)
+   Also, it cannot be executed while you are connected to the target database.
+   (Connect to postgres or any other database to issue this
+   command.)
+   If anyone else is connected to the target database, this command will fail
+   unless you use the FORCE option described below.
   
 
   
@@ -64,6 +70,28 @@ DROP DATABASE [ IF EXISTS ] name
  
 

+
+   
+FORCE
+
+ 
+  Attempt to terminate all existing connections to the target database.
+  It doesn't terminate prepared transactions or active logical replication
+  slot(s).
+ 
+ 
+  This will fail, if current user has no permissions to terminate other
+  connections. Required permissions are the same as with 
+  pg_terminate_backend, described
+  in .
+
+  This will fail if we are not able to terminate connections or
+  when there are active prepared transactions or active logical replication
+  slots.
+ 
+
+   
+
   
  
 
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 01d66212e9..8e62359b4d 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -811,7 +811,7 @@ createdb_failure_callback(int code, Datum arg)
  * DROP DATABASE
  */
 void
-dropdb(const char *dbname, bool missing_ok)
+dropdb(const char *dbname, bool missing_ok, bool force)
 {

[BUG] standby node can not provide service even it replays all log files

2019-10-21 Thread Thunder
Hi hackers,
I found this issue when restart standby node and then try to connect it.
It return "psql: FATAL:  the database system is starting up".


The steps to reproduce this issue.
1.  Create a session to run uncommit_trans.sql
2.  Create the other session to do checkpoint
3.  Restart standby node.
4.  standby node can not provide service even it has replayed all log files.


I think the issue is in ProcArrayApplyRecoveryInfo function.
The standby state is in STANDBY_SNAPSHOT_PENDING, but the lastOverflowedXid is 
not committed.


Any idea to fix this issue?
Thanks.



uncommit_trans.sql
Description: Binary data


Incorrect relation locked at beginning of REINDEX CONCURRENTLY

2019-10-21 Thread Michael Paquier
Hi all,

While digging into the issues reported lately about REINDEX
CONCURRENTLY, I have bumped into the following, independent, issue:
/* Now open the relation of the new index, a lock is also needed on it */
newIndexRel = index_open(indexId, ShareUpdateExclusiveLock)

In this code path, indexId is the OID od the old index copied, and
newIndexId is the OID of the new index created.  So that's clearly
incorrect, and the comment even says the intention.  This causes for
example the same session lock to be taken twice on the old index, with
the new index remaining unprotected.

Any objections if I fix this issue as per the attached?
--
Michael
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index e9da06a9fa..824e221a28 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2960,8 +2960,11 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	indexId,
 	concurrentName);
 
-		/* Now open the relation of the new index, a lock is also needed on it */
-		newIndexRel = index_open(indexId, ShareUpdateExclusiveLock);
+		/*
+		 * Now open the relation of the new index, a lock is also needed
+		 * on it.
+		 */
+		newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
 
 		/*
 		 * Save the list of OIDs and locks in private context


signature.asc
Description: PGP signature


Re:[BUG] standby node can not provide service even it replays all log files

2019-10-21 Thread Thunder
Can we fix this issue like the following patch?


$git diff src/backend/access/transam/xlog.c
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 49ae97d4459..0fbdf6fd64a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8365,7 +8365,7 @@ CheckRecoveryConsistency(void)
 * run? If so, we can tell postmaster that the database is consistent 
now,
 * enabling connections.
 */
-   if (standbyState == STANDBY_SNAPSHOT_READY &&
+   if ((standbyState == STANDBY_SNAPSHOT_READY || standbyState == 
STANDBY_SNAPSHOT_PENDING) &&
!LocalHotStandbyActive &&
reachedConsistency &&
IsUnderPostmaster)






At 2019-10-21 15:40:24, "Thunder"  wrote:

Hi hackers,
I found this issue when restart standby node and then try to connect it.
It return "psql: FATAL:  the database system is starting up".


The steps to reproduce this issue.
1.  Create a session to run uncommit_trans.sql
2.  Create the other session to do checkpoint
3.  Restart standby node.
4.  standby node can not provide service even it has replayed all log files.


I think the issue is in ProcArrayApplyRecoveryInfo function.
The standby state is in STANDBY_SNAPSHOT_PENDING, but the lastOverflowedXid is 
not committed.


Any idea to fix this issue?
Thanks.






 

Re: dropdb --force

2019-10-21 Thread Amit Kapila
On Mon, Oct 21, 2019 at 12:24 PM Pavel Stehule  wrote:
>
> po 21. 10. 2019 v 8:38 odesílatel Amit Kapila  
> napsal:
>>
>> > If we don't wait in TerminateOtherDBBackends, then probably there should 
>> > be necessary some cycles inside CountOtherDBBackends. I think so code like 
>> > is correct
>> >
>> > 1. send SIGTERM to target processes
>> > 2. put some time to processes for logout (100ms)
>> > 3. check in loop (max 5 sec) on logout of all processes
>> >
>> > Maybe my feeling is wrong, but I think so it is good to wait few time 
>> > instead to call CountOtherDBBackends immediately - the first iteration 
>> > should to fail, and then first iteration is useless without chance on 
>> > success.
>> >
>>
>> I think the way I am suggesting by skipping the second step will allow
>> sleeping only when required.  Consider a case where there are just one
>> or two sessions connected to the database and they immediately exited
>> after the signal is sent.  In such a case you don't need to sleep at
>> all whereas, under your proposal, it will always sleep.  In the case
>> where a large number of sessions are present and the first 100ms are
>> not sufficient, we anyway need to wait dynamically.  So, I think the
>> second step not only looks odd but also seems to be redundant.
>
>
> I checked the code, and I think so calling  CountOtherDBBackends from 
> TerminateOtherDBBackends is not good idea. CountOtherDBBackends should be 
> called anywhere, TerminateOtherDBBackends only with FORCE flag. So I wouldn't 
> to change code.
>

Sorry, but I am not able to understand the reason.  Are you worried
about the comments atop CountOtherDBBackends which says it is used in
Drop Database and related commands?

> But I can (and I have not any problem with it) remove or significantly 
> decrease sleeping time in TerminateOtherDBBackends.
>
> 100 ms is maybe very much - but zero is maybe too low. If there will not be 
> any time between TerminateOtherDBBackends and CountOtherDBBackends, then 
> probably CountOtherDBBackends hit waiting 100ms.
>
> What about only 5 ms sleeping in TerminateOtherDBBackends?
>

I am not completely sure about what is the most appropriate thing to
do, but I favor removing sleep from TerminateOtherDBBackends.  OTOH,
there is nothing broken with the logic.  Anyone else wants to weigh in
here?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Fix comment in XLogFileInit()

2019-10-21 Thread Amit Kapila
On Mon, Oct 21, 2019 at 12:28 PM Fujii Masao  wrote:
>
> I found that the argument name of XLogFileInit() is wrong in its comment.
> Attached is the patch that fixes that typo.
>

LGTM.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Obsolete comment in partbounds.c

2019-10-21 Thread Etsuro Fujita
On Sat, Oct 19, 2019 at 5:56 PM Etsuro Fujita  wrote:
> On Fri, Oct 18, 2019 at 6:56 PM Alvaro Herrera  
> wrote:
> > On 2019-Oct-18, Etsuro Fujita wrote:
> > > While reviewing the partitionwise-join patch, I noticed $Subject,ie,
> > > this in create_list_bounds():
> > >
> > > /*
> > >  * Never put a null into the values array, flag instead 
> > > for
> > >  * the code further down below where we construct the 
> > > actual
> > >  * relcache struct.
> > >  */
> > > if (null_index != -1)
> > > elog(ERROR, "found null more than once");
> > > null_index = i;
> > >
> > > "the code further down below where we construct the actual relcache
> > > struct" isn't in the same file anymore by refactoring by commit
> > > b52b7dc25.  How about modifying it like the attached?
> >
> > Yeah, agreed.  Instead of "the null comes from" I would use "the
> > partition that stores nulls".
>
> I think your wording is better than mine.  Thank you for reviewing!

I applied the patch down to PG12.

Best regards,
Etsuro Fujita




Re: dropdb --force

2019-10-21 Thread Pavel Stehule
po 21. 10. 2019 v 10:25 odesílatel Amit Kapila 
napsal:

> On Mon, Oct 21, 2019 at 12:24 PM Pavel Stehule 
> wrote:
> >
> > po 21. 10. 2019 v 8:38 odesílatel Amit Kapila 
> napsal:
> >>
> >> > If we don't wait in TerminateOtherDBBackends, then probably there
> should be necessary some cycles inside CountOtherDBBackends. I think so
> code like is correct
> >> >
> >> > 1. send SIGTERM to target processes
> >> > 2. put some time to processes for logout (100ms)
> >> > 3. check in loop (max 5 sec) on logout of all processes
> >> >
> >> > Maybe my feeling is wrong, but I think so it is good to wait few time
> instead to call CountOtherDBBackends immediately - the first iteration
> should to fail, and then first iteration is useless without chance on
> success.
> >> >
> >>
> >> I think the way I am suggesting by skipping the second step will allow
> >> sleeping only when required.  Consider a case where there are just one
> >> or two sessions connected to the database and they immediately exited
> >> after the signal is sent.  In such a case you don't need to sleep at
> >> all whereas, under your proposal, it will always sleep.  In the case
> >> where a large number of sessions are present and the first 100ms are
> >> not sufficient, we anyway need to wait dynamically.  So, I think the
> >> second step not only looks odd but also seems to be redundant.
> >
> >
> > I checked the code, and I think so calling  CountOtherDBBackends from
> TerminateOtherDBBackends is not good idea. CountOtherDBBackends should be
> called anywhere, TerminateOtherDBBackends only with FORCE flag. So I
> wouldn't to change code.
> >
>
> Sorry, but I am not able to understand the reason.  Are you worried
> about the comments atop CountOtherDBBackends which says it is used in
> Drop Database and related commands?
>

no, just now the code in dropdb looks like

if (force)
TerminateOtherDBBackends(...);

CountOtherDBBackends(...);

if I call CountOtherDBBackends from TerminateOtherDBBackends, then code
will look like

if (force)
  TerminateOtherDBBackends(...);
else
  CountOtherDBBackends(...);

That looks like CountOtherDBBackends is not called when force clause is
active. And this is not true.

So I prefer current relations between routines.




> > But I can (and I have not any problem with it) remove or significantly
> decrease sleeping time in TerminateOtherDBBackends.
> >
> > 100 ms is maybe very much - but zero is maybe too low. If there will not
> be any time between TerminateOtherDBBackends and CountOtherDBBackends, then
> probably CountOtherDBBackends hit waiting 100ms.
> >
> > What about only 5 ms sleeping in TerminateOtherDBBackends?
> >
>
> I am not completely sure about what is the most appropriate thing to
> do, but I favor removing sleep from TerminateOtherDBBackends.  OTOH,
> there is nothing broken with the logic.  Anyone else wants to weigh in
> here?
>

ok. But when I remove it, should not be better to set waiting in
CountOtherDBBackends to some smaller number than 100ms?

Pavel

>
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Andrey Borodin
Hi!

> 18 окт. 2019 г., в 13:21, Dilip Kumar  написал(а):
> 
> On Fri, Oct 18, 2019 at 10:55 AM Amit Kapila  wrote:
>> 
>> 
>> I think we can do it in general as adding some check for parallel
>> vacuum there will look bit hackish.
> I agree with that point.
> It is not clear if we get enough
>> benefit by keeping it for cleanup phase of the index as discussed in
>> emails above.  Heikki, others, let us know if you don't agree here.
> 
> I have prepared a first version of the patch.  Currently, I am
> performing an empty page deletion for all the cases.

I've took a look into the patch, and cannot understand one simple thing...
We should not call gistvacuum_delete_empty_pages() for same gist_stats twice.
Another way once the function is called we should somehow update or zero 
empty_leaf_set.
Does this invariant hold in your patch?

Best regards, Andrey Borodin.



Re: pglz performance

2019-10-21 Thread Andrey Borodin



> 28 сент. 2019 г., в 10:29, Andrey Borodin  написал(а):
> 
> I hope to benchmark decompression on Silesian corpus soon.

I've done it. And results are quite controversial.
Dataset adds 12 payloads to our 5. Payloads have relatively high entropy. In 
many cases pglz cannot compress them at all, so decompression is nop, data is 
stored as is.

Decompressor pglz_decompress_hacked result 48.281747
Decompressor pglz_decompress_hacked8 result 33.868779
Decompressor pglz_decompress_vanilla result 42.510165

Tested on Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

With Silesian corpus pglz_decompress_hacked is actually decreasing performance 
on high-entropy data.
Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress.
In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer 
option.

I've updated test suite [0] and anyone interested can verify benchmarks.

--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud

[0] https://github.com/x4m/test_pglz



Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Dilip Kumar
On Mon, Oct 21, 2019 at 2:30 PM Andrey Borodin  wrote:
>
> Hi!
>
> > 18 окт. 2019 г., в 13:21, Dilip Kumar  написал(а):
> >
> > On Fri, Oct 18, 2019 at 10:55 AM Amit Kapila  
> > wrote:
> >>
> >>
> >> I think we can do it in general as adding some check for parallel
> >> vacuum there will look bit hackish.
> > I agree with that point.
> > It is not clear if we get enough
> >> benefit by keeping it for cleanup phase of the index as discussed in
> >> emails above.  Heikki, others, let us know if you don't agree here.
> >
> > I have prepared a first version of the patch.  Currently, I am
> > performing an empty page deletion for all the cases.
>
> I've took a look into the patch, and cannot understand one simple thing...
> We should not call gistvacuum_delete_empty_pages() for same gist_stats twice.
> Another way once the function is called we should somehow update or zero 
> empty_leaf_set.
> Does this invariant hold in your patch?
>
Thanks for looking into the patch.   With this patch now
GistBulkDeleteResult is local to single gistbulkdelete call or
gistvacuumcleanup.  So now we are not sharing GistBulkDeleteResult,
across the calls so I am not sure how it will be called twice for the
same gist_stats?  I might be missing something here?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-21 Thread Amit Kapila
On Mon, Oct 21, 2019 at 10:48 AM Dilip Kumar  wrote:
>
> On Fri, Oct 18, 2019 at 5:32 PM Amit Kapila  wrote:
>
> > 3.
> > @@ -2479,7 +2480,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
> > ReorderBufferTXN *txn)
> >
> >   /* update the statistics */
> >   rb->spillCount += 1;
> > - rb->spillTxns += txn->serialized ? 1 : 0;
> > + rb->spillTxns += txn->serialized ? 0 : 1;
> >   rb->spillBytes += size;
> >
> > Why is this change required?  Shouldn't we increase the spillTxns
> > count only when the txn is serialized?
>
> Prior to this change it was increasing the rb->spillTxns, every time
> we try to serialize the changes of the transaction.  Now, only we
> increase first time when it is not yet serialized.
>
> >
> > 3.
> > ReorderBufferSerializeTXN()
> > {
> > ..
> > /* update the statistics */
> > rb->spillCount += 1;
> > rb->spillTxns += txn->serialized ? 0 : 1;
> > rb->spillBytes += size;
> >
> > Assert(spilled == txn->nentries_mem);
> > Assert(dlist_is_empty(&txn->changes));
> > txn->nentries_mem = 0;
> > txn->serialized = true;
> > ..
> > }
> >
> > I am not able to understand the above code.  We are setting the
> > serialized parameter a few lines after we check it and increment the
> > spillTxns count. Can you please explain it?
>
> Basically, when the first time we attempt to serialize a transaction,
> txn->serialized will be false, that time we will increment the
> rb->spillTxns and after that set txn->serialized to true.  From next
> time onwards if we try to serialize the same transaction we will not
> increment the rb->spillTxns so that we count each transaction only
> once.
>

Your explanation for both the above comments makes sense to me.  Can
you please add some comments along these lines because it is not
apparent why one wants to increase the spillTxns counter when
txn->serialized is false?

> >
> > Also, isn't spillTxns count bit confusing, because in some cases it
> > will include subtransactions and other cases (where the largest picked
> > transaction is a subtransaction) it won't include it?
>
> I did not understand your comment completely.  Basically,  every
> transaction which we are serializing we will increase the count first
> time right? whether it is the main transaction or the sub-transaction.
>

It was not clear to me earlier whether we always increase the
spillTxns counter for subtransactions or not.  But now, looking at
code carefully, it is clear that is it is getting increased in every
case.  In short, you don't need to do anything for this comment.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Andrey Borodin



> 21 окт. 2019 г., в 11:12, Dilip Kumar  написал(а):
> 
> On Mon, Oct 21, 2019 at 2:30 PM Andrey Borodin  wrote:
>> 
>> I've took a look into the patch, and cannot understand one simple thing...
>> We should not call gistvacuum_delete_empty_pages() for same gist_stats twice.
>> Another way once the function is called we should somehow update or zero 
>> empty_leaf_set.
>> Does this invariant hold in your patch?
>> 
> Thanks for looking into the patch.   With this patch now
> GistBulkDeleteResult is local to single gistbulkdelete call or
> gistvacuumcleanup.  So now we are not sharing GistBulkDeleteResult,
> across the calls so I am not sure how it will be called twice for the
> same gist_stats?  I might be missing something here?

Yes, you are right, sorry for the noise.
Currently we are doing both gistvacuumscan() and 
gistvacuum_delete_empty_pages() in both gistbulkdelete() and 
gistvacuumcleanup(). Is it supposed to be so? Functions gistbulkdelete() and 
gistvacuumcleanup() look very similar and share some comments. This is what 
triggered my attention.

Thanks!

--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud





Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-21 Thread Dilip Kumar
On Mon, Oct 21, 2019 at 2:50 PM Amit Kapila  wrote:
>
> On Mon, Oct 21, 2019 at 10:48 AM Dilip Kumar  wrote:
> >
> > On Fri, Oct 18, 2019 at 5:32 PM Amit Kapila  wrote:
> >
> > > 3.
> > > @@ -2479,7 +2480,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
> > > ReorderBufferTXN *txn)
> > >
> > >   /* update the statistics */
> > >   rb->spillCount += 1;
> > > - rb->spillTxns += txn->serialized ? 1 : 0;
> > > + rb->spillTxns += txn->serialized ? 0 : 1;
> > >   rb->spillBytes += size;
> > >
> > > Why is this change required?  Shouldn't we increase the spillTxns
> > > count only when the txn is serialized?
> >
> > Prior to this change it was increasing the rb->spillTxns, every time
> > we try to serialize the changes of the transaction.  Now, only we
> > increase first time when it is not yet serialized.
> >
> > >
> > > 3.
> > > ReorderBufferSerializeTXN()
> > > {
> > > ..
> > > /* update the statistics */
> > > rb->spillCount += 1;
> > > rb->spillTxns += txn->serialized ? 0 : 1;
> > > rb->spillBytes += size;
> > >
> > > Assert(spilled == txn->nentries_mem);
> > > Assert(dlist_is_empty(&txn->changes));
> > > txn->nentries_mem = 0;
> > > txn->serialized = true;
> > > ..
> > > }
> > >
> > > I am not able to understand the above code.  We are setting the
> > > serialized parameter a few lines after we check it and increment the
> > > spillTxns count. Can you please explain it?
> >
> > Basically, when the first time we attempt to serialize a transaction,
> > txn->serialized will be false, that time we will increment the
> > rb->spillTxns and after that set txn->serialized to true.  From next
> > time onwards if we try to serialize the same transaction we will not
> > increment the rb->spillTxns so that we count each transaction only
> > once.
> >
>
> Your explanation for both the above comments makes sense to me.  Can
> you please add some comments along these lines because it is not
> apparent why one wants to increase the spillTxns counter when
> txn->serialized is false?
Ok, I will add comments in the next patch.
>
> > >
> > > Also, isn't spillTxns count bit confusing, because in some cases it
> > > will include subtransactions and other cases (where the largest picked
> > > transaction is a subtransaction) it won't include it?
> >
> > I did not understand your comment completely.  Basically,  every
> > transaction which we are serializing we will increase the count first
> > time right? whether it is the main transaction or the sub-transaction.
> >
>
> It was not clear to me earlier whether we always increase the
> spillTxns counter for subtransactions or not.  But now, looking at
> code carefully, it is clear that is it is getting increased in every
> case.  In short, you don't need to do anything for this comment.
ok

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Dilip Kumar
On Mon, Oct 21, 2019 at 2:58 PM Andrey Borodin  wrote:
>
>
>
> > 21 окт. 2019 г., в 11:12, Dilip Kumar  написал(а):
> >
> > On Mon, Oct 21, 2019 at 2:30 PM Andrey Borodin  wrote:
> >>
> >> I've took a look into the patch, and cannot understand one simple thing...
> >> We should not call gistvacuum_delete_empty_pages() for same gist_stats 
> >> twice.
> >> Another way once the function is called we should somehow update or zero 
> >> empty_leaf_set.
> >> Does this invariant hold in your patch?
> >>
> > Thanks for looking into the patch.   With this patch now
> > GistBulkDeleteResult is local to single gistbulkdelete call or
> > gistvacuumcleanup.  So now we are not sharing GistBulkDeleteResult,
> > across the calls so I am not sure how it will be called twice for the
> > same gist_stats?  I might be missing something here?
>
> Yes, you are right, sorry for the noise.
> Currently we are doing both gistvacuumscan() and 
> gistvacuum_delete_empty_pages() in both gistbulkdelete() and 
> gistvacuumcleanup(). Is it supposed to be so?

There was an issue discussed in parallel vacuum thread[1], and for
solving that it has been discussed in this thread[2] that we can
delete empty pages in bulkdelete phase itself.  But, that does not
mean that we can remove that from the gistvacuumcleanup phase.
Because if the gistbulkdelete is not at all called in the vacuum pass
then gistvacuumcleanup, will perform both gistvacuumscan and
gistvacuum_delete_empty_pages.  In short, In whichever pass, we detect
the empty page in the same pass we delete the empty page.

Functions gistbulkdelete() and gistvacuumcleanup() look very similar
and share some comments. This is what triggered my attention.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JEQ2y3uNucNopDjK8pse6xSe5%3D_oknoWfRQvAF%3DVqsBA%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/69EF7B88-F3E7-4E09-824D-694CF39E5683%40iki.fi

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-21 Thread Amit Kapila
On Sat, Oct 19, 2019 at 9:14 PM Stephen Frost  wrote:
>
> * Andres Freund (and...@anarazel.de) wrote:
>
> > Especially not when the person suggesting to do so isn't
> > even doing the leg work to estimate the portability issues.
>
> I figured it was common knowledge that gcc/clang supported it just fine,
> which covers something like 90% of the buildfarm.  I haven't got easy
> access to check others.
>

I have tried {} on Windows (MSVC-2017) and it is giving compilation error:

>\src\backend\access\transam\commit_ts.c(425): error C2059: syntax error: '}'
1>\src\backend\access\transam\commit_ts.c(426): error C2059: syntax error: '}'

The changed code looks like below:
Datum
pg_last_committed_xact(PG_FUNCTION_ARGS)
{
..
Datum values[2] = {};
bool nulls[2] = {};
..
}

Does this put an end to the option of using {} or do we want to
investigate something more?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: configure fails for perl check on CentOS8

2019-10-21 Thread Andrew Dunstan


On 10/20/19 7:36 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 10/20/19 1:23 PM, Tom Lane wrote:
>>> The right way to fix it, likely, is to add CFLAGS_SL while performing this
>>> particular autoconf test, as that would replicate the switches used for
>>> plperl (and it turns out that adding -fPIC does solve this problem).
>>> But the configure script doesn't currently know about CFLAGS_SL, so we'd
>>> have to do some refactoring to approach it that way.  Moving that logic
>>> from the platform-specific Makefiles into configure doesn't seem
>>> unreasonable, but it would make the patch bigger.
>> Sounds like a plan. I agree it's annoying to have to do something large
>> for something so trivial.
> Turns out it's not really that bad.  We just have to transfer the
> responsibility for setting CFLAGS_SL from the platform Makefiles
> to the platform template files.  (As a bonus, it'd be possible to
> allow users to override CFLAGS_SL during configure, as they can
> do for CFLAGS.  But I didn't mess with that here.)
>
> I checked that this fixes the Fedora build problem, but I've not
> really tested it on any other platform.  Still, there's not that
> much to go wrong, one would think.
>
>   



LGTM


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [HACKERS] Arrays of domains

2019-10-21 Thread Andres Freund
Hi,

On 2017-09-29 13:10:35 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 09/28/2017 05:44 PM, Tom Lane wrote:
> >> Assuming that that's going to happen for v11, I'm inclined to leave the
> >> optimization problem until the dust settles around CaseTestExpr.
> >> Does anyone feel that this can't be committed before that's addressed?
> 
> > I'm Ok with it as long as we don't forget to revisit this.
> 
> I decided to go ahead and build a quick optimization for this case,
> as per the attached patch that applies on top of what we previously
> discussed.  It brings us back to almost par with HEAD:
> 
>   HEADPatch   + 04.patch
> 
> Q15453.235 ms 5440.876 ms 5407.965 ms
> Q29340.670 ms 10191.194 ms9407.093 ms
> Q319078.457 ms18967.279 ms19050.392 ms
> Q448196.338 ms58547.531 ms48696.809 ms
> 
> Unless Andres feels this is too ugly to live, I'm inclined to commit
> the patch with this addition.  If we don't get around to revisiting
> CaseTestExpr, I think this is OK, and if we do, this will make sure
> that we consider this case in the revisit.

I didn't see this at the time, unfortunately. I'm architecturally
bothered by recursively invoking expression evaluation, but not really
by using CaseTestExpr. I've spent a lot of energy making expression
evaluation non-recursive, and it's also a requirement for a number of
further improvements.

On a read of the thread I didn't find anything along those lines, but
did you consider not using a separate expression state for the
per-element conversion? Something like

EEOP_ARRAYCOERCE_UNPACK
... conversion operations
... including
EEOP_CASE_TESTVAL
... and other things
EEOP_ARRAYCOERCE_PACK

where _UNPACK would set up the ArrayMapState, newly including an
array_iter, and stage the "source" array element for the CaseTest. _PACK
would put processed element into the values array. If _PACK sees there's
further elements, it sets up the new value for the TESTVAL, and jumps to
the step after UNPACK. Otherwise it builds the array and continues.

While that means we'd introduce backward jumps, it'd avoid needing an
expression eval startup for each element, which is a quite substantial
win.  It also avoids needing memory from two different expression
contexts, which is what I'd like to avoid right now.

It seems to me that we practically can be certain that the
EEOP_CASE_TESTVAL will be the first step after the
EEOP_ARRAYCOERCE_UNPACK, and that we therefore actually wouldn't ever
need it. The only reason to have the CaseTestExpr really is that it's
otherwise hard to represent the source of the conversion expression in
the expression tree form. At least I don't immediately see a good way to
do so without it.  I wonder if it's worth to just optimize it away
during expression "compilation", it's actually easier to understand that
way.

Greetings,

Andres Freund




Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread Andrew Dunstan


On 10/21/19 2:07 AM, Tomas Vondra wrote:
> On Sun, Oct 20, 2019 at 06:51:05PM -0400, Andrew Dunstan wrote:
>>
>>> I think the general premise of this thread is that the application
>>> developer does not realize that may be necessary, because it's a bit
>>> surprising behavior, particularly when having more experience with
>>> other
>>> databases that behave differently. It's also pretty easy to not notice
>>> this issue for a long time, resulting in significant data loss.
>>>
>>> Let's say you're used to the MSSQL or MySQL behavior, you migrate your
>>> application to PostgreSQL or whatever - how do you find out about this
>>> behavior? Users are likely to visit
>>>
>>>    https://www.postgresql.org/docs/12/functions-json.html
>>>
>>> but that says nothing about how jsonb_set works with NULL values :-(
>>
>>
>>
>> We should certainly fix that. I accept some responsibility for the
>> omission.
>>
>
> +1
>
>


So let's add something to the JSON funcs page  like this:


Note: All the above functions except for json_build_object,
json_build_array, json_to_recordset, json_populate_record, and
json_populate_recordset and their jsonb equivalents are strict
functions. That is, if any argument is NULL the function result will be
NULL and the function won't even be called. Particular care should
therefore be taken to avoid passing NULL arguments to those functions
unless a NULL result is expected. This is particularly true of the
jsonb_set and jsonb_insert functions.



(We do have a heck of a lot of Note: sections on that page)


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread Adrian Klaver

On 10/20/19 11:07 PM, Tomas Vondra wrote:

On Sun, Oct 20, 2019 at 06:51:05PM -0400, Andrew Dunstan wrote:




True. And AFAIK catching exceptions is not really possible in some code,
e.g. in stored procedures (because we can't do subtransactions, so no
exception blocks).



Can you explain the above to me as I thought there are exception blocks 
in stored functions and now sub-transactions in stored procedures.




--
Adrian Klaver
adrian.kla...@aklaver.com




intermittent test failure on Windows

2019-10-21 Thread Andrew Dunstan


Bowerbird (Visual Studio 2017 / Windows 10 pro) just had a failure on
the pg_ctl test :



There was a similar failure 17 days ago.


I surmise that what's happening here is that the test is trying to read
current_logfiles while the server is writing it, so there's a race
condition.


Perhaps what we need to do is have slurp_file sleep a bit and try again
on Windows if it gets EPERM, or else we need to have the pg_ctl test
wait a bit before calling slurp_file. But we have seen occasional
similar failures on other tests in Windows so a more systemic approach
might be better.


Thoughts?


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Declaring a strict function returns not null / eval speed

2019-10-21 Thread Andres Freund
Hi,

On 2019-10-20 10:27:19 -0400, Tom Lane wrote:
> "RETURNS NOT NULL", perhaps?  That'd have the advantage of not requiring
> any new keyword.

That could work.


> I'm a little bit skeptical of the actual value of adding this additional
> level of complexity, but I suppose we can't determine that reliably
> without doing most of the work :-(

Depends a bit on what case we're concerned about improving. What brought
me onto this was the concern that actually a good bit of the overhead of
computing aggregate transition functions is often the checks whether the
transition value has become NULL. And that for a lot of the more common
aggregates that's unnecessary, as they'll never do so.  That case is
pretty easy to test, we can just stop generating the relevant expression
step and do a few micro benchmarks.

Obviously for the planner taking advantage of that fact, it's more work...

Greetings,

Andres Freund




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-21 Thread Joe Nelson
> > I don't understand why this is an issue worth deviating from the
> > standard for.
> 
> Because this use and the way the standard is defined in this case is
> confusing and could lead later hackers to misunderstand what's going on
> and end up creating bugs-

The two possible misunderstandings seem to be:

1. how 0 is interpreted in various contexts such as bool
2. that the x in {x} applies to only the first element

IMHO we should expect people to be familiar with (1), and we have the
INIT_ALL_ELEMS_ZERO macro to avoid (2). However the more I look at the
code using that macro the less I like it. The {0} initializer is more
idiomatic.

My vote would be to use {0} everywhere and avoid constructions like
{false} which might exacerbate misunderstanding (2).

> I figured it was common knowledge that gcc/clang supported it just fine,
> which covers something like 90% of the buildfarm.  I haven't got easy
> access to check others.

As Amit pointed out, {} doesn't work with MSVC-2017, nor is there any
reason it should, given that it isn't part of the C standard.




Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread David G. Johnston
On Sun, Oct 20, 2019 at 3:51 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

> I'm not arguing against the idea of improving the situation. But I am
> arguing against a minimal fix that will not provide much of value to a
> careful app developer. i.e. I want to do more to support app devs.
> Ideally they would not need to use wrapper functions. There will be
> plenty of situations where it is mighty inconvenient to catch an
> exception thrown by jsonb_set(). And catching exceptions can be
> expensive. You want to avoid that if possible in your
> performance-critical plpgsql code.
>

As there is pretty much nothing that can be done at runtime if this
exception is raised actually "catching" it anywhere deeper than near the
top of the application code is largely pointless.  Its more like a
NullPointerException in Java - if the application raises it there should be
a last line of defense error handler that basically says "you developer
made a mistake somewhere and needs to fix it - tell them this happened".

Performance critical subsections (and pretty much the whole) of the
application can just raise the error to the caller using normal mechanisms
for "SQLException" propogation.

David J.


Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-21 Thread Chapman Flack
On 10/21/19 11:25 AM, Joe Nelson wrote:
> we have the
> INIT_ALL_ELEMS_ZERO macro to avoid (2). However the more I look at the
> code using that macro the less I like it. The {0} initializer is more
> idiomatic.

If faced with the two questions:

1. which of a or b is more "clear" ?
2. which of a or b is more "idiomatic" ?

I think I would feel on more solid ground opining on (1),
where wrt (2) I would feel a little muzzier trying to say
what the question means.

It seems to me that idioms are common bits of usage that take off
because they're widely recognized as saying a specific thing
efficiently and clearly.

On that score, I'm not sure {0} really makes a good idiom ... indeed,
it seems this conversation is largely about whether it /looks/ too
much like an idiom, and to some readers could appear to be saying
something efficiently and clearly but that isn't quite what it means.

I would favor {} in a heartbeat if it were standard, because that
sucker is an idiom.

Failing that, though, I think I still favor the macro, because
question (1) seems less fuzzy than question (2), and on "clear",
the macro wins.

Regards,
-Chap




Re: Re: SQL/JSON: functions

2019-10-21 Thread Andrew Alsup

On 9/27/19 9:42 PM, Nikita Glukhov wrote:

Attached 39th version of the patches rebased onto current master.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

I am unable to cleanly apply this patch.
I've tried applying to the current master (4f4061b2dd)
I've also tried apply to commit b81a9c2fc5 back through 709d003fbd, with 
no success.


Can you please tell me to which commit I can apply the patch? Is the 
following command correct for applying the patch files?


`patch -p1 < ~/Downloads/0001-Jsonpath-support-for-json-v39.patch`

Thanks for your help,
Andrew Alsup





Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread Steven Pousty
On Sun, Oct 20, 2019 at 4:31 PM raf  wrote:

> Steven Pousty wrote:
>
> > I would think though that raising an exception is better than a
> > default behavior which deletes data.
>
> I can't help but feel the need to make the point that
> the function is not deleting anything. It is just
> returning null. The deletion of data is being performed
> by an update statement that uses the function's return
> value to set a column value.
>
> I don't agree that raising an exception in the function
> is a good idea (perhaps unless it's valid to assume
> that this function will only ever be used in such a
> context). Making the column not null (as already
> suggested) and having the update statement itself raise
> the exception seems more appropriate if an exception is
> desirable. But that presumes an accurate understanding
> of the behaviour of jsonb_set.
>
> Really, I think the best fix would be in the
> documentation so that everyone who finds the function
> in the documentation understands its behaviour
> immediately.
>
>
>
Hey Raf

In a perfect world I would agree with you. But often users do not read ALL
the documentation before they use the function in their code OR they are
not sure that the condition applies to them (until it does).  Turning a
JSON null into a SQL null  and thereby "deleting" the data is not the path
of least surprises.

So while we could say reading the documentation is the proper path it is
not the most helpful path. I am not arguing against doc'ing the behavior no
matter what we decide on. What I am saying is an exception is better than
the current situation if we can't agree to any other solution. An exception
is better than just doc but probably not the best solution. (and it seems
like most other people have said as well but the lag on a mailing list is
getting us overlapping).

I see people saying Null pointer exceptions are not helpful. I mostly
agree, they are not the most helpful kind of exception BUT they are better
than some alternatives. So I think it would be better to say NPEs are not
as helpful as they possibly could be.


Re: SQL/JSON: functions

2019-10-21 Thread Nikita Glukhov



On 21.10.2019 19:00, Andrew Alsup wrote:

On 9/27/19 9:42 PM, Nikita Glukhov wrote:

Attached 39th version of the patches rebased onto current master.

I am unable to cleanly apply this patch.
I've tried applying to the current master (4f4061b2dd)
I've also tried apply to commit b81a9c2fc5 back through 709d003fbd, 
with no success.


Can you please tell me to which commit I can apply the patch? Is the 
following command correct for applying the patch files?


`patch -p1 < ~/Downloads/0001-Jsonpath-support-for-json-v39.patch`

Thanks for your help,
Andrew Alsup


v39 patch is based on 5ee96b3e2221d154ffcb719bd2dee1179c53f821

Use the following git command to apply patches:

git am ~/Downloads/0001-Jsonpath-support-for-json-v39.patch

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?

2019-10-21 Thread Robert Haas
On Sat, Oct 19, 2019 at 10:53 AM Andrew Dunstan
 wrote:
> > FWIW I'm not sure the "we don't want to upgrade application code at the
> > same time as the database" is really tenable.
>
> I'm -1 for exactly this reason.

-1 from me, too, also for this reason. I bet if we started looking
we'd find many changes every year that we could justify partially or
completely back-porting on similar grounds, and if we start doing
that, we'll certainly screw it up sometimes, turning what should have
been a smooth minor release upgrade process into one that breaks. And
we'll still not satisfy the people who don't want to upgrade the
application and the database at the same time, because there will
always be changes where nothing like this is remotely reasonable.

Also, we'll then have a lot more behavior differences between minor
releases, which sounds like a bad thing to me. In particular, nobody
will be happy if a pg_dump taken on version X.Y fails to restore on
version X.Z. But even apart from that, it just doesn't sound like a
good idea to have the user-facing behavior vary significantly across
minor releases...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [BUG] standby node can not provide service even it replays all log files

2019-10-21 Thread Robert Haas
On Mon, Oct 21, 2019 at 4:13 AM Thunder  wrote:
> Can we fix this issue like the following patch?
>
> $git diff src/backend/access/transam/xlog.c
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index 49ae97d4459..0fbdf6fd64a 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -8365,7 +8365,7 @@ CheckRecoveryConsistency(void)
>  * run? If so, we can tell postmaster that the database is consistent 
> now,
>  * enabling connections.
>  */
> -   if (standbyState == STANDBY_SNAPSHOT_READY &&
> +   if ((standbyState == STANDBY_SNAPSHOT_READY || standbyState == 
> STANDBY_SNAPSHOT_PENDING) &&
> !LocalHotStandbyActive &&
> reachedConsistency &&
> IsUnderPostmaster)

I think that the issue you've encountered is design behavior.  In
other words, it's intended to work that way.

The comments for the code you propose to change say that we can allow
connections once we've got a valid snapshot. So presumably the effect
of your change would be to allow connections even though we don't have
a valid snapshot.

That seems bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-21 Thread Isaac Morland
On Mon, 21 Oct 2019 at 11:46, Chapman Flack  wrote:

>
> I would favor {} in a heartbeat if it were standard, because that
> sucker is an idiom.
>
> Failing that, though, I think I still favor the macro, because
> question (1) seems less fuzzy than question (2), and on "clear",
> the macro wins.
>

Is it possible to define the macro to be {} where supported and {0} where
needed? Something like:

#if ...
#define INIT_ALL_ELEMS_ZERO {}
#else
#define INIT_ALL_ELEMS_ZERO {0}
#endif

Then it's clear the 0 is just there to make certain compilers happy and
doesn't have any actual meaning.


Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-21 Thread Joe Nelson
> Is it possible to define the macro to be {} where supported and {0} 
> where needed? Something like:

If it's being put behind a macro then *stylistically* it shouldn't
matter whether {} or {0} is chosen, right? In which case {0} would
be a better choice because it's supported everywhere.




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-21 Thread Stephen Frost
Greetings,

* Joe Nelson (j...@begriffs.com) wrote:
> > Is it possible to define the macro to be {} where supported and {0} 
> > where needed? Something like:
> 
> If it's being put behind a macro then *stylistically* it shouldn't
> matter whether {} or {0} is chosen, right? In which case {0} would
> be a better choice because it's supported everywhere.

The problem with {0} in the first place is that it doesn't actually work
in all cases...  Simple cases, yes, but not more complex ones.  It's
unfortunate that there isn't a general solution here that works across
platforms (even if it involved macros..), but that seems to be the case.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: intermittent test failure on Windows

2019-10-21 Thread Tom Lane
Andrew Dunstan  writes:
> Bowerbird (Visual Studio 2017 / Windows 10 pro) just had a failure on
> the pg_ctl test :
> 

> I surmise that what's happening here is that the test is trying to read
> current_logfiles while the server is writing it, so there's a race
> condition.

Hmm ... the server tries to replace current_logfiles atomically
with rename(), so this says that rename isn't atomic on Windows,
which we knew already.  Previous discussion (cf. commit d611175e5)
implies that an even worse failure condition is possible: the server
might fail to rename current_logfiles.tmp into place, just because
somebody is trying to read current_logfiles.  Ugh.

I found a thread about trying to make a really bulletproof rename()
for Windows:

https://www.postgresql.org/message-id/flat/CAPpHfds7dyuGZt%2BPF2GL9qSSVV0OZnjNwqiCPjN7mirDw882tA%40mail.gmail.com

but it looks like we gave up in disgust.

regards, tom lane




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-21 Thread Tom Lane
Stephen Frost  writes:
> * Joe Nelson (j...@begriffs.com) wrote:
>> If it's being put behind a macro then *stylistically* it shouldn't
>> matter whether {} or {0} is chosen, right? In which case {0} would
>> be a better choice because it's supported everywhere.

> The problem with {0} in the first place is that it doesn't actually work
> in all cases...  Simple cases, yes, but not more complex ones.  It's
> unfortunate that there isn't a general solution here that works across
> platforms (even if it involved macros..), but that seems to be the case.

There is a general solution that works across platforms; it's called
memset() and it's what we're using today.  I'm beginning to think that
we should just reject this patch.  It's certainly not enough of an
improvement to justify the amount of discussion that's gone into it.

regards, tom lane




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-21 Thread Andres Freund
On 2019-10-21 15:04:36 -0400, Tom Lane wrote:
> There is a general solution that works across platforms; it's called
> memset() and it's what we're using today.  I'm beginning to think that
> we should just reject this patch.  It's certainly not enough of an
> improvement to justify the amount of discussion that's gone into it.

bikeshedding vs reality of programming & efficiency: 1 : 0.




Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread Tomas Vondra

On Mon, Oct 21, 2019 at 08:06:46AM -0700, Adrian Klaver wrote:

On 10/20/19 11:07 PM, Tomas Vondra wrote:

On Sun, Oct 20, 2019 at 06:51:05PM -0400, Andrew Dunstan wrote:




True. And AFAIK catching exceptions is not really possible in some code,
e.g. in stored procedures (because we can't do subtransactions, so no
exception blocks).



Can you explain the above to me as I thought there are exception 
blocks in stored functions and now sub-transactions in stored 
procedures.




Sorry for the confusion - I've not been particularly careful when
writing that response.

Let me illustrate the issue with this example:

   CREATE TABLE t (a int);

   CREATE OR REPLACE PROCEDURE test() LANGUAGE plpgsql AS $$
   DECLARE
  msg TEXT;
   BEGIN
 -- SAVEPOINT s1;
 INSERT INTO t VALUES (1);
 -- COMMIT;
   EXCEPTION
 WHEN others THEN
   msg := SUBSTR(SQLERRM, 1, 100);
   RAISE NOTICE 'error: %', msg;
   END; $$;

   CALL test();

If you uncomment the SAVEPOINT, you get

   NOTICE:  error: unsupported transaction command in PL/pgSQL

because savepoints are not allowed in stored procedures. Fine.

If you uncomment the COMMIT, you get

   NOTICE:  error: cannot commit while a subtransaction is active

which happens because the EXCEPTION block creates a subtransaction, and
we can't commit when it's active.

But we can commit outside the exception block:

   CREATE OR REPLACE PROCEDURE test() LANGUAGE plpgsql AS $$
   DECLARE
  msg TEXT;
   BEGIN
 BEGIN
   INSERT INTO t VALUES (1);
 EXCEPTION
   WHEN others THEN
 msg := SUBSTR(SQLERRM, 1, 100);
 RAISE NOTICE 'error: %', msg;
  END;
  COMMIT;
   END; $$;


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread Adrian Klaver

On 10/21/19 12:50 PM, Tomas Vondra wrote:

On Mon, Oct 21, 2019 at 08:06:46AM -0700, Adrian Klaver wrote:

On 10/20/19 11:07 PM, Tomas Vondra wrote:

On Sun, Oct 20, 2019 at 06:51:05PM -0400, Andrew Dunstan wrote:




True. And AFAIK catching exceptions is not really possible in some code,
e.g. in stored procedures (because we can't do subtransactions, so no
exception blocks).



Can you explain the above to me as I thought there are exception 
blocks in stored functions and now sub-transactions in stored procedures.




Sorry for the confusion - I've not been particularly careful when
writing that response.

Let me illustrate the issue with this example:

    CREATE TABLE t (a int);

    CREATE OR REPLACE PROCEDURE test() LANGUAGE plpgsql AS $$
    DECLARE
   msg TEXT;
    BEGIN
  -- SAVEPOINT s1;
  INSERT INTO t VALUES (1);
  -- COMMIT;
    EXCEPTION
  WHEN others THEN
    msg := SUBSTR(SQLERRM, 1, 100);
    RAISE NOTICE 'error: %', msg;
    END; $$;

    CALL test();

If you uncomment the SAVEPOINT, you get

    NOTICE:  error: unsupported transaction command in PL/pgSQL

because savepoints are not allowed in stored procedures. Fine.

If you uncomment the COMMIT, you get

    NOTICE:  error: cannot commit while a subtransaction is active

which happens because the EXCEPTION block creates a subtransaction, and
we can't commit when it's active.

But we can commit outside the exception block:

    CREATE OR REPLACE PROCEDURE test() LANGUAGE plpgsql AS $$
    DECLARE
   msg TEXT;
    BEGIN
  BEGIN
    INSERT INTO t VALUES (1);
  EXCEPTION
    WHEN others THEN
  msg := SUBSTR(SQLERRM, 1, 100);
  RAISE NOTICE 'error: %', msg;
   END;
   COMMIT;
    END; $$;


You can do something like the below though:

CREATE TABLE t (a int PRIMARY KEY);

CREATE OR REPLACE PROCEDURE public.test()
 LANGUAGE plpgsql
AS $procedure$
   DECLARE
  msg TEXT;
   BEGIN
 BEGIN
   INSERT INTO t VALUES (1);
 EXCEPTION
   WHEN others THEN
 msg := SUBSTR(SQLERRM, 1, 100);
 RAISE NOTICE 'error: %', msg;
 UPDATE t set a = 2;
  END;
  COMMIT;
   END; $procedure$

test_(postgres)# CALL test();
CALL
test_(postgres)# select * from t;
 a
---
 1
(1 row)

test_(postgres)# CALL test();
NOTICE:  error: duplicate key value violates unique constraint "t_pkey"
CALL
test_(postgres)# select * from t;
 a
---
 2
(1 row)





regards




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread raf
Steven Pousty wrote:

> On Sun, Oct 20, 2019 at 4:31 PM raf  wrote:
> 
> > Steven Pousty wrote:
> >
> > > I would think though that raising an exception is better than a
> > > default behavior which deletes data.
> >
> > I can't help but feel the need to make the point that
> > the function is not deleting anything. It is just
> > returning null. The deletion of data is being performed
> > by an update statement that uses the function's return
> > value to set a column value.
> >
> > I don't agree that raising an exception in the function
> > is a good idea (perhaps unless it's valid to assume
> > that this function will only ever be used in such a
> > context). Making the column not null (as already
> > suggested) and having the update statement itself raise
> > the exception seems more appropriate if an exception is
> > desirable. But that presumes an accurate understanding
> > of the behaviour of jsonb_set.
> >
> > Really, I think the best fix would be in the
> > documentation so that everyone who finds the function
> > in the documentation understands its behaviour
> > immediately.
> >
> Hey Raf
> 
> In a perfect world I would agree with you. But often users do not read ALL
> the documentation before they use the function in their code OR they are
> not sure that the condition applies to them (until it does).

I'm well aware of that, hence the statement that this
information needs to appear at the place in the
documentation where the user is first going to
encounter the function (i.e. in the table where its
examples are). Even putting it in a note box further
down the page might not be enough (but hopefully it
will be).

cheers,
raf





logical replication empty transactions

2019-10-21 Thread Jeff Janes
After setting up logical replication of a slowly changing table using the
built in pub/sub facility, I noticed way more network traffic than made
sense.  Looking into I see that every transaction in that database on the
master gets sent to the replica.  99.999+% of them are empty transactions
('B' message and 'C' message with nothing in between) because the
transactions don't touch any tables in the publication, only non-replicated
tables.  Is doing it this way necessary for some reason?  Couldn't we hold
the transmission of 'B' until something else comes along, and then if that
next thing is 'C' drop both of them?

There is a comment for WalSndPrepareWrite which seems to foreshadow such a
thing, but I don't really see how to use it in this case.  I want to drop
two messages, not one.

 * Don't do anything lasting in here, it's quite possible that nothing will
be done
 * with the data.

This applies to all version which have support for pub/sub, including the
recent commits to 13dev.

I've searched through the voluminous mailing list threads for when this
feature was being presented to see if it was already discussed, but since
every word I can think to search on occurs in virtually every message in
the threads in some context or another, I didn't have much luck.

Cheers,

Jeff


Re: Re: SQL/JSON: functions

2019-10-21 Thread Andrew Alsup

On 10/21/19 12:44 PM, Nikita Glukhov wrote:


v39 patch is based on 5ee96b3e2221d154ffcb719bd2dee1179c53f821

Use the following git command to apply patches:

git am ~/Downloads/0001-Jsonpath-support-for-json-v39.patch



Thank you. The patch applied fine, with no errors.

Is this the type of testing that would be helpful? I plan to construct a 
number of separate queries to test more nuanced edge cases. This test 
simply compares the output of 4 separate sub-queries that should provide 
the same results.


SELECT
  CASE WHEN jop = jp_expr AND jop = jval AND jop = jval_path
    THEN 'pass'
    ELSE 'fail'
  END
FROM
  (
    -- jsonb operator
    SELECT count(*)
    FROM testjsonb
    WHERE j->>'abstract' LIKE 'A%'
  ) as jop,
  (
    -- jsonpath expression
    SELECT count(*)
    FROM testjsonb
    WHERE j @? '$.abstract ? (@ starts with "A")'
  ) as jp_expr,
  (
    -- json_value()
    SELECT count(*)
    FROM testjsonb
    WHERE JSON_VAlUE(j, 'lax $.abstract') LIKE 'A%'
  ) as jval,
  (
    -- json_value(jsonpath)
    SELECT count(*)
    FROM testjsonb
    WHERE JSON_VALUE(j, 'lax $.abstract ? (@ starts with "A")') IS NOT NULL
  ) as jval_path;


If I'm completely off base for how testing is normally conducted, please 
let me know.


Thanks,
Andrew Alsup




Re: dropdb --force

2019-10-21 Thread Amit Kapila
On Mon, Oct 21, 2019 at 2:15 PM Pavel Stehule  wrote:
>
> po 21. 10. 2019 v 10:25 odesílatel Amit Kapila  
> napsal:
>>
>>
>> Sorry, but I am not able to understand the reason.  Are you worried
>> about the comments atop CountOtherDBBackends which says it is used in
>> Drop Database and related commands?
>
>
> no, just now the code in dropdb looks like
>
> if (force)
> TerminateOtherDBBackends(...);
>
> CountOtherDBBackends(...);
>
> if I call CountOtherDBBackends from TerminateOtherDBBackends, then code will 
> look like
>
> if (force)
>   TerminateOtherDBBackends(...);
> else
>   CountOtherDBBackends(...);
>
> That looks like CountOtherDBBackends is not called when force clause is 
> active. And this is not true.
>

Hmm, can't we pass force as a parameter to TerminateOtherDBBackends()
and then take the decision inside that function?  That will make the
code of dropdb function a bit simpler.

> So I prefer current relations between routines.
>
>
>
>>
>> > But I can (and I have not any problem with it) remove or significantly 
>> > decrease sleeping time in TerminateOtherDBBackends.
>> >
>> > 100 ms is maybe very much - but zero is maybe too low. If there will not 
>> > be any time between TerminateOtherDBBackends and CountOtherDBBackends, 
>> > then probably CountOtherDBBackends hit waiting 100ms.
>> >
>> > What about only 5 ms sleeping in TerminateOtherDBBackends?
>> >
>>
>> I am not completely sure about what is the most appropriate thing to
>> do, but I favor removing sleep from TerminateOtherDBBackends.  OTOH,
>> there is nothing broken with the logic.  Anyone else wants to weigh in
>> here?
>
>
> ok. But when I remove it, should not be better to set waiting in  
> CountOtherDBBackends to some smaller number than 100ms?
>

CountOtherDBBackends is called from other places as well, so I don't
think it is advisable to change the sleep time in that function.
Also, I don't want to add a parameter for it.  I think you have a
point that in some cases we might end up sleeping for 100ms when we
could do with less sleeping time, but I think it is true to some
extent today as well.  I think we can anyway change it in the future
if there is a problem with the sleep timing, but for now, I think we
can just call CountOtherDBBackends after sending SIGTERM and call it
good.  You might want to add a futuristic note in the code.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Amit Kapila
On Fri, Oct 18, 2019 at 4:51 PM Dilip Kumar  wrote:
>
> I have prepared a first version of the patch.  Currently, I am
> performing an empty page deletion for all the cases.
>

Few comments:
--
1.
-/*
- * State kept across vacuum stages.
- */
 typedef struct
 {
- IndexBulkDeleteResult stats; /* must be first */
+ IndexBulkDeleteResult *stats; /* kept across vacuum stages. */

  /*
- * These are used to memorize all internal and empty leaf pages in the 1st
- * vacuum stage.  They are used in the 2nd stage, to delete all the empty
- * pages.
+ * These are used to memorize all internal and empty leaf pages. They are
+ * used for deleting all the empty pages.
  */
  IntegerSet *internal_page_set;
  IntegerSet *empty_leaf_set;

Now, if we don't want to share the remaining stats across
gistbulkdelete and gistvacuumcleanup, isn't it better to keep the
information of internal and empty leaf pages as part of GistVacState?
Also, I think it is better to call gistvacuum_delete_empty_pages from
function gistvacuumscan as that will avoid it calling from multiple
places.

2.
- gist_stats->page_set_context = NULL;
- gist_stats->internal_page_set = NULL;
- gist_stats->empty_leaf_set = NULL;

Why have you removed this initialization?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-21 Thread Dilip Kumar
On Fri, Oct 18, 2019 at 5:32 PM Amit Kapila  wrote:
>
> On Mon, Oct 14, 2019 at 3:09 PM Dilip Kumar  wrote:
> >
> > On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra
> >  wrote:
> > >
> > >
> > > Sure, I wasn't really proposing to adding all stats from that patch,
> > > including those related to streaming.  We need to extract just those
> > > related to spilling. And yes, it needs to be moved right after 0001.
> > >
> > I have extracted the spilling related code to a separate patch on top
> > of 0001.  I have also fixed some bugs and review comments and attached
> > as a separate patch.  Later I can merge it to the main patch if you
> > agree with the changes.
> >
>
> Few comments
> -
> 0001-Add-logical_decoding_work_mem-to-limit-ReorderBuffer
> 1.
> + {
> + {"logical_decoding_work_mem", PGC_USERSET, RESOURCES_MEM,
> + gettext_noop("Sets the maximum memory to be used for logical decoding."),
> + gettext_noop("This much memory can be used by each internal "
> + "reorder buffer before spilling to disk or streaming."),
> + GUC_UNIT_KB
> + },
>
> I think we can remove 'or streaming' from above sentence for now.  We
> can add it later with later patch where streaming will be allowed.
Done
>
> 2.
> @@ -206,6 +206,18 @@ CREATE SUBSCRIPTION  class="parameter">subscription_name   
>  
> 
> +
> +   
> +work_mem (integer)
> +
> + 
> +  Limits the amount of memory used to decode changes on the
> +  publisher.  If not specified, the publisher will use the default
> +  specified by logical_decoding_work_mem. When
> +  needed, additional data are spilled to disk.
> + 
> +
> +   
>
> It is not clear why we need this parameter at least with this patch?
> I have raised this multiple times [1][2].

I have moved it out as a separate patch (0003) so that if we need that
we need this for the streaming transaction then we can keep this.
>
> bugs_and_review_comments_fix
> 1.
> },
>   &logical_decoding_work_mem,
> - -1, -1, MAX_KILOBYTES,
> - check_logical_decoding_work_mem, NULL, NULL
> + 65536, 64, MAX_KILOBYTES,
> + NULL, NULL, NULL
>
> I think the default value should be 1MB similar to
> maintenance_work_mem.  The same was true before this change.
default value for maintenance_work_mem is also 64MB. Did you mean min value?
>
> 2. -#logical_decoding_work_mem = 64MB # min 1MB, or -1 to use
> maintenance_work_mem
> +i#logical_decoding_work_mem = 64MB # min 64kB
>
> It seems the 'i' is a leftover character in the above change.  Also,
> change the default value considering the previous point.
oops, fixed.
>
> 3.
> @@ -2479,7 +2480,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn)
>
>   /* update the statistics */
>   rb->spillCount += 1;
> - rb->spillTxns += txn->serialized ? 1 : 0;
> + rb->spillTxns += txn->serialized ? 0 : 1;
>   rb->spillBytes += size;
>
> Why is this change required?  Shouldn't we increase the spillTxns
> count only when the txn is serialized?
Already agreed in previous mail so added comments
>
> 0002-Track-statistics-for-spilling
> 1.
> +
> + spill_txns
> + integer
> + Number of transactions spilled to disk after the memory used by
> +  logical decoding exceeds logical_work_mem. The
> +  counter gets incremented both for toplevel transactions and
> +  subtransactions.
> +  
> +
>
> The parameter name is wrong here. /logical_work_mem/logical_decoding_work_mem
done
>
> 2.
> +
> + spill_txns
> + integer
> + Number of transactions spilled to disk after the memory used by
> +  logical decoding exceeds logical_work_mem. The
> +  counter gets incremented both for toplevel transactions and
> +  subtransactions.
> +  
> +
> +
> + spill_count
> + integer
> + Number of times transactions were spilled to disk. Transactions
> +  may get spilled repeatedly, and this counter gets incremented on every
> +  such invocation.
> +  
> +
> +
> + spill_bytes
> + integer
> + Amount of decoded transaction data spilled to disk.
> +  
> +
>
> In all the above cases, the explanation text starts immediately after
>  tag, but the general coding practice is to start from the next
> line, see the explanation of nearby parameters.
It seems it's mixed, for example, you can see
   Timeline number of last write-ahead log location received and
  flushed to disk, the initial value of this field being the timeline
  number of the first log location used when WAL receiver is started
 

or
Timeline number of last write-ahead log location received and
  flushed to disk, the initial value of this field being the timeline
  number of the first log location used when WAL receiver is started
 

>
> It seems these parameters are added in pg-stat-wal-receiver-view in
> the docs, but in code, it is present as part of pg_stat_replication.
> It seems doc needs 

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-21 Thread Amit Kapila
On Thu, Oct 3, 2019 at 1:18 PM Dilip Kumar  wrote:
>
> I have attempted to test the performance of (Stream + Spill) vs
> (Stream + BGW pool) and I can see the similar gain what Alexey had
> shown[1].
>
> In addition to this, I have rebased the latest patchset [2] without
> the two-phase logical decoding patch set.
>
> Test results:
> I have repeated the same test as Alexy[1] for 1kk and 1kk data and
> here is my result
> Stream + Spill
> N   time on master(sec)   Total xact time (sec)
> 1kk   6   21
> 3kk 18   55
>
> Stream + BGW pool
> N  time on master(sec)  Total xact time (sec)
> 1kk  6  13
> 3kk19  35
>

I think the test results for the master are missing.  Also, how about
running these tests over a network (means master and subscriber are
not on the same machine)?   In general, yours and Alexy's test results
show that there is merit by having workers applying such transactions.
  OTOH, as noted above [1], we are also worried about the performance
of Rollbacks if we follow that approach.  I am not sure how much we
need to worry about Rollabcks if commits are faster, but can we think
of recording the changes in memory and only write to a file if the
changes are above a certain threshold?  I think that might help saving
I/O in many cases.  I am not very sure if we do that how much
additional workers can help, but they might still help.  I think we
need to do some tests and experiments to figure out what is the best
approach?  What do you think?

Tomas, Alexey, do you have any thoughts on this matter?  I think it is
important that we figure out the way to proceed in this patch.

[1] - 
https://www.postgresql.org/message-id/b25ce80e-f536-78c8-d5c8-a5df3e230785%40postgrespro.ru

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Dilip Kumar
On Tue, Oct 22, 2019 at 9:10 AM Amit Kapila  wrote:
>
> On Fri, Oct 18, 2019 at 4:51 PM Dilip Kumar  wrote:
> >
> > I have prepared a first version of the patch.  Currently, I am
> > performing an empty page deletion for all the cases.
> >
>
> Few comments:
> --
> 1.
> -/*
> - * State kept across vacuum stages.
> - */
>  typedef struct
>  {
> - IndexBulkDeleteResult stats; /* must be first */
> + IndexBulkDeleteResult *stats; /* kept across vacuum stages. */
>
>   /*
> - * These are used to memorize all internal and empty leaf pages in the 1st
> - * vacuum stage.  They are used in the 2nd stage, to delete all the empty
> - * pages.
> + * These are used to memorize all internal and empty leaf pages. They are
> + * used for deleting all the empty pages.
>   */
>   IntegerSet *internal_page_set;
>   IntegerSet *empty_leaf_set;
>
> Now, if we don't want to share the remaining stats across
> gistbulkdelete and gistvacuumcleanup, isn't it better to keep the
> information of internal and empty leaf pages as part of GistVacState?

Basically, only IndexBulkDeleteResult is now shared across the stage
so we can move all members to GistVacState and completely get rid of
GistBulkDeleteResult?

IndexBulkDeleteResult *stats
IntegerSet *internal_page_set;
IntegerSet *empty_leaf_set;
MemoryContext page_set_context;


> Also, I think it is better to call gistvacuum_delete_empty_pages from
> function gistvacuumscan as that will avoid it calling from multiple
> places.
Yeah we can do that.
>
> 2.
> - gist_stats->page_set_context = NULL;
> - gist_stats->internal_page_set = NULL;
> - gist_stats->empty_leaf_set = NULL;
>
> Why have you removed this initialization?
This was post-cleanup reset since we were returning the gist_stats so
it was better to clean up but now we are not returning it out so I
though we don't need to clean this.  But, I think now we can free the
memory gist_stats itself.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Amit Kapila
On Tue, Oct 22, 2019 at 10:50 AM Dilip Kumar  wrote:
>
> On Tue, Oct 22, 2019 at 9:10 AM Amit Kapila  wrote:
> >
> > On Fri, Oct 18, 2019 at 4:51 PM Dilip Kumar  wrote:
> > >
> > > I have prepared a first version of the patch.  Currently, I am
> > > performing an empty page deletion for all the cases.
> > >
> >
> > Few comments:
> > --
> > 1.
> > -/*
> > - * State kept across vacuum stages.
> > - */
> >  typedef struct
> >  {
> > - IndexBulkDeleteResult stats; /* must be first */
> > + IndexBulkDeleteResult *stats; /* kept across vacuum stages. */
> >
> >   /*
> > - * These are used to memorize all internal and empty leaf pages in the 1st
> > - * vacuum stage.  They are used in the 2nd stage, to delete all the empty
> > - * pages.
> > + * These are used to memorize all internal and empty leaf pages. They are
> > + * used for deleting all the empty pages.
> >   */
> >   IntegerSet *internal_page_set;
> >   IntegerSet *empty_leaf_set;
> >
> > Now, if we don't want to share the remaining stats across
> > gistbulkdelete and gistvacuumcleanup, isn't it better to keep the
> > information of internal and empty leaf pages as part of GistVacState?
>
> Basically, only IndexBulkDeleteResult is now shared across the stage
> so we can move all members to GistVacState and completely get rid of
> GistBulkDeleteResult?
>

Yes, something like that would be better.  Let's try and see how it comes out.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-21 Thread Dilip Kumar
On Tue, Oct 22, 2019 at 10:46 AM Amit Kapila  wrote:
>
> On Thu, Oct 3, 2019 at 1:18 PM Dilip Kumar  wrote:
> >
> > I have attempted to test the performance of (Stream + Spill) vs
> > (Stream + BGW pool) and I can see the similar gain what Alexey had
> > shown[1].
> >
> > In addition to this, I have rebased the latest patchset [2] without
> > the two-phase logical decoding patch set.
> >
> > Test results:
> > I have repeated the same test as Alexy[1] for 1kk and 1kk data and
> > here is my result
> > Stream + Spill
> > N   time on master(sec)   Total xact time (sec)
> > 1kk   6   21
> > 3kk 18   55
> >
> > Stream + BGW pool
> > N  time on master(sec)  Total xact time (sec)
> > 1kk  6  13
> > 3kk19  35
> >
>
> I think the test results for the master are missing.
Yeah, That time, I was planning to compare spill vs bgworker.
  Also, how about
> running these tests over a network (means master and subscriber are
> not on the same machine)?

Yeah, we should do that that will show the merit of streaming the
in-progress transactions.

   In general, yours and Alexy's test results
> show that there is merit by having workers applying such transactions.
>   OTOH, as noted above [1], we are also worried about the performance
> of Rollbacks if we follow that approach.  I am not sure how much we
> need to worry about Rollabcks if commits are faster, but can we think
> of recording the changes in memory and only write to a file if the
> changes are above a certain threshold?  I think that might help saving
> I/O in many cases.  I am not very sure if we do that how much
> additional workers can help, but they might still help.  I think we
> need to do some tests and experiments to figure out what is the best
> approach?  What do you think?
I agree with the point.  I think we might need to do some small
changes and test to see what could be the best method to handle the
streamed changes at the subscriber end.

>
> Tomas, Alexey, do you have any thoughts on this matter?  I think it is
> important that we figure out the way to proceed in this patch.
>
> [1] - 
> https://www.postgresql.org/message-id/b25ce80e-f536-78c8-d5c8-a5df3e230785%40postgrespro.ru
>


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




pgbench - refactor init functions with buffers

2019-10-21 Thread Fabien COELHO


Hello,

While developing pgbench to allow partitioned tabled, I reproduced the 
string management style used in the corresponding functions, but was 
pretty unhappy with that kind of pattern:


snprintf(buf + strlen(buf), sizeof(buf) - strlen(buf), ...)

However adding a feature is not the place for refactoring.

This patch refactors initialization functions so as to use PQExpBuffer 
where appropriate to simplify and clarify the code. SQL commands are 
generated by accumulating parts into a buffer in order, before executing 
it. I also added a more generic function to execute a statement and fail 
if the result is unexpected.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e72ad0036e..5450eba4ac 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -599,7 +599,7 @@ static void doLog(TState *thread, CState *st,
   StatsData *agg, bool skipped, double latency, double lag);
 static void processXactStats(TState *thread, CState *st, instr_time *now,
 			 bool skipped, StatsData *agg);
-static void append_fillfactor(char *opts, int len);
+static void append_fillfactor(PQExpBuffer query);
 static void addScript(ParsedScript script);
 static void *threadRun(void *arg);
 static void finishCon(CState *st);
@@ -1139,17 +1139,24 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag)
 
 /* call PQexec() and exit() on failure */
 static void
+executeStatementExpect(PGconn *con, const char *sql, const ExecStatusType expected)
+{
+	PGresult   *res;
+
+	res = PQexec(con, sql);
+	if (PQresultStatus(res) != expected)
+	{
+		fprintf(stderr, "%s", PQerrorMessage(con));
+		exit(1);
+	}
+	PQclear(res);
+}
+
+/* execute sql command, which must work, or die if not */
+static void
 executeStatement(PGconn *con, const char *sql)
 {
-	PGresult   *res;
-
-	res = PQexec(con, sql);
-	if (PQresultStatus(res) != PGRES_COMMAND_OK)
-	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
-	}
-	PQclear(res);
+	executeStatementExpect(con, sql, PGRES_COMMAND_OK);
 }
 
 /* call PQexec() and complain, but without exiting, on failure */
@@ -3631,30 +3638,26 @@ initDropTables(PGconn *con)
 static void
 createPartitions(PGconn *con)
 {
-	char		ff[64];
-
-	ff[0] = '\0';
-
-	/*
-	 * Per ddlinfo in initCreateTables, fillfactor is needed on table
-	 * pgbench_accounts.
-	 */
-	append_fillfactor(ff, sizeof(ff));
+	PQExpBufferData		query;
 
 	/* we must have to create some partitions */
 	Assert(partitions > 0);
 
 	fprintf(stderr, "creating %d partitions...\n", partitions);
 
+	initPQExpBuffer(&query);
+
 	for (int p = 1; p <= partitions; p++)
 	{
-		char		query[256];
-
 		if (partition_method == PART_RANGE)
 		{
 			int64		part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
-			char		minvalue[32],
-		maxvalue[32];
+
+			printfPQExpBuffer(&query,
+			  "create%s table pgbench_accounts_%d\n"
+			  "  partition of pgbench_accounts\n"
+			  "  for values from (",
+			  unlogged_tables ? " unlogged" : "", p);
 
 			/*
 			 * For RANGE, we use open-ended partitions at the beginning and
@@ -3663,34 +3666,39 @@ createPartitions(PGconn *con)
 			 * scale, it is more generic and the performance is better.
 			 */
 			if (p == 1)
-sprintf(minvalue, "minvalue");
+appendPQExpBufferStr(&query, "minvalue");
 			else
-sprintf(minvalue, INT64_FORMAT, (p - 1) * part_size + 1);
+appendPQExpBuffer(&query, INT64_FORMAT, (p - 1) * part_size + 1);
+
+			appendPQExpBufferStr(&query, ") to (");
 
 			if (p < partitions)
-sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
+appendPQExpBuffer(&query, INT64_FORMAT, p * part_size + 1);
 			else
-sprintf(maxvalue, "maxvalue");
+appendPQExpBufferStr(&query, "maxvalue");
 
-			snprintf(query, sizeof(query),
-	 "create%s table pgbench_accounts_%d\n"
-	 "  partition of pgbench_accounts\n"
-	 "  for values from (%s) to (%s)%s\n",
-	 unlogged_tables ? " unlogged" : "", p,
-	 minvalue, maxvalue, ff);
+			appendPQExpBufferChar(&query, ')');
 		}
 		else if (partition_method == PART_HASH)
-			snprintf(query, sizeof(query),
-	 "create%s table pgbench_accounts_%d\n"
-	 "  partition of pgbench_accounts\n"
-	 "  for values with (modulus %d, remainder %d)%s\n",
-	 unlogged_tables ? " unlogged" : "", p,
-	 partitions, p - 1, ff);
+			printfPQExpBuffer(&query,
+			  "create%s table pgbench_accounts_%d\n"
+			  "  partition of pgbench_accounts\n"
+			  "  for values with (modulus %d, remainder %d)",
+			  unlogged_tables ? " unlogged" : "", p,
+			  partitions, p - 1);
 		else	/* cannot get there */
 			Assert(0);
 
-		executeStatement(con, query);
+		/*
+		 * Per ddlinfo in initCreateTables, fillfactor is needed on table
+		 * pgbench_accounts.
+		 */
+		append_fillfactor(&query);
+
+		executeStatement(con, query.data);
 	}
+
+	termPQExpBuffer(&query);
 }
 
 /*
@@ -3743,27 +3751,30 @@ initCreateTables(PGconn *con)
 			1