date:20190905

Re: pg_promote() can cause busy loop

2019-09-05 Thread Fujii Masao

On Thu, Sep 5, 2019 at 11:10 AM Michael Paquier  wrote:
>
> On Thu, Sep 05, 2019 at 10:53:19AM +0900, Fujii Masao wrote:
> > It's ok to use PG_RETURN_BOOL(false) instead of breaking out of the loop
> > in that case. Which would make the code simpler.
>
> Okay.  I would have done so FWIW.
>
> > But I don't think it's worth warning about postmaster death here
> > because a backend will emit FATAL message like "terminating connection
> > due to unexpected postmaster exit" in secure_read() after
> > pg_promote() returns false.
>
> Good point, that could be equally confusing.

So, barring any objection, I will commit the attached patch.

Regards,

-- 
Fujii Masao


pg_promote_and_postmaster_death_v2.patch
Description: Binary data

Re: enhance SPI to support EXECUTE commands

2019-09-05 Thread Pavel Stehule

čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang <
zongliang.q...@postgresdata.com> napsal:

> Dear hackers,
>
> I found that such a statement would get 0 in PL/pgSQL.
>
> PREPARE smt_del(int) AS DELETE FROM t1;
> EXECUTE 'EXECUTE smt_del(100)';
> GET DIAGNOSTICS j = ROW_COUNT;
>
> In fact, this is a problem with SPI, it does not support getting result
> of the EXECUTE command. I made a little enhancement. Support for the
> number of rows processed when executing INSERT/UPDATE/DELETE statements
> dynamically.
>

Is there some use case for support this feature?

Regards

Pavel


> Regards,
> Quan Zongliang
>

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2019-09-05 Thread Paul Guo

On Tue, Sep 3, 2019 at 11:58 PM Alvaro Herrera 
wrote:

> On 2019-Aug-22, Anastasia Lubennikova wrote:
>
> > 22.08.2019 16:13, Paul Guo wrote:
> > > Thanks. I updated the patch to v5. It passes install-check testing and
> > > recovery testing.
> > Hi,
> > Thank you for working on this fix.
> > The overall design of the latest version looks good to me.
> > But during the review, I found a bug in the current implementation.
> > New behavior must apply to crash-recovery only, now it applies to
> > archiveRecovery too.
>
> Hello
>
> Paul, Kyotaro, are you working on updating this bugfix?  FWIW the latest
> patch submitted by Paul is still current and CFbot says it passes its
> own test, but from Anastasia's email it still needs a bit of work.
>
> Also: it would be good to have this new bogus scenario described by
> Anastasia covered by a new TAP test.  Anastasia, can we enlist you to
> write that?  Maybe Kyotaro?
>
>
Thanks Anastasia and Alvaro for comment and suggestion. Sorry I've been busy
working on some non-PG stuffs recently. I've never worked on archive
recovery,
so I expect a bit more time after I'm free (hopefully several days later)
to take a look.
Of course Kyotaro, Anastasia or anyone feel free to address the concern
before that.

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-09-05 Thread Etsuro Fujita

Hi,

On Thu, Sep 5, 2019 at 1:24 PM amul sul  wrote:
> On Wed, Sep 4, 2019 at 2:40 AM Alvaro Herrera  
> wrote:
>> CFbot complains that Fujita-san submitted a patch that doesn't apply,
>> which makes sense since the necessary previous patch was only referred
>> to without being resubmitted.  I suggest to always post all patches
>> together with each resubmission so that it can be checked automatically
>> by the cf bot: http://commitfest.cputube.org/etsuro-fujita.html

> Understood and sorry about that.

Sorry about that.

>> I'm not clear on who the author of this patch is, now.  Also, I'm not
>> sure what the *status* is.  Are we waiting for Fujita-san to review this
>> patch?

> Yes, we are waiting for Fujita-san to review.  Fujita-san has started a review
> and proposed some enhancement which I reviewed in the last update.
>
> I think soon Fujita-san might post the complete patch including his changes.

I'm a bit busy with something else recently, but I'll do that ASAP.
And I'll continue to review the other part of the patch.

Best regards,
Etsuro Fujita

Re: Plug-in common/logging.h with vacuumlo and oid2name

2019-09-05 Thread Michael Paquier

On Thu, Sep 05, 2019 at 08:59:41AM +0900, Michael Paquier wrote:
> There is an argument to allow libpq to find out a service file for
> a connection from the executable path.  Note that oid2name can use a
> connection string for connection, but not vacuumlo, so I somewhat
> missed that.

If you think that's not worth considering for now, I am fine to drop
that part.  What do you think about the updated patch attached then?
--
Michael
diff --git a/contrib/oid2name/oid2name.c b/contrib/oid2name/oid2name.c
index df4f9c062f..fa1e7959e7 100644
--- a/contrib/oid2name/oid2name.c
+++ b/contrib/oid2name/oid2name.c
@@ -11,6 +11,7 @@
 
 #include "catalog/pg_class_d.h"
 
+#include "common/logging.h"
 #include "fe_utils/connect.h"
 #include "libpq-fe.h"
 #include "pg_getopt.h"
@@ -85,6 +86,7 @@ get_opts(int argc, char **argv, struct options *my_opts)
 	const char *progname;
 	int			optindex;
 
+	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
 
 	/* set the defaults */
@@ -328,8 +330,8 @@ sql_conn(struct options *my_opts)
 
 		if (!conn)
 		{
-			fprintf(stderr, "%s: could not connect to database %s\n",
-	"oid2name", my_opts->dbname);
+			pg_log_error("could not connect to database %s",
+		 my_opts->dbname);
 			exit(1);
 		}
 
@@ -347,8 +349,8 @@ sql_conn(struct options *my_opts)
 	/* check to see that the backend connection was successfully made */
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
-		fprintf(stderr, "%s: could not connect to database %s: %s",
-"oid2name", my_opts->dbname, PQerrorMessage(conn));
+		pg_log_error("could not connect to database %s: %s",
+	 my_opts->dbname, PQerrorMessage(conn));
 		PQfinish(conn);
 		exit(1);
 	}
@@ -356,8 +358,8 @@ sql_conn(struct options *my_opts)
 	res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
-		fprintf(stderr, "oid2name: could not clear search_path: %s\n",
-PQerrorMessage(conn));
+		pg_log_error("could not clear search_path: %s",
+	 PQerrorMessage(conn));
 		PQclear(res);
 		PQfinish(conn);
 		exit(-1);
@@ -390,8 +392,8 @@ sql_exec(PGconn *conn, const char *todo, bool quiet)
 	/* check and deal with errors */
 	if (!res || PQresultStatus(res) > 2)
 	{
-		fprintf(stderr, "oid2name: query failed: %s\n", PQerrorMessage(conn));
-		fprintf(stderr, "oid2name: query was: %s\n", todo);
+		pg_log_error("query failed: %s", PQerrorMessage(conn));
+		pg_log_error("query was: %s", todo);
 
 		PQclear(res);
 		PQfinish(conn);
diff --git a/contrib/vacuumlo/vacuumlo.c b/contrib/vacuumlo/vacuumlo.c
index 73c06a043e..533e2ce33c 100644
--- a/contrib/vacuumlo/vacuumlo.c
+++ b/contrib/vacuumlo/vacuumlo.c
@@ -23,6 +23,7 @@
 
 #include "catalog/pg_class_d.h"
 
+#include "common/logging.h"
 #include "fe_utils/connect.h"
 #include "libpq-fe.h"
 #include "pg_getopt.h"
@@ -109,8 +110,7 @@ vacuumlo(const char *database, const struct _param *param)
 		conn = PQconnectdbParams(keywords, values, true);
 		if (!conn)
 		{
-			fprintf(stderr, "Connection to database \"%s\" failed\n",
-	database);
+			pg_log_error("connection to database \"%s\" failed", database);
 			return -1;
 		}
 
@@ -129,8 +129,8 @@ vacuumlo(const char *database, const struct _param *param)
 	/* check to see that the backend connection was successfully made */
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
-		fprintf(stderr, "Connection to database \"%s\" failed:\n%s",
-database, PQerrorMessage(conn));
+		pg_log_error("connection to database \"%s\" failed: %s",
+	 database, PQerrorMessage(conn));
 		PQfinish(conn);
 		return -1;
 	}
@@ -145,8 +145,7 @@ vacuumlo(const char *database, const struct _param *param)
 	res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
-		fprintf(stderr, "Failed to set search_path:\n");
-		fprintf(stderr, "%s", PQerrorMessage(conn));
+		pg_log_error("failed to set search_path: %s", PQerrorMessage(conn));
 		PQclear(res);
 		PQfinish(conn);
 		return -1;
@@ -165,8 +164,7 @@ vacuumlo(const char *database, const struct _param *param)
 	res = PQexec(conn, buf);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, "Failed to create temp table:\n");
-		fprintf(stderr, "%s", PQerrorMessage(conn));
+		pg_log_error("failed to create temp table: %s", PQerrorMessage(conn));
 		PQclear(res);
 		PQfinish(conn);
 		return -1;
@@ -182,8 +180,7 @@ vacuumlo(const char *database, const struct _param *param)
 	res = PQexec(conn, buf);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, "Failed to vacuum temp table:\n");
-		fprintf(stderr, "%s", PQerrorMessage(conn));
+		pg_log_error("failed to vacuum temp table: %s", PQerrorMessage(conn));
 		PQclear(res);
 		PQfinish(conn);
 		return -1;
@@ -212,8 +209,7 @@ vacuumlo(const char *database, const struct _param *param)
 	res = PQexec(conn, buf);
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
-		fprintf(stderr, "Failed to find OID columns:\n");
-		fprintf(stderr, "%s", PQerrorMessage(conn));

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-05 Thread Paul Guo

>
> It seems there's minor breakage in the build,  per CFbot.  Can you
> please rebase this?
>
> There is a code conflict. See attached for the new version. Thanks.


v5-0001-Extact-common-functions-from-pg_basebackup-into-s.patch
Description: Binary data


v5-0002-Add-option-to-write-recovery-configuration-inform.patch
Description: Binary data


v5-0003-Ensure-target-clean-shutdown-at-the-beginning-of-.patch
Description: Binary data

Re: pg_promote() can cause busy loop

2019-09-05 Thread Michael Paquier

On Thu, Sep 05, 2019 at 04:03:22PM +0900, Fujii Masao wrote:
> So, barring any objection, I will commit the attached patch.

LGTM.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: enhance SPI to support EXECUTE commands

2019-09-05 Thread Quan Zongliang


On 2019/9/5 15:09, Pavel Stehule wrote:



čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang 
> napsal:


Dear hackers,

I found that such a statement would get 0 in PL/pgSQL.

PREPARE smt_del(int) AS DELETE FROM t1;
EXECUTE 'EXECUTE smt_del(100)';
GET DIAGNOSTICS j = ROW_COUNT;

In fact, this is a problem with SPI, it does not support getting result
of the EXECUTE command. I made a little enhancement. Support for the
number of rows processed when executing INSERT/UPDATE/DELETE statements
dynamically.


Is there some use case for support this feature?

A user deletes the data in PL/pgSQL using the above method, hoping to do 
more processing according to the number of rows affected, and found that 
each time will get 0.


Sample code:
PREPARE smt_del(int) AS DELETE FROM t1 WHERE c=$1;
EXECUTE 'EXECUTE smt_del(100)';
GET DIAGNOSTICS j = ROW_COUNT;

IF j=1 THEN
  do something
ELSIF j=0 THEN
  do something

Here j is always equal to 0.

Regards


Regards

Pavel


Regards,
Quan Zongliang

Re: unexpected rowlock mode when trigger is on the table

2019-09-05 Thread Tomáš Záluský

Thanks for response.

> I think there should be no overlap (PK is column "id", not modified)

The update command sets the detail_id column which has unique constraint.
If I read documentation correctly (13.3.2. Row-level Locks), unique columns 
also count to columns whose presence in update statement causes choosing FOR 
UPDATE lock.

What is unclear to me, why FOR NO KEY UPDATE is chosen when there is no trigger.
Perhaps the execution path to ExecUpdateLockMode is somehow different?
And if FOR NO KEY UPDATE is correct, how to achieve it also with trigger?

Tomáš


__
> Od: "Alvaro Herrera" 
> Komu: "Tomáš Záluský" 
> Datum: 05.09.2019 00:52
> Předmět: Re: unexpected rowlock mode when trigger is on the table
>
> CC: 
>On 2019-Sep-03, Tomáš Záluský wrote:
>
>> postgres=# begin;
>> BEGIN
>> postgres=# update master set detail_id=null, name='y' where id=1000;
>> UPDATE 1
>> 
>> In another psql console, I run:
>> 
>> postgres=# select * from pgrowlocks('master');
>>  locked_row | locker | multi | xids  |  modes   | pids
>> ++---+---+--+---
>>  (0,3)  |564 | f | {564} | {Update} | {138}
>> (1 row)
>
>Hmm, so I'm guessing that this tuple lock comes from GetTupleForTrigger
>called from ExecBRUpdateTriggers, which uses ExecUpdateLockMode() in
>order to figure out the lockmode to use, depending on whether the
>modified columns by the update overlap columns indexed by any unique
>index.  I think there should be no overlap (PK is column "id", not modified)
>but I may be missing something.
>
>(gdb) bt
>#0  heap_lock_tuple (relation=relation@entry=0x7eff2157b4d8, 
>tuple=tuple@entry=0x7ffe570db3e0, cid=0, 
>mode=mode@entry=LockTupleExclusive, 
>wait_policy=wait_policy@entry=LockWaitBlock, 
>follow_updates=follow_updates@entry=0 '\000', buffer=0x7ffe570db3cc, 
>hufd=0x7ffe570db3d0)
>at /pgsql/source/REL9_6_STABLE/src/backend/access/heap/heapam.c:4577
>#1  0x5648b1d52f15 in GetTupleForTrigger (
>estate=estate@entry=0x5648b3894110, 
>epqstate=epqstate@entry=0x5648b3894750, tid=tid@entry=0x7ffe570db674, 
>lockmode=LockTupleExclusive, newSlot=0x7ffe570db498, 
>relinfo=, relinfo=)
>at /pgsql/source/REL9_6_STABLE/src/backend/commands/trigger.c:2709
>#2  0x5648b1d579a0 in ExecBRUpdateTriggers (
>estate=estate@entry=0x5648b3894110, 
>epqstate=epqstate@entry=0x5648b3894750, 
>relinfo=relinfo@entry=0x5648b3894260, 
>tupleid=tupleid@entry=0x7ffe570db674, 
>fdw_trigtuple=fdw_trigtuple@entry=0x0, slot=slot@entry=0x5648b3896670)
>at /pgsql/source/REL9_6_STABLE/src/backend/commands/trigger.c:2432
>#3  0x5648b1d8ddc2 in ExecUpdate (tupleid=tupleid@entry=0x7ffe570db674, 
>oldtuple=oldtuple@entry=0x0, slot=slot@entry=0x5648b3896670, 
>planSlot=planSlot@entry=0x5648b3895998, 
>epqstate=epqstate@entry=0x5648b3894750, 
>estate=estate@entry=0x5648b3894110, canSetTag=1 '\001')
>at /pgsql/source/REL9_6_STABLE/src/backend/executor/nodeModifyTable.c:850
>
>Maybe we're passing an argument wrong somewhere.  Unclear ...
>
>-- 
>Álvaro Herrerahttps://www.2ndQuadrant.com/
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>
>

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-05 Thread Masahiko Sawada

On Thu, Sep 5, 2019 at 10:41 AM Michael Paquier  wrote:
>
> On Wed, Sep 04, 2019 at 04:50:45PM -0400, Alvaro Herrera wrote:
> > According to CFbot, the Windows build fails with this patch.  Please
> > fix.
>
> To save a couple of clicks:
> "C:\projects\postgresql\pageinspect.vcxproj" (default target) (56) ->
> (Link target) ->
>   heapfuncs.obj : error LNK2001: unresolved external symbol
>   pg_popcount32 [C:\projects\postgresql\pageinspect.vcxproj]
> .\Release\pageinspect\pageinspect.dll : fatal error LNK1120: 1
> unresolved externals [C:\projects\postgresql\pageinspect.vcxproj]
>
> I think that it would be more simple to just use pg_popcount().
> That's what other contrib modules do (for example ltree or intarray).

Thanks. I hope the attached new patch fixes this issue.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


v4-0001-Introduce-heap_infomask_flags-to-decode-infomask-.patch
Description: Binary data

RE: [PATCH] Speedup truncates of relation forks

2019-09-05 Thread Jamison, Kirk

On Tuesday, September 3, 2019 9:44 PM (GMT+9), Fujii Masao wrote:
> Thanks for the patch!

Thank you as well for the review!

> -smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
> +smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks,
> bool isRedo)
> 
> smgrdounlinkfork() is dead code. Per the discussion [1], this unused function
> was left intentionally. But it's still dead code since 2012, so I'd like to
> remove it. Or, even if we decide to keep that function for some reasons, I
> don't think that we need to update that so that it can unlink multiple forks
> at once. So, what about keeping
> smgrdounlinkfork() as it is?
> 
> [1]
> https://www.postgresql.org/message-id/1471.1339106...@sss.pgh.pa.us 

I also mentioned it from my first post if we can just remove this dead code.
If not, it would require to modify the function because it would also
need nforks as input argument when calling DropRelFileNodeBuffers. I kept my
changes in the latest patch.
So should I remove the function now or keep my changes?

> + for (int i = 0; i < nforks; i++)
> 
> The variable "i" should not be declared in for loop per PostgreSQL coding
> style.

Fixed.

> + /* Check with the lower bound block number and skip the loop */ if
> + (bufHdr->tag.blockNum < minBlock) continue; /* skip checking the
> + buffer pool scan */
> 
> Because of the above code, the following source comment in bufmgr.c should
> be updated.
> 
> * We could check forkNum and blockNum as well as the rnode, but the
> * incremental win from doing so seems small.
> 
> And, first of all, is this check really useful for performance?
> Since firstDelBlock for FSM fork is usually small, minBlock would also be
> small. So I'm not sure how much this is helpful for performance.

This was a suggestion from Sawada-san in the previous email,
but he also thought that the performance benefit might be small..
so I just removed the related code block in this patch.

> When relation is completely truncated at all (i.e., the number of block to
> delete first is zero), can RelationTruncate() and smgr_redo()  just call
> smgrdounlinkall() like smgrDoPendingDeletes() does, instead of calling
> MarkFreeSpaceMapTruncateRel(), visibilitymap_truncate_prepare() and
> smgrtruncate()? ISTM that smgrdounlinkall() is faster and simpler.

I haven't applied this in my patch yet.
If my understanding is correct, smgrdounlinkall() is used for deleting
relation forks. However, we only truncate (not delete) relations
in RelationTruncate() and smgr_redo(). I'm not sure if it's correct to
use it here. Could you expound more your idea on using smgrdounlinkall?

Regards,
Kirk Jamison

v6-0001-Speedup-truncates-of-relation-forks.patch
Description: v6-0001-Speedup-truncates-of-relation-forks.patch

Re: enhance SPI to support EXECUTE commands

2019-09-05 Thread Quan Zongliang


On 2019/9/5 16:31, Pavel Stehule wrote:



čt 5. 9. 2019 v 10:25 odesílatel Quan Zongliang 
> napsal:


On 2019/9/5 15:09, Pavel Stehule wrote:
 >
 >
 > čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang
 > mailto:zongliang.q...@postgresdata.com>
 > >> napsal:
 >
 >     Dear hackers,
 >
 >     I found that such a statement would get 0 in PL/pgSQL.
 >
 >     PREPARE smt_del(int) AS DELETE FROM t1;
 >     EXECUTE 'EXECUTE smt_del(100)';
 >     GET DIAGNOSTICS j = ROW_COUNT;
 >
 >     In fact, this is a problem with SPI, it does not support
getting result
 >     of the EXECUTE command. I made a little enhancement. Support
for the
 >     number of rows processed when executing INSERT/UPDATE/DELETE
statements
 >     dynamically.
 >
 >
 > Is there some use case for support this feature?
 >
A user deletes the data in PL/pgSQL using the above method, hoping
to do
more processing according to the number of rows affected, and found
that
each time will get 0.

Sample code:
PREPARE smt_del(int) AS DELETE FROM t1 WHERE c=$1;
EXECUTE 'EXECUTE smt_del(100)';
GET DIAGNOSTICS j = ROW_COUNT;


This has not sense in plpgsql. Why you use PREPARE statement explicitly?


Yes, I told him to do it in other ways, and the problem has been solved.

Under psql, we can get this result

flying=# EXECUTE smt_del(100);
DELETE 1

So I think this may be the negligence of SPI, it should be better to 
deal with it.




IF j=1 THEN
    do something
ELSIF j=0 THEN
    do something

Here j is always equal to 0.



Regards

 > Regards
 >
 > Pavel
 >
 >
 >     Regards,
 >     Quan Zongliang
 >

Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2019-09-05 Thread Dilip Kumar

On Thu, Sep 5, 2019 at 2:12 PM Amit Langote  wrote:

Thanks for the patch, I was almost about to press the send button with
my patch.  But, this looks similar to my version.
>
> On Wed, Sep 4, 2019 at 8:53 AM Tom Lane  wrote:
> >
> > Amit Langote  writes:
> > > [ v2-0001-Use-root-parent-s-permissions-when-read-child-tab.patch ]
> >
> > I took a quick look through this.  I have some cosmetic thoughts and
> > also a couple of substantive concerns:
>
> Thanks a lot for reviewing this.
>
> > * As a rule, patches that add fields at the end of a struct are wrong.
> > There is almost always some more-appropriate place to put the field
> > based on its semantics.  We don't want our struct layouts to be historical
> > annals; they should reflect a coherent design.  In this case I'd be
> > inclined to put the new field next to the regular relid field.  It
> > should have a name that's less completely unrelated to relid, too.
>
> I've renamed the field to inh_root_relid and placed it next to relid.
>
> > * It might make more sense to define the new field as "top parent's relid,
> > or self if no parent", rather than "... or zero if no parent".  Then you
> > don't need if-tests like this:
> >
> > +if (rel->inh_root_parent > 0)
> > +rte = 
> > planner_rt_fetch(rel->inh_root_parent,
> > +   root);
> > +else
> > +rte = planner_rt_fetch(rel->relid, 
> > root);
>
> Agreed, done this way.
>
> > * The business about reverse mapping Vars seems quite inefficient, but
> > what's much worse is that it only accounts for one level of parent.
> > I'm pretty certain this will give the wrong answer for multiple levels
> > of partitioning, if the column numbers aren't all the same.
>
> Indeed.  We need to be checking the root parent's permissions, not the
> immediate parent's which might be a child itself.
>
> > * To fix the above, you probably need to map back one inheritance level
> > at a time, which suggests that you could just use the AppendRelInfo
> > parent-rel info and not need any addition to RelOptInfo at all.  This
> > makes the efficiency issue even worse, though.  I don't immediately have a
> > great idea about doing better.  Making it faster might require adding more
> > info to AppendRelInfos, and I'm not quite sure if it's worth the tradeoff.
>
> Hmm, yes.  If AppendRelInfos had contained a reverse translation list
> that maps Vars of a given child to the root parent's, this patch would
> end up being much simpler and not add too much cost to the selectivity
> code.  However building such a map would not be free and the number of
> places where it's useful would be significantly smaller where the
> existing parent-to-child translation list is used.
>
> Anyway, I've fixed the above-mentioned oversights in the current code for now.
>
> > * I'd be inclined to use an actual test-and-elog not just an Assert
> > for the no-mapping-found case.  For one reason, some compilers are
> > going to complain about a set-but-not-used variable in non-assert
> > builds.  More importantly, I'm not very convinced that it's impossible
> > to hit the no-mapping case.  The original proposal was to fall back
> > to current behavior (test the child-table permissions) if we couldn't
> > match the var to the top parent, and I think that that is still a
> > sane proposal.
>
> OK, I've removed the Assert.  For child Vars that can't be translated
> to root parent's, permissions are checked with the child relation,
> like before.

Instead of falling back to the child, isn't it make more sense to
check the permissions on the parent upto which we could translate (it
may not be the root parent)?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2019-09-05 Thread Dilip Kumar

On Thu, Sep 5, 2019 at 2:48 PM Dilip Kumar  wrote:
>
> On Thu, Sep 5, 2019 at 2:12 PM Amit Langote  wrote:
>
> Thanks for the patch, I was almost about to press the send button with
> my patch.  But, this looks similar to my version.
> >
> > On Wed, Sep 4, 2019 at 8:53 AM Tom Lane  wrote:

>
> Instead of falling back to the child, isn't it make more sense to
> check the permissions on the parent upto which we could translate (it
> may not be the root parent)?
>

  /*
+ * For inheritance child relations, we also need to remember
+ * the root parent.
+ */
+ if (parent->rtekind == RTE_RELATION)
+ rel->inh_root_relid = parent->inh_root_relid > 0 ?
+ parent->inh_root_relid :
+ parent->relid;
+ else
+ /* Child relation of flattened UNION ALL subquery. */
+ rel->inh_root_relid = relid;

With the current changes, parent->inh_root_relid will always be > 0 so
(parent->inh_root_relid > 0) condition doesn't make sence. Right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2019-09-05 Thread Amit Langote

Hi Dilip,

Thanks for checking.

On Thu, Sep 5, 2019 at 6:18 PM Dilip Kumar  wrote:
> On Thu, Sep 5, 2019 at 2:12 PM Amit Langote  wrote:
> Thanks for the patch, I was almost about to press the send button with
> my patch.  But, this looks similar to my version.

Good to hear that.

> > On Wed, Sep 4, 2019 at 8:53 AM Tom Lane  wrote:
> > > * I'd be inclined to use an actual test-and-elog not just an Assert
> > > for the no-mapping-found case.  For one reason, some compilers are
> > > going to complain about a set-but-not-used variable in non-assert
> > > builds.  More importantly, I'm not very convinced that it's impossible
> > > to hit the no-mapping case.  The original proposal was to fall back
> > > to current behavior (test the child-table permissions) if we couldn't
> > > match the var to the top parent, and I think that that is still a
> > > sane proposal.
> >
> > OK, I've removed the Assert.  For child Vars that can't be translated
> > to root parent's, permissions are checked with the child relation,
> > like before.
>
> Instead of falling back to the child, isn't it make more sense to
> check the permissions on the parent upto which we could translate (it
> may not be the root parent)?

Hmm, in that case, the parent up to which we might be able to
translate would still be a child and might have different permissions
than the table mentioned in the query (what's being called "root" in
this context).  Would it be worth further complicating this code if
that's the case?

Thanks,
Amit

Re: enhance SPI to support EXECUTE commands

2019-09-05 Thread Pavel Stehule

čt 5. 9. 2019 v 10:57 odesílatel Quan Zongliang <
zongliang.q...@postgresdata.com> napsal:

> On 2019/9/5 16:31, Pavel Stehule wrote:
> >
> >
> > čt 5. 9. 2019 v 10:25 odesílatel Quan Zongliang
> >  > > napsal:
> >
> > On 2019/9/5 15:09, Pavel Stehule wrote:
> >  >
> >  >
> >  > čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang
> >  >  > 
> >  >  > >> napsal:
> >  >
> >  > Dear hackers,
> >  >
> >  > I found that such a statement would get 0 in PL/pgSQL.
> >  >
> >  > PREPARE smt_del(int) AS DELETE FROM t1;
> >  > EXECUTE 'EXECUTE smt_del(100)';
> >  > GET DIAGNOSTICS j = ROW_COUNT;
> >  >
> >  > In fact, this is a problem with SPI, it does not support
> > getting result
> >  > of the EXECUTE command. I made a little enhancement. Support
> > for the
> >  > number of rows processed when executing INSERT/UPDATE/DELETE
> > statements
> >  > dynamically.
> >  >
> >  >
> >  > Is there some use case for support this feature?
> >  >
> > A user deletes the data in PL/pgSQL using the above method, hoping
> > to do
> > more processing according to the number of rows affected, and found
> > that
> > each time will get 0.
> >
> > Sample code:
> > PREPARE smt_del(int) AS DELETE FROM t1 WHERE c=$1;
> > EXECUTE 'EXECUTE smt_del(100)';
> > GET DIAGNOSTICS j = ROW_COUNT;
> >
> >
> > This has not sense in plpgsql. Why you use PREPARE statement explicitly?
> >
> Yes, I told him to do it in other ways, and the problem has been solved.
>
> Under psql, we can get this result
>
> flying=# EXECUTE smt_del(100);
> DELETE 1
>
> So I think this may be the negligence of SPI, it should be better to
> deal with it.
>

Personally, I would not to support features that allows bad code.

Pavel

>
> >
> > IF j=1 THEN
> > do something
> > ELSIF j=0 THEN
> > do something
> >
> > Here j is always equal to 0.
> >
> >
> >
> > Regards
> >
> >  > Regards
> >  >
> >  > Pavel
> >  >
> >  >
> >  > Regards,
> >  > Quan Zongliang
> >  >
> >
>
>

Specifying attribute slot for storing/reading statistics

2019-09-05 Thread Esteban Zimanyi

Dear all

We are developing MobilityDB, an open source PostgreSQL/PostGIS extension
that provides temporal and spatio-temporal types. The source code, manuals,
and related publications are available at the address
https://github.com/ULB-CoDE-WIT/MobilityDB/


In MobilityDB temporal types are types derived from PostgreSQL/PostGIS
types to which a time dimension is added. MobilityDB provides the following
temporal types: tbool (temporal boolean), tint (temporal int), tfloat
(temporal float), text (temporal text), tgeompoint (temporal geometric
points) and tgeogpoint (temporal geographic points). For example, we can
define a tfloat and a tgeompoint as follows

SELECT tfloat '[1.5@2000-01-01, 2.5@2000-01-02, 1.5@2000-01-03]';
SELECT tgeompoint '[Point(0 0)@2000-01-01 08:00, Point(1 0)@2000-01-02
08:05, Point(1 1)@2000-01-03 08:10]';

We are developing the analyze/selectivity functions for those types. Our
approach is to use the standard PostgreSQL/PostGIS functions for the value
and the time dimensions where the slots starting from 0 will be used for
the value dimension, and the slots starting from 2 will be used for the
time dimension. For example, for tfloat we use range_typanalyze and related
functions for
* collecting in slots 0 and 1, STATISTIC_KIND_BOUNDS_HISTOGRAM
and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the float ranges of the value
dimension
*  collecting in slots 2 and 3, STATISTIC_KIND_BOUNDS_HISTOGRAM
and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the periods (similar to
tstzranges) of the time dimension

However, we end up copying several PostgreSQL functions to which we only
add an additional parameter stating the slot number from which the specific
statistic kind should be found (either 0 or 2)

bool
get_attstatsslot_mobdb(AttStatsSlot *sslot, HeapTuple statstuple,
int reqkind, Oid reqop, int flags, int startslot)
{
[...]
for (i = startslot; i < STATISTIC_NUM_SLOTS; i++)
{
if ((&stats->stakind1)[i] == reqkind &&
(reqop == InvalidOid || (&stats->staop1)[i] == reqop))
break;
}
[...]
}

double
var_eq_const_mobdb(VariableStatData *vardata, Oid operator, Datum constval,
bool negate, int startslot)
{
[...]
}
Selectivity
scalarineqsel_mobdb(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
VariableStatData *vardata, Datum constval, Oid consttype,
int startslot)
{
[...]
}

static Selectivity
mcv_selectivity_mobdb(VariableStatData *vardata, FmgrInfo *opproc,
Datum constval, Oid atttype, bool varonleft,
double *sumcommonp, int startslot)
{
[...]
}
static double
ineq_histogram_selectivity_mobdb(PlannerInfo *root, VariableStatData *
vardata,
FmgrInfo *opproc, bool isgt, bool iseq, Datum constval,
Oid consttype, int startslot)
{
[...]
}

in addition to copying other functions needed by the above functions since
they are not exported (defined as static)

static bool
get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
Oid sortop, Datum *min, Datum *max)

static bool
get_actual_variable_endpoint(Relation heapRel,
Relation indexRel, ScanDirection indexscandir,
ScanKey scankeys, int16 typLen,
bool typByVal, MemoryContext outercontext,
Datum *endpointDatum)

[...]

Is there a better way to do this ?

Is there any chance that the API for accessing the typanalyze and
selectivity functions will be enhanced in a future release ?

Regards

Esteban

-- 

Prof. Esteban Zimanyi
Department of Computer & Decision Engineering  (CoDE) CP 165/15
Universite Libre de Bruxelles
Avenue F. D. Roosevelt 50
B-1050 Brussels, Belgium
fax: + 32.2.650.47.13
tel: + 32.2.650.31.85
e-mail: ezima...@ulb.ac.be
Internet: http://code.ulb.ac.be/

Re: enhance SPI to support EXECUTE commands

2019-09-05 Thread Pavel Stehule

čt 5. 9. 2019 v 10:25 odesílatel Quan Zongliang <
zongliang.q...@postgresdata.com> napsal:

> On 2019/9/5 15:09, Pavel Stehule wrote:
> >
> >
> > čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang
> >  > > napsal:
> >
> > Dear hackers,
> >
> > I found that such a statement would get 0 in PL/pgSQL.
> >
> > PREPARE smt_del(int) AS DELETE FROM t1;
> > EXECUTE 'EXECUTE smt_del(100)';
> > GET DIAGNOSTICS j = ROW_COUNT;
> >
> > In fact, this is a problem with SPI, it does not support getting
> result
> > of the EXECUTE command. I made a little enhancement. Support for the
> > number of rows processed when executing INSERT/UPDATE/DELETE
> statements
> > dynamically.
> >
> >
> > Is there some use case for support this feature?
> >
> A user deletes the data in PL/pgSQL using the above method, hoping to do
> more processing according to the number of rows affected, and found that
> each time will get 0.
>
> Sample code:
> PREPARE smt_del(int) AS DELETE FROM t1 WHERE c=$1;
> EXECUTE 'EXECUTE smt_del(100)';
> GET DIAGNOSTICS j = ROW_COUNT;
>

This has not sense in plpgsql. Why you use PREPARE statement explicitly?


> IF j=1 THEN
>do something
> ELSIF j=0 THEN
>do something
>
> Here j is always equal to 0.
>



>
> Regards
>
> > Regards
> >
> > Pavel
> >
> >
> > Regards,
> > Quan Zongliang
> >
>
>

Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2019-09-05 Thread Amit Langote

On Wed, Sep 4, 2019 at 8:53 AM Tom Lane  wrote:
>
> Amit Langote  writes:
> > [ v2-0001-Use-root-parent-s-permissions-when-read-child-tab.patch ]
>
> I took a quick look through this.  I have some cosmetic thoughts and
> also a couple of substantive concerns:

Thanks a lot for reviewing this.

> * As a rule, patches that add fields at the end of a struct are wrong.
> There is almost always some more-appropriate place to put the field
> based on its semantics.  We don't want our struct layouts to be historical
> annals; they should reflect a coherent design.  In this case I'd be
> inclined to put the new field next to the regular relid field.  It
> should have a name that's less completely unrelated to relid, too.

I've renamed the field to inh_root_relid and placed it next to relid.

> * It might make more sense to define the new field as "top parent's relid,
> or self if no parent", rather than "... or zero if no parent".  Then you
> don't need if-tests like this:
>
> +if (rel->inh_root_parent > 0)
> +rte = 
> planner_rt_fetch(rel->inh_root_parent,
> +   root);
> +else
> +rte = planner_rt_fetch(rel->relid, root);

Agreed, done this way.

> * The business about reverse mapping Vars seems quite inefficient, but
> what's much worse is that it only accounts for one level of parent.
> I'm pretty certain this will give the wrong answer for multiple levels
> of partitioning, if the column numbers aren't all the same.

Indeed.  We need to be checking the root parent's permissions, not the
immediate parent's which might be a child itself.

> * To fix the above, you probably need to map back one inheritance level
> at a time, which suggests that you could just use the AppendRelInfo
> parent-rel info and not need any addition to RelOptInfo at all.  This
> makes the efficiency issue even worse, though.  I don't immediately have a
> great idea about doing better.  Making it faster might require adding more
> info to AppendRelInfos, and I'm not quite sure if it's worth the tradeoff.

Hmm, yes.  If AppendRelInfos had contained a reverse translation list
that maps Vars of a given child to the root parent's, this patch would
end up being much simpler and not add too much cost to the selectivity
code.  However building such a map would not be free and the number of
places where it's useful would be significantly smaller where the
existing parent-to-child translation list is used.

Anyway, I've fixed the above-mentioned oversights in the current code for now.

> * I'd be inclined to use an actual test-and-elog not just an Assert
> for the no-mapping-found case.  For one reason, some compilers are
> going to complain about a set-but-not-used variable in non-assert
> builds.  More importantly, I'm not very convinced that it's impossible
> to hit the no-mapping case.  The original proposal was to fall back
> to current behavior (test the child-table permissions) if we couldn't
> match the var to the top parent, and I think that that is still a
> sane proposal.

OK, I've removed the Assert.  For child Vars that can't be translated
to root parent's, permissions are checked with the child relation,
like before.


> As for how to test, it doesn't seem like it should be that hard to devise
> a situation where you'll get different plan shapes depending on whether
> the planner has an accurate or default selectivity estimate.

I managed to find a test case by trial-and-error, but it may be more
convoluted than it has to be.

-- On HEAD
create table permtest_parent (a int, b text, c text) partition by list (a);
create table permtest_child (b text, a int, c text) partition by list (b);
create table permtest_grandchild (c text, b text, a int);
alter table permtest_child attach partition permtest_grandchild for
values in ('a');
alter table permtest_parent attach partition permtest_child for values in (1);
insert into permtest_parent select 1, 'a', i || 'x' || i * 10 from
generate_series(1, 1000) i;
analyze permtest_parent;
explain (costs off) select * from permtest_parent p1 inner join
permtest_parent p2 on p1.a = p2.a and p1.c like '4x5%';
   QUERY PLAN
-
 Nested Loop  (cost=0.00..47.00 rows=1000 width=28)
   Join Filter: (p1.a = p2.a)
   ->  Seq Scan on permtest_grandchild p1  (cost=0.00..18.50 rows=1 width=14)
 Filter: (c ~~ '4x5%'::text)
   ->  Seq Scan on permtest_grandchild p2  (cost=0.00..16.00 rows=1000 width=14)
(5 rows)

create role regress_no_child_access;
revoke all on permtest_grandchild from regress_no_child_access;
grant all on permtest_parent to regress_no_child_access;
set session authorization regress_no_child_access;
explain (costs off) select * from permtest_parent p1 inner join
permtest_parent p2 on p1.a = p2.a

Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2019-09-05 Thread Amit Langote

On Thu, Sep 5, 2019 at 6:33 PM Dilip Kumar  wrote:
>   /*
> + * For inheritance child relations, we also need to remember
> + * the root parent.
> + */
> + if (parent->rtekind == RTE_RELATION)
> + rel->inh_root_relid = parent->inh_root_relid > 0 ?
> + parent->inh_root_relid :
> + parent->relid;
> + else
> + /* Child relation of flattened UNION ALL subquery. */
> + rel->inh_root_relid = relid;
>
> With the current changes, parent->inh_root_relid will always be > 0 so
> (parent->inh_root_relid > 0) condition doesn't make sence. Right?

Oops, you're right.  It should be:

if (parent->rtekind == RTE_RELATION)
rel->inh_root_relid = parent->inh_root_relid;
else
rel->inh_root_relid = relid;

Updated patch attached.

Thanks,
Amit


v4-0001-Use-root-parent-s-permissions-when-reading-child-.patch
Description: Binary data

Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2019-09-05 Thread Dilip Kumar

On Thu, Sep 5, 2019 at 3:26 PM Amit Langote  wrote:
>
> On Thu, Sep 5, 2019 at 6:33 PM Dilip Kumar  wrote:
> >   /*
> > + * For inheritance child relations, we also need to remember
> > + * the root parent.
> > + */
> > + if (parent->rtekind == RTE_RELATION)
> > + rel->inh_root_relid = parent->inh_root_relid > 0 ?
> > + parent->inh_root_relid :
> > + parent->relid;
> > + else
> > + /* Child relation of flattened UNION ALL subquery. */
> > + rel->inh_root_relid = relid;
> >
> > With the current changes, parent->inh_root_relid will always be > 0 so
> > (parent->inh_root_relid > 0) condition doesn't make sence. Right?
>
> Oops, you're right.  It should be:
>
> if (parent->rtekind == RTE_RELATION)
> rel->inh_root_relid = parent->inh_root_relid;
> else
> rel->inh_root_relid = relid;
>
Right!

> Updated patch attached.
>

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: BUG #15977: Inconsistent behavior in chained transactions

2019-09-05 Thread Peter Eisentraut

On 2019-09-04 16:49, fn ln wrote:
> I made another patch for suggested behavior (COMMIT/ROLLBACK AND CHAIN
> now gives us an error when used in an implicit block).

I'm content with this patch.  Better disable questionable cases now and
maybe re-enable them later if someone wants to make a case for it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Plug-in common/logging.h with vacuumlo and oid2name

2019-09-05 Thread Peter Eisentraut

On 2019-09-05 01:59, Michael Paquier wrote:
> On Wed, Sep 04, 2019 at 02:37:12PM +0200, Peter Eisentraut wrote:
 Do we need set_pglocale_pgservice() calls here if we're not doing NLS?
 Does the logging stuff require it?  I'm not sure.
>>>
>>> The logging part does not require it, but this can be used for
>>> PGSYSCONFDIR, no?
>>
>> How does PGSYSCONFDIR come into play here?
> 
> There is an argument to allow libpq to find out a service file for
> a connection from the executable path.  Note that oid2name can use a
> connection string for connection, but not vacuumlo, so I somewhat
> missed that.

Oh I see what's going on.  PGSYSCONFDIR is used by libpq, and
set_pglocale_pgservice() does some path mangling on PGSYSCONFDIR if set
for Windows.  Shouldn't libpq do that itself?  Worth looking into but
probably unrelated to your patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Unexpected "shared memory block is still in use"

2019-09-05 Thread Peter Eisentraut

On 2019-09-04 16:59, Tom Lane wrote:
>> A related point, perhaps we should change the key printed into
>> postmaster.pid to be in hexadecimal format ("0x08x") so that it matches
>> what ipcs prints.
> Hmm, that depends on whose ipcs you use :-(.  A quick survey
> of my machines says it's
> 
> key shmid
> 
> Linux:  hex decimal
> FreeBSD:decimal decimal
> NetBSD: decimal decimal
> OpenBSD:decimal decimal
> macOS:  hex decimal
> HPUX:   hex (not printed)
> 
> There's certainly room to argue that hex+decimal is most popular,
> but I'm not sure that that outweighs possible compatibility issues
> from changing postmaster.pid contents.  (Admittedly, it's not real
> clear that anything would be paying attention to the shmem key,
> so maybe there's no compatibility issue.)

Let's just leave it decimal then.  At least then it's easier to compare
it to ls -i output.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-05 Thread Alvaro Herrera

Thank for rebasing.

I didn't like 0001 very much.

* It seems now would be the time to stop pretending we're using a file
called recovery.conf; I know we still support older server versions that
use that file, but it sounds like we should take the opportunity to
rename the function to be less misleading once those versions vanish out
of existance.

* disconnect_and_exit seems a bit out of place compared to the other
parts of this new module.  I think you only put it there so that the
'conn' can be a global, and that you can stop passing 'conn' as a
variable to GenerateRecoveryConf.  It seems more modular to me to keep
it as a separate variable in each program and have it passed down to the
routine that writes the file.

* From modularity also seems better to me to avoid a global variable
'recoveryconfcontents' and instead return the string from
GenerateRecoveryConf to pass as a param to WriteRecoveryConf.
(In fact, I wonder why the current structure is like it is, namely to
have ReceiveAndUnpackTarFile write the file; why wouldn't be its caller
be responsible for writing it?)

I wonder about putting this new file in src/fe_utils instead of keeping
it in pg_basebackup and symlinking to pg_rewind.  Maybe if we make it a
true module (recovery_config_gen.c) it makes more sense there.

0002 seems okay as far as it goes.


0003:

I still don't understand why we need a command-line option to do this.
Why should it not be the default behavior?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Plug-in common/logging.h with vacuumlo and oid2name

2019-09-05 Thread Alvaro Herrera

Patch looks good to me, please push.

Generally speaking I find the 'progname' handling a bit silly (since we
have it both as a variable in each program and also in logging.c
separately), but that's not the fault of this patch, and this patch
doesn't make it worse.  That said, I think some other messages in
vacuumlo can be made pg_log_somelevel(), based on occurrences of
'progname'.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: unexpected rowlock mode when trigger is on the table

2019-09-05 Thread Alvaro Herrera

On 2019-Sep-05, Tomáš Záluský wrote:

> Thanks for response.
> 
> > I think there should be no overlap (PK is column "id", not modified)
> 
> The update command sets the detail_id column which has unique constraint.

Oh, I see, yeah that explains it.

> What is unclear to me, why FOR NO KEY UPDATE is chosen when there is no 
> trigger.
> Perhaps the execution path to ExecUpdateLockMode is somehow different?

heap_update on its own uses a slightly different method to determine
which columns are modified -- see HeapDetermineModifiedColumns.  In this
case, since the old value is NULL and the updated value is NULL, that
function decides that the column has not changed and thus it doesn't
need the stronger lock.  I bet it would work differently if you had a
different detail_id originally, or if you set it to a different value
afterwards.

> And if FOR NO KEY UPDATE is correct, how to achieve it also with trigger?

Not sure that's feasible, short of patching the Pg source.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SegFault on 9.6.14

2019-09-05 Thread Amit Kapila

On Mon, Sep 2, 2019 at 4:51 PM Amit Kapila  wrote:
>
> On Fri, Aug 9, 2019 at 6:29 PM Robert Haas  wrote:
> >
> >
> > But beyond that, the issue here is that the Limit node is shutting
> > down the Gather node too early, and the right fix must be to stop
> > doing that, not to change the definition of what it means to shut down
> > a node, as this patch does.  So maybe a possible approach here - which
> > I think is more or less what Tom is proposing - is:
> >
> > 1. Remove the code from ExecLimit() that calls ExecShutdownNode().
> >
>
> Attached patch does that.  I have also added one test as a separate
> patch so that later if we introduce shutting down resources in Limit
> node, we don't break anything.  As of now, I have kept it separate for
> easy verification, but if we decide to go with this approach and test
> appears fine, we can merge it along with the fix.
>

I have merged the code change and test case patch as I felt that it is
good to cover this case.  I have slightly changed the test case to
make its output predictable (made the inner scan ordered so that the
query always produces the same result).  One more thing I am not able
to come up with some predictable test case for 9.6 branches as it
doesn't support Gather Merge which is required for this particular
test to always produce predictable output.  There could be some better
way to write this test, so any input in that regards or otherwise is
welcome.  So, if we commit this patch the containing test case will be
for branches HEAD~10, but the code will be for HEAD~9.6.

> > 2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode()
> > gets called at the very end of execution, at least when execute_once
> > is set, before exiting parallel mode.
> >
>
> I am not sure if I completely understand this point.  AFAICS, the
> ExecuteShutdownNode is called whenever we are done getting the tuples.
> One place where it is not there in that function is when we assume
> destination is closed, basically below code:
> ExecutePlan()
> {
> ..
> if (!dest->receiveSlot(slot, dest))
> break;
> ..
> }
>
> Do you expect this case to be also dealt or you have something else in
> mind?
>

It still appears problematic, but I couldn't come up with a test case
to reproduce the problem.  I'll try some more on this, but I think
this anyway can be done separately once we have a test to show the
problem.

>  The other possibility could be that we move the shutdown of the
> node at the end of the function when we exit parallel mode but doing
> that lead to some regression failure on my machine.  I will
> investigate the same.
>

This was failing because use_parallel_mode flag in function
ExecutePlan() won't be set for workers and hence they won't get a
chance to accumulate its stats.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

0001-Forbid-Limit-node-to-shutdown-resources.patch
Description: Binary data

Re: Specifying attribute slot for storing/reading statistics

2019-09-05 Thread Tom Lane

Esteban Zimanyi  writes:
> We are developing the analyze/selectivity functions for those types. Our
> approach is to use the standard PostgreSQL/PostGIS functions for the value
> and the time dimensions where the slots starting from 0 will be used for
> the value dimension, and the slots starting from 2 will be used for the
> time dimension. For example, for tfloat we use range_typanalyze and related
> functions for
> * collecting in slots 0 and 1, STATISTIC_KIND_BOUNDS_HISTOGRAM
> and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the float ranges of the value
> dimension
> *  collecting in slots 2 and 3, STATISTIC_KIND_BOUNDS_HISTOGRAM
> and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the periods (similar to
> tstzranges) of the time dimension

IMO this is fundamentally wrong, or at least contrary to the design
of pg_statistic.  It is not supposed to matter which "slot" a given
statistic type is actually stored in; rather, readers are supposed to
search for the desired statistic type using the stakindN, staopN and
(if relevant) stacollN fields.

In this case it seems like it'd be reasonable to rely on the staop
fields to distinguish between the value and time dimensions, since
(IIUC) they're of different types.

Another idea is to invent your own slot kind identifiers instead of
using built-in ones.  I'm not sure that there's any point in using
the built-in kind values, since (a) none of the core selectivity code
is likely to get called on your data and (b) even if it were, it'd
likely do the wrong thing.  See the comments in pg_statistic.h,
starting about line 150, about assignment of non-built-in slot kinds.

> Is there any chance that the API for accessing the typanalyze and
> selectivity functions will be enhanced in a future release ?

Well, maybe you could convince us that the stakind/staop scheme for
identifying statistics is inadequate so we need another identification
field (corresponding to a component of the column being described,
perhaps).  I'd be strongly against assigning any semantic meaning
to the slot numbers, though.  That's likely to break code that's
written according to existing conventions.

regards, tom lane

changing wal segsize with pg_resetwal

2019-09-05 Thread Alvaro Herrera

On 2019-Aug-28, Pavel Demidov wrote:

> I hear that not recommended to set pg_resetwal with --wal-segsize for wal
> increasing. Are any more detailed information exists about it? What an
> effects could be? Does it possible change it due full offline?

The manual contains a warning about it:

   While pg_resetwal will set the WAL starting address
   beyond the latest existing WAL segment file, some segment size changes
   can cause previous WAL file names to be reused.  It is recommended to
   use -l together with this option to manually set the
   WAL starting address if WAL file name overlap will cause problems with
   your archiving strategy.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Add support for ON UPDATE/DELETE actions on ALTER CONSTRAINT

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

Matheus, any replies to this?   I've marked the patch as Waiting on
Author for now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH][PROPOSAL] Add enum releation option type

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

After looking closer once again, I don't like this patch.

I think the FOOBAR_ENUM_DEF defines serve no purpose, other than
source-code placement next to the enum value definitions.  I think for
example check_option, living in reloptions.c, should look like this:

{
{
"check_option",
"View has WITH CHECK OPTION defined (local or 
cascaded).",
RELOPT_KIND_VIEW,
AccessExclusiveLock
},
{
{ "local", VIEW_OPTION_CHECK_OPTION_LOCAL },
{ "cascaded", VIEW_OPTION_CHECK_OPTION_CASCADED },
{ NULL }
},
"Valid values are \"local\" and \"cascaded\"."
},

Note the relopt_enum is pretty much the same you have, except we also
have a const char *valid_values_errdetail; and the ugly #define no
longer exists but instead we put it in enumRelOpts.

rel.h ends up like this:

/*
 * Values for ViewOptions->check_option.
 */
typedef enum
{
VIEWOPTIONS_CHECK_OPTION_NOTSET,
VIEWOPTIONS_CHECK_OPTION_LOCAL,
VIEWOPTIONS_CHECK_OPTION_CASCADED
} ViewOpts_CheckOptionValues;

/*
 * ViewOptions
 *  Contents of rd_options for views
 */
typedef struct ViewOptions
{
int32   vl_len_;/* varlena header (do not touch 
directly!) */
boolsecurity_barrier;
ViewOpts_CheckOptionValues check_option;
} ViewOptions;

I'm marking this Waiting on Author.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2019-09-05 Thread Tom Lane

Amit Langote  writes:
> On Thu, Sep 5, 2019 at 6:18 PM Dilip Kumar  wrote:
>> Instead of falling back to the child, isn't it make more sense to
>> check the permissions on the parent upto which we could translate (it
>> may not be the root parent)?

> Hmm, in that case, the parent up to which we might be able to
> translate would still be a child and might have different permissions
> than the table mentioned in the query (what's being called "root" in
> this context).  Would it be worth further complicating this code if
> that's the case?

I think that checking intermediate levels would be an actively bad idea
--- it would make the behavior too confusing.  We should preferably check
the table actually named in the query, or if we can't then check the
table we are using the stats of; nothing else.

Another idea that we should consider, though, is to allow the access if
*either* of those two tables allows it.  The main reason that that's
attractive is that it's certain not to break any case that works today.
But also, it would mean that in many practical cases we'd not have to
try to map Vars back up to the original parent, thus avoiding the
performance penalty.  (That is, check the target table as we do now,
and only if we find it lacks permissions do we start mapping back.)

regards, tom lane

Re: Proposal: roll pg_stat_statements into core

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

On 2019-Sep-02, Euler Taveira wrote:

> At least if pg_stat_statements
> was in core you could load it by default and have a GUC to turn it
> on/off without restarting the server (that was Magnus proposal and
> Andres agreed). I support this idea.

Actually this is possible without moving to part of core, too -- change
things so that putting it in shared_preload_libraries reserves the shmem
area but does not enable collection, then have a separate GUC that
enables/disables collection.  A bit like archive_mode=on,
archive_command=/bin/true, which allows to change archive_command
without restarting.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: accounting for memory used for BufFile during hash joins

2019-09-05 Thread Melanie Plageman

On Tue, Sep 3, 2019 at 9:36 AM Alvaro Herrera 
wrote:

> On 2019-Jul-11, Tomas Vondra wrote:
>
> > On Wed, Jul 10, 2019 at 04:51:02PM -0700, Melanie Plageman wrote:
>
> > > I think implementing support for parallel hashjoin or explicitly
> > > disabling it would be the bare minimum for this patch, which is why I
> > > made 2 its own item. I've marked it as returned to author for this
> > > reason.
> >
> > OK. I'm a bit confused / unsure what exactly our solution to the various
> > hashjoin issues is. I have not been paying attention to all the various
> > threads, but I thought we kinda pivoted to the BNL approach, no? I'm not
> > against pushing this patch (the slicing one) forward and then maybe add
> > BNL on top.
>
> So what's a good way forward for this patch?  Stalling forever like a
> glacier is not an option; it'll probably end up melting.  There's a lot
> of discussion on this thread which I haven't read, and it's not
> immediately clear to me whether this patch should just be thrown away in
> favor of something completely different, or it can be considered a first
> step in a long road.
>

So, I have been working on the fallback to block nested loop join
patch--latest non-parallel version posted here [1]. I am currently
still working on the parallel version but don't have a complete
working patch yet. I am hoping to finish it and solicit feedback in
the next couple weeks.

My patch chunks up a bad inner side batch and processes it a chunk
at a time. I haven't spent too much time yet thinking about Hubert's
suggestion proposed upthread. In the past I had asked Tomas about the
idea of splitting up only "bad batches" to avoid having other batches
which are very small. It seemed like this introduced additional
complexity for future spilled tuples finding a home, however, I had
not considered the hash function chain method Hubert is mentioning.

Even if we implemented additional strategies like the one Hubert is
suggesting, I still think that both the slicing patch originally
proposed in this thread as well as a BNLJ fallback option could all
work together, as I believe they solve slightly different problems.

If Tomas or someone else has time to pick up and modify BufFile
accounting patch, committing that still seems like the nest logical
step.

I will work on getting a complete (parallel-aware) BNLJ patch posted
soon.

[1]
https://www.postgresql.org/message-id/CAAKRu_ZsRU%2BnszShs3AGVorx%3De%2B2jYkL7X%3DjiNO6%2Bqbho7vRpw%40mail.gmail.com

-- 
Melanie Plageman

Re: range_agg

2019-09-05 Thread Jeff Davis

On Sun, 2019-09-01 at 06:26 -0700, Paul A Jungwirth wrote:
> @+ and @- and @* (I dunno why but I kind of like it. We already have
> @> and <@.)

I think I like this proposal best; it reminds me of perl. Though some
might say that's an argument against it.

Regards,
Jeff Davis

Re: [PATCH] Connection time for \conninfo

2019-09-05 Thread Rodrigo Ramírez Norambuena

On Tue, Sep 3, 2019 at 11:06 PM Michael Paquier  wrote:
>
> You can do basically the same thing by looking at
> pg_stat_activity.backend_start and compare it with the clock
> timestamp.  Doing it at SQL level the way you want has also the
> advantage to offer you a modular format output.

Hi Michael,

Thanks you for paying attention, I appreciate this.

What are you say seams little trick to what I propose in this patch
because you'll need filter what is your connection, and if the
connection is through  the connection pooler could be more.

The main goal this is only getting information from the client side
(psql) as frontend.

Regards!
-- 
Rodrigo Ramírez Norambuena
http://www.rodrigoramirez.com/

Re: [PATCH] Connection time for \conninfo

2019-09-05 Thread Rodrigo Ramírez Norambuena

On Wed, Sep 4, 2019 at 11:04 AM Alvaro Herrera  wrote:
>
> The only thing that seems wrong about this proposal is that the time
> format is a bit too verbose.  I would have it do "N days 12:23:34".
>

Hi Alvaro!,

I attach a second version with this change.

-- 
Rodrigo Ramírez Norambuena
http://www.rodrigoramirez.com/
From 3526b58e836f92da6c5da6b105c6748948b99365 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Rodrigo=20Ram=C3=ADrez=20Norambuena?= 
Date: Tue, 3 Sep 2019 17:15:39 -0400
Subject: [PATCH] Add to  the \conninfo for time of current connection:

Add extra information for \conninfo psql command to show the current
time lapsed of the connection.

The time will be reestablished again if the connection is reset and
reconnected successfully with the backend.

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index c0a7a5566e..8b0e87211d 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -263,6 +263,42 @@ HandleSlashCmds(PsqlScanState scan_state,
 }
 
 
+static void
+printTimeConnected(void)
+{
+
+	int days = 0, hours = 0 , minutes = 0 , seconds = 0;
+	int secs = time(NULL) - pset.connected;
+
+	#define SECOND (1)
+	#define MINUTE (SECOND * 60)
+	#define HOUR (MINUTE * 60)
+	#define DAY (HOUR * 24)
+
+	if (secs > DAY) {
+		days = (secs / DAY);
+		secs -= (days * DAY);
+	}
+	if (secs > HOUR) {
+		hours = (secs / HOUR);
+		secs -= (hours * HOUR);
+	}
+	if (secs > MINUTE) {
+		minutes = (secs / MINUTE);
+		secs -= (minutes * MINUTE);
+	}
+	if (secs > 0)
+		seconds = secs;
+
+	if (days > 0)
+		printf(ngettext("The time connection is %d day %02d:%02d:%02d\n",
+"The time connection is %d days %02d:%02d:%02d\n", days),
+days, hours, minutes, seconds);
+	else
+		printf(_("The time connection is %02d:%02d:%02d\n"),
+			hours, minutes, seconds);
+}
+
 /*
  * Subroutine to actually try to execute a backslash command.
  *
@@ -624,6 +660,7 @@ exec_command_conninfo(PsqlScanState scan_state, bool active_branch)
 			}
 			printSSLInfo();
 			printGSSInfo();
+			printTimeConnected();
 		}
 	}
 
@@ -3318,6 +3355,7 @@ SyncVariables(void)
 	pset.encoding = PQclientEncoding(pset.db);
 	pset.popt.topt.encoding = pset.encoding;
 	pset.sversion = PQserverVersion(pset.db);
+	pset.connected =  time(NULL);
 
 	SetVariable(pset.vars, "DBNAME", PQdb(pset.db));
 	SetVariable(pset.vars, "USER", PQuser(pset.db));
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 44a782478d..c9d4eaf0e7 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -408,7 +408,10 @@ CheckConnection(void)
 			UnsyncVariables();
 		}
 		else
+		{
+			pset.connected = time(NULL);
 			fprintf(stderr, _("Succeeded.\n"));
+		}
 	}
 
 	return OK;
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 5be5091f0e..ed3ab368f5 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -141,6 +141,7 @@ typedef struct _psqlSettings
 	const char *prompt3;
 	PGVerbosity verbosity;		/* current error verbosity level */
 	PGContextVisibility show_context;	/* current context display level */
+	int connected; /* unixtime for connected init time */
 } PsqlSettings;
 
 extern PsqlSettings pset;
-- 
2.21.0

Re: range_agg

2019-09-05 Thread Paul A Jungwirth

On Thu, Sep 5, 2019 at 10:15 AM Jeff Davis  wrote:
>
> On Sun, 2019-09-01 at 06:26 -0700, Paul A Jungwirth wrote:
> > @+ and @- and @* (I dunno why but I kind of like it. We already have
> > @> and <@.)
>
> I think I like this proposal best; it reminds me of perl. Though some
> might say that's an argument against it.

Thanks Jeff, it's my favorite too. :-) Strangely it feels the hardest
to justify. Right now I have + and - and * implemented but I'll change
them to @+ and @- and @* so that I can support `range R range =
multirange` too.

Btw is there any reason to send a "preview" patch with my current
progress, since we're starting a new commit fest? Here is what I have
left to do:

- Change those three operators.
- Write range_intersect_agg. (range_agg is done but needs some tests
before I commit it.)
- Write documentation.
- Add multiranges to resolve_generic_type, and figure out how to test
that (see the other thread about two latent range-related bugs there).
- Rebase on current master. (It should be just a few weeks behind right now.)
- Run pgindent to make sure I'm conforming to whitespace/style guidelines.
- Split it up into a few separate patch files.

Right now I'm planning to do all that before sending a patch. I'm
happy to send something something in-progress too, but I don't want to
waste any reviewers' time. If folks want an early peak though let me
know. (You can also find my messy progress at
https://github.com/pjungwir/postgresql/tree/multirange)

Also here are some other items that won't be in my next patch, but
should probably be done (maybe by someone else but I'm happy to figure
it out too) before this is really committed:

- typanalyze
- selectivity
- gist support
- spgist support

If anyone would like to help with those, let me know. :-)

Yours,
Paul

Re: basebackup.c's sendFile() ignores read errors

2019-09-05 Thread Robert Haas

On Fri, Aug 30, 2019 at 7:05 AM Jeevan Ladhe
 wrote:
>> Fixed both comments in the attached patch.
>
> Thanks, the patch looks good to me.

Here's a version of the patch with a further change to the wording of
the comment.  I hope this is clearer.

I think this needs to be back-patched all the way back to 9.4, and it
doesn't seem to apply cleanly before v11.  Any chance you could
prepare a version for the older branches?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

fread-error-check-v3.patch
Description: Binary data

Re: [PATCH] Connection time for \conninfo

2019-09-05 Thread Tom Lane

=?UTF-8?Q?Rodrigo_Ram=C3=ADrez_Norambuena?=  writes:
> On Tue, Sep 3, 2019 at 11:06 PM Michael Paquier  wrote:
>> You can do basically the same thing by looking at
>> pg_stat_activity.backend_start and compare it with the clock
>> timestamp.  Doing it at SQL level the way you want has also the
>> advantage to offer you a modular format output.

> What are you say seams little trick to what I propose in this patch
> because you'll need filter what is your connection, and if the
> connection is through  the connection pooler could be more.
> The main goal this is only getting information from the client side
> (psql) as frontend.

A couple of thoughts on this ---

* Michael's right that it's *possible* to get the session start time in
SQL, but having to find one's session's entry in pg_stat_activity is
pretty awkward.  I wonder if we should invent pg_backend_start_time()
analogous to pg_backend_pid().  Most of the other columns in
pg_stat_activity either already have similar functions (e.g. CURRENT_USER)
or don't seem especially useful to be able to retrieve for one's own
session, but this one maybe is.  That's somewhat orthogonal to the
proposed patch but we should probably be considering it in the same
discussion.

* It's not quite clear what the use-case is for displaying this in
\conninfo.  That's easy for humans to look at, but I don't really
understand why a human would care, and it's just about useless for
any programmatic application.

* I wonder why the proposal is to display duration of connection rather
than start time.  I don't especially like the idea that \conninfo
would change every time you look at it, so I'd tend to lean to the
latter, but maybe there are other arguments for one or the other.

* The patch appears to be tracking time measured at the client side,
which is going to be just enough different from the server's idea of
the session start time to be confusing/annoying.  Do we really want
to do it that way?

* There are other definitional questions like what happens when you
reconnect (either due to intentional \connect or momentary connection
loss), or what should happen when you have no connection at all thanks
to a non-recoverable connection loss.  I suppose the patch's code
provides some answers but I'm not sure that it's consistent or well
thought out.  (Commit aef362385 provides some precedent you should look
at in this regard, plus I think this patch has to be rebased over it
anyway.)

BTW, the result type of time(2) is not "int", so this is just wrong:

+   int connected; /* unixtime for connected init time */

This field name is pretty poorly chosen as well; it sounds like
it's a boolean "am I connected" state, and there's certainly no
hint that it's a time value.

regards, tom lane

Re: tableam vs. TOAST

2019-09-05 Thread Robert Haas

On Thu, Sep 5, 2019 at 10:52 AM Alvaro Herrera  wrote:
> I agree, and can we move forward with this 0001?  The idea here is to
> change no code (as also suggested by Tom elsewhere), and it's the
> largest patch in this series by a mile.  I checked --color-moved=zebra
> and I think the patch looks fine, and also it compiles fine.  I ran
> src/tools/pginclude/headerscheck on it and found no complaints.
>
> So here's a rebased version, where the DETOAST_H whitespace has been
> removed.  No other changes from your original.  Will you please push
> soon?

Done, thanks. Here's the rest again with the additional rename added
to 0003 (formerly 0004). I think it's probably OK to go ahead with
that stuff, too, but I'll wait a bit to see if anyone wants to raise
more objections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v5-0003-Rename-some-toasting-functions-based-on-whether-t.patch
Description: Binary data


v5-0002-Allow-TOAST-tables-to-be-implemented-using-table-.patch
Description: Binary data


v5-0001-Create-an-API-for-inserting-and-deleting-rows-in-.patch
Description: Binary data

Re: range_agg

2019-09-05 Thread Jeff Davis

On Thu, 2019-09-05 at 10:45 -0700, Paul A Jungwirth wrote:
> Right now I'm planning to do all that before sending a patch. I'm
> happy to send something something in-progress too, but I don't want
> to
> waste any reviewers' time. If folks want an early peak though let me
> know. (You can also find my messy progress at
> https://github.com/pjungwir/postgresql/tree/multirange)

Sounds good. The rule I use is: "will the feedback I get be helpful, or
just tell me about obvious problems I already know about".

Regards,
Jeff Davis

AtEOXact_Snapshot timing

2019-09-05 Thread Robert Haas

Way back in 2011, commit 57eb009092684e6e1788dd0dae641ccee1668b10
moved AbortTransaction's AtEOXact_Snapshot call to CleanupTransaction
to fix a problem when a ROLLBACK statement was prepared at the
protocol level and executed in a transaction with REPEATABLE READ or
higher isolation. RevalidateCachedQuery would attempt to obtain a
snapshot and end up failing.  At the time, it was judged that
plancache.c was behaving correctly and this logic was rejiggered to
make that coding pattern safe. However, commit
ac63dca607e8e22247defbc8fe03b6baa3628c42 subsequently taught
RevalidateCachedQuery not to obtain a snapshot for such commands after
all while fixing an unrelated bug, and there now seems to be no case
in which we obtain a snapshot in an aborted transaction.

I'd like to propose that we upgrade that practice to a formal rule.
We've already taken some steps in this direction; notably, commit
42c80c696e9c8323841180029cc62741c21bd356 added an assertion to the
effect that we never perform catcache lookups outside of a valid,
non-aborted transaction. However, right now, if you made the mistake
of trying to access the database through some means other than a
catcache lookup in an aborted transaction, it might appear to work.
Actually, it would be terribly unsafe, because (1) you might've failed
after inserting a heap tuple and before inserting all the
corresponding index tuples and (2) any random subset of the tuples
inserted by prior commands in your transaction might have been pruned
by other backends after you removed your XID from the ProcArray, while
others would remain visible.  In short, such a snapshot is not really
a snapshot at all.

The best way that I've been able to come up with to enforce this rule
after a little bit of thought is to add Assert(IsTransactionState())
to a bunch of functions in snapmgr.c, most notably
GetTransactionSnapshot and GetCatalogSnapshot. The attached patch does
that.  It also makes the comments in RevalidateCachedQuery more
explicit about this issue, and it moves the AtEOXact_Snapshot call
back to AbortTransaction, on the theory (or hope?) that it's better to
dispose of resources sooner, especially resources that might look
superficially valid but really are not.

You may (or may not) wonder why I'm poking at this apparently-obscure
topic.  The answer is "undo."  Without getting into the gory details,
it's better for undo if as much of the cleanup work as possible
happens at AbortTransaction() time and as little as possible is left
until CleanupTransaction().  That seems like a good idea on general
principle too, though, so I'm proposing this as an independent
cleanup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-Move-AtEOXact_Snapshot-back-to-AbortTransaction.patch
Description: Binary data

Re: tableam vs. TOAST

2019-09-05 Thread Andres Freund

Hi,

On 2019-09-05 13:42:40 -0400, Robert Haas wrote:
> Done, thanks. Here's the rest again with the additional rename added
> to 0003 (formerly 0004). I think it's probably OK to go ahead with
> that stuff, too, but I'll wait a bit to see if anyone wants to raise
> more objections.

Well, I still dislike making the toast chunk size configurable in a
halfhearted manner.

Greetings,

Andres Freund

Re: [HACKERS] CLUSTER command progress monitor

2019-09-05 Thread Robert Haas

On Wed, Sep 4, 2019 at 9:03 PM Michael Paquier  wrote:
> For CLUSTER, the progress starts and ends in cluster_rel().  CLUSTER
> uses its code paths at the beginning, but then things get more
> complicated, particularly with finish_heap_swap() which calls directly
> reindex_table().  6f97457 includes one progress update at point which
> can be a problem per its shared nature in reindex_relation() with
> PROGRESS_CLUSTER_INDEX_REBUILD_COUNT.  This last part is wrong IMO,
> why should cluster reporting take priority in this code path,
> enforcing anything else?

Oops.  Yeah, that's bogus (as are some of the other things you
mention).  I think we're going to have to fix this by passing down
some flags to these functions to tell them what kind of progress
updates to do (or to do none).  Or else pass down a callback function
and a context object, but that seems like it might be overkill.

> On top of those issues, I see some problems with the current state of
> affairs, and I am repeating myself:
> - It is possible that pgstat_progress_update_param() is called for a
> given command for a code path taken by a completely different
> command, and that we have a real risk of showing a status completely
> buggy as the progress phases share the same numbers.
> - We don't consider wisely end and start progress handling for
> cascading calls, leading to a risk where we start command A, but
> for shared code paths where we assume that only command B can run then
> the processing abruptly ends for command A.

Those are just weaknesses of the infrastructure.  Perhaps there is a
better solution, but there's no intrinsic reason that we can't avoid
them by careful coding.

> - Is it actually fine to report information about a command completely
> different than the one provided by the client?  It is for example
> possible to call CLUSTER, but show up to the user progress report
> about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index).  This could
> actually make sense if we are able to handle cascading progress
> reports.

Well, it might be OK to do that if we're clear that this is the index
progress-reporting view and the command is CLUSTER but it happens to
be building an index now so we're showing it here.  But I don't see
how it would work anyway: you can't reported cascading progress
reports in shared memory because you've only got a fixed amount of
space.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: tableam vs. TOAST

2019-09-05 Thread Robert Haas

On Thu, Sep 5, 2019 at 3:10 PM Andres Freund  wrote:
> On 2019-09-05 13:42:40 -0400, Robert Haas wrote:
> > Done, thanks. Here's the rest again with the additional rename added
> > to 0003 (formerly 0004). I think it's probably OK to go ahead with
> > that stuff, too, but I'll wait a bit to see if anyone wants to raise
> > more objections.
>
> Well, I still dislike making the toast chunk size configurable in a
> halfhearted manner.

So, I'd be willing to look into that some more.  But how about if I
commit the next patch in the series first?  I think this comment is
really about the second patch in the series, "Allow TOAST tables to be
implemented using table AMs other than heap," and it's fair to point
out that, since that patch extends table AM, we're somewhat committed
to it once we put it in.  But "Create an API for inserting and
deleting rows in TOAST tables." is just refactoring, and I don't see
what we gain from waiting to commit that part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: tableam vs. TOAST

2019-09-05 Thread Tom Lane

Andres Freund  writes:
> Well, I still dislike making the toast chunk size configurable in a
> halfhearted manner.

It's hard to make it fully configurable without breaking our on-disk
storage, because of the lack of any explicit representation of the chunk
size in TOAST data.  You have to "just know" how big the chunks are
supposed to be.

However, it's reasonable to ask why we should treat it as an AM property,
especially a fixed AM property as this has it.  If somebody does
reimplement toast logic in some other AM, they might well decide it's
worth the storage cost to be more flexible about the chunk size ... but
too bad, this design won't let them do it.

I don't entirely understand why relation_toast_am is a callback
while toast_max_chunk_size isn't, either.  Why would they not both
be callbacks?  That would at least let an AM set a per-relation
max chunk size, if it wanted.

It seems like this design throws away most of the benefit of a fixed
chunk size (mostly, being able to do relevant modulo arithmetic with
shifts and masks rather than full-fledged integer division) without
getting much of anything in return.

regards, tom lane

Re: Index Skip Scan

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

On 2019-Sep-05, Dmitry Dolgov wrote:

> Here is the version in which stepping between the pages works better. It seems
> sufficient to fix the case you've mentioned before, but for that we need to
> propagate keepPrev logic through `_bt_steppage` & `_bt_readnextpage`, and I
> can't say I like this solution. I have an idea that maybe it would be simpler
> to teach the code after index_skip to not do `_bt_next` right after one skip
> happened before. It should immediately elliminate several hacks from index 
> skip
> itself, so I'll try to pursue this idea.

Cool.

I think multiplying two ScanDirections to watch for a negative result is
pretty ugly:

/*
 * If advancing direction is different from index direction, we 
must
 * skip right away, but _bt_skip requires a starting point.
 */
if (direction * indexscan->indexorderdir < 0 &&
!node->iss_FirstTupleEmitted)

Surely there's a better way to code that?

I think "scanstart" needs more documentation, both in the SGML docs as
well as the code comments surrounding it.

Please disregard my earlier comment about FirstTupleEmitted.  I was
thinking that index_skip would itself emit a tuple (ie. call some
"getnext" internally) rather than just repositioning.  There might still
be some more convenient way to represent this, but I have no immediate
advice.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: add a MAC check for TRUNCATE

2019-09-05 Thread Yuli Khodorkovskiy

On Mon, Sep 2, 2019 at 10:58 AM Kohei KaiGai  wrote:
>
> Hello Yuli,

Hello KaiGai,

>
> 2019年7月25日(木) 3:52 Yuli Khodorkovskiy :
> > Since all DAC checks should have corresponding MAC, this patch adds a
> > hook to allow extensions to implement a MAC check on TRUNCATE. I have
> > also implemented this access check in the sepgsql extension.
> >
> > One important thing to note is that refpolicy [1] and Redhat based
> > distributions do not have the SELinux permission for db_table {truncate}
> > implemented.
> >
> How db_table:{delete} permission is different from truncate?
> From the standpoint of data access, TRUNCATE is equivalent to DELETE
> without WHERE, isn't it?

To echo Stephen's reply, since TRUNCATE has a dedicated privilege in
the GRANT system, there should be a MAC based permission as well.
Increased granularity for an integrator to add least privileged policy
is a good idea in my view.

> Of course, there are some differences between them. TRUNCATE takes
> exclusive locks and eliminates underlying data blocks, on the other hands,
> DELETE removes rows under MVCC manner. However, both of them
> eventually removes data from the target table.
>
> I like to recommend to reuse "db_table:{delete}" permission for TRUNCATE.
> How about your opinions?

Now that I think about it, using "db_table { delete }" would be fine,
and that would remove the CIL requirement that I stated earlier. Thank
you for the suggestion. I'll send a v2 patch using the delete
permission.

Thank you,

Yuli


>
> Best regards,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei

Re: tableam vs. TOAST

2019-09-05 Thread Robert Haas

On Thu, Sep 5, 2019 at 3:36 PM Tom Lane  wrote:
> Andres Freund  writes:
> > Well, I still dislike making the toast chunk size configurable in a
> > halfhearted manner.
>
> It's hard to make it fully configurable without breaking our on-disk
> storage, because of the lack of any explicit representation of the chunk
> size in TOAST data.  You have to "just know" how big the chunks are
> supposed to be.

There was a concrete proposal about this from Andres here, down at the
bottom of the email:

http://postgr.es/m/20190802224251.lsxw4o5ebn2ng...@alap3.anarazel.de

Basically, detoasting would tolerate whatever chunk size it finds, and
the slice-fetching logic would get complicated.

> However, it's reasonable to ask why we should treat it as an AM property,
> especially a fixed AM property as this has it.  If somebody does
> reimplement toast logic in some other AM, they might well decide it's
> worth the storage cost to be more flexible about the chunk size ... but
> too bad, this design won't let them do it.

Fair complaint.  The main reason I want to treat it as an AM property
is that TOAST_TUPLE_THRESHOLD is defined in terms of heap-specific
constants, and having other AMs include heap-specific header files
seems like a thing we should try hard to avoid. Once you're indirectly
including htup_details.h in every AM in existence, it's going to be
hard to be sure that you've got no other dependencies on the current
heap AM. But I agree that making it not a fixed value could be useful.
One benefit of it would be that you could just change the value, even
for the current heap, without breaking access to already-toasted data.

> It seems like this design throws away most of the benefit of a fixed
> chunk size (mostly, being able to do relevant modulo arithmetic with
> shifts and masks rather than full-fledged integer division) without
> getting much of anything in return.

I don't think you're really getting that particular benefit, because
TOAST_TUPLE_THRESHOLD and TOAST_TUPLE_TARGET are not going to end up
as powers of two.  But you do get the benefit of working with
constants instead of a value determined at runtime.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: tableam vs. TOAST

2019-09-05 Thread Andres Freund

On 2019-09-05 15:27:28 -0400, Robert Haas wrote:
> On Thu, Sep 5, 2019 at 3:10 PM Andres Freund  wrote:
> > On 2019-09-05 13:42:40 -0400, Robert Haas wrote:
> > > Done, thanks. Here's the rest again with the additional rename added
> > > to 0003 (formerly 0004). I think it's probably OK to go ahead with
> > > that stuff, too, but I'll wait a bit to see if anyone wants to raise
> > > more objections.
> >
> > Well, I still dislike making the toast chunk size configurable in a
> > halfhearted manner.
> 
> So, I'd be willing to look into that some more.  But how about if I
> commit the next patch in the series first?  I think this comment is
> really about the second patch in the series, "Allow TOAST tables to be
> implemented using table AMs other than heap," and it's fair to point
> out that, since that patch extends table AM, we're somewhat committed
> to it once we put it in.  But "Create an API for inserting and
> deleting rows in TOAST tables." is just refactoring, and I don't see
> what we gain from waiting to commit that part.

Yea, makes sense to me.

Re: ERROR: multixact X from before cutoff Y found to be still running

2019-09-05 Thread Bossart, Nathan

On 9/4/19, 9:03 PM, "Thomas Munro"  wrote:
> Both patches prevent mxactLimit from being newer than the oldest
> running multixact.  The v1 patch uses the most aggressive setting
> possible: the oldest running multi; the v2 uses the least aggressive
> of the 'safe' and oldest running multi.  At first glance it seems like
> the second one is better: it only does something different if we're in
> the dangerous scenario you identified, but otherwise it sticks to the
> safe limit, which generates less IO.

Thanks for taking a look!

Right, the v2 patch will effectively ramp-down the freezemin as your
freeze_max_age gets smaller, while the v1 patch will set the effective
freezemin to zero as soon as your multixact age passes the threshold.
I think what is unclear to me is whether this ramp-down behavior is
the intended functionality or we should be doing something similar to
what we do for regular transaction IDs (i.e. force freezemin to zero
right after it hits the "oldest xmin is far in the past" threshold).
The comment above MultiXactMemberFreezeThreshold() explains things
pretty well, but AFAICT it is more geared towards influencing
autovacuum scheduling.  I agree that v2 is safer from the standpoint
that it changes as little as possible, though.

Nathan

PostgreSQL 12 Beta 4

2019-09-05 Thread Jonathan S. Katz

Hi,

PostgreSQL 12 Beta 4 will be released on 2019-09-12. Please make sure
that fixes for bugs and other open items[1] are committed by the end of
the weekend.

Thanks for all of your efforts in getting PostgreSQL 12 ready for
general availability!

Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items



signature.asc
Description: OpenPGP digital signature

Re: Index Skip Scan

2019-09-05 Thread Dmitry Dolgov

> On Mon, Sep 2, 2019 at 3:28 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Wed, Aug 28, 2019 at 9:32 PM Floris Van Nee  
> > wrote:
> >
> > I'm afraid I did manage to find another incorrect query result though
>
> Yes, it's an example of what I was mentioning before, that the current 
> modified
> implementation of `_bt_readpage`  wouldn't work well in case of going between
> pages. So far it seems that the only problem we can have is when previous and
> next items located on a different pages. I've checked how this issue can be
> avoided, I hope I will post a new version relatively soon.

Here is the version in which stepping between the pages works better. It seems
sufficient to fix the case you've mentioned before, but for that we need to
propagate keepPrev logic through `_bt_steppage` & `_bt_readnextpage`, and I
can't say I like this solution. I have an idea that maybe it would be simpler
to teach the code after index_skip to not do `_bt_next` right after one skip
happened before. It should immediately elliminate several hacks from index skip
itself, so I'll try to pursue this idea.

> On Wed, Sep 4, 2019 at 10:45 PM Alvaro Herrera  
> wrote:

Thank you for checking it out!

> Surely it isn't right to add members prefixed with "ioss_" to
> struct IndexScanState.

Yeah, sorry. I've incorporated IndexScan support originally only to show that
it's possible (with some limitations), but after that forgot to clean up. Now
those fields are renamed.

> I'm surprised about this "FirstTupleEmitted" business.  Wouldn't it make
> more sense to implement index_skip() to return the first tuple if the
> scan is just starting?  (I know little about executor, apologies if this
> is a stupid question.)

I'm not entirely sure, which exactly part do you mean? Now the first tuple is
returned by `_bt_first`, how would it help if index_skip will return it?

> It would be good to get more knowledgeable people to review this patch.
> It's clearly something we want, yet it's been there for a very long
> time.

Sure, that would be nice.

v25-0001-Index-skip-scan.patch
Description: Binary data

Re: fix for BUG #3720: wrong results at using ltree

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

On 2019-Jul-09, Oleg Bartunov wrote:

> On Mon, Jul 8, 2019 at 7:22 AM Thomas Munro  wrote:
> >
> > On Sun, Apr 7, 2019 at 3:46 AM Tom Lane  wrote:

> > > In short, I'm wondering if we should treat this as a documentation
> > > bug not a code bug.  But to do that, we'd need a more accurate
> > > description of what the code is supposed to do, because the statement
> > > quoted above is certainly not a match to the actual behavior.

> > Teodor, Oleg, would you like to offer an opinion here?

> We are currently very busy and will look at the problem (and dig into
> our memory) later.

Hi Oleg, Teodor.  Did you find time to refresh your memory on these things?
It would be good to have these bugfixes sorted out.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Patch] Invalid permission check in pg_stats for functional indexes

2019-09-05 Thread Tom Lane

Alvaro Herrera  writes:
> Hmm.  This seems to create a large performance drop.

Yeah.  It's also flat out wrong for indexes that depend on whole-row
variables.  For that case, we really need to insist that the user
have select privilege on all the table columns, but this won't
accomplish that.  It ignores pg_depend entries with refobjsubid = 0,
and there's no good way to expand those even if it didn't.

> You forgot to add a condition `pg_depend.classid =
> 'pg_catalog.pg_class'::pg_catalog.regclass` in your subquery (fixing
> that probably improves the plan a lot); but more generally I'm not sure
> that querying pg_depend is an acceptable way to go about this.

pg_depend isn't ideal for this, I agree.  It serves other masters.

> I have
> to admit I don't see any other way to get a list of columns involved in
> an expression, though.  Maybe we need to introduce a function that
> returns the set of columns involved in an index (which should include
> the column in a WHERE clause if any, I suppose.)

I agree that some C function that inspects the index definition is
probably needed here.  Not sure exactly what it should look like.

We might be well advised to bury the whole business in a function
like "has_index_column_privilege(index_oid, col, priv_type)" rather
than implementing that partly in SQL and partly in C.  The performance
would be better, and we'd have more flexibility to fix issues without
forcing new initdb's.

On the other hand, a SQL function that just parses the index definition
and returns relevant column number(s) might be useful for other
purposes, so maybe we should write that alongside this.

> What about relkind='m'?

As coded, this certainly breaks pg_stat for those, and for foreign tables
as well.  Likely better to write something like
"case when relkind = 'i' then do-something-for-indexes else old-code end".

Actually ... maybe we don't need to change the view definition at all,
but instead just make has_column_privilege() do something different
for indexes than it does for other relation types.  It's dubious that
applying that function to an index yields anything meaningful today,
so we could redefine what it returns without (probably) breaking
anything.  That would at least give us an option to back-patch, too,
though the end result might be complex enough that we don't care to
risk it.

I wonder which of the other has_xxx_privilege tests are likewise
in need of rethink for indexes.

> I'm not sure about a writing a test for this.  Do we have any tests for
> privileges here?

I don't think we have any meaningful tests for the info-schema views
as such.  However, if we redefine the problem as "has_column_privilege
on an index does the wrong thing", then privileges.sql is a natural
home for testing that because it already has test cases for that
function.

regards, tom lane

Re: Misleading comment in tuplesort_set_bound

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

On 2019-Aug-26, Tom Lane wrote:

> James Coleman  writes:

> I think the comment is fine as-is.  Perhaps the code would be clearer
> though, if we merged those two asserts into one?
> 
>   /* Assert we're called before loading any tuples */
>   Assert(state->status == TSS_INITIAL &&
>  state->memtupcount == 0);

Makes sense to me.  James, do you want to submit a new patch?

> I'm not totally sure about the usefulness/relevance of the two
> assertions following these, but they could likely do with their
> own comment(s), because this one surely isn't covering them.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Misleading comment in tuplesort_set_bound

2019-09-05 Thread James Coleman

Yes, planning on it, just a bit behind right now so will likely be a
few more days at least.

On Thu, Sep 5, 2019 at 4:57 PM Alvaro Herrera from 2ndQuadrant
 wrote:
>
> On 2019-Aug-26, Tom Lane wrote:
>
> > James Coleman  writes:
>
> > I think the comment is fine as-is.  Perhaps the code would be clearer
> > though, if we merged those two asserts into one?
> >
> >   /* Assert we're called before loading any tuples */
> >   Assert(state->status == TSS_INITIAL &&
> >  state->memtupcount == 0);
>
> Makes sense to me.  James, do you want to submit a new patch?
>
> > I'm not totally sure about the usefulness/relevance of the two
> > assertions following these, but they could likely do with their
> > own comment(s), because this one surely isn't covering them.
>
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #15977: Inconsistent behavior in chained transactions

2019-09-05 Thread Andres Freund

Hi,


On 2019-09-05 14:16:11 +0200, Peter Eisentraut wrote:
> On 2019-09-04 16:49, fn ln wrote:
> > I made another patch for suggested behavior (COMMIT/ROLLBACK AND CHAIN
> > now gives us an error when used in an implicit block).
> 
> I'm content with this patch.

Would need tests.


> Better disable questionable cases now and maybe re-enable them later
> if someone wants to make a case for it.

I do think the fact that COMMIT in multi-statement implicit transaction
has some usecase, is an argument for just implementing it properly...


Greetings,

Andres Freund

Re: AtEOXact_Snapshot timing

2019-09-05 Thread Andres Freund

Hi,

On 2019-09-05 14:50:50 -0400, Robert Haas wrote:
> The best way that I've been able to come up with to enforce this rule
> after a little bit of thought is to add Assert(IsTransactionState())
> to a bunch of functions in snapmgr.c, most notably
> GetTransactionSnapshot and GetCatalogSnapshot.

Wonder if there's a risk that callers might still have a snapshot
around, in independent memory?  It could make sense to add such an
assert to a visibility routine or two, maybe?

I suspect that we should add non-assert condition to a central place
like GetSnapshotData(). It's not hard to imagine extensions just using
that directly, and that we'd never notice that with assert only
testing. It's also hard to imagine a single if
(unlikely(IsTransactionState())) to be expensive enough to matter
compared to GetSnapshotData()'s own cost.

I wonder, not just because of the previous paragraph, whether it could
be worthwhile to expose enough xact.c state to allow
IsTransactionState() to be done without a function call. ISMT a few
Assert(IsTransactionState()) type checks really are important enough to
be done in production builds too. Some of the relevant scenarios aren't
even close to be covered fully, and you'll get bad results if there's
such a problem.

> @@ -2732,6 +2732,18 @@ AbortTransaction(void)
>   pgstat_report_xact_timestamp(0);
>   }
>  
> + /*
> +  * Any snapshots taken by this transaction were unsafe to use even at 
> the
> +  * time when we entered AbortTransaction(), since we might have, for
> +  * example, inserted a heap tuple and failed while inserting index 
> tuples.
> +  * They were even more unsafe after ProcArrayEndTransaction(), since 
> after
> +  * that point tuples we inserted could be pruned by other backends.
> +  * However, we postpone the cleanup until this point in the sequence
> +  * because it has to be done after ResourceOwnerRelease() has finishing
> +  * unregistering snapshots.
> +  */
> + AtEOXact_Snapshot(false, true);
> +
>   /*
>* State remains TRANS_ABORT until CleanupTransaction().
>*/

Hm. This means that
if (is_parallel_worker)
CallXactCallbacks(XACT_EVENT_PARALLEL_ABORT);
else
CallXactCallbacks(XACT_EVENT_ABORT);
which, together with a few of the other functions, could plausibly try
to use snapshot related logic, may end up continuing to use an existing
snapshot without us detecting the problem? I think?  Especially with the
asserts present ISTM that we really should kill the existing snapshots
directly adjacent to ProcArrayEndTransaction(). As you say, after that
the snapshots aren't correct anymore.  And with the right assertions we
should be safe againsts accidental reintroduction of catalog access in
the following code?

Greetings,

Andres Freund

Re: ERROR: multixact X from before cutoff Y found to be still running

2019-09-05 Thread Thomas Munro

On Fri, Sep 6, 2019 at 6:32 AM Jeremy Schneider  wrote:
> It really appears that it was the autovacuum process itself that was 
> providing the oldest running multixact which caused errors on yesterday's 
> attempts to vacuum other tables - even though I though vacuum processes were 
> ignored by that code.  I'll have to take another look at some point.

Ah, that seems plausible.  If the backend ever called
GetMultiXactIdMembers() and thence MultiXactIdSetOldestVisible() at a
time when there were live multixacts, it would set its own
OldestVisibleMXactID[] slot, and then GetOldestMultiXactId() would
return that value for the rest of the transaction (unless there was an
even older one to return, but in the case you're describing there
wasn't).  GetOldestMultiXactId() doesn't have a way to ignore vacuum
backends, like GetOldestXmin() does.  That doesn't seem to be a
problem in itself.

(I am not sure why GetOldestMultiXactId() needs to consider
OldestVisibleMXactId[] at all for this purpose, and not just
OldestMemberXactId[], but I suppose it has to do with simultaneously
key-share-locked and updated tuples or something, it's too early and I
haven't had enough coffee.)

-- 
Thomas Munro
https://enterprisedb.com

Re: Fix XML handling with DOCTYPE

2019-09-05 Thread Chapman Flack

Hi Alvaro,

On 08/03/19 12:15, Alvaro Herrera wrote:

>> I don't know if it's too late to get in the upcoming minor releases,
>> but maybe it can, if it looks ok, or the next ones, if that's too rushed.
> 
> Hmm, I'm travelling back home from a conference the weekend, so yeah I
> think it would be rushed for me to handle for the upcoming set.  But I
> can look at it before the *next* set.

Are these on your radar to maybe backpatch in this round of activity?

The latest patches I did for 11 and 10 are in
https://www.postgresql.org/message-id/5D45A44F.8010803%40anastigmatix.net

Cheers,
-Chap

Re: libpq debug log

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

Hello,

Nice patch.  I think there's pretty near consensus that this is
something we want, and that the output format of one trace msg per libpq
msg is roughly okay.  (I'm not sure there's 100% agreement on this last
point, but it seems close enough.)

> I changed like this:
> 
> 2019-04-04 02:39:51.488 UTC  > Query 59 "SELECT 
> pg_catalog.set_config('search_path', '', false)"

The "59" there seems quite odd, though.

* What is this in pg_conn?  I don't understand why this array is of size
  MAXPGPATH.
+   PGFrontendLogMsgEntry frontend_entry[MAXPGPATH];
  This seems to make pg_conn much larger than we'd like.

Minor review points:

* Do we need COMMAND_B_MIN etc to reduce the number of zeroes at the
beginning of the tables?

* The place where you define TRACELOG_TIME_SIZE seems like it'll be
  unavoidably out of date if somebody changes the format string.  I'd put
  that #define next to where the fmt appears.

* "output a a null-terminated" has duplicated "a"

* We don't add braces around single-statement blocks (pqPutMsgEnd)

* Please pgindent.


Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FETCH FIRST clause PERCENT option

2019-09-05 Thread Tom Lane

Surafel Temesgen  writes:
> [ percent-incremental-v8.patch ]

I took a quick look through this.

* Why is this adding new functionality in tuplestore.c?  Especially
functionality to get out a different tuple representation than what was
put in?  I can't see a reason why nodeLimit should need that, and for
sure I don't see a reason why it would need it for this feature if it
didn't before.

* The business with query.limitCount having different data types
depending on LimitOption seems entirely bletcherous and bug-inducing.
(There are live bugs of that sort in what you have now: notably, that
kluge in adjust_limit_rows_costs will crash and burn if float8 is
pass-by-reference.  Breaking the Datum abstraction is bad practice.)
I wonder if we could get away with making limitCount be float8 all the
time.  This would mean that you'd lose exactness for counts above 2^53
(or so, depending on whether float8 is IEEE or not), but I doubt that
anybody would ever notice.

* Names like "PERCENTAGE" seem mighty generic to be exposing as globally
known symbols; also you're disregarding a fairly widespread PG
convention that members of an enum should have names derived from the
enum type's name.  So I'd be inclined to make the values of enum
LimitOption be LIMIT_OPTION_COUNT and LIMIT_OPTION_PERCENT.  (I don't
especially like EXACT_NUMBER, first because it's going to be quite long
with the prefix, and second because it suggests that maybe there's an
INEXACT_NUMBER alternative.)

* The "void *limitOption" business in gram.y also seems pretty ugly;
it'd be better for that to be enum LimitOption.  I gather you want
the null to indicate "no option given", but likely it'd be better
to invent a LIMIT_OPTION_DEFAULT enum alternative for that purpose.

* An alternative to the three points above is just to separate
Query.limitCount (an int8) from Query.limitPercent (a float8), with the
understanding that at most one of the two can be non-null, and use
similar representations at other steps of the processing chain.  Then
you don't need enum limitOption at all.  That might not scale especially
nicely to additional variants, but trying to overload limitCount even
further isn't going to be nice either.

* The proposed change in grouping_planner() seems like a complete hack.
Why would you not change the way that limit_tuples was computed earlier,
instead?  That is, it seems like the part of the planner to teach about
this feature is preprocess_limit (and limit_needed), not someplace else.
It is surely not sane that only this one planner decision would react to
a percentage-based limit.

* I didn't really study the changes in nodeLimit.c, but doing
"tuplestore_rescan" in ExecReScanLimit is surely just wrong.  You
probably want to delete and recreate the tuplestore, instead, since
whatever data you already collected is of no further use.  Maybe, in
the case where no rescan of the child node is needed, you could re-use
the data already collected; but that would require a bunch of additional
logic.  I'm inclined to think that v1 of the patch shouldn't concern
itself with that sort of optimization.

* I don't see any delta in ruleutils.c, but surely that needs to know
about this feature so that it can correctly print views using it.

regards, tom lane

Re: FETCH FIRST clause WITH TIES option

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

As Tom just said in the thread for PERCENT, the gram.y changes need a
better representation.  Also, rename EXACT_NUMBER, per that thread.

As far as I can tell, this concerns feature F867.  I think we should
mark that as supported after this patch -- please edit
src/backend/catalog/sql_features.txt.

Earlier in the thread, Tomas Vondra said:

> 3) I'm a bit confused by the initialization added to ExecInitLimit. It
> first gets the tuple descriptor from the limitstate (it should not do so  
>   
> 
> directly but use ExecGetResultType). But when it creates the extra slot,  
>   
> 
> it uses ops extracted from the outer plan. That's strange, I guess ...
>   
> 
>   
>   
> 
> And then it extracts the descriptor from the outer plan and uses it when  
>   
> 
> calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
>   
> 
> the last_slot, which is using a descriptor from the limitstate.   
>   
> 
>   
>   
> 
> IMHO all of this should use descriptor/ops from the outer plan, no? It
>   
> 
> probably does not change anything because limit does not project, but it  
>   
> 
> seems confusing.  
>   
> 

and you replied:

> agree

... yet this doesn't appear to have resulted in any change in the code,
or I just missed it.  Are you going to update the patch per that?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [bug fix] Produce a crash dump before main() on Windows

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

On 2019-Jul-01, Tsunakawa, Takayuki wrote:

> From: Amit Kapila [mailto:amit.kapil...@gmail.com]
> > Tsunakawa/Haribabu - By reading this thread briefly, it seems we need
> > some more inputs from other developers on whether to fix this or not,
> > so ideally the status of this patch should be 'Needs Review'.  Why it
> > is in 'Waiting on Author' state?  If something is required from
> > Author, then do we expect to see the updated patch in the next few
> > days?
> 
> Thank you for paying attention to this.  I think the patch is good,
> but someone else may have a different solution.  So I marked it as
> needs review.

Tsunakawa-san, there's been some discussion downthread.  Could you
please submit an updated version of the patch?  I would like to get it
pushed before beta4 next week, if possible.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Problem during Windows service start

2019-09-05 Thread Alvaro Herrera from 2ndQuadrant

On 2019-Jul-24, Kyotaro Horiguchi wrote:

> > Please find the proposed patch for review. I will attach it to
> > commitfest as well
> 
> Pacemaker suffers the same thing. We suggest our customers that "start
> server alone to perform recovery then start pacemaker if it is
> expected to take a long time for recovery so that reaches time out".
> 
> I don't think it is good think to let status SERVICE_RUNNING although
> it actually is not (yet). I think the right direction here is that, if
> pg_ctl returns by timeout, pgwin32_ServiceMain kills the starting
> server then report something like "timedout and server was stopped,
> please make sure the server not to take a long time to perform
> recovery.".

I'm not sure that's a great reaction; it makes total recovery time
even longer.  How would the user ensure that recovery takes a shorter
time?  We'd be forcing them to start the service over and over, until
recovery completes.

Can't we have pg_ctl just continue to wait indefinitely?  So we'd set
SERVICE_START_PENDING when wait_for_postmaster is out of patience, then
loop again -- until recovery completes.  Exiting pg_ctl on timeout seems
reasonable for interactive use, but maybe for service use it's not
reasonable.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: enhance SPI to support EXECUTE commands

2019-09-05 Thread Quan Zongliang


On 2019/9/5 17:33, Pavel Stehule wrote:



čt 5. 9. 2019 v 10:57 odesílatel Quan Zongliang 
> napsal:


On 2019/9/5 16:31, Pavel Stehule wrote:
 >
 >
 > čt 5. 9. 2019 v 10:25 odesílatel Quan Zongliang
 > mailto:zongliang.q...@postgresdata.com>
 > >> napsal:
 >
 >     On 2019/9/5 15:09, Pavel Stehule wrote:
 >      >
 >      >
 >      > čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang
 >      > mailto:zongliang.q...@postgresdata.com>
 >     >
 >      > 
 >           >
 >      >     Dear hackers,
 >      >
 >      >     I found that such a statement would get 0 in PL/pgSQL.
 >      >
 >      >     PREPARE smt_del(int) AS DELETE FROM t1;
 >      >     EXECUTE 'EXECUTE smt_del(100)';
 >      >     GET DIAGNOSTICS j = ROW_COUNT;
 >      >
 >      >     In fact, this is a problem with SPI, it does not support
 >     getting result
 >      >     of the EXECUTE command. I made a little enhancement.
Support
 >     for the
 >      >     number of rows processed when executing
INSERT/UPDATE/DELETE
 >     statements
 >      >     dynamically.
 >      >
 >      >
 >      > Is there some use case for support this feature?
 >      >
 >     A user deletes the data in PL/pgSQL using the above method,
hoping
 >     to do
 >     more processing according to the number of rows affected, and
found
 >     that
 >     each time will get 0.
 >
 >     Sample code:
 >     PREPARE smt_del(int) AS DELETE FROM t1 WHERE c=$1;
 >     EXECUTE 'EXECUTE smt_del(100)';
 >     GET DIAGNOSTICS j = ROW_COUNT;
 >
 >
 > This has not sense in plpgsql. Why you use PREPARE statement
explicitly?
 >
Yes, I told him to do it in other ways, and the problem has been solved.

Under psql, we can get this result

flying=# EXECUTE smt_del(100);
DELETE 1

So I think this may be the negligence of SPI, it should be better to
deal with it.


Personally, I would not to support features that allows bad code.

My code is actually a way to continue the CREATE AS SELECT and COPY 
statements. In spi.c, they look like this:


if (IsA(stmt->utilityStmt, CreateTableAsStmt))  // original code
...
else if (IsA(stmt->utilityStmt, CopyStmt))  // original code
...
else if (IsA(stmt->utilityStmt, ExecuteStmt))  // my code

My patch was not developed for this PL/pgSQL approach. I just because it 
found this problem.




Pavel


 >
 >     IF j=1 THEN
 >         do something
 >     ELSIF j=0 THEN
 >         do something
 >
 >     Here j is always equal to 0.
 >
 >
 >
 >     Regards
 >
 >      > Regards
 >      >
 >      > Pavel
 >      >
 >      >
 >      >     Regards,
 >      >     Quan Zongliang
 >      >
 >

Re: progress report for ANALYZE

2019-09-05 Thread Tatsuro Yamada


Hi Alvaro,


There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.


 
Thanks for fixing that. :)

I'll test it later.


I think we have to address the following comment from Robert. Right?
Do you have any ideas?



I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count.  That creates a race during which users could see the first bit
of data set and the second not set.  I don't see any reason not to set
all four fields together.



Hmm... I understand but it's little difficult because if there are
child rels, acquire_inherited_sample_rows() calls acquire_sample_rows()
(See below). So, it is able to set all four fields together if inh flag
is given as a parameter of those functions, I suppose. But I'm not sure
whether it's okay to add the parameter to both functions or not.
Do you have any ideas?


# do_analyze_rel()
...
if (inh)
numrows = acquire_inherited_sample_rows(onerel, elevel,
rows, targrows,
&totalrows, &totaldeadrows);
else
numrows = (*acquirefunc) (onerel, elevel,
  rows, targrows,
  &totalrows, &totaldeadrows);


# acquire_inherited_sample_rows()
...
foreach(lc, tableOIDs)
{
...
   /* Check table type (MATVIEW can't happen, but might as well allow) */
if (childrel->rd_rel->relkind == RELKIND_RELATION ||
childrel->rd_rel->relkind == RELKIND_MATVIEW)
{
/* Regular table, so use the regular row acquisition function */
acquirefunc = acquire_sample_rows;
...
/* OK, we'll process this child */
has_child = true;
rels[nrels] = childrel;
acquirefuncs[nrels] = acquirefunc;
...
}
...
for (i = 0; i < nrels; i++)
{
...
AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
...
if (childtargrows > 0)
{
...
/* Fetch a random sample of the child's rows */
childrows = (*acquirefunc) (childrel, elevel,
rows + numrows, childtargrows,
&trows, &tdrows)



Thanks,
Tatsuro Yamada

Re: enhance SPI to support EXECUTE commands

2019-09-05 Thread Pavel Stehule

pá 6. 9. 2019 v 3:36 odesílatel Quan Zongliang <
zongliang.q...@postgresdata.com> napsal:

> On 2019/9/5 17:33, Pavel Stehule wrote:
> >
> >
> > čt 5. 9. 2019 v 10:57 odesílatel Quan Zongliang
> >  > > napsal:
> >
> > On 2019/9/5 16:31, Pavel Stehule wrote:
> >  >
> >  >
> >  > čt 5. 9. 2019 v 10:25 odesílatel Quan Zongliang
> >  >  > 
> >  >  > >> napsal:
> >  >
> >  > On 2019/9/5 15:09, Pavel Stehule wrote:
> >  >  >
> >  >  >
> >  >  > čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang
> >  >  >  > 
> >  >  > >
> >  >  >  > 
> >  >  >  >  >  >
> >  >  > Dear hackers,
> >  >  >
> >  >  > I found that such a statement would get 0 in PL/pgSQL.
> >  >  >
> >  >  > PREPARE smt_del(int) AS DELETE FROM t1;
> >  >  > EXECUTE 'EXECUTE smt_del(100)';
> >  >  > GET DIAGNOSTICS j = ROW_COUNT;
> >  >  >
> >  >  > In fact, this is a problem with SPI, it does not
> support
> >  > getting result
> >  >  > of the EXECUTE command. I made a little enhancement.
> > Support
> >  > for the
> >  >  > number of rows processed when executing
> > INSERT/UPDATE/DELETE
> >  > statements
> >  >  > dynamically.
> >  >  >
> >  >  >
> >  >  > Is there some use case for support this feature?
> >  >  >
> >  > A user deletes the data in PL/pgSQL using the above method,
> > hoping
> >  > to do
> >  > more processing according to the number of rows affected, and
> > found
> >  > that
> >  > each time will get 0.
> >  >
> >  > Sample code:
> >  > PREPARE smt_del(int) AS DELETE FROM t1 WHERE c=$1;
> >  > EXECUTE 'EXECUTE smt_del(100)';
> >  > GET DIAGNOSTICS j = ROW_COUNT;
> >  >
> >  >
> >  > This has not sense in plpgsql. Why you use PREPARE statement
> > explicitly?
> >  >
> > Yes, I told him to do it in other ways, and the problem has been
> solved.
> >
> > Under psql, we can get this result
> >
> > flying=# EXECUTE smt_del(100);
> > DELETE 1
> >
> > So I think this may be the negligence of SPI, it should be better to
> > deal with it.
> >
> >
> > Personally, I would not to support features that allows bad code.
> >
> My code is actually a way to continue the CREATE AS SELECT and COPY
> statements. In spi.c, they look like this:
>
> if (IsA(stmt->utilityStmt, CreateTableAsStmt))  // original code
> ...
> else if (IsA(stmt->utilityStmt, CopyStmt))  // original code
> ...
> else if (IsA(stmt->utilityStmt, ExecuteStmt))  // my code
>
> My patch was not developed for this PL/pgSQL approach. I just because it
> found this problem.
>

ok, I can understand to this - but your example is usage is not good.

Pavel


>
> > Pavel
> >
> >
> >  >
> >  > IF j=1 THEN
> >  > do something
> >  > ELSIF j=0 THEN
> >  > do something
> >  >
> >  > Here j is always equal to 0.
> >  >
> >  >
> >  >
> >  > Regards
> >  >
> >  >  > Regards
> >  >  >
> >  >  > Pavel
> >  >  >
> >  >  >
> >  >  > Regards,
> >  >  > Quan Zongliang
> >  >  >
> >  >
> >
>
>
>

Re: [bug fix] Produce a crash dump before main() on Windows

2019-09-05 Thread Tom Lane

Alvaro Herrera from 2ndQuadrant  writes:
> Tsunakawa-san, there's been some discussion downthread.  Could you
> please submit an updated version of the patch?  I would like to get it
> pushed before beta4 next week, if possible.

[ confused... ]  I haven't been paying attention to this thread,
but is it really something we'd push into v12 at this point?

regards, tom lane

Re: POC: Cleaning up orphaned files using undo logs

2019-09-05 Thread vignesh C

Hi Thomas,

While testing one of the recovery scenarios I found one issue:
FailedAssertion("!(logno == context->recovery_logno)

The details of the same is mentioned below:
The context's try_location was not updated in
UndoLogAllocateInRecovery, in PrepareUndoInsert the try_location was
updated with the undo record size.
In the subsequent UndoLogAllocateInRecovery as the value for
try_location was not initialized but only updated with the size the
logno will always not match if the recovery_logno is non zero and the
assert fails.

Fixed by setting the try_location in UndoLogAllocateInRecovery,
similar to try_location setting in UndoLogAllocate.
Patch for the same is attached.

Please have a look and add the changes in one of the upcoming version.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

On Mon, Sep 2, 2019 at 9:53 AM Thomas Munro  wrote:
>
> On Fri, Aug 30, 2019 at 8:27 PM Kuntal Ghosh  
> wrote:
> > I'm getting the following assert failure while performing the recovery
> > with the same.
> > "TRAP: FailedAssertion("slot->meta.status == UNDO_LOG_STATUS_FULL",
> > File: "undolog.c", Line: 997)"
> >
> > I found that we don't emit an WAL record when we update the
> > slot->meta.status as UNDO_LOG_STATUS_FULL. If we don't that, after
> > crash recovery, some new transaction may use that undo log which is
> > wrong, IMHO. Am I missing something?
>
> Thanks, right, that status logging is wrong, will fix in next version.
>
> --
> Thomas Munro
> https://enterprisedb.com
>
>
From e69a9eea71562434cc0512871a9b6d7424ce93ec Mon Sep 17 00:00:00 2001
From: Vigneshwaran c 
Date: Fri, 30 Aug 2019 11:45:03 +0530
Subject: [PATCH] FailedAssertion("!(logno == context->recovery_logno) fix

try_location was not updated in UndoLogAllocateInRecovery, in
PrepareUndoInsert the try_location was updated with the undo record size.
In the subsequent UndoLogAllocateInRecovery as the value for try_location
was not initialized but only updated with the size the logno will always
not match if the recovery_logno is non zero and the assert fails. Fixed
by setting the try_location in UndoLogAllocateInRecovery, similar to
try_location setting in UndoLogAllocate.

Patch by Vigneshwaran C, reviewed by Dilip Kumar.
---
 src/backend/access/undo/undolog.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/backend/access/undo/undolog.c b/src/backend/access/undo/undolog.c
index 073221c..9acd570 100644
--- a/src/backend/access/undo/undolog.c
+++ b/src/backend/access/undo/undolog.c
@@ -960,6 +960,14 @@ UndoLogAllocateInRecovery(UndoLogAllocContext *context,
 			*need_xact_header =
 context->try_location == InvalidUndoRecPtr &&
 slot->meta.unlogged.insert == slot->meta.unlogged.this_xact_start;
+
+			/*
+			 * If no try_location was passed in, or if we switched logs, then we'll
+			 * return the current insertion point.
+			 */
+			if (context->try_location == InvalidUndoRecPtr)
+context->try_location = MakeUndoRecPtr(slot->logno, slot->meta.unlogged.insert);
+
 			*last_xact_start = slot->meta.unlogged.last_xact_start;
 			context->recovery_logno = slot->logno;

-- 
1.8.3.1

Re: REINDEX filtering in the backend

2019-09-05 Thread Michael Paquier

On Wed, Sep 04, 2019 at 01:54:53PM +0200, Peter Eisentraut wrote:
> Right.  We should aim to get per-object collation version tracking done.
>  And then we might want to have a REINDEX variant that processes exactly
> those indexes with an out-of-date version number -- and then updates
> that version number once the reindexing is done.  I think that project
> is achievable for PG13.
> 
> I think we can keep this present patch in our back pocket for, say, the
> last commit fest if we don't make sufficient progress on those other
> things.  Right now, it's hard to review because the target is moving,
> and it's unclear what guidance to give users.

Okay, agreed.  I have marked the patch as returned with feedback
then.
--
Michael


signature.asc
Description: PGP signature

Re: Plug-in common/logging.h with vacuumlo and oid2name

2019-09-05 Thread Michael Paquier

On Thu, Sep 05, 2019 at 09:59:51AM -0400, Alvaro Herrera wrote:
> Patch looks good to me, please push.

Thanks, applied without the calls to set_pglocale_pgservice().

> Generally speaking I find the 'progname' handling a bit silly (since we
> have it both as a variable in each program and also in logging.c
> separately), but that's not the fault of this patch, and this patch
> doesn't make it worse.  That said, I think some other messages in
> vacuumlo can be made pg_log_somelevel(), based on occurrences of
> 'progname'.

That would mean moving some of the messages from stdout to stderr.  We
could do that.
--
Michael


signature.asc
Description: PGP signature

Re: Avoiding hash join batch explosions with extreme skew and weird stats

2019-09-05 Thread Thomas Munro

On Wed, Jul 31, 2019 at 6:47 AM Melanie Plageman
 wrote:
> So, I've rewritten the patch to use a BufFile for the outer table
> batch file tuples' match statuses and write bytes to and from the file
> which start as 0 and, upon encountering a match for a tuple, I set its
> bit in the file to 1 (also rebased with current master).
>
> It, of course, only works for parallel-oblivious hashjoin -- it relies
> on deterministic order of tuples encountered in the outer side batch
> file to set the right match bit and uses a counter to decide which bit
> to set.
>
> I did the "needlessly dumb implementation" Robert mentioned, though,
> I thought about it and couldn't come up with a much smarter way to
> write match bits to a file. I think there might be an optimization
> opportunity in not writing the current_byte to the file each time that
> the outer tuple matches and only doing this once we have advanced to a
> tuple number that wouldn't have its match bit in the current_byte. I
> didn't do that to keep it simple, and, I suspect there might be a bit
> of gymnastics needed to make sure that that byte is actually written
> to the file in case we exit from some other state before we encounter
> the tuple represented in the last bit in that byte.

Thanks for working on this! I plan to poke at it a bit in the next few weeks.

> I plan to work on a separate implementation for parallel hashjoin
> next--to understand what is required. I believe the logic to decide
> when to fall back should be fairly easy to slot in at the end once
> we've decided what that logic is.

Seems like a good time for me to try to summarise what I think the
main problems are here:

1.  The match-bit storage problem already discussed.  The tuples that
each process receives while reading from SharedTupleStore are
non-deterministic (like other parallel scans).  To use a bitmap-based
approach, I guess we'd need to invent some way to give the tuples a
stable identifier within some kind of densely packed number space that
we could use to address the bitmap, or take the IO hit and write all
the tuples back.  That might involve changing the way SharedTupleStore
holds data.

2.  Tricky problems relating to barriers and flow control.  First, let
me explain why PHJ doesn't support full/right outer joins yet.  At
first I thought it was going to be easy, because, although the shared
memory hash table is read-only after it has been built, it seems safe
to weaken that only slightly and let the match flag be set by any
process during probing: it's OK if two processes clobber each other's
writes, as the only transition is a single bit going strictly from 0
to 1, and there will certainly be a full memory barrier before anyone
tries to read those match bits.  Then during the scan for unmatched,
you just have to somehow dole out hash table buckets or ranges of
buckets to processes on a first-come-first-served basis.  But then
I crashed into the following problem:

* You can't begin the scan for unmatched tuples until every process
has finished probing (ie until you have the final set of match bits).
* You can't wait for every process to finish probing, because any
process that has emitted a tuple might never come back if there is
another node that is also waiting for all processes (ie deadlock
against another PHJ doing the same thing), and probing is a phase that
emits tuples.

Generally, it's not safe to emit tuples while you are attached to a
Barrier, unless you're only going to detach from it, not wait at it,
because emitting tuples lets the program counter escape your control.
Generally, it's not safe to detach from a Barrier while accessing
resources whose lifetime it controls, such as a hash table, because
then it might go away underneath you.

The PHJ plans that are supported currently adhere to that programming
rule and so don't have a problem: after the Barrier reaches the
probing phase, processes never wait for each other again so they're
free to begin emitting tuples.  They just detach when they're done
probing, and the last to detach cleans up (frees the hash table etc).
If there is more than one batch, they detach from one batch and attach
to another when they're ready (each batch has its own Barrier), so we
can consider the batches to be entirely independent.

There is probably a way to make a scan-for-unmatched-inner phase work,
possibly involving another Barrier or something like that, but I ran
out of time trying to figure it out and wanted to ship a working PHJ
for the more common plan types.  I suppose PHLJ will face two variants
of this problem: (1) you need to synchronise the loops (you can't dump
the hash table in preparation for the next loop until all have
finished probing for the current loop), and yet you've already emitted
tuples, so you're not allowed to wait for other processes and they're
not allowed to wait for you, and (2) you can't start the
scan-for-unmatched-outer until all the probe loops belonging to one
batch are done.  Th

Re: pg_promote() can cause busy loop

2019-09-05 Thread Fujii Masao

On Thu, Sep 5, 2019 at 4:52 PM Michael Paquier  wrote:
>
> On Thu, Sep 05, 2019 at 04:03:22PM +0900, Fujii Masao wrote:
> > So, barring any objection, I will commit the attached patch.
>
> LGTM.  Thanks!

Committed. Thanks!

Regards,

-- 
Fujii Masao

Re: [HACKERS] CLUSTER command progress monitor

2019-09-05 Thread Michael Paquier

On Thu, Sep 05, 2019 at 03:17:51PM -0400, Robert Haas wrote:
> Oops.  Yeah, that's bogus (as are some of the other things you
> mention).  I think we're going to have to fix this by passing down
> some flags to these functions to tell them what kind of progress
> updates to do (or to do none).  Or else pass down a callback function
> and a context object, but that seems like it might be overkill.

One idea I got was to pass the command ID as an extra argument of the
update routine.  I am not completely sure either if we need this level
of complication.

> Those are just weaknesses of the infrastructure.  Perhaps there is a
> better solution, but there's no intrinsic reason that we can't avoid
> them by careful coding.

Perhaps.  The current infra allows the addition of a progress report
in code paths which are isolated from other things.  For CLUSTER, most
things are fine as long as the progress is updated in cluster_rel(),
the rest is too internal.

> Well, it might be OK to do that if we're clear that this is the index
> progress-reporting view and the command is CLUSTER but it happens to
> be building an index now so we're showing it here.  But I don't see
> how it would work anyway: you can't reported cascading progress
> reports in shared memory because you've only got a fixed amount of
> space.

I don't see exactly why we could not switch to a fixed number of
slots, say 8, with one code path to start a progress which adds an
extra report on the stack, one to remove one entry from the stack, and
a new one to reset the whole thing for a backend.  This would not need
much restructuration of course.

Finally comes the question of what do we do for v12?  I am adding in
CC Peter, Alvaro being already present, who have been involved in the
commits with CREATE INDEX and REINDEX.  It would be sad to revert a
this feature, but well I'd rather do that now than regret later
releasing the feature as it is currently shaped..  Let's see what the
others think.
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-05 Thread Michael Paquier

On Fri, Sep 06, 2019 at 02:44:18PM +0900, Michael Paquier wrote:
> I don't see exactly why we could not switch to a fixed number of
> slots, say 8, with one code path to start a progress which adds an
> extra report on the stack, one to remove one entry from the stack, and
> a new one to reset the whole thing for a backend.  This would not need
> much restructuration of course.

Wake up, Neo.  Your last sentence is confusing.  I meant that this
would need more design efforts, so that's not in scope for v12. 
--
Michael


signature.asc
Description: PGP signature

Re: [bug fix] Produce a crash dump before main() on Windows

2019-09-05 Thread Michael Paquier

On Thu, Sep 05, 2019 at 11:49:03PM -0400, Tom Lane wrote:
> Alvaro Herrera from 2ndQuadrant  writes:
>> Tsunakawa-san, there's been some discussion downthread.  Could you
>> please submit an updated version of the patch?  I would like to get it
>> pushed before beta4 next week, if possible.
> 
> [ confused... ]  I haven't been paying attention to this thread,
> but is it really something we'd push into v12 at this point?

The last patch submitted is here:
https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8ECF73@G01JPEXMBYT05
And based on the code paths it touches I would recommend to not play
with REL_12_STABLE at this stage.
--
Michael


signature.asc
Description: PGP signature

Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2019-09-05 Thread Amit Langote

On Fri, Sep 6, 2019 at 12:53 AM Tom Lane  wrote:
>
> Amit Langote  writes:
> > On Thu, Sep 5, 2019 at 6:18 PM Dilip Kumar  wrote:
> >> Instead of falling back to the child, isn't it make more sense to
> >> check the permissions on the parent upto which we could translate (it
> >> may not be the root parent)?
>
> > Hmm, in that case, the parent up to which we might be able to
> > translate would still be a child and might have different permissions
> > than the table mentioned in the query (what's being called "root" in
> > this context).  Would it be worth further complicating this code if
> > that's the case?
>
> I think that checking intermediate levels would be an actively bad idea
> --- it would make the behavior too confusing.  We should preferably check
> the table actually named in the query, or if we can't then check the
> table we are using the stats of; nothing else.

Agreed.

> Another idea that we should consider, though, is to allow the access if
> *either* of those two tables allows it.  The main reason that that's
> attractive is that it's certain not to break any case that works today.
> But also, it would mean that in many practical cases we'd not have to
> try to map Vars back up to the original parent, thus avoiding the
> performance penalty.  (That is, check the target table as we do now,
> and only if we find it lacks permissions do we start mapping back.)

Ah, that sounds like a good idea.

Patch updated that way.

Thanks,
Amit


v5-0001-Use-root-parent-s-permissions-when-reading-child-.patch
Description: Binary data

Re: basebackup.c's sendFile() ignores read errors

2019-09-05 Thread Jeevan Chalke

On Thu, Sep 5, 2019 at 11:40 PM Robert Haas  wrote:

> On Fri, Aug 30, 2019 at 7:05 AM Jeevan Ladhe
>  wrote:
> >> Fixed both comments in the attached patch.
> >
> > Thanks, the patch looks good to me.
>
> Here's a version of the patch with a further change to the wording of
> the comment.  I hope this is clearer.
>

Yep.


>
> I think this needs to be back-patched all the way back to 9.4, and it
> doesn't seem to apply cleanly before v11.  Any chance you could
> prepare a version for the older branches?
>

Attached patch for v10 and pre. The same v10 patch applies cleanly.

Changes related to the page checksum verification is not present on v10 and
pre, and thus those changes are not applicable, so removed those.  Also,
wal_segment_size is XLogSegSize over there, adjusted that.


>
> Thanks,
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ba4937b..dfb0f47 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -85,6 +85,18 @@ static char *statrelpath = NULL;
  */
 #define THROTTLING_FREQUENCY	8
 
+/*
+ * Checks whether we encountered any error in fread().  fread() doesn't give
+ * any clue what has happened, so we check with ferror().  Also, neither
+ * fread() nor ferror() set errno, so we just throw a generic error.
+ */
+#define CHECK_FREAD_ERROR(fp, filename) \
+do { \
+	if (ferror(fp)) \
+		ereport(ERROR, \
+(errmsg("could not read from file \"%s\"", filename))); \
+} while (0)
+
 /* The actual number of bytes, transfer of which may cause sleep. */
 static uint64 throttling_sample;
 
@@ -509,6 +521,8 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 	break;
 			}
 
+			CHECK_FREAD_ERROR(fp, pathbuf);
+
 			if (len != XLogSegSize)
 			{
 CheckXLogRemoved(segno, tli);
@@ -1245,6 +1259,8 @@ sendFile(char *readfilename, char *tarfilename, struct stat *statbuf,
 		}
 	}
 
+	CHECK_FREAD_ERROR(fp, readfilename);
+
 	/* If the file was truncated while we were sending it, pad it with zeros */
 	if (len < statbuf->st_size)
 	{

Re: amcheck verification for GiST

2019-09-05 Thread Andrey Borodin

Hi!

> 4 сент. 2019 г., в 2:13, Alvaro Herrera  написал(а):
> 
> On 2019-Mar-29, Andrey Borodin wrote:
> 
>> Here's updated patch with AccessShareLock.
>> I've tried to stress this with combination of random inserts, vaccuums and 
>> checks. This process neither failed, nor deadlocked.
>> The patch intentionally contains one superflous line to make gist logically 
>> broken. This triggers regression test of amcheck.
> 
> How close are we to this being a committable patch?  CF bot complains
> that it doesn't apply anymore (please rebase), but from past discussion
> it seems pretty close to ready.

Here's rebased version. Changes in v9:
* adjust to usage of table_open
* update new extension version
* check for main fork presence in GiST check too

Thanks!

Best regards, Andrey Borodin.


0001-GiST-verification-function-for-amcheck-v9.patch
Description: Binary data

RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-09-05 Thread Smith, Peter

-Original Message-
From: Masahiko Sawada  Sent: Thursday, 15 August 2019 
7:10 PM

> BTW I've created PoC patch for cluster encryption feature. Attached patch set 
> has done some items of TODO list and some of them can be used even for finer 
> granularity encryption. Anyway, the implemented components are followings:

Hello Sawada-san,

I guess your original patch code may be getting a bit out-dated by the ongoing 
TDE discussions, but I have done some code review of it anyway. 

Hopefully a few comments below can still be of use going forward:

---

REVIEW COMMENTS

* src/backend/storage/encryption/enc_cipher.c – For functions 
EncryptionCipherValue/String maybe should log warnings for unexpected values 
instead of silently assigning to default 0/”off”.

* src/backend/storage/encryption/enc_cipher.c – For function 
EncryptionCipherString, purpose of returning ”unknown” if unclear because that 
will map back to “off” again anyway via EncryptionCipherValue. Why not just 
return "off" (with warning logged).

* src/include/storage/enc_common.h – Typo in comment: "Encrypton".

* src/include/storage/encryption.h - The macro DataEncryptionEnabled may be 
better to be using enum TDE_ENCRYPTION_OFF instead of magic number 0

* src/backend/storage/encryption/kmgr.c - Function BootStrapKmgr will report 
error if USE_OPENSSL is not defined. The check seems premature because it would 
fail even if the user is not using encryption. Shouldn't the lack of openssl be 
OK when user is not using TDE at all (i.e. when encryption is "none")?

* src/backend/storage/encryption/kmgr.c - In function BootStrapMgr suggest 
better to check if (bootstrap_data_encryption_cipher == TDE_ENCRYPTION_OFF) 
using enum instead of the magic number 0.

* src/backend/storage/encryption/kmgr.c - The function 
run_cluster_passphrase_command function seems mostly a clone of an existing 
run_ssl_passphrase_command function. Is it possible to refactor to share the 
common code?

* src/backend/storage/encryption/kmgr.c - The function derive_encryption_key 
declares a char key_len. Why char? It seems int everywhere else.

* src/backend/bootstrap/bootstrap.c - Suggest better if variable declaration 
bootstrap_data_encryption_cipher = 0 uses enum TDE_ENCRYPTION_OFF instead of 
magic number 0

* src/backend/utils/misc/guc.c - It looks like the default value for GUC 
variable data_encryption_cipher is AES128. Wouldn't "off" be the more 
appropriate default value? Otherwise it seems inconsistent with the logic of 
initdb (which insists that the -e option is mandatory if you wanted any 
encryption).

* src/backend/utils/misc/guc.c - There is a missing entry in the 
config_group_names[]. The patch changed the config_group[] in guc_tables.h, so 
I think there needs to be a matching item in the config_group_names.

* src/bin/initdb/initdb.c - The function check_encryption_cipher would disallow 
an encryption value of "none". Although maybe it is not very useful to say -e 
none, it does seem inconsistent to reject it, given that "none" was a valid 
value for the GUC variable data_encryption_cipher.

* contrib/bloom/blinsert.c - In function btbuildempty the arguments for 
PageEncryptionInPlace seem in the wrong order (forknum should be 2nd).

* src/backend/access/hash/hashpage.c - In function _hash_alloc_buckets the 
arguments for PageEncryptionInPlace seem in the wrong order (forknum should be 
2nd).

* src/backend/access/spgist/spginsert.c - In function spgbuildempty the 
arguments for PageEncryptionInPlace seem in the wrong order (forknum should be 
2nd). This error looks repeated 3X.

* in multiple files - The encryption enums have equivalent strings ("off", 
"aes-128", "aes-256") but they are scattered as string literals in many places 
(e.g. pg_controldata.c, initdb.c, guc.c, enc_cipher.c). Suggest it would be 
better to declare those as string constants in one place only.

---

Kind Regards,

Peter Smith
Fujitsu Australia

84 matches

Mail list logo