Re: New vacuum option to do only freezing

2019-02-28 Thread Masahiko Sawada
On Thu, Feb 28, 2019 at 2:46 AM Bossart, Nathan  wrote:
>
> On 2/27/19, 2:08 AM, "Masahiko Sawada"  wrote:
> >> +   if (skip_index_vacuum)
> >> +   appendStringInfo(&buf, ngettext("%.0f tuple is left as 
> >> dead.\n",
> >> +  
> >>  "%.0f tuples are left as dead.\n",
> >> +  
> >>  nleft),
> >> +nleft);
> >>
> >> I think we could emit this metric for all cases, not only when
> >> DISABLE_INDEX_CLEANUP is used.
> >
> > I think that tups_vacuumed shows total number of vacuumed tuples and
> > is already shown in the log message. The 'nleft' counts the total
> > number of recorded dead tuple but not counts tuples are removed during
> > HOT-pruning. Is this a valuable for users in non-DISABLE_INDEX_CLEANUP
> > case?
>
> I think it is valuable.  When DISABLE_INDEX_CLEANUP is not used or it
> is used for a relation with no indexes, it makes it clear that no
> tuples were left marked as dead.  Also, it looks like all of the other
> information here is provided regardless of the options used.  IMO it
> is good to list all of the stats so that users have the full picture
> of what VACUUM did.
>

I see your point. That seems good to me.

Attached the updated version patch. I've incorporated all review
comments I got and have changed the number of tuples being reported as
'removed tuples'. With this option, tuples completely being removed is
only tuples marked as unused during HOT-pruning, other dead tuples are
left. So we count those tuples during HOT-pruning and reports it as
removed tuples.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


v7-0001-Add-DISABLE_INDEX_CLEANUP-option-to-VACUUM-comman.patch
Description: Binary data


v7-0002-Add-diable-index-cleanup-option-to-vacuumdb.patch
Description: Binary data


Re: Prevent extension creation in temporary schemas

2019-02-28 Thread Sergei Kornilov
Hi

> I found that this strange error appears after making
> temporary tables.
>
> test=> CREATE TEMPORARY TABLE temp (id int);
> CREATE TABLE
> test=> CREATE EXTENSION file_fdw WITH SCHEMA pg_temp_3;
> ERROR: function file_fdw_handler() does not exist
>
> I would try to understand this problem for community and
> my experience.

This behavior seems as not related to extensions infrastructure:

postgres=#  CREATE TEMPORARY TABLE temp (id int);
CREATE TABLE
postgres=# set search_path to 'pg_temp_3';
SET
postgres=# create function foo() returns int as 'select 1' language sql;
CREATE FUNCTION
postgres=# select pg_temp_3.foo();
 foo 
-
   1
(1 row)

postgres=# select foo();
ERROR:  function foo() does not exist
LINE 1: select foo();
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
postgres=# show search_path ;
 search_path 
-
 pg_temp_3
(1 row)

regards, Sergei



RE: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-02-28 Thread Tsunakawa, Takayuki
From: Alvaro Herrera [mailto:alvhe...@2ndquadrant.com]
> Robert used the phrase "attractive nuisance", which maybe sounds like a
> good thing to have to a non native speaker, but it actually isn't -- he
> was saying we should avoid a GUC at all, and I can see the reason for
> that.  I think we should have a VACUUM option and a reloption, but no
> GUC.

Uh, thanks.  I've just recognized I didn't know the meaning of "nuisance."  
I've looked up the meaning in the dictionary.  Nuisance is like a trouble 
maker...

Why do you think that it's better for VACUUM command to have the option?  I 
think it's a table property whose value is determined based on the application 
workload, not per VACUUM execution.  Rather, I think GUC is more useful to 
determine the behavior of the entire database and/or application.

If we want to change a given execution of VACUUM, then we can ALTER TABLE SET, 
VACUUM, and ALTER TABLE SET back.


Regards
Takayuki Tsunakawa




RE: Problem with default partition pruning

2019-02-28 Thread Imai, Yoshikazu
Hosoya-san

On Wed, Feb 27, 2019 at 6:51 AM, Yuzuko Hosoya wrote:
> > From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> > Sent: Wednesday, February 27, 2019 11:22 AM
> >
> > Hosoya-san,
> >
> > On 2019/02/22 17:14, Yuzuko Hosoya wrote:
> > > Hi,
> > >
> > > I found the bug of default partition pruning when executing a range
> query.
> > >
> > > -
> > > postgres=# create table test1(id int, val text) partition by range
> > > (id); postgres=# create table test1_1 partition of test1 for values
> > > from (0) to (100); postgres=# create table test1_2 partition of
> > > test1 for values from (150) to (200); postgres=# create table
> > > test1_def partition of test1 default;
> > >
> > > postgres=# explain select * from test1 where id > 0 and id < 30;
> > >QUERY PLAN
> > > 
> > >  Append  (cost=0.00..11.83 rows=59 width=11)
> > >->  Seq Scan on test1_1  (cost=0.00..5.00 rows=58 width=11)
> > >  Filter: ((id > 0) AND (id < 30))
> > >->  Seq Scan on test1_def  (cost=0.00..6.53 rows=1 width=12)
> > >  Filter: ((id > 0) AND (id < 30))
> > > (5 rows)
> > >
> > > There is no need to scan the default partition, but it's scanned.
> > > -
> > >
> > > In the current implement, whether the default partition is scanned
> > > or not is determined according to each condition of given WHERE
> > > clause at get_matching_range_bounds().  In this example,
> > > scan_default is set true according to id > 0 because id >= 200
> > > matches the default partition.  Similarly, according to id < 30,
> scan_default is set true.
> > > Then, these results are combined according to AND/OR at
> perform_pruning_combine_step().
> > > In this case, final result's scan_default is set true.
> > >
> > > The modifications I made are as follows:
> > > - get_matching_range_bounds() determines only offsets of range bounds
> > >   according to each condition
> > > - These results are combined at perform_pruning_combine_step()
> > > - Whether the default partition is scanned or not is determined at
> > >   get_matching_partitions()
> > >
> > > Attached the patch.  Any feedback is greatly appreciated.
> >
> > Thank you for reporting.  Can you please add this to March CF in Bugs
> > category so as not to lose
> track
> > of this?
> >
> > I will try to send review comments soon.
> >
> Thank you for your reply.  I added this to March CF.

I tested with simple use case and I confirmed it works correctly like below.

In case using between clause:
postgres=# create table test1(id int, val text) partition by range (id); 
postgres=# create table test1_1 partition of test1 for values from (0) to 
(100); 
postgres=# create table test1_2 partition of test1 for values from (150) to 
(200);
postgres=# create table test1_def partition of test1 default; 

[HEAD]
postgres=# explain analyze select * from test1 where id between 0 and 50;
QUERY PLAN  
   
---
 Append  (cost=0.00..58.16 rows=12 width=36) (actual time=0.008..0.008 rows=0 
loops=1)
   ->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36) (actual 
time=0.005..0.005 rows=0 loops=1)
 Filter: ((id >= 0) AND (id <= 50))
   ->  Seq Scan on test1_def  (cost=0.00..29.05 rows=6 width=36) (actual 
time=0.002..0.002 rows=0 loops=1)
 Filter: ((id >= 0) AND (id <= 50))


[patched]
postgres=# explain analyze select * from test1 where id between 0 and 50;
   QUERY PLAN   
 
-
 Append  (cost=0.00..29.08 rows=6 width=36) (actual time=0.006..0.006 rows=0 
loops=1)
   ->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36) (actual 
time=0.004..0.005 rows=0 loops=1)
 Filter: ((id >= 0) AND (id <= 50))



I considered about another use case. If default partition contains rows whose 
id = 300 and then we add another partition which have constraints like id >= 
300 and id < 400, I thought we won't scan the rows anymore. But I noticed we 
simply can't add such a partition.

postgres=# insert into test1 values (300);
INSERT 0 1
postgres=# create table test1_3 partition of test1 for values from (300) to 
(400); 
ERROR:  updated partition constraint for default partition "test1_def" would be 
violated by some row


So I haven't come up with bad cases so far :)

--
Yoshikazu Imai 





Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-02-28 Thread Laurenz Albe
Tsunakawa, Takayuki wrote:
> Why do you think that it's better for VACUUM command to have the option?  I 
> think it's a
> table property whose value is determined based on the application workload, 
> not per VACUUM
> execution.  Rather, I think GUC is more useful to determine the behavior of 
> the entire
> database and/or application.

I cannot speak for Alvaro, but I think that many people think that a global 
setting
is too dangerous (I personally don't think so).

And if we don't have a GUC, an option to VACUUM would be convenient for one-time
clean-up of a table where taking a truncation lock would be too disruptive.

> If we want to change a given execution of VACUUM, then we can ALTER TABLE 
> SET, VACUUM,
> and ALTER TABLE SET back.

True. That ALTER TABLE would probably need a SHARE UPDATE EXCLUSIVE lock on the 
table,
and that's no worse than VACUUM itself.

Yours,
Laurenz Albe




extension patch of CREATE OR REPLACE TRIGGER

2019-02-28 Thread Osumi, Takamichi
Dear hackers

Hi there !
One past thread about introducing CREATE OR REPLACE TRIGGER into the syntax
had stopped without complete discussion in terms of LOCK level.

The past thread is this. I'd like to inherit this one.
https://www.postgresql.org/message-id/flat/0B4917A40C80E34BBEC4BE1A7A9AB7E276F5D9%40g01jpexmbkw05#39f3c956d549c134474724d2b775399c
Mr. Tom Lane mentioned that this change requires really careful study in this 
thread.

First of all, please don't forget I don't talk about DO CLAUSE in this thread.
Secondly, Mr. Surafel Temesgen pointed out a bug but it doesn't appear.

Anyway, let's go back to the main topic.
>From my perspective, how CREATE OR REPLACE TRIGGER works is,
when there is no counterpart replaced by a new trigger,
CREATE TRIGGER is processed with SHARE ROW EXCLUSIVE LOCK as usual.

On the other hand, when there's,
REPLACE TRIGGER procedure is executed with ACCESS EXCLUSIVE LOCK.

This feeling comes from my idea
that acquiring ACCESS EXCLUSIVE LOCK when replacing trigger occurs
provides data consistency between transactions and protects concurrent pg_dump.

In order to make this come true, as the first step,
I've made a patch to add CREATE OR REPLACE TRIGGER with some basic tests in 
triggers.sql.

Yet, I'm still wondering which part of LOCK level in this patch should be 
raised to ACCESS EXCLUSIVE LOCK.
Could anyone give me an advise about
how to protect the process of trigger replacement in the way I suggested above ?


Takamichi Osumi


CREATE_OR_REPLACE_TRIGGER_v01.patch
Description: CREATE_OR_REPLACE_TRIGGER_v01.patch


RE: Protect syscache from bloating with negative cache entries

2019-02-28 Thread Ideriha, Takeshi
>From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com]
>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>> I measured the memory context accounting overhead using Tomas's tool
>> palloc_bench, which he made it a while ago in the similar discussion.
>> https://www.postgresql.org/message-id/53f7e83c.3020...@fuzzy.cz
>>
>> This tool is a little bit outdated so I fixed it but basically I
>> followed him.
>> Things I did:
>> - make one MemoryContext
>> - run both palloc() and pfree() for 32kB area 1,000,000 times.
>> - And measure this time

>And are you sure you didn't enable assert checking?
Ah, sorry.. I misconfigured it. 

>I'm afraid the measurement is not correct.  First, the older discussion below 
>shows
>that the accounting overhead is much, much smaller, even with a more complex
>accounting.
>Second, allocation/free of memory > 8 KB calls malloc()/free().  I guess the
>accounting overhead will be more likely to be hidden under the overhead of 
>malloc()
>and free().  What we'd like to know the overhead when malloc() and free() are 
>not
>called.

Here is the average of 50 times measurement. 
Palloc-pfree for 800byte with 1,000,000 times, and 32kB with 1,000,000 times.
I checked malloc is not called at size=800 using gdb.

[Size=800, iter=1,000,000]
Master |15.763
Patched|16.262 (+3%)

[Size=32768, iter=1,000,000]
Master |61.3076
Patched|62.9566 (+2%)

At least compared to previous HashAg version, the overhead is smaller.
It has some overhead but is increase by 2 or 3% a little bit?

Regards,
Takeshi Ideriha


Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Fabien COELHO


Hello Andres,


Note that my concern is not about the page size, but rather that as more
commands may change the cluster status by editing the control file, it would
be better that a postmaster does not start while a pg_rewind or enable
checksum or whatever is in progress, and currently there is a possible race
condition between the read and write that can induce an issue, at least
theoretically.


Seems odd to bring this up in this thread, it really has nothing to do
with the topic.


Indeed. I raised it here because it is in the same area of code and 
Michaël was looking at it.


If we were to want to do more here, ISTM the right approach would use 
the postmaster pid file, not the control file.


ISTM that this just means re-inventing a manual poor-featured 
race-condition-prone lock API around another file, which seems to be 
created more or less only by "pg_ctl", while some other commands use the 
control file (eg pg_rewind, AFAICS).


--
Fabien.

Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

2019-02-28 Thread Ramanarayana
>
> Hi,

I have tested the three issues fixed in patch 001. Array Indexes
issue is still there.Running the following query returns ERROR: more than
one value returned by column XPath expression

SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
PASSING data
COLUMNS
country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
size_text float PATH 'SIZE/text()',
size_text_1 float PATH 'SIZE/text()[1]',
size_text_2 float PATH 'SIZE/text()[2]',
"SIZE" float, size_xml xml PATH 'SIZE')

The other two issues are resolved by this patch.


-- 
Cheers
Ram 4.0


2019-03 Starts Tomorrow

2019-02-28 Thread David Steele

Hackers,

The 2019-03 CF is almost upon us.  The CF will officially start at 00:00 
AoE (12:00 UTC) on Friday, March 1st.


Any large, new patches submitted at the last moment will likely be 
labelled as targeting PG13 and may be pushed off to the 2017-07 CF as 
well.  With the new labeling scheme it is not quite as important to keep 
PG13 patches out of this CF but we'll see how the community feels about 
that as we go.


If you have a patch that has been Waiting on Author without any 
discussion since before the last CF ended then you should submit a new 
patch for the start of the 2019-03 CF.


Happy Hacking!

--
-David
da...@pgmasters.net



Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

2019-02-28 Thread Pavel Stehule
čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana  napsal:

> Hi,
>
> I have tested the three issues fixed in patch 001. Array Indexes
> issue is still there.Running the following query returns ERROR: more than
> one value returned by column XPath expression
>
> SELECT xmltable.*
> FROM (SELECT data FROM xmldata) x,
> LATERAL XMLTABLE('/ROWS/ROW'
> PASSING data
> COLUMNS
> country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
> size_text float PATH 'SIZE/text()',
> size_text_1 float PATH 'SIZE/text()[1]',
> size_text_2 float PATH 'SIZE/text()[2]',
> "SIZE" float, size_xml xml PATH 'SIZE')
>
> The other two issues are resolved by this patch.
>

what patches you are used?

Regards

Pavel


> --
> Cheers
> Ram 4.0
>


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2019-02-28 Thread Etsuro Fujita

Hi Andres,

(2019/02/28 5:33), Andres Freund wrote:

On 2015-05-12 14:24:34 -0400, Tom Lane wrote:

I did a very basic update of your postgres_fdw patch to test this with,
and attach that so that you don't have to repeat the effort.  I'm not sure
whether we want to try to convert that into something committable.  I'm
afraid that the extra round trips involved in doing row locking this way
will be so expensive that no one really wants it for postgres_fdw.  It's
more credible that FDWs operating against local storage would have use
for it.


Fujita-san, do you know of any FDWs that use this?


No, I don't.


I'm currently
converting the EPQ machinery to slots, and in course of that I (with
Horiguchi-san's help), converted RefetchForeignRow to return a slot. But
there's currently no in-core user of this facility...  I guess I can
rebase the preliminary postgres_fdw patch here, but it bitrotted
significantly.


I'll rebase that patch and help the testing, if you want me to.


I also feel like there should be some test coverage for
an API in a nontrivial part of the code...


Yeah, but as mentioned above, the row-locking API is provided for FDWs 
operating against local storage, which we don't have in core, unfortunately.


Best regards,
Etsuro Fujita




Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

2019-02-28 Thread Ramanarayana
Hi,
I applied the following patches

0001-XML-XPath-comments-processing-instructions-array-ind.patch


0002-XML-avoid-xmlStrdup-if-possible.patch



Can you let me know what fix is done in patch 002. I will test that as well?

Regards,
Ram.

On Thu, 28 Feb 2019 at 15:01, Pavel Stehule  wrote:

>
>
> čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana 
> napsal:
>
>> Hi,
>>
>> I have tested the three issues fixed in patch 001. Array Indexes
>> issue is still there.Running the following query returns ERROR: more
>> than one value returned by column XPath expression
>>
>> SELECT xmltable.*
>> FROM (SELECT data FROM xmldata) x,
>> LATERAL XMLTABLE('/ROWS/ROW'
>> PASSING data
>> COLUMNS
>> country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
>> size_text float PATH 'SIZE/text()',
>> size_text_1 float PATH 'SIZE/text()[1]',
>> size_text_2 float PATH 'SIZE/text()[2]',
>> "SIZE" float, size_xml xml PATH 'SIZE')
>>
>> The other two issues are resolved by this patch.
>>
>
> what patches you are used?
>
> Regards
>
> Pavel
>
>
>> --
>> Cheers
>> Ram 4.0
>>
>

-- 
Cheers
Ram 4.0


Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

2019-02-28 Thread Pavel Stehule
čt 28. 2. 2019 v 10:49 odesílatel Ramanarayana  napsal:

> Hi,
> I applied the following patches
>
> 0001-XML-XPath-comments-processing-instructions-array-ind.patch
> 
>
> 0002-XML-avoid-xmlStrdup-if-possible.patch
> 
>
>
> Can you let me know what fix is done in patch 002. I will test that as
> well?
>

I afraid so this patch set was not finished, and is not in current
commitfest

please, check this set https://commitfest.postgresql.org/22/1872/

Regards

Pavel



> Regards,
> Ram.
>
> On Thu, 28 Feb 2019 at 15:01, Pavel Stehule 
> wrote:
>
>>
>>
>> čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana 
>> napsal:
>>
>>> Hi,
>>>
>>> I have tested the three issues fixed in patch 001. Array Indexes
>>> issue is still there.Running the following query returns ERROR: more
>>> than one value returned by column XPath expression
>>>
>>> SELECT xmltable.*
>>> FROM (SELECT data FROM xmldata) x,
>>> LATERAL XMLTABLE('/ROWS/ROW'
>>> PASSING data
>>> COLUMNS
>>> country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
>>> size_text float PATH 'SIZE/text()',
>>> size_text_1 float PATH 'SIZE/text()[1]',
>>> size_text_2 float PATH 'SIZE/text()[2]',
>>> "SIZE" float, size_xml xml PATH 'SIZE')
>>>
>>> The other two issues are resolved by this patch.
>>>
>>
>> what patches you are used?
>>
>> Regards
>>
>> Pavel
>>
>>
>>> --
>>> Cheers
>>> Ram 4.0
>>>
>>
>
> --
> Cheers
> Ram 4.0
>


Re: FETCH FIRST clause PERCENT option

2019-02-28 Thread Surafel Temesgen
On Sun, Feb 24, 2019 at 12:27 AM Tomas Vondra 
wrote:

>
> I'm sorry, I still don't understand what the supposed problem is. I
> don't think it's all that different from what nodeMaterial.c does, for
> example.
>
>
sorry for the noise .Attache is complete patch for incremental approach

regards
Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..e3ce4d7e36 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,8 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+with PERCENT count specifies the maximum number of rows to return 
+in percentage.ROW
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..ca957901e3 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -21,6 +21,8 @@
 
 #include "postgres.h"
 
+#include 
+
 #include "executor/executor.h"
 #include "executor/nodeLimit.h"
 #include "miscadmin.h"
@@ -44,6 +46,9 @@ ExecLimit(PlanState *pstate)
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
+	TupleDesc   tupleDescriptor;
+	slot = node->subSlot;
+	tupleDescriptor = node->ps.ps_ResultTupleDesc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -81,7 +86,15 @@ ExecLimit(PlanState *pstate)
 			/*
 			 * Check for empty window; if so, treat like empty subplan.
 			 */
-			if (node->count <= 0 && !node->noCount)
+			if (node->limitOption == PERCENTAGE)
+			{
+if (node->percent == 0.0)
+{
+	node->lstate = LIMIT_EMPTY;
+	return NULL;
+}
+			}
+			else if (node->count <= 0 && !node->noCount)
 			{
 node->lstate = LIMIT_EMPTY;
 return NULL;
@@ -107,6 +120,15 @@ ExecLimit(PlanState *pstate)
 	break;
 			}
 
+			/*
+			 * We may needed this tuple in backward scan so put it into tuplestore.
+			 */
+			if (node->limitOption == PERCENTAGE)
+			{
+tuplestore_puttupleslot(node->tuple_store, slot);
+tuplestore_advance(node->tuple_store, true);
+			}
+
 			/*
 			 * Okay, we have the first tuple of the window.
 			 */
@@ -130,6 +152,72 @@ ExecLimit(PlanState *pstate)
  * advancing the subplan or the position variable; but change
  * the state machine state to record having done so.
  */
+
+/*
+ * In case of coming back from backward scan the tuple is already
+ * in tuple store.
+ */
+if (node->limitOption == PERCENTAGE && node->backwardPosition > 0)
+{
+	slot = MakeSingleTupleTableSlot(tupleDescriptor, &TTSOpsMinimalTuple);
+	if (tuplestore_gettupleslot(node->tuple_store, true, false, slot))
+	{
+		node->subSlot = slot;
+		node->position++;
+		node->backwardPosition--;
+		return slot;
+	}
+	else
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+}
+
+/*
+ * In PERCENTAGE case no need of executing outerPlan multiple times.
+ */
+if (node->limitOption ==  PERCENTAGE && node->reachEnd)
+{
+	node->lstate = LIMIT_WINDOWEND;
+
+	/*
+	 * If we know we won't need to back up, we can release
+	 * resources at this point.
+	 */
+	if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+		(void) ExecShutdownNode(outerPlan);
+
+	return NULL;
+}
+
+/*
+ * When in percentage mode, we need to see if we can get any
+ * additional rows from the subplan (enough to increase the
+ * node->count value).
+ */
+if (node->limitOption == PERCENTAGE)
+{
+	/* loop until the node->count increments */
+	while (node->position - node->offset >= node->count)
+	{
+		int64	cnt;
+
+		slot = ExecProcNode(outerPlan);
+		if (TupIsNull(slot))
+		{
+			node->reachEnd = true;
+			break;
+		}
+
+		tuplestore_puttupleslot(node->tuple_store, slot);
+
+		cnt = tuplestore_tuple_count(node->tuple_store);
+
+		node->count = ceil(node->percent * cnt / 100.0);
+	}
+}
+
 if (!node->noCount &&
 	node->position - node->offs

Re: extension patch of CREATE OR REPLACE TRIGGER

2019-02-28 Thread David Rowley
On Thu, 28 Feb 2019 at 21:44, Osumi, Takamichi
 wrote:
> I've made a patch to add CREATE OR REPLACE TRIGGER with some basic tests in 
> triggers.sql.

Hi,

I see there are two patch entries in the commitfest for this.  Is that
a mistake? If so can you "Withdraw" one of them?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: FETCH FIRST clause PERCENT option

2019-02-28 Thread Kyotaro HORIGUCHI
Hello.

At Sat, 23 Feb 2019 22:27:44 +0100, Tomas Vondra  
wrote in <81a5c0e9-c17d-28f3-4647-8a4659cdf...@2ndquadrant.com>
> 
> 
> On 2/23/19 8:53 AM, Surafel Temesgen wrote:
> > 
> > 
> > On Sun, Feb 10, 2019 at 2:22 AM Tomas Vondra
> > mailto:tomas.von...@2ndquadrant.com>> wrote:
> >  
> > 
> > 
> > I'm not sure I understand - are you saying every time the user does a
> > FETCH, we have to run the outer plan from scratch? I don't see why would
> > that be necessary? And if it is, how come there's no noticeable
> > performance difference?
> > 
> > Can you share a patch implementing the incremental approach, and a query
> > demonstrating the issue?
> > 
> > 
> > I didn't implement it but its obvious that it doesn't work similarly
> > with previous approach.
> > 
> 
> Sure, but that's hardly a sufficient argument for the current approach.
> 
> > We need different implementation and my plan was to use tuplestore per
> > call and clear
> > 
> > it after returning tuple but I see that the plan will not go far because
> > mainly the last returned
> > 
> > slot is not the last slot we get from outerPlan execution
> > 
> 
> I'm sorry, I still don't understand what the supposed problem is. I
> don't think it's all that different from what nodeMaterial.c does, for
> example.
> 
> As I explained before, having to execute the outer plan till completion
> before returning any tuples is an issue. So either it needs fixing or an
> explanation why it's not an issue.

One biggest issue seems to be we don't know the total number of
outer tuples before actually reading a null tuple. I doubt of
general shortcut for that. It also seems preventing limit node
from just using materialized outer.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: FETCH FIRST clause PERCENT option

2019-02-28 Thread Kyotaro HORIGUCHI
At Thu, 28 Feb 2019 13:32:17 +0300, Surafel Temesgen  
wrote in 
> On Sun, Feb 24, 2019 at 12:27 AM Tomas Vondra 
> wrote:
> 
> >
> > I'm sorry, I still don't understand what the supposed problem is. I
> > don't think it's all that different from what nodeMaterial.c does, for
> > example.
> >
> >
> sorry for the noise .Attache is complete patch for incremental approach

Mmm. It seems just moving
reading-all-before-return-the-first-tuple from LIMIT_INITIAL to
LIMIT_RESCAN. If I read it correctly node->conut is not correct
since it is calculated as "the percentage of the number of tuples
read *so far* excluding the offset portion.).


|  while (node->position - node->offset >= node->count)
|  {
| slot = ExecProcNode(outerPlan);
| (if Tupleis Null then break;)
| tuplestore_puttupleslot(node->tuple_store, slot);
| cnt = tuplestore_tuple_count(node->tuple_store);
| node->count = ceil(node->percent * cnt / 100.0);
|  }

If I'm missing nothing, this seems almost same with the follows.

  while ((position - offset) * percent / 100 >=
 (position - offset) * percent / 100) {}


In the first place, the patch breaks nodeLimit.c and build fails.


-* previous time we got a different result.
+* previous time we got a different result.In PERCENTAGE option there 
are 
+* no bound on the number of output tuples */
 */


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Joe Conway
On 2/27/19 7:54 PM, Michael Paquier wrote:
> On Wed, Feb 27, 2019 at 07:45:11PM -0500, Joe Conway wrote:
>> It seems to me that OpenTransientFile() is more appropriate. Patch done
>> that way attached.
> 
> Works for me, thanks for sending a patch!  While on it, could you
> clean up the comment on top of get_controlfile()?  "char" is mentioned
> instead of "const char" for DataDir which is incorrect.  I would
> remove the complete set of arguments from the description and just
> keep the routine name.

Sure, will do. What are your thoughts on backpatching? This seems
unlikely to be a practical concern in the field, so my inclination is a
master only fix.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 07:11:04AM -0500, Joe Conway wrote:
> Sure, will do. What are your thoughts on backpatching? This seems
> unlikely to be a practical concern in the field, so my inclination is a
> master only fix.

I agree that this would unlikely become an issue as an error on the
control file would most likely be a PANIC when updating it, so fixing
only HEAD sounds fine to me.  Thanks for asking!
--
Michael


signature.asc
Description: PGP signature


Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

2019-02-28 Thread Pavel Stehule
čt 28. 2. 2019 v 10:31 odesílatel Pavel Stehule 
napsal:

>
>
> čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana 
> napsal:
>
>> Hi,
>>
>> I have tested the three issues fixed in patch 001. Array Indexes
>> issue is still there.Running the following query returns ERROR: more
>> than one value returned by column XPath expression
>>
>> SELECT xmltable.*
>> FROM (SELECT data FROM xmldata) x,
>> LATERAL XMLTABLE('/ROWS/ROW'
>> PASSING data
>> COLUMNS
>> country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
>> size_text float PATH 'SIZE/text()',
>> size_text_1 float PATH 'SIZE/text()[1]',
>> size_text_2 float PATH 'SIZE/text()[2]',
>> "SIZE" float, size_xml xml PATH 'SIZE')
>>
>> The other two issues are resolved by this patch.
>>
>
I tested xmltable-xpath-result-processing-bugfix-6.patch

and it is working

postgres=# SELECT  xmltable.*
postgres-#FROM (SELECT data FROM xmldata) x,
postgres-# LATERAL XMLTABLE('/ROWS/ROW'
postgres(#  PASSING data
postgres(#  COLUMNS
postgres(#   country_name text PATH
'COUNTRY_NAME/text()' NOT NULL,
postgres(#   size_text float PATH
'SIZE/text()',
postgres(#   size_text_1 float PATH
'SIZE/text()[1]',
postgres(#   size_text_2 float PATH
'SIZE/text()[2]',
postgres(#   "SIZE" float, size_xml xml
PATH 'SIZE')  ;
┌──┬───┬─┬─┬──┬┐

│ country_name │ size_text │ size_text_1 │ size_text_2 │ SIZE │
size_xml  │
╞══╪═══╪═╪═╪══╪╡

│ Australia│ ∅ │   ∅ │   ∅ │∅ │
∅  │
│ China│ ∅ │   ∅ │   ∅ │∅ │
∅  │
│ HongKong │ ∅ │   ∅ │   ∅ │∅ │
∅  │
│ India│ ∅ │   ∅ │   ∅ │∅ │
∅  │
│ Japan│ ∅ │   ∅ │   ∅ │∅ │
∅  │
│ Singapore│   791 │ 791 │   ∅ │  791 │ 791 │
└──┴───┴─┴─┴──┴┘

(6 rows)

Regards

Pavel


>
> what patches you are used?
>
> Regards
>
> Pavel
>
>
>> --
>> Cheers
>> Ram 4.0
>>
>


Re: Libpq support to connect to standby server as priority

2019-02-28 Thread Dave Cramer
> Now I will add the another parameter target_server_type to choose the
> primary, standby or prefer-standby
> as discussed in the upthreads with a new GUC variable.
>


So just to further confuse things here is a use case for "preferPrimary"

This is from the pgjdbc list.

"if the master instance fails, we would like the driver to communicate with
the secondary instance for read-only operations before the failover process
is commenced. The second use-case is when the master instance is
deliberately shut down for maintenance reasons and we do not want to fail
over to the secondary instance, but instead allow it to process user
queries throughout the maintenance."


see this for the thread.
https://www.postgresql.org/message-id/VI1PR05MB5295AE43EF9525EACC9E57ECBC750%40VI1PR05MB5295.eurprd05.prod.outlook.com

Dave Cramer

da...@postgresintl.com
www.postgresintl.com


Re: pgbench MAX_ARGS

2019-02-28 Thread David Rowley
On Wed, 27 Feb 2019 at 01:57, Simon Riggs  wrote:
> I've put it as 256 args now.

I had a look at this and I see you've added some docs to mention the
number of parameters that are allowed; good.

+   pgbench supports up to 256 variables in one
+   statement.

However, the code does not allow 256 variables as the documents claim.
Per >= in:

  if (cmd->argc >= MAX_ARGS)
  {
  fprintf(stderr, "statement has too many arguments (maximum is %d): %s\n",

For it to be 256 that would have to be > MAX_ARGS.

I also don't agree with this change:

- MAX_ARGS - 1, cmd->lines.data);
+ MAX_ARGS, cmd->lines.data);

The 0th element of the argv array was for the sql, per:

cmd->argv[0] = sql;

then the 9 others were for the variables, so the MAX_ARGS - 1 was
correct originally. I think some comments in the area to explain the
0th is for the sql would be a good idea too, that might stop any
confusion in the future. I see that's documented in the struct header
comment, but maybe worth a small note around that error message just
to confirm the - 1 is not a mistake, and neither is the >= MAX_ARGS.

Probably it's fine to define MAX_ARGS to 256 then put back the
MAX_ARGS - 1 code so that we complain if we get more than 255
unless 256 is really needed, of course, in which case MAX_ARGS will
need to be 257.

The test also seems to test that 256 variables in a statement gives an
error. That contradicts the documents that have been added, which say
256 is the maximum allowed.

Setting to WoA

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Online verification of checksums

2019-02-28 Thread Fabien COELHO



Hallo Mickael,


So I have now changed behaviour so that short writes count as skipped
files and pg_verify_checksums no longer bails out on them. When this
occors a warning is written to stderr and their overall count is also
reported at the end. However, unless there are other blocks with bad
checksums, the exit status is kept at zero.


This seems fair when online, however I'm wondering whether it is when 
offline. I'd say that the whole retry logic should be skipped in this 
case? i.e. "if (block_retry || !online) { error message and continue }"

on both short read & checksum failure retries.


New patch attached.


Patch applies cleanly, compiles, global & local make check ok.

I'm wondering whether it should exit(1) on "lseek" failures. Would it make 
sense to skip the file and report it as such? Should it be counted as a 
skippedfile?


WRT the final status, ISTM that slippedblocks & files could warrant an 
error when offline, although they might be ok when online?


--
Fabien.



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Robert Haas
On Wed, Feb 27, 2019 at 10:41 PM Tom Lane  wrote:
> In short, this situation may look fine from the perspective of a committer
> with a relatively short timeline to commit, but it's pretty darn awful for
> everybody else.  The only way to avoid a ~ 50% failure rate is to choose
> OIDs above 6K, and once everybody starts doing it like that, things are
> going to get very unpleasant very quickly.

The root problem here from the perspective of a non-committer is not
that they might have to renumber OIDs a few times over the year or two
it takes to get their patch merged, but rather that it takes a year or
two to get their patch merged.  That's not to say that I have no
sympathy with people in that situation or don't want to make their
lives easier, but I'm not really convinced that burdening committers
with additional manual steps is the right way to get patches merged
faster.  This seems like a big piece of new mechanism being invented
to solve an occasional annoyance. Your statistics are not convincing
at all: you're arguing that this is a big problem because 2-3% of
pending patches current have an issue here, and some others have in
the past, but that's a really small percentage, and the time spent
doing OID renumbering must be a tiny percentage of the total time
anyone spends hacking on PostgreSQL.

I think that the problem here is have a very limited range of OIDs
(10k) which can be used for this purpose, and the number of OIDs that
are used in that space is now a significant fraction of the total
(>4.5k), and the problem is further complicated by the desire to keep
the OIDs assigned near the low end of the available numbering space
and/or near to other OIDs used for similar purposes.  The sheer fact
that the address space is nearly half-used means that conflicts are
likely even if people choose OIDs at random, and when people choose
OIDs non-randomly -- lowest, round numbers, near to other OIDs -- the
chances of conflicts just go up.

We could fix that problem by caring less about keeping all the numbers
gapless and increasing the size of the reserved space to say 100k, but
just as a thought, what if we stopped assigning manual OIDs for new
catalog entries altogether, except for once at the end of each release
cycle?  Make a way that people can add an entry to pg_proc.dat or
whatever without fixing an OID, and let the build scripts generate
one.  As many patches as happen during a release cycle will add new
such entries and they'll just all get some OID or other.  Then, at the
end of the release cycle, we'll run a script that finds all of those
catalog entries and rewrites the .dat files, adding a permanent OID
assignment to each one, so that those OIDs will then be fixed for all
future releases (unless we drop the entries or explicitly change
something).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: BUG #15623: Inconsistent use of default for updatable view

2019-02-28 Thread Dean Rasheed
On Thu, 28 Feb 2019 at 07:47, Amit Langote
 wrote:
>
> +if (attrno == 0)
> +elog(ERROR, "Cannot set value in column %d to
> DEFAULT", i);
>
> Maybe: s/Cannot/cannot/g
>

Ah yes, you're right. That is the convention.


> +Assert(list_length(sublist) == numattrs);
>
> Isn't this Assert useless, because we're setting numattrs to
> list_length() and transformInsertStmt ensures that all
> sublists are same length?
>

Well possibly I'm being over-paranoid, but given that it may have come
via a previous invocation of rewriteValuesRTE() that may have
completely rebuilt the lists, it seemed best to be sure that it hadn't
done something unexpected. It's about to use that to read from the
attrnos array, so it might read beyond the array bounds if any of the
prior logic was faulty.

Thanks for looking.

Regards,
Dean



Re: Removing unneeded self joins

2019-02-28 Thread Alexander Kuzmenkov

Hi Tom,

Thanks for the update.


On 2/22/19 03:25, Tom Lane wrote:

* My compiler was bitching about misplaced declarations, so I moved
some variable declarations accordingly.  I couldn't help noticing
that many of those wouldn't have been a problem in the first place
if you were following project style for loops around list_delete_cell
calls, which usually look more like this:



Changed them to conform to project style. After a couple of segfaults I 
remembered why I wrote them the way I did: you can't use plain 
'continue' with our style of loops and also have to update prev, and I 
can't seem to remember to do that.




I notice though that there's one unexplained plan change remaining
in join.out:

@@ -4365,11 +4365,13 @@ explain (costs off)
  select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
-QUERY PLAN
---
+   QUERY PLAN
+
   Result
 One-Time Filter: false
-(2 rows)
+   ->  Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
  
  -- bug 5255: this is not optimizable by join removal

  begin;

That sure looks like a bug.  I don't have time to look for the
cause right now.



David also asked about this before. This is the same plan you'd get for 
'select * from parent where k = 1 and k = 2', and this plan is exposed 
by join removal. So this is not a bug in join removal itself.




I also noticed that the test results show that when a table
is successfully optimized away, the remaining reference seems
to have the alias of the second reference not the first one.
That seems a little ... weird.  It's just cosmetic of course, but
why is that?



Can you point me to a query that looks wrong?



Also, I did notice that you'd stuck a declaration for
"struct UniqueIndexInfo" into paths.h, which then compelled you
to include that header in planmain.h.  This seems like poor style;
I'd have been inclined to put the struct in pathnodes.h instead.



Moved.



I wonder why you didn't include the Relids into UniqueIndexInfo as well
... and maybe make it a proper Node so that unique_for_rels could be
printed by outfuncs.c.



We also prove and cache the uniqueness for subqueries, for which the 
UniqueIndexInfo is not relevant, that's why it was optional and stored 
in a parallel list. Now I changed it to UniqueRelInfo which always has 
outerrelids and optionally the unique index.



I also fixed a bug with not updating the references in HAVING clause. 
New version is attached.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 5430a9ef35f5e21dade56fc5573ed7f377dce757 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov 
Date: Thu, 28 Feb 2019 17:06:45 +0300
Subject: [PATCH v11] Remove self joins on a unique column

---
 src/backend/nodes/outfuncs.c  |  19 +-
 src/backend/optimizer/path/indxpath.c |  26 +-
 src/backend/optimizer/path/joinpath.c |   6 +-
 src/backend/optimizer/plan/analyzejoins.c | 968 +++---
 src/backend/optimizer/plan/planmain.c |   7 +-
 src/backend/optimizer/util/pathnode.c |   3 +-
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/pathnodes.h |  21 +-
 src/include/optimizer/pathnode.h  |   5 +
 src/include/optimizer/paths.h |   4 +-
 src/include/optimizer/planmain.h  |  10 +-
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 219 ++-
 src/test/regress/sql/equivclass.sql   |  17 +
 src/test/regress/sql/join.sql |  94 +++
 16 files changed, 1359 insertions(+), 99 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 65302fe..5ffe3ba 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2261,7 +2261,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
 	/* we don't try to print fdwroutine or fdw_private */
-	/* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+	WRITE_NODE_FIELD(unique_for_rels);
+	/* can't print non_unique_for_rels; BMSes aren't Nodes */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_UINT_FIELD(baserestrict_min_security);
 	WRITE_NODE_FIELD(joininfo);
@@ -2334,6 +2335,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
 }
 
 static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+	WRITE_NODE_TYPE("UNIQUERELINFO");
+
+	WRITE_BITMAPSET_FIELD(outerrelids);
+	if (node->index)
+	{
+		WRITE_OID_FIELD(index->indexoid);
+		WRITE_NODE_FIELD(clauses);
+	}
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -4085,6 +4099,9 @@ outNode(StringInfo str, const void *obj)
 			

Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Robert Haas
On Wed, Feb 27, 2019 at 6:03 PM Joe Conway  wrote:
> Patch for discussion attached.

So... you're just going to replace ALL error messages of any kind with
"ERROR: missing error text" when this option is enabled?  That sounds
unusable.  I mean if I'm reading it right this would get not only
messages from SQL-callable functions but also things like "deadlock
detected" and "could not read block %u in file %s" and "database is
not accepting commands to avoid wraparound data loss in database with
OID %u".  You can't even shut it off conveniently, because the way
you've designed it it has to be PGC_POSTMASTER to avoid TOCTTOU
vulnerabilities.  Maybe I'm misreading the patch?

I don't think it would be crazy to have a mode where we try to redact
the particular error messages that might leak information, but I think
we'd need to make it only those.  A wild idea might be to let
proleakproof take on three values: yes, no, and maybe.  When 'maybe'
functions are involved, we tell them whether or not the current query
involves any security barriers, and if so they self-censor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Drop type "smgr"?

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 1:03 AM Thomas Munro  wrote:
> The type smgr has only one value 'magnetic disk'.  ~15 years ago it
> also had a value 'main memory', and in Berkeley POSTGRES 4.2 there was
> a third value 'sony jukebox'.  Back then, all tables had an associated
> block storage manager, and it was recorded as an attribute relsmgr of
> pg_class (or pg_relation as it was known further back).  This was the
> type of that attribute, removed by Bruce in 3fa2bb31 (1997).
>
> Nothing seems to break if you remove it (except for some tests using
> it in an incidental way).  See attached.

FWIW, +1 from me.  I thought about arguing to remove this a number of
years ago when I was poking around in this area for some reason, but
it didn't seem important enough to be worth arguing about then.  Now,
because we're actually going to maybe-hopefully get some more smgrs
that do interesting things, it seems worth the arguing...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Optimze usage of immutable functions as relation

2019-02-28 Thread Alexander Kuzmenkov

On 2/18/19 03:20, Tom Lane wrote:

The dummy-relation stuff I referred to has now been merged, so there's
really no good reason not to revise the patch along that line.



I'll try to post the revised implementation soon.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Remove Deprecated Exclusive Backup Mode

2019-02-28 Thread Fujii Masao
On Wed, Feb 27, 2019 at 4:35 PM Laurenz Albe  wrote:
>
> Fujii Masao wrote:
> > So, let me clarify the situations;
> >
> > (3) If backup_label.pending exists but recovery.signal doesn't, the server
> >ignores (or removes) backup_label.pending and do the recovery
> >starting the pg_control's REDO location. This case can happen if
> >the server crashes while an exclusive backup is in progress.
> >So crash-in-the-middle-of-backup doesn't prevent the recovery from
> >starting in this idea
>
> That's where I see the problem with your idea.
>
> It is fairly easy for someone to restore a backup and then just start
> the server without first creating "recovery.signal", and that would
> lead to data corruption.

Isn't this case problematic even when a backup was taken by pg_basebackup?
Because of lack of recovery.signal, no archived WAL files are replayed and
the database may not reach to the latest point.

Regards,

-- 
Fujii Masao



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 9:12 AM, Robert Haas wrote:
> On Wed, Feb 27, 2019 at 6:03 PM Joe Conway  wrote:
>> Patch for discussion attached.
> 
> So... you're just going to replace ALL error messages of any kind with
> "ERROR: missing error text" when this option is enabled?  That sounds
> unusable.  I mean if I'm reading it right this would get not only
> messages from SQL-callable functions but also things like "deadlock
> detected" and "could not read block %u in file %s" and "database is
> not accepting commands to avoid wraparound data loss in database with
> OID %u".  You can't even shut it off conveniently, because the way
> you've designed it it has to be PGC_POSTMASTER to avoid TOCTTOU
> vulnerabilities.  Maybe I'm misreading the patch?

You have it correct.

I completely disagree that is is unusable though. The way I envision
this is that you enable force_leakproof on your development machine
without suppress_client_messages being turned on. Do your debugging there.

On production, both are turned on. You still get full unredacted
messages in your pg log. The client on a prod system does not need these
details. If you *really* need to, you can restart to turn it on for a
short while on prod, but hopefully you have a non prod system where you
reproduce issues for debugging anyway.

I am not married to making this only changeable via restart though --
that's why I posted the patch for discussion. Perhaps a superuserset
would be better so debugging could be done on one session only on the
prod machine.

> I don't think it would be crazy to have a mode where we try to redact
> the particular error messages that might leak information, but I think
> we'd need to make it only those.  A wild idea might be to let
> proleakproof take on three values: yes, no, and maybe.  When 'maybe'
> functions are involved, we tell them whether or not the current query
> involves any security barriers, and if so they self-censor.

Again, I disagree. See above -- you have all you need in the server logs.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Dean Rasheed
On Thu, 28 Feb 2019 at 14:13, Robert Haas  wrote:
> A wild idea might be to let
> proleakproof take on three values: yes, no, and maybe.  When 'maybe'
> functions are involved, we tell them whether or not the current query
> involves any security barriers, and if so they self-censor.
>

Does self-censoring mean that they might still throw an error for some
inputs, but that error won't reveal any information about the input
values? That's not entirely consistent with my understanding of the
definition of leakproof, but maybe there are situations where that
amount of information leakage would be OK. So maybe we could have
"strictly leakproof" functions that never throw errors and "weakly
leakproof" functions (needs a better name) that can throw errors, as
long as those errors don't include data values. Then we could allow
strict and weak security barriers on a per-table basis, depending on
how sensitive the data is in each table (I'm not a fan of using GUCs
to control this).

Regards,
Dean



Re: Remove Deprecated Exclusive Backup Mode

2019-02-28 Thread Stephen Frost
Greetings,

* Fujii Masao (masao.fu...@gmail.com) wrote:
> On Wed, Feb 27, 2019 at 4:35 PM Laurenz Albe  wrote:
> >
> > Fujii Masao wrote:
> > > So, let me clarify the situations;
> > >
> > > (3) If backup_label.pending exists but recovery.signal doesn't, the server
> > >ignores (or removes) backup_label.pending and do the recovery
> > >starting the pg_control's REDO location. This case can happen if
> > >the server crashes while an exclusive backup is in progress.
> > >So crash-in-the-middle-of-backup doesn't prevent the recovery from
> > >starting in this idea
> >
> > That's where I see the problem with your idea.
> >
> > It is fairly easy for someone to restore a backup and then just start
> > the server without first creating "recovery.signal", and that would
> > lead to data corruption.
> 
> Isn't this case problematic even when a backup was taken by pg_basebackup?
> Because of lack of recovery.signal, no archived WAL files are replayed and
> the database may not reach to the latest point.

There's a number of problems, imv, that I think have just been
ignored/missed when it comes to the recovery.conf removal and tools like
pg_basebackup, see this msg also:

https://postgr.es/m/20181127153405.gx3...@tamriel.snowman.net

At this point, I think we need to put these issues on the v12 open items
page so that we don't forget about them and get these things fixed
before v12 comes out.

Thanks!

Stephen


signature.asc
Description: PGP signature


Add exclusive backup deprecation notes to documentation

2019-02-28 Thread David Steele

Hackers,

It has been noted on multiple threads, such as [1], that it would be 
good to have additional notes in the documentation to explain why 
exclusive backups have been deprecated and why they should be avoided 
when possible.


This patch attempts to document the limitations of the exclusive mode.

--
-David
da...@pgmasters.net

[1] 
https://www.postgresql.org/message-id/flat/ac7339ca-3718-3c93-929f-99e725d1172c%40pgmasters.net
From 68beedac7e529a81ebf58ff40d30b28e404b30f5 Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Thu, 28 Feb 2019 16:55:42 +0200
Subject: [PATCH] Add exclusive backup deprecation notes to documentation.

Update the exclusive backup documentation to explain the limitations of the 
exclusive backup mode and make it clear that the feature is deprecated.
---
 doc/src/sgml/backup.sgml | 42 
 1 file changed, 42 insertions(+)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..de1a8a5826 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -948,6 +948,48 @@ SELECT * FROM pg_stop_backup(false, true);


 Making an exclusive low level backup
+
+  
+   
+ The exclusive backup method is deprecated and should be avoided in favor
+ of the non-exclusive backup method or
+ pg_basebackup.
+   
+
+   
+ The primary issue with the exclusive method is that the
+ backup_label file is written into the data directory
+ when pg_start_backup is called and remains until
+ pg_stop_backup is called.  If
+ PostgreSQL or the host terminates abnormally
+ then backup_label will be left in the data directory
+ and PostgreSQL will not start. A log message
+ recommends that backup_label be removed if not
+ restoring from a backup.
+   
+
+   
+ However, if backup_label is present because a restore
+ is actually in progress, then removing it will result in corruption.  For
+ this reason it is not recommended to automate the removal of
+ backup_label.
+   
+
+   
+ Another issue with exclusive backup mode is that it will continue until
+ pg_stop_backup is called, even if the calling process
+ is no longer performing the backup.  The next time
+ pg_start_backup is called it will fail unless
+ pg_stop_backup is called manually first.
+   
+
+   
+ Finally, only one exclusive backup may be run at a time.  However, it is
+ possible to run an exclusive backup at the same time as any number of
+ non-exclusive backups.
+   
+  
+
 
  The process for an exclusive backup is mostly the same as for a
  non-exclusive one, but it differs in a few key steps. This type of backup
-- 
2.17.2 (Apple Git-113)



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Chapman Flack
On 2/28/19 9:52 AM, Dean Rasheed wrote:

> Does self-censoring mean that they might still throw an error for some
> inputs, but that error won't reveal any information about the input
> values? That's not entirely consistent with my understanding of the
> definition of leakproof

That's the question I was also preparing to ask ... I understood the
definition to exclude even the possibility that some inputs could
produce errors.

> amount of information leakage would be OK. So maybe we could have
> "strictly leakproof" functions that never throw errors and "weakly
> leakproof" functions (needs a better name) that can throw errors, as
> long as those errors don't include data values. Then we could allow
> strict and weak security barriers on a per-table basis

Interesting idea. I wonder if the set { strictly, weakly } would be
better viewed as a user-definable set (a site might define "leakproof
wrt HIPAA", "leakproof wrt FERPA", etc.), and then configure which
combination of leakproof properties must apply where.

OTOH, I'd have to wonder about the feasibility of auditing code for
leakproofness at that kind of granularity.

Regards,
-Chap



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joshua Brindle
On Thu, Feb 28, 2019 at 9:12 AM Robert Haas  wrote:

Hi, Robert, it has been a while :)

>
> So... you're just going to replace ALL error messages of any kind with
> "ERROR: missing error text" when this option is enabled?  That sounds
> unusable.  I mean if I'm reading it right this would get not only
> messages from SQL-callable functions but also things like "deadlock
> detected" and "could not read block %u in file %s" and "database is
> not accepting commands to avoid wraparound data loss in database with
> OID %u".  You can't even shut it off conveniently, because the way

This makes complete sense to me. The client side of a client/server
protocol doesn't have any way to fix 'could not read block %u in file
%s', the client doesn't need that kind of detailed information about a
server, and in fact that information could be security sensitive.
Imagine connecting to a webserver with a web browser and getting a
similar message.

When something critical happens the servers logs can be viewed and the
error can be addressed there, on the server.



Re: Remove Deprecated Exclusive Backup Mode

2019-02-28 Thread David Steele

On 2/28/19 4:44 PM, Fujii Masao wrote:

On Wed, Feb 27, 2019 at 4:35 PM Laurenz Albe  wrote:


Fujii Masao wrote:

So, let me clarify the situations;

(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this idea


That's where I see the problem with your idea.

It is fairly easy for someone to restore a backup and then just start
the server without first creating "recovery.signal", and that would
lead to data corruption.


Isn't this case problematic even when a backup was taken by pg_basebackup?
Because of lack of recovery.signal, no archived WAL files are replayed and
the database may not reach to the latest point.


It is problematic because we have a message encouraging users to delete 
backup_label when in fact they should be creating recovery.signal.


If the required WAL is present the cluster will not be corrupted.  It 
just may not play forward as far as you would prefer.


--
-David
da...@pgmasters.net



Re: Drop type "smgr"?

2019-02-28 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Feb 28, 2019 at 7:37 PM Tom Lane  wrote:
>> Thomas Munro  writes:
>>> Our current thinking is that smgropen() should know how to map a small
>>> number of special database OIDs to different smgr implementations

>> Hmm.  Maybe mapping based on tablespaces would be a better idea?

> In the undo log proposal (about which more soon) we are using
> tablespaces for their real purpose, so we need that OID.  If you SET
> undo_tablespaces = foo then future undo data created by your session
> will be written there, which might be useful for putting that IO on
> different storage.

Meh.  That's a point, but it doesn't exactly seem like a killer argument.
Just in the abstract, it seems much more likely to me that people would
want per-database special rels than per-tablespace special rels.  And
I think your notion of a GUC that can control this is probably pie in
the sky anyway: if we can't afford to look into the catalogs to resolve
names at this code level, how are we going to handle a GUC?

The real reason I'm concerned about this, though, is that for either
a database or a tablespace, you can *not* get away with having a magic
OID just hanging in space with no actual catalog row matching it.
If nothing else, you need an entry there to prevent someone from
reusing the OID for another purpose.  And a pg_database row that
doesn't correspond to a real database is going to break all kinds of
code, starting with pg_upgrade and the autovacuum launcher.  Special
rows in pg_tablespace are much less likely to cause issues, because
of the precedent of pg_global and pg_default.

In short, I think you're better off equating smgrs to magic
tablespaces, and if you need some way of letting users control
where those map to storage-wise, control it in some other way.

regards, tom lane



Re: Prevent extension creation in temporary schemas

2019-02-28 Thread Tom Lane
Sergei Kornilov  writes:
>> test=> CREATE EXTENSION file_fdw WITH SCHEMA pg_temp_3;
>> ERROR: function file_fdw_handler() does not exist

> This behavior seems as not related to extensions infrastructure:

Yeah, I think it's just because we won't search the pg_temp schema
for function or operator names, unless the calling SQL command
explicitly writes "pg_temp.foo(...)" or equivalent.  That's an
ancient security decision, which we're unlikely to undo.  It
certainly puts a crimp in the usefulness of putting extensions into
pg_temp, but I don't think it totally destroys the usefulness.
You could still use an extension to package, say, the definitions
of a bunch of temp tables and views that you need to create often.

regards, tom lane



Re: [HACKERS] Incomplete startup packet errors

2019-02-28 Thread Christoph Berg
Re: Magnus Hagander 2016-04-13 

> > >>> It's fairly common to see a lot of "Incomplete startup packet" in the
> > >>> logfiles caused by monitoring or healthcheck connections.
> >
> > >> I've also seen it caused by port scanning.
> >
> > > Yes, definitely. Question there might be if that's actually a case when
> > we
> > > *want* that logging?
> >
> > I should think someone might.  But I doubt we want to introduce another
> > GUC for this.  Would it be okay to downgrade the message to DEBUG1 if
> > zero bytes were received?
> >
> >
> Yeah, that was my suggestion - I think that's a reasonable compromise.  And
> yes, I agree that a separate GUC for it would be a huge overkill.

There have been numerous complaints about that log message, and the
usual reply is always something like what Pavel said recently:

"It is garbage. Usually it means nothing, but better to work live
without this garbage." [1]

[1] 
https://www.postgresql.org/message-id/CAFj8pRDtwsxj63%3DLaWSwA8u7NrU9k9%2BdJtz2gB_0f4SxCM1sQA%40mail.gmail.com

Let's get rid of it.

Christoph
>From 19b4e489fdc974ad8414e960a721694b2c2ee9d5 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Thu, 28 Feb 2019 16:12:46 +0100
Subject: [PATCH] Demote "incomplete startup packet" to DEBUG1

---
 src/backend/postmaster/postmaster.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ccea231e98..2dfb7b6e0a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1907,7 +1907,7 @@ ProcessStartupPacket(Port *port, bool SSLdone)
 		 * don't clutter the log with a complaint.
 		 */
 		if (!SSLdone)
-			ereport(COMMERROR,
+			ereport(DEBUG1,
 	(errcode(ERRCODE_PROTOCOL_VIOLATION),
 	 errmsg("incomplete startup packet")));
 		return STATUS_ERROR;
@@ -1919,7 +1919,7 @@ ProcessStartupPacket(Port *port, bool SSLdone)
 	if (len < (int32) sizeof(ProtocolVersion) ||
 		len > MAX_STARTUP_PACKET_LENGTH)
 	{
-		ereport(COMMERROR,
+		ereport(DEBUG1,
 (errcode(ERRCODE_PROTOCOL_VIOLATION),
  errmsg("invalid length of startup packet")));
 		return STATUS_ERROR;
@@ -1938,7 +1938,7 @@ ProcessStartupPacket(Port *port, bool SSLdone)
 
 	if (pq_getbytes(buf, len) == EOF)
 	{
-		ereport(COMMERROR,
+		ereport(DEBUG1,
 (errcode(ERRCODE_PROTOCOL_VIOLATION),
  errmsg("incomplete startup packet")));
 		return STATUS_ERROR;
-- 
2.20.1



PostgreSQL Participates in GSoC 2019!

2019-02-28 Thread Stephen Frost
Greetings,

I'm pleased to announce that we have been accepted by Google to
participate in the Summer of Code (GSoC) 2019 program. This will be the
12th time that the PostgreSQL Project will provide mentorship for
students to help develop new features for PostgreSQL. We have the chance
to accept student projects that will be developed from May to August.

If you are a student, and want to participate in this year's GSoC,
please watch this Wiki page: https://wiki.postgresql.org/wiki/GSoC

If you are interested in mentoring a student, you can add your own idea
to the project list. Please reach out to the PG GSoC admins, listed
here: https://wiki.postgresql.org/wiki/GSoC

And finally, we ask everyone to reach out to students and universities
and let them know about GSoC.

Thanks!

Stephen
PostgreSQL GSoC 2019 Administrator


signature.asc
Description: PGP signature


Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Tom Lane
Robert Haas  writes:
> This seems like a big piece of new mechanism being invented
> to solve an occasional annoyance. Your statistics are not convincing
> at all: you're arguing that this is a big problem because 2-3% of
> pending patches current have an issue here, and some others have in
> the past, but that's a really small percentage, and the time spent
> doing OID renumbering must be a tiny percentage of the total time
> anyone spends hacking on PostgreSQL.

TBH, I find this position utterly baffling.  It's true that only a
small percentage of patches have an issue here, because only a small
percentage of patches dabble in manually-assigned OIDs at all.  But
*among those that do*, there is a huge problem.  I had not actually
realized how bad it is until I gathered those stats, but it's bad.

I don't understand the objection to inventing a mechanism that will
help those patches and has no impact whatever when working on patches
that don't involve manually-assigned OIDs.

And, yeah, I'd like us not to have patches hanging around for years
either, but that's a reality that's not going away.

> We could fix that problem by caring less about keeping all the numbers
> gapless and increasing the size of the reserved space to say 100k,

We already had this discussion.  Moving FirstNormalObjectId is infeasible
without forcing a dump/reload, which I don't think anyone wants to do.

> but
> just as a thought, what if we stopped assigning manual OIDs for new
> catalog entries altogether, except for once at the end of each release
> cycle?

And that's another idea without any basis in reality.  What are you
going to do instead?  What mechanism will you use to track these
OIDs so you can clean up later?  Who's going to write the code that
will support this?  Not me.  I think the proposal that is on the
table is superior.

regards, tom lane



Re: Drop type "smgr"?

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 10:09 AM Tom Lane  wrote:
> The real reason I'm concerned about this, though, is that for either
> a database or a tablespace, you can *not* get away with having a magic
> OID just hanging in space with no actual catalog row matching it.
> If nothing else, you need an entry there to prevent someone from
> reusing the OID for another purpose.  And a pg_database row that
> doesn't correspond to a real database is going to break all kinds of
> code, starting with pg_upgrade and the autovacuum launcher.  Special
> rows in pg_tablespace are much less likely to cause issues, because
> of the precedent of pg_global and pg_default.

My first intuition was the same as yours -- that we should use the
tablespace to decide which smgr is relevant -- but I now think that
intuition was wrong.  Even if you use the tablespace OID to select the
smgr, it doesn't completely solve the problem you're worried about
here.  You still have to put SOMETHING in the database OID field, and
that's going to be just as fake as it was before.  I guess you could
use the OID for pg_global for things like undo and SLRU data, but then
how is GetRelationPath() going to work?  You don't have enough bits
left anywhere to specify both a tablespace and a location within that
tablespace in a reasonable way, and I think it's far from obvious how
you would build a side channel that could carry that information.

Also, I don't see why we'd need a fake pg_database row in the first
place.  IIUC, the OID counter wraps around to FirstNormalObjectId, so
nobody should have a database with an OID less than that value.

If we were using smgrs to represent other kinds of ways of storing
user tables, e.g. network.c, cloud.c, magtape.c, punchcard.c, etc. -
then I think we'd have to find some way to let each user-defined
tablespace pick an smgr, and I don't really know how that would work
given the fact that we can't really use catalog lookups to figure out
which one to use, but actually the direction here is to store files
internal to the system using special-purpose smgrs so that we can pull
more things into shared_buffers, and for that purpose the database OID
seems cleaner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: Continue work on changes to recovery.conf API

2019-02-28 Thread David Steele

On 11/27/18 4:36 PM, Peter Eisentraut wrote:

On 27/11/2018 13:21, David Steele wrote:

I would prefer a specific file that will be auto-included into
postgresql.conf when present but be ignored when not present.  Some
settings are generally ephemeral (recovery_target_time) and it would be
nice for them to go away.  When recovery is complete the file would be
renamed to .done just as recovery.conf is now.


That might be a useful facility, but it wouldn't really address the
pg_basebackup -R issue, because that creates settings that you don't
want going away in this manner.  You'd then need two separate such
files, one for targeted recovery that goes away when recovery ends, and
one that is automatically included that pg_basebackup can overwrite at will.


I'm not sure why that's true.  Some settings you might prefer to be 
persistent and others not.  If pg_basebackup -R is treating them the 
same then that seems like a limitation.


Endlessly appending settings into postgresql.auto.conf seems confusing 
even if it makes sense to postgres (i.e. last setting wins).


Regards,
--
-David
da...@pgmasters.net



Re: Re: Continue work on changes to recovery.conf API

2019-02-28 Thread David Steele

Hi Peter,

On 11/27/18 5:34 PM, Stephen Frost wrote:

Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:

On 27/11/2018 13:21, David Steele wrote:

I would prefer a specific file that will be auto-included into
postgresql.conf when present but be ignored when not present.  Some
settings are generally ephemeral (recovery_target_time) and it would be
nice for them to go away.  When recovery is complete the file would be
renamed to .done just as recovery.conf is now.


That might be a useful facility, but it wouldn't really address the
pg_basebackup -R issue, because that creates settings that you don't
want going away in this manner.  You'd then need two separate such
files, one for targeted recovery that goes away when recovery ends, and
one that is automatically included that pg_basebackup can overwrite at will.


I've been thinking about this also and I agree that there's some
challenges when it comes to having another file- what happens if someone
does an ALTER SYSTEM on primary_conninfo while it's in the
'recovery.auto.conf' (or whatever) file?  Does that go into
postgresql.auto.conf, or somewhere else?  What if they do a RESET?

Then there's the other side of things- do we really want things like
recovery_target_time being kept around in postgresql.auto.conf after a
promotoion?  Do we want to keep appending primary_conninfo into
postgresql.auto.conf?  I haven't looked but I'm also concerned that
something like ALTER SYSTEM RESET isn't really prepared to find
duplicates in postgresql.auto.conf...

Maybe thinking through what we want to have happen in each scenario
would be good.  If you perform a pg_basebackup -R and there's already
something set in postgresql.auto.conf for primary conninfo- what should
happen?  If we reach the end of recovery and promote while
postgresql.auto.conf has recovery_target_time set, what should happen?
If third-party tools want to do what pg_basebackup -R does and modify
things in postgresql.auto.conf, how should they do that?

Here's my thoughts on answers to the above:

- pg_basebackup -R should INSERT .. ON CONFLICT UPDATE the settings that
   it wants to set in postgresql.auto.conf

- When we reach the end of recovery and promote, we should DELETE from
   the postgresql.auto.conf the recovery target settings.

- External tools should either be told that they can modify
   postgresql.auto.conf and given guideance on how to do so, or we should
   provide a tool which allows them to do so (or maybe both).

As we already have a section for recovery target settings that clearly
has them as independent, hopefully this will make sense to users.  Open
to other ideas too, of course, but I don't think we can continue to just
append things to the end of postgresql.auto.conf when pg_basebackup is
run with -R.


I'd be interested to get your take to these questions.

Just about every third-party backup and HA tool out there writes 
recovery.conf files and automates restores.  This is a huge change.


I personally would prefer to have something like 
postgresql.recovery.conf file that is automatically included if it is 
present.  This simplifies the issue of how to maintain recovery settings 
in postgresql.auto.conf.  The file could be renamed to 
postgresql.recovery.conf.done similar to how recovery.conf is now.


Of course, some settings like primary_conninfo could just stay in 
postgresql.conf or postgresql.auto.conf since they are far less subject 
to change.  Or they could be in postgresql.recovery.conf for HA 
environments.


I don't see why this won't work with pg_basebackup -R -- it just may be 
the case that some settings get overridden.  In HA scenarios it's hard 
to see how pg_basebackup would have the correct info for something like 
primary_conninfo anyway.


I like the flexibility of recovery options being available as GUCs but 
I'm not sure the ramifications of these changes have been completely 
thought through.


Regards,
--
-David
da...@pgmasters.net



Re: Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Tom Lane
Joshua Brindle  writes:
> On Thu, Feb 28, 2019 at 9:12 AM Robert Haas  wrote:
>> So... you're just going to replace ALL error messages of any kind with
>> "ERROR: missing error text" when this option is enabled?  That sounds
>> unusable.  I mean if I'm reading it right this would get not only
>> messages from SQL-callable functions but also things like "deadlock
>> detected" and "could not read block %u in file %s" and "database is
>> not accepting commands to avoid wraparound data loss in database with
>> OID %u".  You can't even shut it off conveniently, because the way

> This makes complete sense to me. The client side of a client/server
> protocol doesn't have any way to fix 'could not read block %u in file
> %s', the client doesn't need that kind of detailed information about a
> server, and in fact that information could be security sensitive.

I agree with Robert that this idea is a nonstarter.  Not only is it
a complete disaster from a usability standpoint, but *it does not
fix the problem*.  The mere fact that an error occurred, or didn't,
is already an information leak.  Sure, you can only extract one bit
per query that way, but a slow leak is still a leak.  See the Spectre
vulnerability for a pretty exact parallel.

The direction I think we're going to need to go in is to weaken our
standards for what we'll consider a leakproof function, and/or try
harder to make common WHERE-clause operators leakproof.  The thread
over at
https://www.postgresql.org/message-id/flat/7DF52167-4379-4A1E-A957-90D774EBDF21%40winand.at
raises the question of why we don't expect that *all* indexable
operators are leakproof, at least with respect to the index column
contents.  (Failing to distinguish which of the inputs can be
leaked seems like a pretty fundamental mistake in hindsight.
For instance, some opclasses can directly index regex operators,
and one would not wish to give up the ability for a regex operator
to complain about an invalid pattern.  But it could be leakproof
with respect to the data side.)

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 10:27 AM Tom Lane  wrote:
> Robert Haas  writes:
> > This seems like a big piece of new mechanism being invented
> > to solve an occasional annoyance. Your statistics are not convincing
> > at all: you're arguing that this is a big problem because 2-3% of
> > pending patches current have an issue here, and some others have in
> > the past, but that's a really small percentage, and the time spent
> > doing OID renumbering must be a tiny percentage of the total time
> > anyone spends hacking on PostgreSQL.
>
> TBH, I find this position utterly baffling.  It's true that only a
> small percentage of patches have an issue here, because only a small
> percentage of patches dabble in manually-assigned OIDs at all.  But
> *among those that do*, there is a huge problem.  I had not actually
> realized how bad it is until I gathered those stats, but it's bad.
>
> I don't understand the objection to inventing a mechanism that will
> help those patches and has no impact whatever when working on patches
> that don't involve manually-assigned OIDs.
>
> And, yeah, I'd like us not to have patches hanging around for years
> either, but that's a reality that's not going away.

I don't think this is the worst proposal ever.  However, I also think
that it's not unreasonable to raise the issue that writing OR
reviewing OR committing a patch already involves adhering to a thicket
of undocumented rules.  When somebody fails to adhere to one of those
rules, they get ignored or publicly shamed.  Now you want to add yet
another step to the process - really two.  If you want to submit a
patch that requires new catalog entries, you must know that you're
supposed to put those OIDs in this new range that we're going to set
aside for such things, and if you want to commit one, you must know
that you're suppose to renumber those OID assignments into some other
range.  And people are going to screw it up - submitters are going to
fail to know about this new policy (which will probably be documented
nowhere, just like all the other ones) - and committers are going to
fail to remember to renumber things.  So, I suspect that for every
unit of work it saves somebody, it's probably going to generate about
one unit of extra work for somebody else.

A lot of projects have a much less painful process for getting patches
integrated than we do.  I don't know how those projects maintain
adequate code quality, but I do know that making it easy to get a
patch accepted makes people more likely to contribute patches, and
increases overall development velocity.  It is not even vaguely
unreasonable to worry about whether making this more complicated is
going to hurt more than it helps, and I don't know why you think
otherwise.

> > but
> > just as a thought, what if we stopped assigning manual OIDs for new
> > catalog entries altogether, except for once at the end of each release
> > cycle?
>
> And that's another idea without any basis in reality.  What are you
> going to do instead?  What mechanism will you use to track these
> OIDs so you can clean up later?

Right now every entry in pg_proc.dat includes an OID assignment.  What
I'm proposing is that we would also allow entries that did not have
one, and the build process would assign one while processing the .dat
files.  Then later, somebody could use a script that went through and
rewrote the .dat file to add OID assignments to any entries that
lacked them.  Since the topic of having tools for automated rewrite of
those files has been discussed at length, and since we already have a
script called reformat_dat_file.pl in the tree which  contains
comments indicating that it could be modified for bulk editing, said
script having been committed BY YOU, I don't understand why you think
that bulk editing is infeasible.

> Who's going to write the code that
> will support this?  Not me.  I think the proposal that is on the
> table is superior.

OK.  Well, I think that doing nothing is superior to this proposal,
for reasons similar to what Peter Eisentraut has already articulated.
And I think rather than blasting forward with your own preferred
alternative in the face of disagreement, you should be willing to
discuss other possible options.  But if you're not willing to do that,
I can't make you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Aggregation push-down

2019-02-28 Thread Antonin Houska
Tom Lane  wrote:

> Antonin Houska  writes:
> > Michael Paquier  wrote:
> >> Latest patch set fails to apply, so moved to next CF, waiting on
> >> author.
> 
> > Rebased.
> 
> This is in need of rebasing again :-(.  I went ahead and pushed the 001
> part, since that seemed fairly uncontroversial.

ok, thanks.

> I did not spend a whole lot of time looking at the patch today, but
> I'm still pretty distressed at the data structures you've chosen.
> I remain of the opinion that a grouped relation and a base relation
> are, er, unrelated, even if they happen to share the same relid set.
> So I do not see the value of the RelOptInfoSet struct you propose here,

ok. As you suggested upthread, I try now to reuse the join_rel_list /
join_rel_hash structures, see v11-001-Introduce_RelInfoList.patch.

> and I definitely don't think there's any value in having, eg,
> create_seqscan_path or create_index_path dealing with this stuff.

Originally I tried to aggregate any path that ever gets passed to agg_path(),
but that's probably not worth the code complexity. Now the partial aggregation
is only applied to paths that survived agg_path() on the plain relation.

> I also don't like changing create_nestloop_path et al to take a PathTarget
> rather than using the RelOptInfo's pathtarget; IMO, it's flat out wrong
> for a path to generate a tlist different from what its parent RelOptInfo
> says that the relation produces.

Likewise, the current patch version is less invasive, so create_nestloop_path
et al are not touched at all.

> I think basically the way this ought to work is that we generate baserel
> paths pretty much the same as today, and then we run through the baserels
> and see which ones are potentially worth doing partial aggregation on,
> and for each one that is, we create a separate "upper relation" RelOptInfo
> describing that.  The paths for this RelOptInfo would be
> partial-aggregation paths using paths from the corresponding baserel as
> input.  Then we'd run a join search that only considers joining grouped
> rels with plain rels (I concur that joining two grouped rels is not worth
> coping with, at least for now).

make_join_rel() is the core of my implementation: besides joining two plain
relations it tries to join plain relation to grouped one, and also to
aggregate the join of the two plain relations. I consider this approach less
invasive and more efficient than running the whole standard_join_search again
for the grouped rels.

The problem of numeric-like data types (i.e. types for wich equality of two
values of the grouping key does not justify putting them into the same group
because information like scale would be discarded this way) remains open. My
last idea was to add a boolean flag to operator class which tells that
equality implies "bitwise equality", and to disallow aggregate push-down if
SortGroupClause.eqop is in an opclass which has this field FALSE. I'd like to
hear your opinion before I do any coding.

-- 
Antonin Houska
https://www.cybertec-postgresql.com

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 6b96e7de0a..4484fb4fbc 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4906,7 +4906,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
 	 */
 	Assert(fpinfo->relation_index == 0);	/* shouldn't be set yet */
 	fpinfo->relation_index =
-		list_length(root->parse->rtable) + list_length(root->join_rel_list);
+		list_length(root->parse->rtable) +
+		list_length(root->join_rel_list->items);
 
 	return true;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 65302fe65b..8b333f069a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2272,6 +2272,14 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 }
 
 static void
+_outRelInfoList(StringInfo str, const RelInfoList *node)
+{
+	WRITE_NODE_TYPE("RELOPTINFOLIST");
+
+	WRITE_NODE_FIELD(items);
+}
+
+static void
 _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 {
 	WRITE_NODE_TYPE("INDEXOPTINFO");
@@ -4031,6 +4039,9 @@ outNode(StringInfo str, const void *obj)
 			case T_RelOptInfo:
 _outRelOptInfo(str, obj);
 break;
+			case T_RelInfoList:
+_outRelInfoList(str, obj);
+break;
 			case T_IndexOptInfo:
 _outIndexOptInfo(str, obj);
 break;
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index e07bab831e..82b4f5c56a 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -92,11 +92,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
 	 *
 	 * join_rel_level[] shouldn't be in use, so just Assert it isn't.
 	 */
-	savelength = list_length(root->join_rel_list);
-	savehash = root->join_rel_hash;
+	savelength = list_length(root->join_rel_list->items);
+	savehash = root->join_rel_list->hash;
 	Assert(root->join_rel_level == NULL);
 
-	root-

Re: Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joshua Brindle
On Thu, Feb 28, 2019 at 10:49 AM Tom Lane  wrote:
>
> Joshua Brindle  writes:
> > On Thu, Feb 28, 2019 at 9:12 AM Robert Haas  wrote:
> >> So... you're just going to replace ALL error messages of any kind with
> >> "ERROR: missing error text" when this option is enabled?  That sounds
> >> unusable.  I mean if I'm reading it right this would get not only
> >> messages from SQL-callable functions but also things like "deadlock
> >> detected" and "could not read block %u in file %s" and "database is
> >> not accepting commands to avoid wraparound data loss in database with
> >> OID %u".  You can't even shut it off conveniently, because the way
>
> > This makes complete sense to me. The client side of a client/server
> > protocol doesn't have any way to fix 'could not read block %u in file
> > %s', the client doesn't need that kind of detailed information about a
> > server, and in fact that information could be security sensitive.
>
> I agree with Robert that this idea is a nonstarter.  Not only is it
> a complete disaster from a usability standpoint, but *it does not
> fix the problem*.  The mere fact that an error occurred, or didn't,
> is already an information leak.  Sure, you can only extract one bit
> per query that way, but a slow leak is still a leak.  See the Spectre
> vulnerability for a pretty exact parallel.

How is leakproof defined WRT Postgres? Generally speaking a 1 bit
error path would be considered a covert channel on most systems and
are relatively slow even compared to e.g., timing channels.

Redacting error information, outside of the global leakproof setting,
seems useful to prevent data leakage to a client on another system,
such as Robert's example above "could not read block %u in file %s".

Although, and Joe may hate me for saying this, I think only the
non-constants should be redacted to keep some level of usability for
regular SQL errors. Maybe system errors like the above should be
removed from client messages in general.

> The direction I think we're going to need to go in is to weaken our
> standards for what we'll consider a leakproof function, and/or try
> harder to make common WHERE-clause operators leakproof.  The thread
> over at
> https://www.postgresql.org/message-id/flat/7DF52167-4379-4A1E-A957-90D774EBDF21%40winand.at
> raises the question of why we don't expect that *all* indexable
> operators are leakproof, at least with respect to the index column
> contents.  (Failing to distinguish which of the inputs can be
> leaked seems like a pretty fundamental mistake in hindsight.
> For instance, some opclasses can directly index regex operators,
> and one would not wish to give up the ability for a regex operator
> to complain about an invalid pattern.  But it could be leakproof
> with respect to the data side.)
>
> regards, tom lane



Re: Drop type "smgr"?

2019-02-28 Thread Tom Lane
Robert Haas  writes:
> My first intuition was the same as yours -- that we should use the
> tablespace to decide which smgr is relevant -- but I now think that
> intuition was wrong.  Even if you use the tablespace OID to select the
> smgr, it doesn't completely solve the problem you're worried about
> here.  You still have to put SOMETHING in the database OID field, and
> that's going to be just as fake as it was before.

My thought was that that could be zero if not relevant ... isn't that
what we do now for buffer tags for shared rels?

> I guess you could
> use the OID for pg_global for things like undo and SLRU data, but then
> how is GetRelationPath() going to work?  You don't have enough bits
> left anywhere to specify both a tablespace and a location within that
> tablespace in a reasonable way, and I think it's far from obvious how
> you would build a side channel that could carry that information.

It's certainly possible/likely that we're going to end up needing to
widen buffer tags to represent the smgr explicitly, because some use
cases are going to need a real database spec, some are going to need
a real tablespace spec, and some might need both.  Maybe we should
just bite that bullet.

regards, tom lane



Re: Add exclusive backup deprecation notes to documentation

2019-02-28 Thread Laurenz Albe
David Steele wrote:
> This patch attempts to document the limitations of the exclusive mode.

Thanks!

> +   
> + The primary issue with the exclusive method is that the
> + backup_label file is written into the data 
> directory
> + when pg_start_backup is called and remains until
> + pg_stop_backup is called.  If
> + PostgreSQL or the host terminates abnormally

There should be a comma at the end of this line.

> + then backup_label will be left in the data 
> directory
> + and PostgreSQL will not start. A log message

You should say "*may* not start", because it will if the WAL segment is still 
there.

> + recommends that backup_label be removed if not
> + restoring from a backup.
> +   

Yours,
Laurenz Albe




Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Chapman Flack
On 2/28/19 11:03 AM, Joshua Brindle wrote:

> How is leakproof defined WRT Postgres? Generally speaking a 1 bit

>From the CREATE FUNCTION reference page:

LEAKPROOF indicates that the function has no side effects. It reveals no
information about its arguments other than by its return value. For
example, a function which *throws an error message for some argument
values but not others*, or which includes the argument values in any
error message, is not leakproof.

(*emphasis* mine.)

Regards,
-Chap



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 11:03 AM, Joshua Brindle wrote:
> On Thu, Feb 28, 2019 at 10:49 AM Tom Lane  wrote:
>>
>> Joshua Brindle  writes:
>> > On Thu, Feb 28, 2019 at 9:12 AM Robert Haas  wrote:
>> >> So... you're just going to replace ALL error messages of any kind with
>> >> "ERROR: missing error text" when this option is enabled?  That sounds
>> >> unusable.  I mean if I'm reading it right this would get not only
>> >> messages from SQL-callable functions but also things like "deadlock
>> >> detected" and "could not read block %u in file %s" and "database is
>> >> not accepting commands to avoid wraparound data loss in database with
>> >> OID %u".  You can't even shut it off conveniently, because the way
>>
>> > This makes complete sense to me. The client side of a client/server
>> > protocol doesn't have any way to fix 'could not read block %u in file
>> > %s', the client doesn't need that kind of detailed information about a
>> > server, and in fact that information could be security sensitive.
>>
>> I agree with Robert that this idea is a nonstarter.  Not only is it
>> a complete disaster from a usability standpoint, but *it does not
>> fix the problem*.  The mere fact that an error occurred, or didn't,
>> is already an information leak.  Sure, you can only extract one bit
>> per query that way, but a slow leak is still a leak.  See the Spectre
>> vulnerability for a pretty exact parallel.
> 
> How is leakproof defined WRT Postgres? Generally speaking a 1 bit
> error path would be considered a covert channel on most systems and
> are relatively slow even compared to e.g., timing channels.

Yes, I am pretty sure there are plenty of very security conscious
environments that would be willing to make this tradeoff in order to get
reliable RLS performance.

> Redacting error information, outside of the global leakproof setting,
> seems useful to prevent data leakage to a client on another system,
> such as Robert's example above "could not read block %u in file %s".

True

> Although, and Joe may hate me for saying this, I think only the
> non-constants should be redacted to keep some level of usability for
> regular SQL errors. Maybe system errors like the above should be
> removed from client messages in general.

I started down this path and it looked fragile. I guess if there is
generally enough support to think this might be viable I could open up
that door again, but I don't want to waste time if the approach is
really a non-starter as stated upthread :-/.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joshua Brindle
On Thu, Feb 28, 2019 at 11:14 AM Joe Conway  wrote:
>
> > Although, and Joe may hate me for saying this, I think only the
> > non-constants should be redacted to keep some level of usability for
> > regular SQL errors. Maybe system errors like the above should be
> > removed from client messages in general.
>
> I started down this path and it looked fragile. I guess if there is
> generally enough support to think this might be viable I could open up
> that door again, but I don't want to waste time if the approach is
> really a non-starter as stated upthread :-/.
>

The only non-starter for Tom was weakening leakproof, right? Can we
keep the suppression, and work on strengthening leakproof as a
separate activity?



Re: Index INCLUDE vs. Bitmap Index Scan

2019-02-28 Thread Tom Lane
I wrote:
> Tomas Vondra  writes:
>> I do recall a discussion about executing expressions on index tuples
>> during IOS (before/without inspecting the heap tuple). I'm too lazy to
>> search for the thread now, but I recall it was somehow related to
>> leak-proof-ness. And AFAICS none of the "div" procs are marked as
>> leak-proof, so perhaps that's one of the reasons?

> Leak-proof-ness is kind of related, perhaps, but it's not quite the property
> we're after here --- at least not to my mind.  It might be an interesting
> discussion exactly what the relationship is.  Meanwhile, we don't have any
> separate concept of functions that are safe to apply to any index entry;
> opclass membership is it.

The other thread about RLS helped me to crystallize the vague feelings
I had about this.  I now think that this is what we're actually assuming:
an indexable operator must be leakproof *with respect to its index-key
input*.  If it is not, it might throw errors or otherwise reveal the
existence of index entries for dead rows, which would be a usability fail
whether or not you are excited about security as such.

On the other hand, it's okay to throw errors that only reveal information
about the non-index input.  For instance, it's not a problem for pg_trgm
to treat regex-match operators as indexable, even though those will throw
error for a malformed pattern input.

So this is indeed related to leakproof-ness, but our current definition
of "leakproof" is too simple to capture the property exactly.

Getting back to the question of this thread, I think we'd have to restrict
any filtering done in advance of heap liveness checks to fully leakproof
functions, since we don't want the filter expression to possibly throw
an error, regardless of which input(s) came from the index.

regards, tom lane



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 11:14 AM Joe Conway  wrote:
> > Although, and Joe may hate me for saying this, I think only the
> > non-constants should be redacted to keep some level of usability for
> > regular SQL errors. Maybe system errors like the above should be
> > removed from client messages in general.
>
> I started down this path and it looked fragile. I guess if there is
> generally enough support to think this might be viable I could open up
> that door again, but I don't want to waste time if the approach is
> really a non-starter as stated upthread :-/.

Hmm.  It seems to me that if there's a function that sometimes throws
an error and other times does not, and if that behavior is dependent
on the input, then even redacting the error message down to 'ERROR:
error' does not remove the leak.  So it seems to me that regardless of
what one thinks about the proposal from a usability perspective, it's
probably not correct from a security standpoint.  Information that
couldn't be leaked until present rules would leak with this change,
when the new GUCs were turned on.

Am I wrong?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 11:37 AM, Robert Haas wrote:
> On Thu, Feb 28, 2019 at 11:14 AM Joe Conway  wrote:
>> > Although, and Joe may hate me for saying this, I think only the
>> > non-constants should be redacted to keep some level of usability for
>> > regular SQL errors. Maybe system errors like the above should be
>> > removed from client messages in general.
>>
>> I started down this path and it looked fragile. I guess if there is
>> generally enough support to think this might be viable I could open up
>> that door again, but I don't want to waste time if the approach is
>> really a non-starter as stated upthread :-/.
> 
> Hmm.  It seems to me that if there's a function that sometimes throws
> an error and other times does not, and if that behavior is dependent
> on the input, then even redacting the error message down to 'ERROR:
> error' does not remove the leak.  So it seems to me that regardless of
> what one thinks about the proposal from a usability perspective, it's
> probably not correct from a security standpoint.  Information that
> couldn't be leaked until present rules would leak with this change,
> when the new GUCs were turned on.
> 
> Am I wrong?


No, and Tom stated as much too, but life is all about tradeoffs. Some
people will find this an acceptable compromise. For those that don't
they don't have to use it. IMHO we tend toward too much nannyism too often.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 11:44 AM Joe Conway  wrote:
> No, and Tom stated as much too, but life is all about tradeoffs. Some
> people will find this an acceptable compromise. For those that don't
> they don't have to use it. IMHO we tend toward too much nannyism too often.

Well, I agree with that, too.

Hmm.  I don't think there's anything preventing you from implementing
this in "userspace," is there?  A logging hook could suppress all
error message text, and you could just mark all functions leakproof
after that, and you'd have this exact behavior in an existing release
with no core code changes, I think.

If you do that, or just stick this patch into your own distro, I would
be interested to hear some experiences from customers (and those who
support them) after some time had gone by.  I find it hard to imagine
delivering customer support in an environment configured this way, but
sometimes my imagination is limited.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 11:50 AM, Robert Haas wrote:
> On Thu, Feb 28, 2019 at 11:44 AM Joe Conway  wrote:
>> No, and Tom stated as much too, but life is all about tradeoffs. Some
>> people will find this an acceptable compromise. For those that don't
>> they don't have to use it. IMHO we tend toward too much nannyism too often.
> 
> Well, I agree with that, too.
> 
> Hmm.  I don't think there's anything preventing you from implementing
> this in "userspace," is there?  A logging hook could suppress all
> error message text, and you could just mark all functions leakproof
> after that, and you'd have this exact behavior in an existing release
> with no core code changes, I think.

I think that would affect the server logs too, no? Worth thinking about
though...

Also manually marking all functions leakproof is far less convenient
than turning off the check as this patch effectively allows. You would
want to keep track of the initial condition and be able to restore it if
needed. Doable but much uglier. Perhaps we could tolerate a hook that
would allow an extension to do this though?

> If you do that, or just stick this patch into your own distro, I would
> be interested to hear some experiences from customers (and those who
> support them) after some time had gone by.  I find it hard to imagine
> delivering customer support in an environment configured this way, but
> sometimes my imagination is limited.

Again, remember that the actual messages are available in the server
logs. The presumption is that the server logs are kept secure, and it is
ok to leak information into them. How the customer does or does not
decide to pass some of that information on to a support group becomes a
problem to deal with on a case by case basis.

Also, as mentioned up-thread, in many cases there is or should be a
non-production instance available to use for reproducing problems to
debug them. Presumably the data on such a system is faked or has already
been cleaned up for a wider audience.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 12:05 PM Joe Conway  wrote:
> I think that would affect the server logs too, no? Worth thinking about
> though...

Yeah, I suppose so, although there might be a way to work around that.

> Also manually marking all functions leakproof is far less convenient
> than turning off the check as this patch effectively allows. You would
> want to keep track of the initial condition and be able to restore it if
> needed. Doable but much uglier. Perhaps we could tolerate a hook that
> would allow an extension to do this though?

Yeah, possibly.  I guess we'd have to see how ugly that looks, but

> Again, remember that the actual messages are available in the server
> logs. The presumption is that the server logs are kept secure, and it is
> ok to leak information into them. How the customer does or does not
> decide to pass some of that information on to a support group becomes a
> problem to deal with on a case by case basis.
>
> Also, as mentioned up-thread, in many cases there is or should be a
> non-production instance available to use for reproducing problems to
> debug them. Presumably the data on such a system is faked or has already
> been cleaned up for a wider audience.

Mmmph.  If your customers always have a non-production instance where
problems from production can be easily reproduced, your customers are
not much like our customers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 12:28 PM, Robert Haas wrote:
> Mmmph.  If your customers always have a non-production instance where
> problems from production can be easily reproduced, your customers are
> not much like our customers.

Well I certainly did not mean to imply that this is always the case ;-)

But I think it is fair to tell customers that have these tradeoffs in
front of them that it would be even more wise in the case they decided
to use this capability.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: Drop type "smgr"?

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 11:06 AM Tom Lane  wrote:
> It's certainly possible/likely that we're going to end up needing to
> widen buffer tags to represent the smgr explicitly, because some use
> cases are going to need a real database spec, some are going to need
> a real tablespace spec, and some might need both.  Maybe we should
> just bite that bullet.

Well, Andres will probably complain about that.  He thinks, IIUC, that
the buffer tags are too wide already and that it's significantly
hurting performance on very very common operations - like buffer
lookups.  I haven't verified that myself, but I tend to think he knows
what he's talking about.

Anyway, given that your argument started off from the premise that we
had to have a pg_database row, I think we'd better look a little
harder at whether that premise is correct before getting too excited
here.  As I said in my earlier reply, I think that we probably don't
need to have a pg_database row given that we wrap around to
FirstNormalObjectId; any value we hard-code would be less than that.
If there are other reasons why we'd need that, it might be useful to
hear about them.

However, all we really need to decide on this thread is whether we
need 'smgr' exposed as an SQL type.  And I can't see why we need that
no matter how all of the rest of this turns out.  Nobody is currently
proposing to give users a choice of smgrs, just to use them for
internal stuff.  Even if that changed later, it doesn't necessarily
mean we'd add back an SQL type, or that if we did it would look like
the one we have today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Drop type "smgr"?

2019-02-28 Thread Tom Lane
Robert Haas  writes:
> On Thu, Feb 28, 2019 at 1:03 AM Thomas Munro  wrote:
>> Nothing seems to break if you remove it (except for some tests using
>> it in an incidental way).  See attached.

> FWIW, +1 from me.

To be clear, I'm not objecting to the proposed patch either.  I was
just wondering where we plan to go from here, given that smgr.c wasn't
getting removed.

BTW, there is stuff in src/backend/storage/smgr/README that is
already obsoleted by this patch, and more that might be obsoleted
if development proceeds as discussed here.  So that needs a look.

regards, tom lane



plpgsql variable named as SQL keyword

2019-02-28 Thread Pavel Stehule
Hi

one user of plpgsql_check reported interesting error message

create or replace function omega.foo(a int)
returns int as $$
declare offset integer := 0;
begin
  return offset + 1;
end;
$$ language plpgsql;

postgres=# select omega.foo(10);
ERROR:  query "SELECT offset + 1" returned 0 columns
CONTEXT:  PL/pgSQL function omega.foo(integer) line 4 at RETURN

Maybe we should to disallow variables named as sql reserved keyword.

Regards

Pavel


Re: Bloom index cost model seems to be wrong

2019-02-28 Thread Jeff Janes
On Sun, Feb 24, 2019 at 11:09 AM Jeff Janes  wrote:

> I've moved this to the hackers list, and added Teodor and Alexander of the
> bloom extension, as I would like to hear their opinions on the costing.
>

My previous patch had accidentally included a couple lines of a different
thing I was working on (memory leak, now-committed), so this patch removes
that diff.

I'm adding it to the commitfest targeting v13.  I'm more interested in
feedback on the conceptual issues rather than stylistic ones, as I would
probably merge the two functions together before proposing something to
actually be committed.

Should we be trying to estimate the false positive rate and charging
cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to
recheck and reject those?  I don't think other index types do that, and I'm
inclined to think the burden should be on the user not to create silly
indexes that produce an overwhelming number of false positives.

Cheers,

Jeff


bloom_cost_v3.patch
Description: Binary data


Re: Drop type "smgr"?

2019-02-28 Thread Andres Freund
Hi,

On 2019-02-28 12:36:50 -0500, Robert Haas wrote:
> Well, Andres will probably complain about that.  He thinks, IIUC, that
> the buffer tags are too wide already and that it's significantly
> hurting performance on very very common operations - like buffer
> lookups.

Correct. Turns out especially comparing the keys after the hash match is
pretty expensive. It also is a significant factor influencing the size
of the hashtable, which influences how much of it can be in cache.

My plan is still to move to a two tiered system, where we have one
unordered datastructure to map from (db, tablespace, oid) to a secondary
ordered datastructure that then maps from (block number) to an actual
offset. With the first being cached somewhere in RelationData, therefore
not being performance critical. But while I hope to work for that in 13,
I don't think making other large projects depend on it would be smart.

Greetings,

Andres Freund



Re: Remove Deprecated Exclusive Backup Mode

2019-02-28 Thread David Steele

On 2/27/19 8:22 PM, Christophe Pettus wrote:




On Feb 26, 2019, at 11:38, Magnus Hagander  wrote:
That should not be a wiki page, really, that should be part of the main 
documentation.


I was just suggesting using a wiki page to draft it before we drop it into the 
main documentation.  I'm open to other suggestions, of course!


It seems to me that the best way to discuss this is via a patch to the 
main documentation.  I've written something to get us started:


https://commitfest.postgresql.org/22/2042/

--
-David
da...@pgmasters.net



Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Perumal Raj
Dear SMEs

I have finally decided to move forward after great hospitality in Version
9.2.24 :-)

First i attempted to upgrade from 9.2.24 to 10.7, but its failed with
following error during Check Mode.

could not load library "$libdir/hstore": ERROR:  could not access file
"$libdir/hstore": No such file or directory
could not load library "$libdir/adminpack": ERROR:  could not access file
"$libdir/adminpack": No such file or directory
could not load library "$libdir/uuid-ossp": ERROR:  could not access file
"$libdir/uuid-ossp": No such file or directory

Observation : the above Libraries are present in 9.2 whereas its mising in
10.7. So i decided to go with lower version.

Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but its
failed with following error during Check Mode.

could not load library "$libdir/pg_reorg":
ERROR:  could not access file "$libdir/pg_reorg": No such file or directory

Observation : In this case , pg_reorg is not present on both Source and
Target . But strange its failing.


Method Used : pg_upgrade

Could you please share some light here to get rid of  library issue .

Thanks, in advance ,
Raju


Re: Drop type "smgr"?

2019-02-28 Thread Tom Lane
Shawn Debnath  writes:
> On Thu, Feb 28, 2019 at 10:35:50AM -0500, Robert Haas wrote:
>> Also, I don't see why we'd need a fake pg_database row in the first
>> place.  IIUC, the OID counter wraps around to FirstNormalObjectId, so
>> nobody should have a database with an OID less than that value.

> We have scripts under catalog directory that can check to ensure OIDs 
> aren't re-used accidentally. However, we still have to define an entry 
> in a catalog somewhere and I was proposing creating a new one, 
> pg_storage_managers?, to track these entries.

That would fail to capture the property that the smgr OIDs mustn't
conflict with database OIDs, so the whole thing still seems like an
ugly kluge from here.

> Another thought: my colleague Anton Shyrabokau suggested potentially
> re-using forknumber to differentiate smgrs. We are using 32 bits to
> map 5 entries today.

Yeah, that seems like it might be a workable answer.

I really do think that relying on magic database OIDs is a bad idea;
if you think you aren't going to need a real database ID in there in
the future, you're being short-sighted.  And I guess the same argument
establishes that relying on magic tablespace OIDs would also be a bad
idea, even if there weren't a near-term proposal on the table for
needing real tablespace IDs in an alternate smgr.  So we need to find
some bits somewhere else, and the fork number field is the obvious
candidate.  Since, per this discussion, the smgr IDs need not really
be OIDs at all, we just need a few bits for them.

regards, tom lane



Re: Drop type "smgr"?

2019-02-28 Thread Andres Freund
Hi,

On 2019-02-28 10:02:46 -0800, Shawn Debnath wrote:
> We have scripts under catalog directory that can check to ensure OIDs 
> aren't re-used accidentally. However, we still have to define an entry 
> in a catalog somewhere and I was proposing creating a new one, 
> pg_storage_managers?, to track these entries. See [1] for previous
> discussion on this topic. We wouldn't need to do catalog lookups for
> being able to use the smgrs as the OIDs will be hardcoded in C, but the
> data will be available for posterity and OID reservation.

I'm inclined to just put them in pg_am, with a new type 's' (we already
have amtype = i for indexes, I'm planning to add 't' for tables
soon). While not a perfect fit for storage managers, it seems to fit
well enough.


> Another thought: my colleague Anton Shyrabokau suggested potentially
> re-using forknumber to differentiate smgrs. We are using 32 bits to
> map 5 entries today. This approach would be similar to how we split up
> the segment numbers and use the higher bits to identify forget or
> unlink requests for checkpointer.

That could probably be done, without incurring too much overhead
here. I'm not sure that the added complexity around the tree is worth it
however.

I personally would just go with the DB oid for the near future, where we
don't need per-database storage for those. And then plan for buffer
manager improvements.

Greetings,

Andres Freund



Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2019-02-28 Thread Andres Freund


Hi,

Thanks for the quick response.

On 2019-02-28 18:28:37 +0900, Etsuro Fujita wrote:
> > I'm currently
> > converting the EPQ machinery to slots, and in course of that I (with
> > Horiguchi-san's help), converted RefetchForeignRow to return a slot. But
> > there's currently no in-core user of this facility...  I guess I can
> > rebase the preliminary postgres_fdw patch here, but it bitrotted
> > significantly.
> 
> I'll rebase that patch and help the testing, if you want me to.

That'd be awesome.


> > I also feel like there should be some test coverage for
> > an API in a nontrivial part of the code...
> 
> Yeah, but as mentioned above, the row-locking API is provided for FDWs
> operating against local storage, which we don't have in core, unfortunately.

Yea. file_fdw exists, but doesn't support modifications...

Greetings,

Andres Freund



Re: plpgsql variable named as SQL keyword

2019-02-28 Thread Tom Lane
Pavel Stehule  writes:
> Maybe we should to disallow variables named as sql reserved keyword.

That would just break existing code.  There are lots of other
examples where you can get away with such things.

We've expended quite a lot of sweat to avoid reserving more names than
we had to in plpgsql.  I'm disinclined to throw that away just because
somebody found an error message confusing.  It's not like reserving
"offset" would cause this case to work.

regards, tom lane



Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Mahendra Singh
Hi
Please try with below commands.

Let we want to upgrade v6 to v11.
Note: I installed my binary inside result folder.

export OLDCLUSTER=./6_EDBAS/EDBAS/result
export NEWCLUSTER=./11_EDBAS/EDBAS/result
./11_EDBAS/EDBAS/result/bin/pg_upgrade --old-bindir=$OLDCLUSTER/bin
--new-bindir=$NEWCLUSTER/bin --old-datadir=$OLDCLUSTER/bin/data
--new-datadir=$NEWCLUSTER/bin/data

Note: old server should be in running state and new server should not be in
running state.

Thanks and Regards
Mahendra

On Thu, 28 Feb 2019 at 23:44, Perumal Raj  wrote:

> Dear SMEs
>
> I have finally decided to move forward after great hospitality in Version
> 9.2.24 :-)
>
> First i attempted to upgrade from 9.2.24 to 10.7, but its failed with
> following error during Check Mode.
>
> could not load library "$libdir/hstore": ERROR:  could not access file
> "$libdir/hstore": No such file or directory
> could not load library "$libdir/adminpack": ERROR:  could not access file
> "$libdir/adminpack": No such file or directory
> could not load library "$libdir/uuid-ossp": ERROR:  could not access file
> "$libdir/uuid-ossp": No such file or directory
>
> Observation : the above Libraries are present in 9.2 whereas its mising in
> 10.7. So i decided to go with lower version.
>
> Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but
> its failed with following error during Check Mode.
>
> could not load library "$libdir/pg_reorg":
> ERROR:  could not access file "$libdir/pg_reorg": No such file or directory
>
> Observation : In this case , pg_reorg is not present on both Source and
> Target . But strange its failing.
>
>
> Method Used : pg_upgrade
>
> Could you please share some light here to get rid of  library issue .
>
> Thanks, in advance ,
> Raju
>
>


Re: Drop type "smgr"?

2019-02-28 Thread Andres Freund
On 2019-02-28 13:16:02 -0500, Tom Lane wrote:
> Shawn Debnath  writes:
> > On Thu, Feb 28, 2019 at 10:35:50AM -0500, Robert Haas wrote:
> >> Also, I don't see why we'd need a fake pg_database row in the first
> >> place.  IIUC, the OID counter wraps around to FirstNormalObjectId, so
> >> nobody should have a database with an OID less than that value.
> 
> > We have scripts under catalog directory that can check to ensure OIDs 
> > aren't re-used accidentally. However, we still have to define an entry 
> > in a catalog somewhere and I was proposing creating a new one, 
> > pg_storage_managers?, to track these entries.
> 
> That would fail to capture the property that the smgr OIDs mustn't
> conflict with database OIDs, so the whole thing still seems like an
> ugly kluge from here.

It's definitely a kludge, but it doesn't seem that bad for now. Delaying
a nice answer till we have a more efficient bufmgr representation
doesn't seem crazy to me.

I don't think there's a real conflict risk given that we don't allow for
duplicated oids across catalogs at bootstrap time, and this is
definitely a bootstrap time issue.


> > Another thought: my colleague Anton Shyrabokau suggested potentially
> > re-using forknumber to differentiate smgrs. We are using 32 bits to
> > map 5 entries today.
> 
> Yeah, that seems like it might be a workable answer.

Yea, if we just split that into two 16bit entries, there'd not be much
lost. Some mild mild performance regression due to more granular
memory->register reads/writes, but I can't even remotely see that
matter.


> Since, per this discussion, the smgr IDs need not really be OIDs at
> all, we just need a few bits for them.

Personally I find having them as oids more elegant than the distaste
from misusing the database oid for a wihle, but I think it's fair to
disagree here.

Greetings,

Andres Freund



Re: plpgsql variable named as SQL keyword

2019-02-28 Thread Pavel Stehule
čt 28. 2. 2019 v 19:20 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > Maybe we should to disallow variables named as sql reserved keyword.
>
> That would just break existing code.  There are lots of other
> examples where you can get away with such things.
>
> We've expended quite a lot of sweat to avoid reserving more names than
> we had to in plpgsql.  I'm disinclined to throw that away just because
> somebody found an error message confusing.  It's not like reserving
> "offset" would cause this case to work.
>

partially I solved it with new warning in plpgsql_check

https://github.com/okbob/plpgsql_check/commit/5b9ef57d570c1d11fb92b9cff76655a03767f662

postgres=# select * from plpgsql_check_function('omega.foo(int, int,
int)');
+---+

|
plpgsql_check_function |
+---+

| warning:0:3:statement block:name of variable "offset" is reserved
keyword |
| Detail: The reserved keyword was used as variable
name.   |
| error:42601:4:RETURN:query "SELECT offset + 1" returned 0
columns |
+---+

(3 rows)

I understand so it has not simple solution (or had not solution). I
reported it +/- for record.

Thank you for reply

Pavel


> regards, tom lane
>


Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Perumal Raj
Thanks Mahendra for quick response.

I have followed same way, only difference i didn't bringup Source ( 9.2),
But not sure how that will resolve libraries issue.
All i tried with --check mode only

Thanks,


On Thu, Feb 28, 2019 at 10:23 AM Mahendra Singh  wrote:

> Hi
> Please try with below commands.
>
> Let we want to upgrade v6 to v11.
> Note: I installed my binary inside result folder.
>
> export OLDCLUSTER=./6_EDBAS/EDBAS/result
> export NEWCLUSTER=./11_EDBAS/EDBAS/result
> ./11_EDBAS/EDBAS/result/bin/pg_upgrade --old-bindir=$OLDCLUSTER/bin
> --new-bindir=$NEWCLUSTER/bin --old-datadir=$OLDCLUSTER/bin/data
> --new-datadir=$NEWCLUSTER/bin/data
>
> Note: old server should be in running state and new server should not be
> in running state.
>
> Thanks and Regards
> Mahendra
>
> On Thu, 28 Feb 2019 at 23:44, Perumal Raj  wrote:
>
>> Dear SMEs
>>
>> I have finally decided to move forward after great hospitality in Version
>> 9.2.24 :-)
>>
>> First i attempted to upgrade from 9.2.24 to 10.7, but its failed with
>> following error during Check Mode.
>>
>> could not load library "$libdir/hstore": ERROR:  could not access file
>> "$libdir/hstore": No such file or directory
>> could not load library "$libdir/adminpack": ERROR:  could not access file
>> "$libdir/adminpack": No such file or directory
>> could not load library "$libdir/uuid-ossp": ERROR:  could not access file
>> "$libdir/uuid-ossp": No such file or directory
>>
>> Observation : the above Libraries are present in 9.2 whereas its mising
>> in 10.7. So i decided to go with lower version.
>>
>> Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but
>> its failed with following error during Check Mode.
>>
>> could not load library "$libdir/pg_reorg":
>> ERROR:  could not access file "$libdir/pg_reorg": No such file or
>> directory
>>
>> Observation : In this case , pg_reorg is not present on both Source and
>> Target . But strange its failing.
>>
>>
>> Method Used : pg_upgrade
>>
>> Could you please share some light here to get rid of  library issue .
>>
>> Thanks, in advance ,
>> Raju
>>
>>


Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Sergei Kornilov
Hello

pgsql-hackers seems wrong list for such question.

> could not load library "$libdir/hstore": ERROR:  could not access file 
> "$libdir/hstore": No such file or directory
> could not load library "$libdir/adminpack": ERROR:  could not access file 
> "$libdir/adminpack": No such file or directory
> could not load library "$libdir/uuid-ossp": ERROR:  could not access file 
> "$libdir/uuid-ossp": No such file or directory
>
> Observation : the above Libraries are present in 9.2 whereas its mising in 
> 10.7. So i decided to go with lower version.

This is contrib modules. They can be shipped in separate package, 
postgresql10-contrib.x86_64 for example (in centos repo)

> Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but its 
> failed with following error during Check Mode.
>
> could not load library "$libdir/pg_reorg":
> ERROR:  could not access file "$libdir/pg_reorg": No such file or directory
>
> Observation : In this case , pg_reorg is not present on both Source and 
> Target . But strange its failing.

This is 3rd-party extension. Best way would be drop this extension on old 
cluster and perform upgrade. pg_reorg is abandoned for years, pg_repack is live 
fork if you need such tool.

regards, Sergei



Re: partitioned tables referenced by FKs

2019-02-28 Thread Alvaro Herrera
On 2019-Feb-28, Amit Langote wrote:

> Hi Alvaro,
> 
> I looked at the latest patch and most of the issues/bugs that I was going
> to report based on the late January version of the patch seem to have been
> taken care of; sorry that I couldn't post sooner which would've saved you
> some time.   The patch needs to be rebased on top of ff11e7f4b9 which
> touched ri_triggers.c.

Rebased to current master.  I'll reply later to handle the other issues
you point out.

Given the comments on 0002 in this thread and elsewhere, I'm inclined to
push that one today with minor tweaks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1a0b9c9a699b2c80ebc295513e07db362b39fe0d Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 28 Nov 2018 11:52:00 -0300
Subject: [PATCH v5 1/5] Rework deleteObjectsInList to allow objtype-specific
 checks

This doesn't change any functionality yet.
---
 src/backend/catalog/dependency.c | 41 +++-
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 2048d71535b..0b4c47b808c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -230,29 +230,38 @@ deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
 	int			i;
 
 	/*
-	 * Keep track of objects for event triggers, if necessary.
+	 * Invoke pre-deletion callbacks and keep track of objects for event
+	 * triggers, if necessary.
 	 */
-	if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL))
+	for (i = 0; i < targetObjects->numrefs; i++)
 	{
-		for (i = 0; i < targetObjects->numrefs; i++)
+		const ObjectAddress *thisobj = &targetObjects->refs[i];
+		Oid			objectClass = getObjectClass(thisobj);
+
+		if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL))
 		{
-			const ObjectAddress *thisobj = &targetObjects->refs[i];
-			const ObjectAddressExtra *extra = &targetObjects->extras[i];
-			bool		original = false;
-			bool		normal = false;
-
-			if (extra->flags & DEPFLAG_ORIGINAL)
-original = true;
-			if (extra->flags & DEPFLAG_NORMAL)
-normal = true;
-			if (extra->flags & DEPFLAG_REVERSE)
-normal = true;
-
-			if (EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
+			if (EventTriggerSupportsObjectClass(objectClass))
 			{
+bool		original = false;
+bool		normal = false;
+const ObjectAddressExtra *extra = &targetObjects->extras[i];
+
+if (extra->flags & DEPFLAG_ORIGINAL)
+	original = true;
+if (extra->flags & DEPFLAG_NORMAL ||
+	extra->flags & DEPFLAG_REVERSE)
+	normal = true;
+
 EventTriggerSQLDropAddObject(thisobj, original, normal);
 			}
 		}
+
+		/* Invoke class-specific pre-deletion checks */
+		switch (objectClass)
+		{
+			default:
+break;
+		}
 	}
 
 	/*
-- 
2.17.1

>From f353acaffaf275df623e70c296d5833bb9f318e7 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 22 Jan 2019 18:00:31 -0300
Subject: [PATCH v5 2/5] index_get_partition

---
 src/backend/catalog/partition.c  | 35 
 src/backend/commands/tablecmds.c | 40 +---
 src/include/catalog/partition.h  |  1 +
 3 files changed, 47 insertions(+), 29 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 3ccdaff8c45..f9282587f8b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -145,6 +145,41 @@ get_partition_ancestors_worker(Relation inhRel, Oid relid, List **ancestors)
 	get_partition_ancestors_worker(inhRel, parentOid, ancestors);
 }
 
+/*
+ * Return the OID of the index, in the given partition, that is a child of the
+ * given index or InvalidOid if there isn't one.
+ */
+Oid
+index_get_partition(Relation partition, Oid indexId)
+{
+	List	   *idxlist = RelationGetIndexList(partition);
+	ListCell   *l;
+
+	foreach(l, idxlist)
+	{
+		Oid			partIdx = lfirst_oid(l);
+		HeapTuple	tup;
+		Form_pg_class classForm;
+		bool		ispartition;
+
+		tup = SearchSysCache1(RELOID, ObjectIdGetDatum(partIdx));
+		if (!tup)
+			elog(ERROR, "cache lookup failed for relation %u", partIdx);
+		classForm = (Form_pg_class) GETSTRUCT(tup);
+		ispartition = classForm->relispartition;
+		ReleaseSysCache(tup);
+		if (!ispartition)
+			continue;
+		if (get_partition_parent(lfirst_oid(l)) == indexId)
+		{
+			list_free(idxlist);
+			return partIdx;
+		}
+	}
+
+	return InvalidOid;
+}
+
 /*
  * map_partition_varattnos - maps varattno of any Vars in expr from the
  * attno's of 'from_rel' to the attno's of 'to_rel' partition, each of which
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index a93b13c2fe4..251c5cd3fa1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15454,36 +15454,18 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 st

Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Perumal Raj
Thank you very much Sergei,

Yes, i want to get rid of old extension, Could you please share the query
to find extension which is using pg_reorg.

Regards,




On Thu, Feb 28, 2019 at 10:27 AM Sergei Kornilov  wrote:

> Hello
>
> pgsql-hackers seems wrong list for such question.
>
> > could not load library "$libdir/hstore": ERROR:  could not access file
> "$libdir/hstore": No such file or directory
> > could not load library "$libdir/adminpack": ERROR:  could not access
> file "$libdir/adminpack": No such file or directory
> > could not load library "$libdir/uuid-ossp": ERROR:  could not access
> file "$libdir/uuid-ossp": No such file or directory
> >
> > Observation : the above Libraries are present in 9.2 whereas its mising
> in 10.7. So i decided to go with lower version.
>
> This is contrib modules. They can be shipped in separate package,
> postgresql10-contrib.x86_64 for example (in centos repo)
>
> > Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but
> its failed with following error during Check Mode.
> >
> > could not load library "$libdir/pg_reorg":
> > ERROR:  could not access file "$libdir/pg_reorg": No such file or
> directory
> >
> > Observation : In this case , pg_reorg is not present on both Source and
> Target . But strange its failing.
>
> This is 3rd-party extension. Best way would be drop this extension on old
> cluster and perform upgrade. pg_reorg is abandoned for years, pg_repack is
> live fork if you need such tool.
>
> regards, Sergei
>


Re: Bloom index cost model seems to be wrong

2019-02-28 Thread Tom Lane
Jeff Janes  writes:
> Should we be trying to estimate the false positive rate and charging
> cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to
> recheck and reject those?  I don't think other index types do that, and I'm
> inclined to think the burden should be on the user not to create silly
> indexes that produce an overwhelming number of false positives.

Heap-access costs are added on in costsize.c, not in the index
cost estimator.  I don't remember at the moment whether there's
any explicit accounting for lossy indexes (i.e. false positives).
Up to now, there haven't been cases where we could estimate the
false-positive rate with any accuracy, so we may just be ignoring
the effect.  But if we decide to account for it, I'd rather have
costsize.c continue to add on the actual cost, perhaps based on
a false-positive-rate fraction returned by the index estimator.

regards, tom lane



Re: [RFC] [PATCH] Flexible "partition pruning" hook

2019-02-28 Thread Mike Palmiotto
On Wed, Feb 27, 2019 at 12:36 PM Peter Eisentraut
 wrote:
> 
> To rephrase this: You have a partitioned table, and you have a RLS
> policy that hides certain rows, and you know based on your business
> logic that under certain circumstances entire partitions will be hidden,
> so they don't need to be scanned.  So you want a planner hook that would
> allow you to prune those partitions manually.
>
> That sounds pretty hackish to me.  We should give the planner and
> executor the appropriate information to make these decisions, like an
> additional partition constraint.

Are you thinking of a partition pruning step for FuncExpr's or
something else? I was considering an implementation where FuncExpr's
were marked for execution-time pruning, but wanted to see if this
patch got any traction first.

> If this information is hidden in
> user-defined functions in a way that cannot be reasoned about, who is
> enforcing these constraints and how do we know they are actually correct?

The author of the extension which utilizes the hook would have to be
sure they use the hook correctly. This is not a new or different
concept to any other existing hook. This hook in particular would be
used by security extensions that have some understanding of the
underlying security model being implemented by RLS.

Thanks,

--
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com



Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Sergei Kornilov
Hi

> Yes, i want to get rid of old extension, Could you please share the query to 
> find extension which is using pg_reorg.

pg_reorg is name for both tool and extension.
Check every database in cluster with, for example, psql command "\dx" or read 
pg_dumpall -s output for some CREATE EXTENSION statements to find all installed 
extensions.

regards, Sergei



Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Perumal Raj
here is the data,

postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# \dx
 List of installed extensions
  Name   | Version |   Schema   | Description
-+-++--
 plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(1 row)

template1=# \c postgres
You are now connected to database "postgres" as user "postgres".
postgres=# \dx
 List of installed extensions
  Name   | Version |   Schema   | Description
-+-++--
 plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(1 row)

postgres=# \c nagdb
You are now connected to database "nagdb" as user "postgres".
nagdb=# \dx
 List of installed extensions
  Name   | Version |   Schema   | Description
-+-++--
 plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(1 row)

nagdb=# \c archive_old
You are now connected to database "books_old" as user "postgres".
books_old=# \dx
 List of installed extensions
Name| Version |   Schema   |
Description
+-++---
 pg_stat_statements | 1.1 | public | track execution
statistics of all SQL statements executed
 plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)

archive_old=# \c production
You are now connected to database "blurb_production" as user "postgres".
production=# \dx
 List of installed extensions
Name| Version |   Schema   |
Description
+-++---
 hstore | 1.1 | public | data type for storing
sets of (key, value) pairs
 pg_stat_statements | 1.1 | public | track execution
statistics of all SQL statements executed
 plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language
 uuid-ossp  | 1.0 | public | generate universally
unique identifiers (UUIDs)
(4 rows)


Thanks,



On Thu, Feb 28, 2019 at 11:04 AM Sergei Kornilov  wrote:

> Hi
>
> > Yes, i want to get rid of old extension, Could you please share the
> query to find extension which is using pg_reorg.
>
> pg_reorg is name for both tool and extension.
> Check every database in cluster with, for example, psql command "\dx" or
> read pg_dumpall -s output for some CREATE EXTENSION statements to find all
> installed extensions.
>
> regards, Sergei
>


Re: POC: converting Lists into arrays

2019-02-28 Thread Tom Lane
David Rowley  writes:
> On Thu, 28 Feb 2019 at 09:26, Tom Lane  wrote:
>> 0001 below does this.  I found a couple of places that could use
>> forfive(), as well.  I think this is a clear legibility and
>> error-proofing win, and we should just push it.

> I've looked over this and I agree that it's a good idea.  Reducing the
> number of lnext() usages seems like a good idea in order to reduce the
> footprint of the main patch.

I've pushed that; thanks for reviewing!

>> 0002 below does this.  I'm having a hard time deciding whether this
>> part is a good idea or just code churn.  It might be more readable
>> (especially to newbies) but I can't evaluate that very objectively.

> I'm less decided on this.

Yeah, I think I'm just going to drop that idea.  People didn't seem
very sold on list_cell_is_last() being a readability improvement,
and it certainly does nothing to reduce the footprint of the main
patch.

I now need to rebase the main patch over what I pushed; off to do
that next.

regards, tom lane



Re: pg_partition_tree crashes for a non-defined relation

2019-02-28 Thread Alvaro Herrera
On 2019-Feb-28, Michael Paquier wrote:

> On Wed, Feb 27, 2019 at 03:48:08PM -0300, Alvaro Herrera wrote:
> > I just happened to come across the result of this rationale in
> > pg_partition_tree() (an SRF) while developing a new related function,
> > pg_partition_ancestors(), and find the resulting behavior rather absurd
> > -- it returns one row with all NULL columns, rather than no rows.  I
> > think the sensible behavior would be to do SRF_RETURN_DONE() before
> > stashing any rows to the output, so that we get an empty result set
> > instead.
> 
> Hmm.  Going through the thread again NULL was decided to make the
> whole experience consistent, now by returning nothing we would get
> a behavior as consistent as when NULL is used in input, so point taken
> to tune the behavior for unsupported relkinds and undefined objects.

Right, thanks.

> Does the attached look fine to you?

Yeah, looks good, please push.

What about legacy inheritance?  I see that pg_partition_tree handles
that case perfectly well -- it seems to return the complete hierarchy
rooted at the given relation.  However, it seems odd that it works at
all, don't you think?  Consider this:

create table t1 (a int);
create table t11 () inherits (t1);
create table t2 (b int);
create table t111() inherits (t1, t2);

alvherre=# select * from pg_partition_tree('t1');
 relid | parentrelid | isleaf | level 
---+-++---
 t1| t   | t  | 0
 t11   | t1  | t  | 1
 t111  | t1  | t  | 1
(3 filas)

OK so far... but look at t2's tree:

alvherre=# select * from pg_partition_tree('t2');
 relid | parentrelid | isleaf | level 
---+-++---
 t2| t   | t  | 0
 t111  | t1  | t  | 2

I think this one is just weird -- t1 is not listed as "relid" anywhere,
and why does t111 has level=2?

I would opt for returning the empty set for legacy inheritance too.

More generally, I think we should return empty for anything that's
either not RELKIND_PARTITIONED_TABLE or has relispartition set.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: some ri_triggers.c cleanup

2019-02-28 Thread Peter Eisentraut
On 2019-02-25 17:17, Corey Huinker wrote:
> Right, this makes a lot of sense, similar to how ri_restrict() combines
> RESTRICT and NO ACTION.
> 
> 
> I'm pretty sure that's where I got the idea, yes. 

Committed, including your patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-28 Thread Tomas Vondra
Hi,

Attached is an updated version of this patch series. I've decided to
rebase and send both parts (MCV and histograms), although we've agreed
to focus on the MCV part for now. I don't want to leave the histogram to
lag behind, because (a) then it'd be much more work to update it, and
(b) I think it's an useful feedback about likely future changes.

This should address most of the issues pointed out by David in his
recent reviews. Briefly:

1) It fixes/updates a number of comments and docs on various places,
removes redundant comments etc. In most cases I've simply adopted the
wording proposed by David, with minor tweaks in a couple of cases.

2) Reverts changes that exposed analyze_mcv_list - this was a leftover
from the attempt to reuse the single-column algorithm, but we've since
agreed it's not the right approach. So this change is unnecessary.

3) I've tweaked the code to accept RelabelType nodes as supported,
similarly to what examine_variable() does. Previously I concluded we
can't support RelabelType, but it seems that reasoning was bogus. I've
slightly tweaked the regression tests by changing one of the columns to
varchar, so that the queries actualy trigger this.

4) I've tweaked a couple of places (UpdateStatisticsForTypeChange,
statext_clauselist_selectivity and estimate_num_groups_simple) per
David's suggestions. Those were fairly straightforward simplifications.

5) I've removed mcv_count from statext_mcv_build(). As David pointed
out, this was not actually needed - it was another remnant of the
attempt to re-use analyze_mcv_list() which needs such array. But without
it we can access the groups directly.

6) One of the review questions was about the purpose of this code:

  for (i = 0; i < nitems; i++)
  {
  if (groups[i].count < mincount)
  {
  nitems = i;
  break;
  }
  }

It's quite simple - we want to include groups with more occurrences than
mincount, and the groups are sorted by the count (in descending order).
So we simply find the first group with count below mincount, and the
index is the number of groups to keep. I've tried to explain that in a
comment.

7) I've fixed a bunch of format patters in statext_mcv_deserialize(),
particularly those that confused %d and %u. We can't however use %d for
VARSIZE_ANY_EXHDR, because that macro expands into offsetof() etc. So
that would trigger compiler warnings.

8) Yeah, pg_stats_ext_mcvlist_items was broken. The issue was that one
of the output parameters is defined as boolean[], but the function was
building just string. Originally it used BuildTupleFromCStrings(), but
then it switched to heap_form_tuple() without building a valid array.

I've decided to simply revert back to BuildTupleFromCStrings(). It's not
going to be used very frequently, so the small performance difference is
not important.

I've also fixed the formatting issues, pointed out by David.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-multivariate-MCV-lists-20190228.patch.gz
Description: application/gzip


0002-multivariate-histograms-20190228.patch.gz
Description: application/gzip


Re: FETCH FIRST clause PERCENT option

2019-02-28 Thread Tomas Vondra
On 2/28/19 12:26 PM, Kyotaro HORIGUCHI wrote:
> Hello.
> 
> At Sat, 23 Feb 2019 22:27:44 +0100, Tomas Vondra 
>  wrote in 
> <81a5c0e9-c17d-28f3-4647-8a4659cdf...@2ndquadrant.com>
>>
>>
>> On 2/23/19 8:53 AM, Surafel Temesgen wrote:
>>>
>>>
>>> On Sun, Feb 10, 2019 at 2:22 AM Tomas Vondra
>>> mailto:tomas.von...@2ndquadrant.com>> wrote:
>>>  
>>>
>>>
>>> I'm not sure I understand - are you saying every time the user does a
>>> FETCH, we have to run the outer plan from scratch? I don't see why would
>>> that be necessary? And if it is, how come there's no noticeable
>>> performance difference?
>>>
>>> Can you share a patch implementing the incremental approach, and a query
>>> demonstrating the issue?
>>>
>>>
>>> I didn't implement it but its obvious that it doesn't work similarly
>>> with previous approach.
>>>
>>
>> Sure, but that's hardly a sufficient argument for the current approach.
>>
>>> We need different implementation and my plan was to use tuplestore per
>>> call and clear
>>>
>>> it after returning tuple but I see that the plan will not go far because
>>> mainly the last returned
>>>
>>> slot is not the last slot we get from outerPlan execution
>>>
>>
>> I'm sorry, I still don't understand what the supposed problem is. I
>> don't think it's all that different from what nodeMaterial.c does, for
>> example.
>>
>> As I explained before, having to execute the outer plan till completion
>> before returning any tuples is an issue. So either it needs fixing or an
>> explanation why it's not an issue.
> 
> One biggest issue seems to be we don't know the total number of
> outer tuples before actually reading a null tuple. I doubt of
> general shortcut for that. It also seems preventing limit node
> from just using materialized outer.
> 

Sure, if you actually want all tuples, you'll have to execute the outer
plan till completion. But that's not what I'm talking about - what if we
only ever need to read one row from the limit?

To give you a (admittedly, somewhat contrived and artificial example):

SELECT * FROM t1 WHERE id IN (
  SELECT id FROM t2 ORDER BY x FETCH FIRST 10 PERCENT ROWS ONLY
);

Maybe this example is bogus and/or does not really matter in practice. I
don't know, but I've been unable to convince myself that's the case.


regards


-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Protect syscache from bloating with negative cache entries

2019-02-28 Thread Robert Haas
On Wed, Feb 27, 2019 at 3:16 AM Ideriha, Takeshi
 wrote:
> I'm afraid I may be quibbling about it.
> What about users who understand performance drops but don't want to
> add memory or decrease concurrency?
> I think that PostgreSQL has a parameter
> which most of users don't mind and use is as default
> but a few of users want to change it.
> In this case as you said, introducing hard limit parameter causes
> performance decrease significantly so how about adding detailed caution
> to the document like planner cost parameter?

There's nothing wrong with a parameter that is useful to some people
and harmless to everyone else, but the people who are proposing that
parameter still have to demonstrate that it has those properties.
This email thread is really short on clear demonstrations that X or Y
is useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-02-28 Thread Alvaro Herrera
On 2019-Feb-28, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Hopefully we'll get Tom's patch that addresses the failure-to-truncate
> > issues in pg12.
> 
> Hm, are you speaking of the handwaving I did in
> https://www.postgresql.org/message-id/2348.1544474...@sss.pgh.pa.us
> ?
> 
> I wasn't really working on that for v12 --- I figured it was way
> too late in the cycle to be starting on such a significant change.

Oh, well, it certainly seems far too late *now*.  However, what about
the idea in 
https://postgr.es/m/1255.1544562...@sss.pgh.pa.us
namely that we write out the buffers involved?  That sounds like it
might be backpatchable, and thus it's not too late for it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Segfault when restoring -Fd dump on current HEAD

2019-02-28 Thread Alvaro Herrera
On 2019-Feb-27, Dmitry Dolgov wrote:

> > On Wed, Feb 27, 2019 at 1:32 PM Alvaro Herrera  
> > wrote:
> >
> > > > I think it would be better to just put back the .defn = "" (etc) to the
> > > > ArchiveEntry calls.
> > >
> > > Then we should do this not only for defn, but for owner and dropStmt too.
> >
> > Yeah, absolutely.
> 
> Done, please find the attached patch.

Pushed, thanks.  I added the reminder comment I mentioned.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ATTACH/DETACH PARTITION CONCURRENTLY

2019-02-28 Thread Robert Haas
On Tue, Feb 26, 2019 at 5:10 PM Robert Haas  wrote:
> Aside from these problems, I think I have spotted a subtle problem in
> 0001. I'll think about that some more and post another update.

0001 turned out to be guarding against the wrong problem.  It supposed
that if we didn't get a coherent view of the system catalogs due to
concurrent DDL, we could just AcceptInvalidationMessages() and retry.
But that turns out to be wrong, because there's a (very) narrow window
after a process removes itself from the ProcArray and before it sends
invalidation messages.  It wasn't difficult to engineer an alternative
solution that works, but unfortunately it's only good enough to handle
the ATTACH case, so this is another thing that will need more thought
for concurrent DETACH.  Anyway, the updated 0001 contains that code
and some explanatory comments.  The rest of the series is
substantially unchanged.

I'm not currently aware of any remaining correctness issues with this
code, although certainly there may be some.  There has been a certain
dearth of volunteers to review any of this.  I do plan to poke at it a
bit to see whether it has any significant performance impact, but not
today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v4-0004-Reduce-the-lock-level-required-to-attach-a-partit.patch
Description: Binary data


v4-0002-Ensure-that-repeated-PartitionDesc-lookups-return.patch
Description: Binary data


v4-0001-Teach-RelationBuildPartitionDesc-to-cope-with-con.patch
Description: Binary data


v4-0003-Teach-runtime-partition-pruning-to-cope-with-conc.patch
Description: Binary data


Re: Segfault when restoring -Fd dump on current HEAD

2019-02-28 Thread Dmitry Dolgov
> On Thu, Feb 28, 2019 at 9:24 PM Alvaro Herrera  
> wrote:
>
> Pushed, thanks.  I added the reminder comment I mentioned.

Thank you, sorry for troubles.



Re: Drop type "smgr"?

2019-02-28 Thread Thomas Munro
On Fri, Mar 1, 2019 at 7:24 AM Andres Freund  wrote:
> On 2019-02-28 13:16:02 -0500, Tom Lane wrote:
> > Shawn Debnath  writes:
> > > Another thought: my colleague Anton Shyrabokau suggested potentially
> > > re-using forknumber to differentiate smgrs. We are using 32 bits to
> > > map 5 entries today.
> >
> > Yeah, that seems like it might be a workable answer.
>
> Yea, if we just split that into two 16bit entries, there'd not be much
> lost. Some mild mild performance regression due to more granular
> memory->register reads/writes, but I can't even remotely see that
> matter.

Ok, that's a interesting way to include it in BufferTag without making
it wider.  But then how about the SMGR interface?  I think that value
would need to be added to the smgropen() interface, and all existing
callers would pass in (say) SGMR_RELATION (meaning "use md.c"), and
new code would use SMGR_UNDO, SMGR_SLRU etc.  That seems OK to me.
Ancient POSTGRES had an extra argument like that and would say eg
smgropen(rd->rd_rel->relsmgr, rd), but in this new idea I think it'd
always be a constant or a value from a BufferTag, and the BufferTag
would have been set with a constant, since the code reading these
buffers would always be code that knows which it wants.  We'd be using
this new argument not as a modifier to control storage location as
they did back then, but rather as a whole new namespace for
RelFileNode values, that also happens to be stored differently; that
might be a hint that it could also go into RelFileNode (but I'm not
suggesting that).

> > Since, per this discussion, the smgr IDs need not really be OIDs at
> > all, we just need a few bits for them.
>
> Personally I find having them as oids more elegant than the distaste
> from misusing the database oid for a wihle, but I think it's fair to
> disagree here.

It sounds like your buffer mapping redesign would completely change
the economics and we could reconsider much of this later without too
much drama; that was one of the things that made me feel that the
magic database OID approach was acceptable at least in the short term.

-- 
Thomas Munro
https://enterprisedb.com



Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Andres Freund
Hi,

On 2019-02-28 09:54:48 +0100, Fabien COELHO wrote:
> > If we were to want to do more here, ISTM the right approach would use
> > the postmaster pid file, not the control file.
> 
> ISTM that this just means re-inventing a manual poor-featured
> race-condition-prone lock API around another file, which seems to be created
> more or less only by "pg_ctl", while some other commands use the control
> file (eg pg_rewind, AFAICS).

Huh? Postmaster.pid is written by the backend, pg_ctl just checks it to
see if the backend has finished starting up etc. It's precisely what the
backend uses to prevent two postmasters to start etc. It's also what say
pg_resetwal checks to protect against a concurrently running lcuster
(albeit in a racy way). If we want to make things more bulletproof,
that's the place.  The control file is constantly written to, sometimes
by different processes, it'd just not be a good file for such lockout
mechanisms.

Greetings,

Andres Freund



Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Joe Conway
On 2/28/19 7:20 AM, Michael Paquier wrote:
> On Thu, Feb 28, 2019 at 07:11:04AM -0500, Joe Conway wrote:
>> Sure, will do. What are your thoughts on backpatching? This seems
>> unlikely to be a practical concern in the field, so my inclination is a
>> master only fix.
> 
> I agree that this would unlikely become an issue as an error on the
> control file would most likely be a PANIC when updating it, so fixing
> only HEAD sounds fine to me.  Thanks for asking!


Committed and push that way.

By the way, while looking at this, I noted at least a couple of places
where OpenTransientFile() is being passed O_RDWR when the usage is
pretty clearly intended to be read-only. For example at least two
instances in slru.c -- SlruPhysicalReadPage() and
SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and
fixing those instances?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Drop type "smgr"?

2019-02-28 Thread Thomas Munro
On Fri, Mar 1, 2019 at 4:09 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Thu, Feb 28, 2019 at 7:37 PM Tom Lane  wrote:
> >> Thomas Munro  writes:
> >>> Our current thinking is that smgropen() should know how to map a small
> >>> number of special database OIDs to different smgr implementations
>
> >> Hmm.  Maybe mapping based on tablespaces would be a better idea?
>
> > In the undo log proposal (about which more soon) we are using
> > tablespaces for their real purpose, so we need that OID.  If you SET
> > undo_tablespaces = foo then future undo data created by your session
> > will be written there, which might be useful for putting that IO on
> > different storage.
>
> Meh.  That's a point, but it doesn't exactly seem like a killer argument.
> Just in the abstract, it seems much more likely to me that people would
> want per-database special rels than per-tablespace special rels.  And
> I think your notion of a GUC that can control this is probably pie in
> the sky anyway: if we can't afford to look into the catalogs to resolve
> names at this code level, how are we going to handle a GUC?

I have this working like so:

* undo logs have a small amount of meta-data in shared memory, stored
in a file at checkpoint time, with all changes WAL logged, visible to
users in pg_stat_undo_logs view
* one of the properties of an undo log is its tablespace (the point
here being that it's not in a catalog)
* you don't need access to any catalogs to find the backing files for
a RelFileNode (the path via tablespace symlinks is derivable from
spcNode)
* therefore you can find your way from an UndoLogRecPtr in (say) a
zheap page to the relevant blocks on disk without any catalog access;
this should work even in the apparently (but not actually) circular
case of a pg_tablespace catalog that is stored in zheap (not something
we can do right now, but hypothetically speaking), and has undo data
that is stored in some non-default tablespace that must be consulted
while scanning the catalog (not that I'm suggesting that would
necessarily be a good idea to suppose catalogs in non-default
tablespaces; I'm just addressing your theoretical point)
* the GUC is used to resolve tablespace names to OIDs only by sessions
that are writing, when selecting (or creating) an undo log to attach
to and begin writing into; those sessions have no trouble reading the
catalog to do so without problematic circularities, as above

Seems to work; the main complications so far were coming up with
reasonable behaviour and interlocking when you drop tablespaces that
contain undo logs (short version: if they're not needed for snapshots
or rollback, they are dropped, wasting the rest of their undo address
space; otherwise they prevents the tablespace from being dropped with
a clear message to that effect).

It doesn't make any sense to put things like clog or any other SLRU in
a non-default tablespace though.  It's perfectly OK if not all smgr
implementations know how to deal with tablespaces, and the SLRU
support should just not support that.

> The real reason I'm concerned about this, though, is that for either
> a database or a tablespace, you can *not* get away with having a magic
> OID just hanging in space with no actual catalog row matching it.
> If nothing else, you need an entry there to prevent someone from
> reusing the OID for another purpose.  And a pg_database row that
> doesn't correspond to a real database is going to break all kinds of
> code, starting with pg_upgrade and the autovacuum launcher.  Special
> rows in pg_tablespace are much less likely to cause issues, because
> of the precedent of pg_global and pg_default.

GetNewObjectId() never returns values < FirstNormalObjectId.

I don't think it's impossible for someone to want to put SMGRs in a
catalog of some kind some day.  Even though the ones for clog, undo
etc would still probably need special hard-coded treatment as
discussed, I suppose it's remotely possible that someone might some
day figure out a useful way to allow extensions that provide different
block storage (nvram?  zfs zvols?  encryption? (see Haribabu's reply))
but I don't have any specific ideas about that or feel inclined to
design something for unknown future use.

-- 
Thomas Munro
https://enterprisedb.com



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2019-02-28 Thread Peter Moser
Dear all,
we rebased our temporal normalization patch on top of 
554ebf687852d045f0418d3242b978b49f160f44 from 2019-02-28.


On 9/7/18 1:02 PM, Peter Moser wrote:
> The syntax is
> SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c;

Please find all information about our decisions and current state within the 
previous email.

> What we like to discuss now is:
> - Is sort_inner_and_outer the correct place to perform this split?
> - How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE
>   mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is
>   not an equality operator, but if we use RANGE_EQ it assumes that the
>   right-hand-side of the operator must be a range as well.
> - Should we better change our mergeclause to a RANGE_ELEM_CONTAINED
>   comparison, or keep RANGE_EQ and fix pathkeys later?
> - How do we update equivalence classes, and pathkeys
>   when changing the inner relation's data type from "int4range" to "int"
>   in the query tree inside "sort_inner_and_outer" to get the correct
>   ordering and data types

I will also add this prototype (WIP) patch to the commitfest of March, as 
suggested by two developers met at the
FOSDEM some weeks ago.


Best regards,
Anton, Johann, Michael, Peter

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 2a1d000b03..a309596fa1 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -99,6 +99,106 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 
+// XXX TEMPORAL NORMALIZE PEMOSER 
+// !!! THis is just for prototyping, delete asap...
+
+#include "catalog/pg_operator.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/fmgroids.h"
+#include "utils/rangetypes.h"
+#include "utils/typcache.h"
+#include "access/htup_details.h"/* for heap_getattr */
+#include "nodes/print.h"/* for print_slot */
+#include "utils/datum.h"/* for datumCopy */
+
+
+
+#define TEMPORAL_DEBUG
+/*
+ * #define TEMPORAL_DEBUG
+ * XXX PEMOSER Maybe we should use execdebug.h stuff here?
+ */
+#ifdef TEMPORAL_DEBUG
+static char*
+datumToString(Oid typeinfo, Datum attr)
+{
+	Oid typoutput;
+	booltypisvarlena;
+	getTypeOutputInfo(typeinfo, &typoutput, &typisvarlena);
+	return OidOutputFunctionCall(typoutput, attr);
+}
+
+#define TPGdebug(...)   { printf(__VA_ARGS__); printf("\n"); fflush(stdout); }
+#define TPGdebugDatum(attr, typeinfo)   TPGdebug("%s = %s %ld\n", #attr, datumToString(typeinfo, attr), attr)
+#define TPGdebugSlot(slot)  { printf("Printing Slot '%s'\n", #slot); print_slot(slot); fflush(stdout); }
+
+#else
+#define datumToString(typeinfo, attr)
+#define TPGdebug(...)
+#define TPGdebugDatum(attr, typeinfo)
+#define TPGdebugSlot(slot)
+#endif
+
+TypeCacheEntry *testmytypcache;
+#define setSweepline(datum) \
+	node->sweepline = datumCopy(datum, node->datumFormat->attbyval, node->datumFormat->attlen)
+
+#define freeSweepline() \
+	if (! node->datumFormat->attbyval) pfree(DatumGetPointer(node->sweepline))
+
+ /*
+  * slotGetAttrNotNull
+  *  Same as slot_getattr, but throws an error if NULL is returned.
+  */
+static Datum
+slotGetAttrNotNull(TupleTableSlot *slot, int attnum)
+{
+	bool isNull;
+	Datum result;
+
+	result = slot_getattr(slot, attnum, &isNull);
+
+	if(isNull)
+		ereport(ERROR,
+(errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("Attribute \"%s\" at position %d is null. Temporal " \
+		"adjustment not possible.",
+ NameStr(TupleDescAttr(slot->tts_tupleDescriptor, attnum - 1)->attname),
+ attnum)));
+
+	return result;
+}
+
+/*
+ * heapGetAttrNotNull
+ *  Same as heap_getattr, but throws an error if NULL is returned.
+ */
+static Datum
+heapGetAttrNotNull(TupleTableSlot *slot, int attnum)
+{
+	bool isNull;
+	Datum result;
+	HeapTuple tuple;
+
+	tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
+	result = heap_getattr(tuple,
+		  attnum,
+		  slot->tts_tupleDescriptor,
+		  &isNull);
+	if(isNull)
+		ereport(ERROR,
+(errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("Attribute \"%s\" at position %d is null. Temporal " \
+		"adjustment not possible.",
+		NameStr(TupleDescAttr(slot->tts_tupleDescriptor,
+attnum - 1)->attname),
+		attnum)));
+
+	return result;
+}
+
+// XXX TEMPORAL NORMALIZE PEMOSER END 
+
 
 /*
  * States of the ExecMergeJoin state machine
@@ -138,6 +238,10 @@ typedef struct MergeJoinClauseData
 	 * stored here.
 	 */
 	SortSupportData ssup;
+
+	/* needed for Temporal Normalization */
+	bool isnormalize;
+	TypeCacheEntry  *range_typcache;
 }			MergeJoinClauseData;
 
 /* Result type for MJEvalOuterValues and MJEvalInnerValues */
@@ -152,6 +256,59 @@ typedef enum
 #define MarkInnerTuple(innerTupleSlot, mergestate) \
 	ExecCopySlot((mergestate)->mj_MarkedTupleSlot, (innerTupleSlot))
 
+/*
+ * temporalAdjust

Re: Index Skip Scan

2019-02-28 Thread Jeff Janes
On Wed, Feb 20, 2019 at 11:33 AM Dmitry Dolgov <9erthali...@gmail.com>
wrote:

> > On Fri, Feb 1, 2019 at 8:24 PM Jesper Pedersen <
> jesper.peder...@redhat.com> wrote:
> >
> > Dmitry and I will look at this and take it into account for the next
> > version.
>
> In the meantime, just to not forget, I'm going to post another version
> with a
> fix for cursor fetch backwards, which was crashing before.


This version of the patch can return the wrong answer.

create index on pgbench_accounts (bid, aid);
begin; declare c  cursor for select distinct on (bid) bid, aid from
pgbench_accounts order by bid, aid;
fetch 2 from c;
 bid |   aid
-+-
   1 |   1
   2 | 100,001

fetch backward 1 from c;
 bid |   aid
-+-
   1 | 100,000

Without the patch, instead of getting a wrong answer, I get an error:

ERROR:  cursor can only scan forward
HINT:  Declare it with SCROLL option to enable backward scan.

If I add "SCROLL", then I do get the right answer with the patch.

Cheers,

Jeff


  1   2   >