Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-04-01 Thread David Rowley
On 16 March 2018 at 04:01, Tom Lane  wrote:
> I hadn't been paying much attention to this thread, but I've now taken
> a quick look at the 2018-02-19 patch, and I've got to say I do not like
> it much.  The changes in createplan.c in particular seem like hack-and-
> slash rather than anything principled or maintainable.

Thanks for looking at this.  I didn't manage to discover any other
working solutions to when the Vars can be replaced. If we don't do
this in createplan.c then it's going to cause problems in places such
as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c
gets hold of the plan.

> The core issue here is that Paths involving the appendrel and higher
> will contain Vars referencing the appendrel's varno, whereas the child
> is set up to emit Vars containing its own varno, and somewhere we've got
> to match those up.  I don't think though that this problem is exactly
> specific to single-member Appends, and so I would rather we not invent a
> solution that's specific to that.  A nearly identical issue is getting
> rid of no-op SubqueryScan nodes.  I've long wished we could simply not
> emit those in the first place, but it's really hard to do because of
> the fact that Vars inside the subquery have different varnos from those
> outside.  (I've toyed with the idea of globally flattening the rangetable
> before we start planning, not at the end, but haven't made it happen yet;
> and anyway that would be only one step towards such a goal.)

I'm not quite sure why you think the solution I came up with is
specific to single-member Appends. The solution merely uses
single-member Append paths as a proxy path for the solution which I've
tried to make generic to any node type. For example, the patch also
resolves the issue for MergeAppend, so certainly nothing in there is
specific to single-member Appends.  I could have made the proxy any
other path type, it's just that you had suggested Append would be
better than inventing ProxyPath, which is what I originally proposed.

> It might be worth looking at whether we couldn't fix the single-member-
> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c
> get rid of them.  That's not the most beautiful solution perhaps, but
> it'd be very localized and low-risk.

It might be possible, but wouldn't that only solve 1 out of 2
problems. The problem of the planner not generating the most optimal
plan is ignored with this. For example, it does not make much sense to
bolt a Materialize node on top of an IndexScan node in order to
provide the IndexScan with mark/restore capabilities... They already
allow that.

> In general setrefs.c is the right place to deal with variable-matching
> issues.  So even if you don't like that specific idea, it'd probably be
> worth thinking about handling this by recording instructions telling
> setrefs what to do, instead of actually doing it at earlier stages.

>From what I can see, setrefs.c is too late. ERRORs are generated
before setrefs.c gets hold of the plan if we don't replace Vars.

I'm not opposed to finding a better way to do this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: csv format for psql

2018-04-01 Thread Pavel Stehule
2018-04-01 8:30 GMT+02:00 Fabien COELHO :

>
> Hello Isaac,
>
> Personnaly I do not have any problem with CSV defaulting to '|' separator,
>>> given that anyway people often use anything but a comma for the purpose,
>>> including '|'.
>>>
>>> However Pavel wants to block the patch on this point. Too bad.
>>>
>>
>> OK, mostly trying to avoid commenting because I doubt I have much to add.
>> But. If I ask for CSV and don't specify any overrides, I expect to get
>> "C"omma separated values, not some other character. More specifically, if I
>> say --csv I expect to get files that are identical with what I would get if
>> I used COPY ... CSV.
>>
>
> My summary was incomplete. The --csv option implementation by Daniel
> already does that.
>
> The issue Pavel is complaining about is that in interactive mode "\pset
> format csv" does not do the same: it triggers the csv-rule string-escaping
> mechanism, but does not reset the "fieldsep" variable (eh, it sets the
> "format" variable) so the default separator under this interactive use is
> "|" if the "fieldsep" variable is shared.
>
> I have suggested a "\csv" interactive command to set both as a convenient
> shorthand for "\pset format csv & \pset fieldsep ','", similarly to --csv,
> but Pavel still wants "\pset format csv" to trigger the full csv output.
>
> A consequence I forgot about adding "fieldsep_csv", is that it probably
> has to duplicate the "_zero" ugly hack to be consistent with existing
> "*sep" variables, or else be inconsistent. Sigh.
>

there are not any relation between fieldsep_csv and fieldsep_zero.

The base of this issue is in wrong concept of fieldsep. It is based on idea
so one value can be good enough for all formats. It is worked when all
formats that used this variable is ONE format - unaligned.

similar issue is with tuplesonly option - see Isaac's mail.

For different formats the different defaults are more consistent - the
other formats does it - doesn't use fieldsep - but it lost internal
consistency, because its does quietly.

We can have different ideas about users's expectations - but we should to
share opinion, so correct solution should be consistent. If I choose some
format, then behave should be same without dependency on activation
mechanism.

so \csv should be same like \pset format csv or --csv.

I don't share option so CSV format should be exactly same like CSV COPY.
COPY is designed for backups - and header is not too important there. When
I seen some csv, then there usually header was used.

If I can search some precedent, then it is \pset linestyle and \pset
unicode_*** options.

so we can to have

\pset format xxx

and set of local possibly changed defaults

\pset csv_fieldsep ,
\pset csv_tuplesonly on
\pset unaligned_fieldsep |
\pset unaligned_tuplesonly off

the fieldsep, tuplesonly can be just links to currently used _fieldsep
and _tuplesonly

This is consistent - defaults can be correct for Isaac, and I can paste to
my .psqlrc

\pset csv_tuplesonly off

and the semantic is clean, and this option will be active just for csv, and
doesn't depend on current format - so it can be used in .psqlrc

When it is works well for linestyle, then it can works for some other too

Regards

Pavel






>
> --
> Fabien.
>


Re: some last patches breaks plan cache

2018-04-01 Thread Pavel Stehule
2018-04-01 1:00 GMT+02:00 Tomas Vondra :

>
>
> On 03/31/2018 08:28 PM, Tomas Vondra wrote:
> >
> >
> > On 03/31/2018 07:56 PM, Tomas Vondra wrote:
> >> On 03/31/2018 07:38 PM, Pavel Stehule wrote:
> >>> Hi
> >>>
> >>> CREATE OR REPLACE PROCEDURE public.proc(a integer, INOUT b integer, c
> >>> integer)
> >>>  LANGUAGE plpgsql
> >>> AS $procedure$
> >>> begin
> >>>   b := a + c;
> >>> end;
> >>> $procedure$
> >>>
> >>> CREATE OR REPLACE PROCEDURE public.testproc()
> >>>  LANGUAGE plpgsql
> >>> AS $procedure$
> >>> declare r int;
> >>> begin
> >>>   call proc(10, r, 20);
> >>> end;
> >>> $procedure$
> >>>
> >>> postgres=# call testproc();
> >>> CALL
> >>> postgres=# call testproc();
> >>> ERROR:  SPI_execute_plan_with_paramlist failed executing query "CALL
> >>> proc(10, r, 20)": SPI_ERROR_ARGUMENT
> >>> CONTEXT:  PL/pgSQL function testproc() line 4 at CALL
> >>> postgres=#
> >>>
> >>> second call fails
> >>
> >> Yeah.
> >>
> >> d92bc83c48bdea9888e64cf1e2edbac9693099c9 seems to have broken this :-/
> >>
> >
> > FWIW it seems the issue is somewhere in exec_stmt_call, which does this:
> >
> > /*
> >  * Don't save the plan if not in atomic context.  Otherwise,
> >  * transaction ends would cause warnings about plan leaks.
> >  */
> > exec_prepare_plan(estate, expr, 0, estate->atomic);
> >
> > When executed outside transaction, CALL has estate->atomic=false, and so
> > calls exec_prepare_plan() with keepplan=false. And on the second call it
> > gets bogus Plan, of course (with the usual 0x7f7f7f7f7f7f7f7f patterns).
> >
> > When in a transaction, it sets keepplan=true, and everything works fine.
> >
> > So either estate->atomic is not sufficient on it's own, or we need to
> > reset the expr->plan somewhere.
> >
>
> The attached patch fixes this, but I'm not really sure it's the right
> fix - I'd expect there to be a more principled way, doing resetting the
> plan pointer when 'plan->saved == false'.
>
>
it fixes some issue, but not all

I see changes in plpgsql_check regress tests

CREATE OR REPLACE PROCEDURE public.testproc()
 LANGUAGE plpgsql
AS $procedure$
declare r int;
begin
  call proc(10, r + 10, 20);
end;
$procedure$

postgres=# call testproc();
ERROR:  argument 2 is an output argument but is not writable
CONTEXT:  PL/pgSQL function testproc() line 4 at CALL
postgres=# call testproc();
ERROR:  SPI_execute_plan_with_paramlist failed executing query "CALL
proc(10, r + 10, 20)": SPI_ERROR_ARGUMENT
CONTEXT:  PL/pgSQL function testproc() line 4 at CALL




> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


tab complete for procedures for \sf and \ef commands

2018-04-01 Thread Pavel Stehule
Hi

small bugfix patch

Regards

Pavel
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 6926ca132e..f6f7c52bb0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3685,7 +3685,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_relations, NULL);
 
 	else if (TailMatchesCS1("\\ef"))
-		COMPLETE_WITH_VERSIONED_SCHEMA_QUERY(Query_for_list_of_functions, NULL);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_routines, NULL);
 	else if (TailMatchesCS1("\\ev"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_views, NULL);
 
@@ -3794,7 +3794,7 @@ psql_completion(const char *text, int start, int end)
 			COMPLETE_WITH_LIST_CS3("default", "verbose", "terse");
 	}
 	else if (TailMatchesCS1("\\sf*"))
-		COMPLETE_WITH_VERSIONED_SCHEMA_QUERY(Query_for_list_of_functions, NULL);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_routines, NULL);
 	else if (TailMatchesCS1("\\sv*"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_views, NULL);
 	else if (TailMatchesCS1("\\cd|\\e|\\edit|\\g|\\i|\\include|"


Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-04-01 Thread Simon Riggs
On 31 March 2018 at 14:21, Amit Kapila  wrote:
> On Thu, Mar 29, 2018 at 4:47 PM, Greg Stark  wrote:
>> I'm poking around to see debug a vacuuming problem and wondering if
>> I've found something more serious.
>>
>> As far as I can tell the snapshots on HOT standby are built using a
>> list of running xids that the primary builds and puts in the WAL and
>> that seems to include all xids from transactions running in all
>> databases. The HOT standby would then build a snapshot and eventually
>> send the xmin of that snapshot back to the primary in the hot standby
>> feedback and that would block vacuuming tuples that might be visible
>> to the standby.
>>
>> Many ages ago Alvaro sweated blood to ensure vacuums could run for
>> long periods of time without holding back the xmin horizon and
>> blocking other vacuums from cleaning up tuples. That's the purpose of
>> the excludeVacuum flag in GetCurrentVirtualXIDs(). That's possible
>> because we know vacuums won't insert any tuples that queries might try
>> to view and also vacuums won't try to perform any sql queries on other
>> tables.
>>
>> I can't find anywhere that the standby snapshot building mechanism
>> gets this same information about which xids are actually vacuums that
>> can be ignored when building a snapshot.
>>
>
> I think the vacuum assigns xids only if it needs to truncate some of
> the pages in the relation which happens towards the end of vacuum.
> So, it shouldn't hold back the xmin horizon for long.

Yes, that's the reason. I recall VACUUMs giving lots of problems
during development  of Hot Standby.

VACUUM FULL was the thing that needed to be excluded in the past
because it needed an xid to move rows.

Greg's concern is a good one and his noticing that we hadn't
specifically excluded VACUUMs is valid, so we should exclude them.
Well spotted, Greg.

So although this doesn't have the dramatic effect it might have had,
there is still the possibility of some effect and I think we should
treat it as a bug.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


ignore_lazy_vacuums_in_RunningTransactionData.v1.patch
Description: Binary data


Re: Online enabling of checksums

2018-04-01 Thread Magnus Hagander
On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra 
wrote:

> On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> > mailto:tomas.von...@2ndquadrant.com>>
> wrote:
> >
> > ...
> >
> > I do think just waiting for all running transactions to complete is
> > fine, and it's not the first place where we use it - CREATE
> SUBSCRIPTION
> > does pretty much exactly the same thing (and CREATE INDEX
> CONCURRENTLY
> > too, to some extent). So we have a precedent / working code we can
> copy.
> >
> >
> > Thinking again, I don't think it should be done as part of
> > BuildRelationList(). We should just do it once in the launcher before
> > starting, that'll be both easier and cleaner. Anything started after
> > that will have checksums on it, so we should be fine.
> >
> > PFA one that does this.
> >
>
> Seems fine to me. I'd however log waitforxid, not the oldest one. If
> you're a DBA and you want to make the checksumming to proceed, knowing
> the oldest running XID is useless for that. If we log waitforxid, it can
> be used to query pg_stat_activity and interrupt the sessions somehow.
>

Yeah, makes sense. Updated.



> > > And if you try this with a temporary table (not hidden in
> transaction,
> > > so the bgworker can see it), the worker will fail with this:
> > >
> > >   ERROR:  cannot access temporary tables of other sessions
> > >
> > > But of course, this is just another way how to crash without
> updating
> > > the result for the launcher, so checksums may end up being
> enabled
> > > anyway.
> > >
> > >
> > > Yeah, there will be plenty of side-effect issues from that
> > > crash-with-wrong-status case. Fixing that will at least make things
> > > safer -- in that checksums won't be enabled when not put on all
> pages.
> > >
> >
> > Sure, the outcome with checksums enabled incorrectly is a
> consequence of
> > bogus status, and fixing that will prevent that. But that wasn't my
> main
> > point here - not articulated very clearly, though.
> >
> > The bigger question is how to handle temporary tables gracefully, so
> > that it does not terminate the bgworker like this at all. This might
> be
> > even bigger issue than dropped relations, considering that temporary
> > tables are pretty common part of applications (and it also includes
> > CREATE/DROP).
> >
> > For some clusters it might mean the online checksum enabling would
> > crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
> >
> > Unfortunately, try_relation_open() won't fix this, as the error comes
> > from ReadBufferExtended. And it's not a matter of simply creating a
> > ReadBuffer variant without that error check, because temporary tables
> > use local buffers.
> >
> > I wonder if we could just go and set the checksums anyway, ignoring
> the
> > local buffers. If the other session does some changes, it'll
> overwrite
> > our changes, this time with the correct checksums. But it seems
> pretty
> > dangerous (I mean, what if they're writing stuff while we're updating
> > the checksums? Considering the various short-cuts for temporary
> tables,
> > I suspect that would be a boon for race conditions.)
> >
> > Another option would be to do something similar to running
> transactions,
> > i.e. wait until all temporary tables (that we've seen at the
> beginning)
> > disappear. But we're starting to wait on more and more stuff.
> >
> > If we do this, we should clearly log which backends we're waiting
> for,
> > so that the admins can go and interrupt them manually.
> >
> >
> >
> > Yeah, waiting for all transactions at the beginning is pretty simple.
> >
> > Making the worker simply ignore temporary tables would also be easy.
> >
> > One of the bigger issues here is temporary tables are *session* scope
> > and not transaction, so we'd actually need the other session to finish,
> > not just the transaction.
> >
> > I guess what we could do is something like this:
> >
> > 1. Don't process temporary tables in the checksumworker, period.
> > Instead, build a list of any temporary tables that existed when the
> > worker started in this particular database (basically anything that we
> > got in our scan). Once we have processed the complete database, keep
> > re-scanning pg_class until those particular tables are gone (search by
> oid).
> >
> > That means that any temporary tables that are created *while* we are
> > processing a database are ignored, but they should already be receiving
> > checksums.
> >
> > It definitely leads to a potential issue with long running temp tables.
> > But as long as we look at the *actual tables* (by oid), we should be
> > able to handle long-running sessions once they have dropped their temp
> > tables.
> >
> > Does that sound workable to you?
> >
>
> Yes, that's pretty much what I meant b

Diagonal storage model

2018-04-01 Thread Konstantin Knizhnik

Hi hackers,

Vertical (columnar) storage mode is most optimal for analytic and this is why 
it is widely used in databases oriented on OLAP, such as Vertica, HyPer,KDB,...
In Postgres we have cstore extension which is not able to provide all benefits 
of vertical model because of lack of support of vector operations in executor.
Situation can be changed if we will have pluggable storage API with support of 
vectorized execution.

But veritcal model is not so good for updates and load of data (because data is 
mostly imported in horizontal format).
This is why in most of the existed systems data is presentin both formats (at 
least for some time).

I want to announce new model, "diagonal storage" which combines benefits of 
both approaches.
The idea is very simple: we first store column 1 of first record, then column 2 
of second record, ... and so on until we reach the last column.
After it we store second column of first record, third column of the second 
record,...

Profiling of TPC-H queries shows that mode of the time of query exectution 
(about 17%) is spent is heap_deform_tuple.
New format will allow to significantly reduce time of heap deforming, because 
there is just of column if the particular record in each tile.
Moreover over we can perform deforming of many tuples in parallel, which ids 
especially efficient at quantum computers.

Attach please find patch with first prototype implementation. It provides about 
3.14 times improvement of performance at most of TPC-H queries.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



diagonal.patch.gz
Description: GNU Zip compressed data


Re: new function for tsquery creartion

2018-04-01 Thread Aleksandr Parfenov

Hello hackers,

On 2018-03-28 12:21, Aleksander Alekseev wrote:

It doesn't sound right to me to accept any input as a general rule but
sometimes return errors nevertheless. That API would be complicated for
the users. Thus I suggest to accept any garbage and try our best to
interpret it.


I agree with Aleksander about silencing all errors in 
websearch_to_tsquery().


In the attachment is a revised patch with the attempt to introduce an 
ability to ignore syntax errors in gettoken_tsvector().
I'm also read through the patch and all the code looks good to me except 
one thing.
The name of enum ts_parsestate looks more like a name of the function 
than a name of a type.
In my version, it renamed to QueryParserState, but you can fix it if I'm 
wrong.


--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index ea5947a3a8..bdf05236cf 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -390,7 +390,8 @@ add_to_tsvector(void *_state, char *elem_value, int elem_len)
  * and different variants are ORed together.
  */
 static void
-pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, int16 weight, bool prefix)
+pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval,
+			  int16 weight, bool prefix, bool force_phrase)
 {
 	int32		count = 0;
 	ParsedText	prs;
@@ -423,7 +424,12 @@ pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval,
 	/* put placeholders for each missing stop word */
 	pushStop(state);
 	if (cntpos)
-		pushOperator(state, data->qoperator, 1);
+	{
+		if (force_phrase)
+			pushOperator(state, OP_PHRASE, 1);
+		else
+			pushOperator(state, data->qoperator, 1);
+	}
 	cntpos++;
 	pos++;
 }
@@ -464,7 +470,10 @@ pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval,
 			if (cntpos)
 			{
 /* distance may be useful */
-pushOperator(state, data->qoperator, 1);
+if (force_phrase)
+	pushOperator(state, OP_PHRASE, 1);
+else
+	pushOperator(state, data->qoperator, 1);
 			}
 
 			cntpos++;
@@ -490,6 +499,7 @@ to_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(&data),
+		  false,
 		  false);
 
 	PG_RETURN_TSQUERY(query);
@@ -520,7 +530,8 @@ plainto_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(&data),
-		  true);
+		  true,
+		  false);
 
 	PG_RETURN_POINTER(query);
 }
@@ -551,7 +562,8 @@ phraseto_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(&data),
-		  true);
+		  true,
+		  false);
 
 	PG_RETURN_TSQUERY(query);
 }
@@ -567,3 +579,36 @@ phraseto_tsquery(PG_FUNCTION_ARGS)
 		ObjectIdGetDatum(cfgId),
 		PointerGetDatum(in)));
 }
+
+Datum
+websearch_to_tsquery_byid(PG_FUNCTION_ARGS)
+{
+	text	   *in = PG_GETARG_TEXT_PP(1);
+	MorphOpaque	data;
+	TSQuery		query = NULL;
+
+	data.cfg_id = PG_GETARG_OID(0);
+
+	data.qoperator = OP_AND;
+
+	query = parse_tsquery(text_to_cstring(in),
+		  pushval_morph,
+		  PointerGetDatum(&data),
+		  false,
+		  true);
+
+	PG_RETURN_TSQUERY(query);
+}
+
+Datum
+websearch_to_tsquery(PG_FUNCTION_ARGS)
+{
+	text	   *in = PG_GETARG_TEXT_PP(0);
+	Oid			cfgId;
+
+	cfgId = getTSCurrentConfig(true);
+	PG_RETURN_DATUM(DirectFunctionCall2(websearch_to_tsquery_byid,
+		ObjectIdGetDatum(cfgId),
+		PointerGetDatum(in)));
+
+}
diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index 1ccbf79030..00e6218691 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -32,12 +32,24 @@ const int	tsearch_op_priority[OP_COUNT] =
 	3			/* OP_PHRASE */
 };
 
+/*
+ * parser's states
+ */
+typedef enum
+{
+	WAITOPERAND = 1,
+	WAITOPERATOR = 2,
+	WAITFIRSTOPERAND = 3,
+	WAITSINGLEOPERAND = 4,
+	INQUOTES = 5 /* for quoted phrases in web search */
+} QueryParserState;
+
 struct TSQueryParserStateData
 {
 	/* State for gettoken_query */
 	char	   *buffer;			/* entire string we are scanning */
 	char	   *buf;			/* current scan point */
-	int			state;
+	QueryParserState state;
 	int			count;			/* nesting count, incremented by (,
  * decremented by ) */
 
@@ -57,12 +69,6 @@ struct TSQueryParserStateData
 	TSVectorParseState valstate;
 };
 
-/* parser's states */
-#define WAITOPERAND 1
-#define WAITOPERATOR	2
-#define WAITFIRSTOPERAND 3
-#define WAITSINGLEOPERAND 4
-
 /*
  * subroutine to parse the modifiers (weight and prefix flag currently)
  * part, like ':AB*' of a query.
@@ -197,6 +203,21 @@ err:
 	return buf;
 }
 
+/*
+ * Parse OR operator used in websearch_to_tsquery().
+ */
+static bool
+parse_or_operator(char 

Re: Add default role 'pg_access_server_files'

2018-04-01 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Sun, Mar 25, 2018 at 09:43:25PM -0400, Stephen Frost wrote:
> > * Michael Paquier (mich...@paquier.xyz) wrote:
> >> On Thu, Mar 08, 2018 at 10:15:11AM +0900, Michael Paquier wrote:
> >> > Other than that the patch looks in pretty good shape to me.
> >> 
> >> The regression tests of file_fdw are blowing up because of an error
> >> string patch 2 changes.
> > 
> > Fixed in the attached.
> 
> Thanks for the updated version.  This test is fixed.

Thanks for checking.  Attached is an updated version which also includes
the changes for adminpack, done in a similar manner to how pgstattuple
was updated, as discussed.  Regression tests updated and extended a bit,
doc updates also included.

If you get a chance to take a look, that'd be great.  I'll do my own
review of it again also after stepping away for a day or so.

Thanks!

Stephen
From 296b407863a7259a04e5e8cfc19f9b8ea124777c Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Wed, 7 Mar 2018 06:42:42 -0500
Subject: [PATCH 1/3] Remove explicit superuser checks in favor of ACLs

This removes the explicit superuser checks in the various file-access
functions in the backend, specifically pg_ls_dir(), pg_read_file(),
pg_read_binary_file(), and pg_stat_file().  Instead, EXECUTE is REVOKE'd
from public for these, meaning that only a superuser is able to run them
by default, but access to them can be GRANT'd to other roles.

Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net
---
 src/backend/catalog/system_views.sql | 14 ++
 src/backend/utils/adt/genfile.c  | 20 
 2 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5e6e8a64f6..559610b12f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1149,6 +1149,20 @@ REVOKE EXECUTE ON FUNCTION lo_export(oid, text) FROM public;
 REVOKE EXECUTE ON FUNCTION pg_ls_logdir() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_ls_waldir() FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_read_file(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_read_file(text,bigint,bigint) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_read_file(text,bigint,bigint,boolean) FROM public;
+
+REVOKE EXECUTE ON FUNCTION pg_read_binary_file(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_read_binary_file(text,bigint,bigint) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_read_binary_file(text,bigint,bigint,boolean) FROM public;
+
+REVOKE EXECUTE ON FUNCTION pg_stat_file(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stat_file(text,boolean) FROM public;
+
+REVOKE EXECUTE ON FUNCTION pg_ls_dir(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
+
 --
 -- We also set up some things as accessible to standard roles.
 --
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index d9027fc688..a4c0f6d5ca 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -195,11 +195,6 @@ pg_read_file(PG_FUNCTION_ARGS)
 	char	   *filename;
 	text	   *result;
 
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- (errmsg("must be superuser to read files";
-
 	/* handle optional arguments */
 	if (PG_NARGS() >= 3)
 	{
@@ -236,11 +231,6 @@ pg_read_binary_file(PG_FUNCTION_ARGS)
 	char	   *filename;
 	bytea	   *result;
 
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- (errmsg("must be superuser to read files";
-
 	/* handle optional arguments */
 	if (PG_NARGS() >= 3)
 	{
@@ -313,11 +303,6 @@ pg_stat_file(PG_FUNCTION_ARGS)
 	TupleDesc	tupdesc;
 	bool		missing_ok = false;
 
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- (errmsg("must be superuser to get file information";
-
 	/* check the optional argument */
 	if (PG_NARGS() == 2)
 		missing_ok = PG_GETARG_BOOL(1);
@@ -399,11 +384,6 @@ pg_ls_dir(PG_FUNCTION_ARGS)
 	directory_fctx *fctx;
 	MemoryContext oldcontext;
 
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- (errmsg("must be superuser to get directory listings";
-
 	if (SRF_IS_FIRSTCALL())
 	{
 		bool		missing_ok = false;
-- 
2.14.1


From 76ba6f1eef402070ca1ff37f74e5dcfc639f6837 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Sun, 31 Dec 2017 14:01:12 -0500
Subject: [PATCH 2/3] Add default roles for file/program access

This patch adds new default roles names 'pg_read_server_files',
'pg_write_server_files', 'pg_execute_server_program' which
allow an administrator to GRANT to a non-superuser role the ability to
access server-side files or run programs through PostgreSQL (as the user
the database is running as).  Having one of these roles allows a
non-superuser to use server-side COPY to read, write, or with a program,
a

Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

2018-04-01 Thread David Rowley
On 3 February 2018 at 03:26, Tom Lane  wrote:
> Tomas Vondra  writes:
>> ISTM this patch got somewhat stuck as we're not quite sure the
>> transformation is correct in all cases. Is my impression correct?
>
> Yeah, that's the core issue.
>
>> If yes, how to we convince ourselves? Would some sort of automated testing
>> (generating data and queries) help? I'm willing to spend some cycles on
>> that, if considered helpful.
>
> I'm not sure if that would be enough to convince doubters.  On the other
> hand, if it found problems, that would definitely be useful.

I've read over this thread and as far as I can see there is concern
that the UNION on the ctids might not re-uniquify the rows again. Tom
mentioned a problem with FULL JOINs, but the concern appears to have
been invalidated due to wrong thinking about how join removals work.

As of now, I'm not quite sure who exactly is concerned. Tom thought
there was an issue but quickly corrected himself.

As far as I see it, there are a few cases we'd need to disable the optimization:

1. Query contains target SRFs (the same row might get duplicated, we
don't want to UNION these duplicates out again, they're meant to be
there)
2. Query has setops (ditto)
3. Any base rels are not RELKIND_RELATION (we need the ctid to
uniquely identify rows)
4. Query has volatile functions (don't want to evaluate volatile
functions more times than requested)

As far as the DISTINCT clause doing the right thing for joins, I see
no issues, even with FULL JOINs. In both branches of the UNION the
join condition will be the same so each side of the join always has
the same candidate row to join to.  I don't think the optimization is
possible if there are OR clauses in the join condition, but that's not
being proposed.

FULL JOINS appear to be fine as the row is never duplicated on a
non-match, so there will only be one version of t1.ctid, NULL::tid or
NULL::tid, t1.ctid and all ctids in the distinctClauses cannot all be
NULL at once.

I used the following SQL to help my brain think through this. There
are two versions of each query, one with DISTINCT and one without. If
the DISTINCT returns fewer rows than the one without then we need to
disable this optimization for that case. I've written queries for 3 of
the above 4 cases. I saw from reading the thread that case #4 is
already disabled:

drop table if exists t1,t2;
create table t1 (a int);
create table t2 (a int);

insert into t1 values(1),(1),(2),(4);
insert into t2 values(1),(1),(3),(3),(4),(4);

select t1.ctid,t2.ctid from t1 full join t2 on t1.a = t2.a;

select distinct t1.ctid,t2.ctid from t1 full join t2 on t1.a = t2.a;

-- case 1: must disable in face of tSRFs

select ctid from (select ctid,generate_Series(1,2) from t1) t;

select distinct ctid from (select ctid,generate_Series(1,2) from t1) t;

-- case 2: must disable optimization with setops.

select ctid from (select ctid from t1 union all select ctid from t1) t;

select distinct ctid from (select ctid from t1 union all select ctid from t1) t;

-- case 3: must disable if we join to anything other than a
RELKIND_RELATION (no ctid)

select ctid from (select t1.ctid from t1 inner join (values(1),(1))
x(x) on t1.a = x.x) t;

select distinct ctid from (select t1.ctid from t1 inner join
(values(1),(1)) x(x) on t1.a = x.x) t;

I've not read the patch yet, but I understand what it's trying to
achieve. My feelings about the patch are that it would be useful to
have. I think if someone needs this then they'll be very happythat
we've added it. I also think there should be a GUC to disable it, and
it should be enabled through the entire alpha/beta period, and we
should consider what the final value for it should be just before RC1.
It's a bit sad to exclude foreign tables, and I'm not too sure what
hurdles this leaves out for pluggable storage. No doubt we'll need to
disable the optimization for those too unless they can provide us with
some row identifier.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

2018-04-01 Thread David Rowley
On 30 March 2018 at 15:05, Andres Freund  wrote:
>> + * To allow join removal to happen, we can't reference the CTID column
>> + * of an otherwise-removable relation.
>
> A brief hint why wouldn't hurt.

Maybe something like:

/*
 * Join removal is only ever possible when no columns of the
to-be-removed relation
 * are referenced.  If we added the CTID here then we could
inadvertently disable join
 * removal.  We'll need to delay adding the CTID until after join
removal takes place.
 */

(I've not read the code, just the thread)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Diagonal storage model

2018-04-01 Thread Дмитрий Воронин
Hi, Konstantin!

Thank you for working on new pluggable storage API.

Your patch in attachment is 505 bytes and contains only diff from explain.c. Is 
it right?

01.04.2018, 15:48, "Konstantin Knizhnik" :
> Hi hackers,
>
> Vertical (columnar) storage mode is most optimal for analytic and this is why 
> it is widely used in databases oriented on OLAP, such as Vertica, 
> HyPer,KDB,...
> In Postgres we have cstore extension which is not able to provide all 
> benefits of vertical model because of lack of support of vector operations in 
> executor.
> Situation can be changed if we will have pluggable storage API with support 
> of vectorized execution.
>
> But veritcal model is not so good for updates and load of data (because data 
> is mostly imported in horizontal format).
> This is why in most of the existed systems data is presentin both formats (at 
> least for some time).
>
> I want to announce new model, "diagonal storage" which combines benefits of 
> both approaches.
> The idea is very simple: we first store column 1 of first record, then column 
> 2 of second record, ... and so on until we reach the last column.
> After it we store second column of first record, third column of the second 
> record,...
>
> Profiling of TPC-H queries shows that mode of the time of query exectution 
> (about 17%) is spent is heap_deform_tuple.
> New format will allow to significantly reduce time of heap deforming, because 
> there is just of column if the particular record in each tile.
> Moreover over we can perform deforming of many tuples in parallel, which ids 
> especially efficient at quantum computers.
>
> Attach please find patch with first prototype implementation. It provides 
> about 3.14 times improvement of performance at most of TPC-H queries.
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

-- 
Best regards, Dmitry Voronin




json(b)_to_tsvector with numeric values

2018-04-01 Thread Dmitry Dolgov
Hi,

We've just noticed, that current implementation of `json(b)_to_tsvector` can be
confusing sometimes, if the target document contains numeric values.
In this case
we just drop them, and only string values will contribute to the result:

select to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::jsonb);
   to_tsvector
-
 'fat':2 'rat':3
(1 row)

The result would be less surprising if all values, that can be converted to
string representation (so, strings and numeric values, nothing to do for null &
boolean), will take part in it:

select to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::jsonb);
   to_tsvector
-
 '123':5 'fat':2 'rat':3
(1 row)

Attached patch contains small fix that's necessary to get the described
behavior. This patch doesn't touch `ts_headline` though, because following the
same approach it would require changing the type of element in the resulting
json(b).

Any opinions about this suggestion? Can it be considered as a bug fix and
included into this release?


jsonb_to_tsvector_numeric_v1.patch
Description: Binary data


Re: [PATCH] Verify Checksums during Basebackups

2018-04-01 Thread Magnus Hagander
On Sat, Mar 31, 2018 at 2:54 PM, Michael Banck 
wrote:

> Hi,
>
> On Fri, Mar 30, 2018 at 07:46:02AM -0400, Stephen Frost wrote:
> > * Magnus Hagander (mag...@hagander.net) wrote:
> > > On Fri, Mar 30, 2018 at 5:35 AM, David Steele 
> wrote:
> > >
> > > > On 3/24/18 10:32 AM, Michael Banck wrote:
> > > > > Am Freitag, den 23.03.2018, 17:43 +0100 schrieb Michael Banck:
> > > > >> Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:
> > > > >>> In my experience actual block errors are relatively rare, so
> there
> > > > >>> aren't likely to be more than a few in a file.  More common are
> > > > >>> overwritten or transposed files, rogue files, etc.  These
> produce a lot
> > > > >>> of output.
> > > > >>>
> > > > >>> Maybe stop after five?
> > > > >
> > > > > The attached patch does that, and outputs the total number of
> > > > > verification failures of that file after it got sent.
> > > > >
> > > > >> I'm on board with this, but I have the feeling that this is not a
> very
> > > > >> common pattern in Postgres, or might not be project style at
> all.  I
> > > > >> can't remember even seen an error message like that.
> > > > >>
> > > > >> Anybody know whether we're doing this in a similar fashion
> elsewhere?
> > > > >
> > > > > I tried to have look around and couldn't find any examples, so I'm
> not
> > > > > sure that patch should go in. On the other hand, we abort on
> checksum
> > > > > failures usually (in pg_dump e.g.), so limiting the number of
> warnings
> > > > > does makes sense.
> > > > >
> > > > > I guess we need to see what others think.
> > > >
> > > > Well, at this point I would say silence more or less gives consent.
> > > >
> > > > Can you provide a rebased patch with the validation retry and warning
> > > > limiting logic added? I would like to take another pass through it
> but I
> > > > think this is getting close.
> > >
> > > I was meaning to mention it, but ran out of cycles.
> > >
> > > I think this is the right way to do it, except the 5 should be a
> #define
> > > and not an inline hardcoded value :) We could argue whether it should
> be "5
> > > total" or "5 per file". When I read the emails I thought it was going
> to be
> > > 5 total, but I see the implementation does 5 / file. In a
> super-damanged
> > > system that will still lead to horrible amounts of logging, but I think
> > > maybe if your system is in that bad shape, then it's a lost cause
> anyway.
> >
> > 5/file seems reasonable to me as well.
> >
> > > I also think the "total number of checksum errors" should be logged if
> > > they're >0, not >5. And I think *that* one should be logged at the end
> of
> > > the entire process, not per file. That'd be the kind of output that
> would
> > > be the most interesting, I think (e.g. if I have it spread out with 1
> block
> > > each across 4 files, I want that logged at the end because it's easy to
> > > otherwise miss one or two of them that may have happened a long time
> apart).
> >
> > I definitely like having a total # of checksum errors included at the
> > end, if there are any at all.  When someone is looking to see why the
> > process returned a non-zero exit code, they're likely to start looking
> > at the end of the log, so having that easily available and clear as to
> > why the backup failed is definitely valuable.
> >
> > > I don't think we have a good comparison elsewhere, and that is as David
> > > mention because other codepaths fail hard when they run into something
> like
> > > that. And we explicitly want to *not* fail hard, per previous
> discussion.
> >
> > Agreed.
>
> Attached is a new and rebased patch which does the above, plus
> integrates the suggested changes by David Steele. The output is now:
>
> $ initdb -k --pgdata=data1 1> /dev/null 2> /dev/null
> $ pg_ctl --pgdata=data1 --log=pg1.log start > /dev/null
> $ dd conv=notrunc oflag=seek_bytes seek=4000 bs=8 count=1 if=/dev/zero
> of=data1/base/12374/2610 2> /dev/null
> $ for i in 4000 13000 21000 29000 37000 43000; do dd conv=notrunc
> oflag=seek_bytes seek=$i bs=8 count=1 if=/dev/zero
> of=data1/base/12374/1259; done 2> /dev/null
> $ pg_basebackup -v -h /tmp --pgdata=data2
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/260 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_13882"
> WARNING:  checksum verification failed in file "./base/12374/2610", block
> 0: calculated C2C9 but expected EC78
> WARNING:  checksum verification failed in file "./base/12374/1259", block
> 0: calculated 8BAE but expected 46B8
> WARNING:  checksum verification failed in file "./base/12374/1259", block
> 1: calculated E413 but expected 7701
> WARNING:  checksum verification failed in file "./base/12374/1259", block
> 2: calculated 5DA9 but expected D5AA
> WARNING:  checksum verification failed in file "./base/12374/1

Re: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full

2018-04-01 Thread Magnus Hagander
On Fri, Mar 23, 2018 at 3:45 PM, Julian Markwort <
julian.markw...@uni-muenster.de> wrote:

> On Sat, 2018-03-17 at 18:24 +0100, Magnus Hagander wrote:
>
> The error message "certificate authentication failed for user XYZ:
> client certificate contains no user name" is the result of calling
> CheckCertAuth when the user presented a certificate without a CN in it.
>
>
> That is arguably wrong, since it's actually password authentication that
> fails. That is the authentication type that was picked in pg_hba.conf. It's
> more "certificate validation" that failed.
>
>
> I think we got confused about this; maybe I didn't graps it fully before:
> CheckCertAuth is currently only called when auth method cert is used. So it
> actually makes sense to say that certificate authentication failed, I think.
>
> I agree that the log message is useful. Though it could be good to clearly
> indicate that it was caused specifically because of the verify-full, to
> differentiate it from other cases of the same message.
>
> I've modified my patch so it still uses CheckCertAuth, but now a different
> message is written to the log when clientcert=verify-full was used.
> For auth method cert, the function should behave as before.
>
> For example, what about the scenario where I use GSSAPI authentication and
> clientcert=verify-full. Then we suddenly have three usernames (gssapi,
> certificate and specified) -- how is the user supposed to know which one
> came from the cert and which one came from gssapi for example?
>
> The user will only see what's printed in the auth_failed() function in
> auth.c with the addition of the logdetail string, which I don't touch with
> this patch.
> As you said, it makes sense that more detailed information is only
> available in the server's log.
>
> I've attached an updated version of the patch.
>

I assume this is a patch that's intended to be applied on top of the
previous patch? If so, please submit the complete pach to make sure the
correct combination ends up actually being reviewed.



> I'm not sure if it is preferred to keep patches as short as possible
> (mostly with respect to the changed lines in the documentation) or to
> organize changes so that the text matches the surrounding column width und
> text flow? Also, I've omitted mentions of the current usage 'clientcert=1'
> - this is still supported in code, but I think telling new users only about
> 'clientcert=verify-ca' and 'clientcert=verify-full' is clearer. Or am I
> wrong on this one?
>
>
I have not had time to look at the updated verison of the patch yet, but I
wanted to get a response in for your last question here.

Keeping patches as short as possible is not a good thing itself. The
important part is that the resulting code and documentation is the best
possible. Sometimes you might want to turn it into two patches submitted at
the same time if one is clearly just reorganisation, but avoiding code
restructure just to keep the lines of patch down is not helpful.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full

2018-04-01 Thread Julian Markwort
On 1. of April 2018 17:46:38 MESZ wrote Magnus Hagander :

>I assume this is a patch that's intended to be applied on top of the
>previous patch? If so, please submit the complete pach to make sure the
>correct combination ends up actually being reviewed.

The v02.patch attached to my last mail contains both source and documentation 
changes.

>Keeping patches as short as possible is not a good thing itself. The
>important part is that the resulting code and documentation is the best
>possible. Sometimes you might want to turn it into two patches
>submitted at
>the same time if one is clearly just reorganisation, but avoiding code
>restructure just to keep the lines of patch down is not helpful.

I think I've made the right compromises regarding readability of the 
documentation in my patch then.

A happy Easter, passover, or Sunday to you
Julian



Re: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full

2018-04-01 Thread Magnus Hagander
On Sun, Apr 1, 2018 at 6:01 PM, Julian Markwort <
julian.markw...@uni-muenster.de> wrote:

> On 1. of April 2018 17:46:38 MESZ wrote Magnus Hagander <
> mag...@hagander.net>:
>
> >I assume this is a patch that's intended to be applied on top of the
> >previous patch? If so, please submit the complete pach to make sure the
> >correct combination ends up actually being reviewed.
>
> The v02.patch attached to my last mail contains both source and
> documentation changes.
>

Hmm. I think I may have been looking at the wrong file. Sorry!


>Keeping patches as short as possible is not a good thing itself. The
> >important part is that the resulting code and documentation is the best
> >possible. Sometimes you might want to turn it into two patches
> >submitted at
> >the same time if one is clearly just reorganisation, but avoiding code
> >restructure just to keep the lines of patch down is not helpful.
>
> I think I've made the right compromises regarding readability of the
> documentation in my patch then.
>
> A happy Easter, passover, or Sunday to you
>

You, too!

(I shall return to reviewing it after the holidays are over)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-04-01 Thread Tomas Vondra
Hi,

The attached patch version modifies how the non-MCV selectivity is
computed, along the lines explained in the previous message.

The comments in statext_clauselist_selectivity() explain it in far more
detail, but we this:

1) Compute selectivity using the MCV (s1).

2) To compute the non-MCV selectivity (s2) we do this:

2a) See how many top-level equalities are there (and compute ndistinct
estimate for those attributes).

2b) If there is an equality on each column, we know there can only be a
single matching item. If we found it in the MCV (i.e. s1 > 0) we're
done, and 's1' is the answer.

2c) If only some columns have equalities, we estimate the selectivity
for equalities as

s2 = ((1 - mcv_total_sel) / ndistinct)

If there are no remaining conditions, we're done.

2d) To estimate the non-equality clauses (on non-MCV part only), we
either repeat the whole process by calling clauselist_selectivity() or
approximating s1 to the non-MCV part. This needs a bit of care to
prevent infinite loops.


Of course, with 0002 this changes slightly, because we may try using a
histogram to estimate the non-MCV part. But that's just an extra step
right before (2a).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-multivariate-MCV-lists-20180401.patch.gz
Description: application/gzip


0002-multivariate-histograms-20180401.patch.gz
Description: application/gzip


Re: Planning counters in pg_stat_statements

2018-04-01 Thread legrand legrand
Hello,

When testing this patch on my WIN1252 database with my java front end, using
11devel snapshot
 I get 
  org.posgresql.util.PSQLException: ERROR: character with byte sequence 0x90
in encoding 
  "WIN1252" has no equivalent in encoding "UTF8"

When using psql with client_encoding = WIN1252, query text are truncated:

postgres=# select pg_stat_statements_reset();
 pg_stat_statements_reset
--

(1 row)


postgres=# show client_encoding;
 client_encoding
-
 WIN1252
(1 row)


postgres=# select substr(query,1,20) from pg_stat_statements;
   substr

 tatements_reset();
 ding;
(2 rows)

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Diagonal storage model

2018-04-01 Thread legrand legrand
Great Idea !
thank you Konstantin



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Diagonal storage model

2018-04-01 Thread Alexander Korotkov
Hi!

On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> I want to announce new model, "diagonal storage" which combines benefits
> of both approaches.
> The idea is very simple: we first store column 1 of first record, then
> column 2 of second record, ... and so on until we reach the last column.
> After it we store second column of first record, third column of the
> second record,...
>

Sounds interesting.  Could "diagonal storages" be applied twice?  That is
could we apply
diagonal transformation to the result of another diagonal transformation?
I expect we
should get a "square diagonal" transformation...

Attach please find patch with first prototype implementation. It provides
> about 3.14 times improvement of performance at most of TPC-H queries.


Great, but with square diagonal transformation we should get 3.14^2 times
improvement,
which is even better!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Optimize Arm64 crc32c implementation in Postgresql

2018-04-01 Thread Andres Freund
Hi,

On 2018-03-06 02:44:35 +0800, Heikki Linnakangas wrote:
> On 02/03/18 06:42, Andres Freund wrote:
> > On 2018-03-02 11:37:52 +1300, Thomas Munro wrote:
> > > So... that stuff probably needs either a configure check for the
> > > getauxval function and/or those headers, or an OS check?
> > 
> > It'd probably be better to not rely on os specific headers, and instead
> > directly access the capabilities.
> 
> Anyone got an idea on how to do that? I googled around a bit, but couldn't
> find any examples.

Similar...


> * Use compiler intrinsics instead of inline assembly.

+many


> * I tested this on Linux, with gcc and clang, on an ARM64 virtual machine
> that I had available (not an emulator, but a VM on a shared ARM64 server).

Have you seen actual postgres performance benefits with the patch?

- Andres



Rethinking -L switch handling and construction of LDFLAGS

2018-04-01 Thread Tom Lane
I noticed that if I build with --with-libxml on my Mac platforms,
"make installcheck" stops working for certain contrib modules such
as postgres_fdw.  I finally got around to diagnosing the reason why,
and it goes like this:

1. --with-libxml causes configure to include

-L/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib
in the LDFLAGS value put into Makefile.global.  That's because
"xml2-config --libs" emits that, and we do need it if we want to link
to the platform-supplied libxml2.

2. However, that directory also contains a symlink to the
platform-supplied libpq.

3. When we go to build postgres_fdw.so, the link command line looks like

ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv 
-Wno-unused-command-line-argument -g -O2  -bundle -multiply_defined suppress -o 
postgres_fdw.so postgres_fdw.o option.o deparse.o connection.o shippable.o  
-L../../src/port -L../../src/common 
-L/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib
  -L/usr/local/ssl/lib -Wl,-dead_strip_dylibs   -L../../src/interfaces/libpq 
-lpq -bundle_loader ../../src/backend/postgres

The details of this might vary depending on your configure options,
but the key point is that the -L/Applications/... switch is before the
-L../../src/interfaces/libpq one.  This means that the linker resolves
"-lpq" to the platform-supplied libpq, not the one in the build tree.
We can confirm that with

$ otool -L postgres_fdw.so 
postgres_fdw.so:
/usr/lib/libpq.5.dylib (compatibility version 5.0.0, current version 
5.6.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1252.50.4)

So, quite aside from any problems stemming from using a 9.3-vintage libpq
with HEAD client code, we are stuck with a libpq that uses Apple's idea of
the default socket location, rather than what the rest of our build uses.
That explains the failures seen in "make installcheck", which look like

2018-04-01 13:09:48.744 EDT [10758] ERROR:  could not connect to server 
"loopback"
2018-04-01 13:09:48.744 EDT [10758] DETAIL:  could not connect to server: No 
such file or directory
Is the server running locally and accepting
connections on Unix domain socket 
"/var/pgsql_socket/.s.PGSQL.5432"?

Of course, /var/pgsql_socket is *not* where my postmaster is putting
its socket.

In short, we need to deal more honestly with the positioning of -L
switches in link commands.  Somebody's idea that we could embed
both -L and -l into $(libpq), and then pay basically no attention to
where that ends up in the final link command, is just too simplistic.

I think that we want to establish an ironclad rule that -L switches
referencing directories in our own build tree must appear before -L
switches referencing external libraries.

I don't have a concrete patch to propose yet, but the design idea
I have in mind is to split LDFLAGS into two or more parts, so that
-L switches for the build tree are supposed to be put in the first
part and external -L switches in the second.  It'd be sufficient
to have Makefile.global do something like

ifdef PGXS
  LDFLAGS_INTERNAL = -L$(libdir)
else
  LDFLAGS_INTERNAL = -L$(top_builddir)/src/port -L$(top_builddir)/src/common
endif
LDFLAGS = $(LDFLAGS_INTERNAL) @LDFLAGS@

and then teach relevant places that they need to add $(libpq) to
LDFLAGS_INTERNAL not LDFLAGS.  (Perhaps "BUILD" would be a better keyword
than "INTERNAL" here?)  Not sure how that would play exactly with
Makefile.shlib's SHLIB_LINK, but maybe we need SHLIB_LINK_INTERNAL along
with SHLIB_LINK.  I'd also like to try to clean up the mess that is
$(libpq_pgport), though I'm not sure just how yet.

Or we could try to create a full separation between -L and -l switches,
ending up with three or more parts for LDFLAGS not just two.  But I'm
not sure if that gains anything.

I have no idea whether the MSVC build infrastructure has comparable
problems, and would not be willing to fix it myself if it does.
But I am willing to try to fix this in the gmake infrastructure.

Comments, better ideas?

regards, tom lane



Re: Rethinking -L switch handling and construction of LDFLAGS

2018-04-01 Thread Andres Freund
Hi,

On 2018-04-01 13:38:15 -0400, Tom Lane wrote:
> In short, we need to deal more honestly with the positioning of -L
> switches in link commands.  Somebody's idea that we could embed
> both -L and -l into $(libpq), and then pay basically no attention to
> where that ends up in the final link command, is just too simplistic.

Sounds right.


> I don't have a concrete patch to propose yet, but the design idea
> I have in mind is to split LDFLAGS into two or more parts, so that
> -L switches for the build tree are supposed to be put in the first
> part and external -L switches in the second.  It'd be sufficient
> to have Makefile.global do something like
> 
> ifdef PGXS
>   LDFLAGS_INTERNAL = -L$(libdir)
> else
>   LDFLAGS_INTERNAL = -L$(top_builddir)/src/port -L$(top_builddir)/src/common
> endif
> LDFLAGS = $(LDFLAGS_INTERNAL) @LDFLAGS@

I'm not sure I like doing this in Makefile.global. We've various files
that extend LDFLAGS in other places, and that's going to be hard if it's
already mushed together again. We end up re-building it from parts in
those files too.

Why don't we change the link commands to reference LDFLAGS_INTERNAL
explicitly?  That seems like it'd be cleaner.

Greetings,

Andres Freund



Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2018-04-01 Thread Yura Sokolov
17.03.2018 03:36, Tomas Vondra пишет:
> 
> On 03/17/2018 12:03 AM, Yura Sokolov wrote:
>> 16.03.2018 04:23, Tomas Vondra пишет:
>>>
>>> ...
>>>
>>> OK, a few more comments.
>>>
>>> 1) The code in ExtendXipSizeForHash seems somewhat redundant with
>>> my_log2 (that is, we could just call the existing function).
>>
>> Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one
>> more call be more expensive than loop itself?
>>
> 
> I very much doubt it there would be a measurable difference. Firstly,
> function calls are not that expensive, otherwise we'd be running around
> and removing function calls from the hot paths. Secondly, the call only
> happens with many XIDs, and in that case the cost should be out-weighted
> by faster lookups. And finally, the function does stuff that seems far
> more expensive than a single function call (for example allocation,
> which may easily trigger a malloc).
> 
> In fact, in the interesting cases it's pretty much guaranteed to hit a
> malloc, because the number of backend processes needs to be high enough
> (say, 256 or more), which means
> 
> GetMaxSnapshotSubxidCount()
> 
> will translate to something like
> 
> ((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
> = (64 + 1) * 256
> = 16640
> 
> and because XIDs are 4B each, that's ~65kB of memory (even ignoring the
> ExtendXipSizeForHash business). And aset.c treats chunks larger than 8kB
> as separate blocks, that are always malloc-ed independently.
> 
> But I may be missing something, and the extra call actually makes a
> difference. But I think the onus of proving that is on you, and the
> default should be not to duplicate code.
> 
>>> 2) Apparently xhlog/subxhlog fields are used for two purposes - to
>>> store log2 of the two XID counts, and to remember if the hash table
>>> is built. That's a bit confusing (at least it should be explained
>>> in a comment) but most importantly it seems a bit unnecessary.
>>
>> Ok, I'll add comment.
>>
>>> I assume it was done to save space, but I very much doubt that
>>> makes any difference. So we could easily keep a separate flag. I'm
>>> pretty sure there are holes in the SnapshotData struct, so we could
>>> squeeze it the flags in one of those.
>>
>> There's hole just right after xhlog. But it will be a bit strange to 
>> move fields around.
>>
> 
> Is SnapshotData really that sensitive to size increase? I have my doubts
> about that, TBH. The default stance should be to make the code easy to
> understand. That is, we should only move fields around if it actually
> makes a difference.
> 
>>> But do we even need a separate flag? We could just as easily keep 
>>> the log2 fields set to 0 and only set them after actually building 
>>> the hash table.
>>
>> There is a need to signal that there is space for hash. It is not
>> always allocated. iirc, I didn't cover the case where snapshot were
>> restored from file, and some other place or two.
>> Only if all places where snapshot is allocated are properly modified
>> to allocate space for hash, then flag could be omitted, and log2
>> itself used as a flag.
>>
> 
> Hmmm, that makes it a bit inconsistent, I guess ... why not to do the
> same thing on all those places?
> 
>>> Or even better, why not to store the mask so that XidInXip does not
>>> need to compute it over and over (granted, that's uint32 instead
>>> of just uint8, but I don't think SnapshotData is particularly
>>> sensitive to this, especially considering how much larger the xid
>>> hash table is).
>>
>> I don't like unnecessary sizeof struct increase. And I doubt that 
>> computation matters. I could be mistaken though, because it is hot
>> place. Do you think it will significantly improve readability?
>>
> 
> IMHO the primary goal is to make the code easy to read and understand,
> and only optimize struct size if it actually makes a difference. We have
> no such proof here, and I very much doubt you'll be able to show any
> difference because even with separate flags pahole says this:
> 
> struct SnapshotData {
> SnapshotSatisfiesFunc  satisfies;/* 0 8 */
> TransactionId  xmin; /* 8 4 */
> TransactionId  xmax; /*12 4 */
> TransactionId *xip;  /*16 8 */
> uint32 xcnt; /*24 4 */
> uint8  xhlog;/*28 1 */
> 
> /* XXX 3 bytes hole, try to pack */
> 
> TransactionId *subxip;   /*32 8 */
> int32  subxcnt;  /*40 4 */
> uint8  subxhlog; /*44 1 */
> bool   suboverflowed;/*45 1 */
> bool   takenDuringRecovery;  /*46 1 */
> bool   copied;   /*47 1 */
> CommandId   

Re: Rethinking -L switch handling and construction of LDFLAGS

2018-04-01 Thread Tom Lane
Andres Freund  writes:
> On 2018-04-01 13:38:15 -0400, Tom Lane wrote:
>> I don't have a concrete patch to propose yet, but the design idea
>> I have in mind is to split LDFLAGS into two or more parts, so that
>> -L switches for the build tree are supposed to be put in the first
>> part and external -L switches in the second.

> I'm not sure I like doing this in Makefile.global. We've various files
> that extend LDFLAGS in other places, and that's going to be hard if it's
> already mushed together again. We end up re-building it from parts in
> those files too.

Yeah, one of the things I'd like to fix is that some of the makefiles,
eg psql's, do

override LDFLAGS := -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport) 
$(LDFLAGS)

which goes *directly* against this commandment in Makefile.global:

# We want -L for libpgport.a and libpgcommon.a to be first in LDFLAGS.  We
# also need LDFLAGS to be a "recursively expanded" variable, else adjustments
# to rpathdir don't work right.  So we must NOT do LDFLAGS := something,
# meaning this has to be done first and elsewhere we must only do LDFLAGS +=
# something.

It's a bit surprising that rpath works at all for these makefiles.
But with what I'm imagining here, I think we could replace that with

LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)

and thereby preserve the recursively-expanded virginity of both
LDFLAGS_INTERNAL and LDFLAGS.  But I've not tried to test anything yet.

> Why don't we change the link commands to reference LDFLAGS_INTERNAL
> explicitly?  That seems like it'd be cleaner.

I'm hesitant to do that because LDFLAGS is a name known to make's
default rules, and I don't want to bet that we're not relying on
those default rules anywhere.  I also disagree with the idea that using
"$(LDFLAGS_INTERNAL) $(LDFLAGS)" in every link command we have is better
or less error-prone than just "$(LDFLAGS)".  Especially not if we end up
with more than two parts.

regards, tom lane



Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2018-04-01 Thread Yura Sokolov
23.03.2018 17:59, Amit Kapila пишет:
> On Sat, Mar 10, 2018 at 7:41 AM, Yura Sokolov  wrote:
>> 08.03.2018 03:42, Tomas Vondra пишет:
>>> One reason against building the hash table in GetSnapshotData is that
>>> we'd build it even when the snapshot is never queried. Or when it is
>>> queried, but we only need to check xmin/xmax.
>>
>> Thank you for analyze, Tomas.
>>
>> Stephen is right about bug in snapmgr.c
>> Attached version fixes bug, and also simplifies XidInXip a bit.
>>
> 
> @@ -2167,8 +2175,7 @@ RestoreSnapshot(char *start_address)
>   /* Copy SubXIDs, if present. */
>   if (serialized_snapshot.subxcnt > 0)
>   {
> - snapshot->subxip = ((TransactionId *) (snapshot + 1)) +
> - serialized_snapshot.xcnt;
> + snapshot->subxip = ((TransactionId *) (snapshot + 1)) + xcnt;
>   memcpy(snapshot->subxip, serialized_xids + serialized_snapshot.xcnt,
> serialized_snapshot.subxcnt * sizeof(TransactionId));
>   }
> 
> 
> It is not clear why you want to change this in RestoreSnapshot when
> nothing related is changed in SerializeSnapshot?  Can you please add
> some comments to clarify it?
> 

I didn't change serialized format. Therefore is no need to change
SerializeSnapshot.
But in-memory representation were changed, so RestoreSnapshot is changed.

With regards,
Sokolov Yura.





Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-01 Thread Thomas Munro
On Fri, Mar 30, 2018 at 10:18 AM, Thomas Munro
 wrote:
> ... on Linux only.

Apparently I was too optimistic.  I had looked only at FreeBSD, which
keeps the page around and dirties it so we can retry, but the other
BSDs apparently don't (FreeBSD changed that in 1999).  From what I can
tell from the sources below, we have:

Linux, OpenBSD, NetBSD: retrying fsync() after EIO lies
FreeBSD, Illumos: retrying fsync() after EIO tells the truth

Maybe my drive-by assessment of those kernel routines is wrong and
someone will correct me, but I'm starting to think you might be better
to assume the worst on all systems.  Perhaps a GUC that defaults to
panicking, so that users on those rare OSes could turn that off?  Even
then I'm not sure if the failure mode will be that great anyway or if
it's worth having two behaviours.  Thoughts?

http://mail-index.netbsd.org/netbsd-users/2018/03/30/msg020576.html
https://github.com/NetBSD/src/blob/trunk/sys/kern/vfs_bio.c#L1059
https://github.com/openbsd/src/blob/master/sys/kern/vfs_bio.c#L867
https://github.com/freebsd/freebsd/blob/master/sys/kern/vfs_bio.c#L2631
https://github.com/freebsd/freebsd/commit/e4e8fec98ae986357cdc208b04557dba55a59266
https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/os/bio.c#L441

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Planning counters in pg_stat_statements

2018-04-01 Thread legrand legrand
I forgot to recompile core ...
now only utility statements (with 0 plans) seems truncated.

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: bulk typos

2018-04-01 Thread Tom Lane
=?UTF-8?Q?F=C3=A9lix_GERZAGUET?=  writes:
> On Sat, Mar 31, 2018 at 12:56 PM, Justin Pryzby 
> wrote:
>> I needed another distraction so bulk-checked for typos, limited to
>> comments in *.[ch].

> I think you introduced another one while changing "explcitly" to
> "expilcitly" instead of "explicitly" :-)

LGTM for the most part, except for this change:

- * Therefore, we do not whinge about no-such-process.
+ * Therefore, we do not whine about no-such-process.

I think that spelling is intentional, so I didn't change it.
Pushed the rest, with Felix's correction.

regards, tom lane



Re: bulk typos

2018-04-01 Thread Andres Freund
Hi,

On 2018-03-31 05:56:40 -0500, Justin Pryzby wrote:
> --- a/src/backend/jit/llvm/llvmjit_expr.c
> +++ b/src/backend/jit/llvm/llvmjit_expr.c
> @@ -1768,7 +1768,7 @@ llvm_compile_expr(ExprState *state)
>   
> b_compare_result,
>   b_null);
>  
> - /* build block analying the !NULL 
> comparator result */
> + /* build block analyzing the !NULL 
> comparator result */
>   LLVMPositionBuilderAtEnd(b, 
> b_compare_result);

Hah. I kinda like the previous way too ;)

Greetings,

Andres Freund




Re: Rethinking -L switch handling and construction of LDFLAGS

2018-04-01 Thread Andres Freund
Hi,

On 2018-04-01 13:55:05 -0400, Tom Lane wrote:
> > Why don't we change the link commands to reference LDFLAGS_INTERNAL
> > explicitly?  That seems like it'd be cleaner.
> 
> I'm hesitant to do that because LDFLAGS is a name known to make's
> default rules, and I don't want to bet that we're not relying on
> those default rules anywhere.

FWIW, postgres builds cleanly with -r -R in MAKELAGS.

Greetings,

Andres Freund



Re: Rethinking -L switch handling and construction of LDFLAGS

2018-04-01 Thread Tom Lane
Andres Freund  writes:
> On 2018-04-01 13:55:05 -0400, Tom Lane wrote:
>> I'm hesitant to do that because LDFLAGS is a name known to make's
>> default rules, and I don't want to bet that we're not relying on
>> those default rules anywhere.

> FWIW, postgres builds cleanly with -r -R in MAKELAGS.

That's pretty hard to believe.  Why would we bother to override every
default rule?  Even if it's true today, I would not accept it as project
policy that we must do so.  Perhaps more to the point, I would strongly
object to any design in which the standard Make variables don't mean
what the default rules expect them to mean.  That's just a recipe for
confusing people and creating hard-to-spot bugs.

regards, tom lane



Re: new function for tsquery creartion

2018-04-01 Thread Dmitry Ivanov

Hi Aleksandr,

I agree with Aleksander about silencing all errors in 
websearch_to_tsquery().


In the attachment is a revised patch with the attempt to introduce an
ability to ignore syntax errors in gettoken_tsvector().


Thanks for the further improvements! Yes, you're both right, the API has 
to be consistent. Unfortunately, I had to make some adjustments 
according to Oleg Bartunov's review. Here's a change log:


1. &, | and (), <-> are no longer considered operators in web search 
mode.
2. I've stumbled upon a bug: web search used to transform "pg_class" 
into 'pg <-> class', which is no longer the case.
3. I changed the behavior of gettoken_tsvector() as soon as I had heard 
from Aleksander Alekseev, so I decided to use my implementation in this 
revision of the patch. This is a good subject for discussion, though. 
Feel free to share your opinion.

4. As suggested by Theodor, I've replaced some bool args with bit flags.


The name of enum ts_parsestate looks more like a name of the function
than a name of a type.
In my version, it renamed to QueryParserState, but you can fix it if 
I'm wrong.


True, but gettoken_query() returns ts_tokentype, so I decided to use 
this naming scheme.


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index ea5947a3a8..6055fb6b4e 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -490,7 +490,7 @@ to_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(&data),
-		  false);
+		  0);
 
 	PG_RETURN_TSQUERY(query);
 }
@@ -520,7 +520,7 @@ plainto_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(&data),
-		  true);
+		  P_TSQ_PLAIN);
 
 	PG_RETURN_POINTER(query);
 }
@@ -551,7 +551,7 @@ phraseto_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(&data),
-		  true);
+		  P_TSQ_PLAIN);
 
 	PG_RETURN_TSQUERY(query);
 }
@@ -567,3 +567,35 @@ phraseto_tsquery(PG_FUNCTION_ARGS)
 		ObjectIdGetDatum(cfgId),
 		PointerGetDatum(in)));
 }
+
+Datum
+websearch_to_tsquery_byid(PG_FUNCTION_ARGS)
+{
+	text	   *in = PG_GETARG_TEXT_PP(1);
+	MorphOpaque	data;
+	TSQuery		query = NULL;
+
+	data.cfg_id = PG_GETARG_OID(0);
+
+	data.qoperator = OP_AND;
+
+	query = parse_tsquery(text_to_cstring(in),
+		  pushval_morph,
+		  PointerGetDatum(&data),
+		  P_TSQ_WEB);
+
+	PG_RETURN_TSQUERY(query);
+}
+
+Datum
+websearch_to_tsquery(PG_FUNCTION_ARGS)
+{
+	text	   *in = PG_GETARG_TEXT_PP(0);
+	Oid			cfgId;
+
+	cfgId = getTSCurrentConfig(true);
+	PG_RETURN_DATUM(DirectFunctionCall2(websearch_to_tsquery_byid,
+		ObjectIdGetDatum(cfgId),
+		PointerGetDatum(in)));
+
+}
diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index 1ccbf79030..695bdb89e9 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -32,14 +32,27 @@ const int	tsearch_op_priority[OP_COUNT] =
 	3			/* OP_PHRASE */
 };
 
+/*
+ * parser's states
+ */
+typedef enum
+{
+	WAITOPERAND = 1,
+	WAITOPERATOR = 2,
+	WAITFIRSTOPERAND = 3,
+	WAITSINGLEOPERAND = 4
+} ts_parserstate;
+
 struct TSQueryParserStateData
 {
 	/* State for gettoken_query */
 	char	   *buffer;			/* entire string we are scanning */
 	char	   *buf;			/* current scan point */
-	int			state;
 	int			count;			/* nesting count, incremented by (,
  * decremented by ) */
+	bool		in_quotes;		/* phrase in quotes "" */
+	bool		is_web;			/* is it a web search? */
+	ts_parserstate state;
 
 	/* polish (prefix) notation in list, filled in by push* functions */
 	List	   *polstr;
@@ -57,12 +70,6 @@ struct TSQueryParserStateData
 	TSVectorParseState valstate;
 };
 
-/* parser's states */
-#define WAITOPERAND 1
-#define WAITOPERATOR	2
-#define WAITFIRSTOPERAND 3
-#define WAITSINGLEOPERAND 4
-
 /*
  * subroutine to parse the modifiers (weight and prefix flag currently)
  * part, like ':AB*' of a query.
@@ -197,6 +204,26 @@ err:
 	return buf;
 }
 
+/*
+ * Parse OR operator used in websearch_to_tsquery().
+ */
+static bool
+parse_or_operator(TSQueryParserState state)
+{
+	char *buf = state->buf;
+
+	if (state->in_quotes)
+		return false;
+
+	return (t_iseq(&buf[0], 'o') || t_iseq(&buf[0], 'O')) &&
+		   (t_iseq(&buf[1], 'r') || t_iseq(&buf[1], 'R')) &&
+		   (buf[2] != '\0' &&
+!t_iseq(&buf[2], '-') &&
+!t_iseq(&buf[2], '_') &&
+!t_isalpha(&buf[2]) &&
+!t_isdigit(&buf[2]));
+}
+
 /*
  * token types for parsing
  */
@@ -219,10 +246,12 @@ typedef enum
  *
  */
 static ts_tokentype
-gettoken_query(TSQueryParserState state,
-			   int8 *operator,
-			   int *lenval, char **strval, int16 *weight, bool *prefix)
+gettoken_query(TSQueryParserState state, int8 *operator,
+			   int *len

Re: [PATCH] Logical decoding of TRUNCATE

2018-04-01 Thread Andres Freund
On 2018-01-25 14:21:15 +0100, Marco Nenciarini wrote:
> + if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA)
> + {
> + 
> + /*
> +  * Check foreign key references.  In CASCADE mode, this should 
> be
> +  * unnecessary since we just pulled in all the references; but 
> as a
> +  * cross-check, do it anyway if in an Assert-enabled build.
> +  */
>   #ifdef USE_ASSERT_CHECKING
>   heap_truncate_check_FKs(rels, false);
> + #else
> + if (stmt->behavior == DROP_RESTRICT)
> + heap_truncate_check_FKs(rels, false);
>   #endif
> + }

That *can't* be right.

> + case REORDER_BUFFER_CHANGE_TRUNCATE:
> + appendStringInfoString(ctx->out, " TRUNCATE:");
> + 
> + if (change->data.truncate_msg.restart_seqs
> + || change->data.truncate_msg.cascade)
> + {
> + if (change->data.truncate_msg.restart_seqs)
> + appendStringInfo(ctx->out, " 
> restart_seqs");
> + if (change->data.truncate_msg.cascade)
> + appendStringInfo(ctx->out, " cascade");
> + }
> + else
> + appendStringInfoString(ctx->out, " (no-flags)");
> + break;
>   default:
>   Assert(false);
>   }

I know this has been discussed in the thread already, but it really
strikes me as wrong to basically do some mini DDL replication feature
via per-command WAL records.
> ***
> *** 111,116  CREATE PUBLICATION  class="parameter">name
> --- 111,121 
> and so the default value for this option is
> 'insert, update, delete'.
>
> +  
> +TRUNCATE is treated as a form of
> +DELETE for the purpose of deciding whether
> +to publish, or not.
> +  
>   
>  
> 

Why is this a good idea?


Hm, it seems logicaldecoding.sgml hasn't been updated?

> + void
> + ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
> + DropBehavior behavior, 
> bool restart_seqs)
> + {
> + List   *rels = list_copy(explicit_rels);

Why is this copied?


> +  * Write a WAL record to allow this set of actions to be logically 
> decoded.
> +  * We could optimize this away when !RelationIsLogicallyLogged(rel)
> +  * but that doesn't save much space or time.

What you're saying isn't that you're not logging anything, but that
you're allocating the header regardless? Because this certainly sounds
like you unconditionally log a WAL record.

> +  * Assemble an array of relids, then an array of seqrelids so we can 
> write
> +  * a single WAL record for the whole action.
> +  */
> + logrelids = palloc(maxrelids * sizeof(Oid));
> + foreach (cell, relids_logged)
> + {
> + nrelids++;
> + if (nrelids > maxrelids)
> + {
> + maxrelids *= 2;
> + logrelids = repalloc(logrelids, maxrelids * 
> sizeof(Oid));
> + }
> + logrelids[nrelids - 1] = lfirst_oid(cell);
> + }
> + 
> + foreach (cell, seq_relids_logged)
> + {
> + nseqrelids++;
> + if ((nrelids + nseqrelids) > maxrelids)
> + {
> + maxrelids *= 2;
> + logrelids = repalloc(logrelids, maxrelids * 
> sizeof(Oid));
> + }
> + logrelids[nrelids + nseqrelids - 1] = lfirst_oid(cell);
> + }

I'm confused. Why do we need the resizing here, when we know the max
upfront?

> + /*
> +  * For truncate we list all truncated relids in an array, followed by all
> +  * sequence relids that need to be restarted, if any.
> +  * All rels are always within the same database, so we just list dbid once.
> +  */
> + typedef struct xl_heap_truncate
> + {
> + Oid dbId;
> + uint32  nrelids;
> + uint32  nseqrelids;
> + uint8   flags;
> + Oid relids[FLEXIBLE_ARRAY_MEMBER];
> + } xl_heap_truncate;

Given that the space is used anyway due to padding, I'd just make flags
32bit.

Greetings,

Andres Freund



Re: Optimizing nested ConvertRowtypeExpr execution

2018-04-01 Thread Andres Freund
Hi,

On 2018-02-26 17:20:05 +0530, Ashutosh Bapat wrote:
> In a multi-level partitioned table, a parent whole-row reference gets
> translated into nested ConvertRowtypeExpr with child whole-row
> reference as the leaf. During the execution, the child whole-row
> reference gets translated into all all intermediate parents' whole-row
> references, ultimately represented as parent's whole-row reference.
> AFAIU, the intermediate translations are unnecessary. The leaf child
> whole-row can be directly translated into top parent's whole-row
> reference. Here's a WIP patch which does that by eliminating
> intermediate ConvertRowtypeExprs during ExecInitExprRec().

Why is this done appropriately at ExecInitExpr() time, rather than at
plan time? Seems like eval_const_expressions() would be a bit more
appropriate (being badly named aside...)?

- Andres



Re: bulk typos

2018-04-01 Thread Gavin Flower

On 02/04/18 07:03, Tom Lane wrote:

=?UTF-8?Q?F=C3=A9lix_GERZAGUET?=  writes:

On Sat, Mar 31, 2018 at 12:56 PM, Justin Pryzby 
wrote:

I needed another distraction so bulk-checked for typos, limited to
comments in *.[ch].

I think you introduced another one while changing "explcitly" to
"expilcitly" instead of "explicitly" :-)

LGTM for the most part, except for this change:

- * Therefore, we do not whinge about no-such-process.
+ * Therefore, we do not whine about no-such-process.

I think that spelling is intentional, so I didn't change it.
Pushed the rest, with Felix's correction.

regards, tom lane


Me thinks some people are Whinging far too much!

see: https://en.oxforddictionaries.com/definition/whinge


Cheers,
Gavin




Re: [HACKERS] Insert values() per-statement overhead

2018-04-01 Thread Vladimir Sitnikov
Andres>I think the biggestoverhead here is that the executor startup
includes
Andres>too many indirect (linked lists) datastructured, that allocated each
Andres>round

The case is very common: batch inserts are popular in Java, and ORMs use
batch API automatically.
However, there's high per-backend-message overhead, and that overhead is
very noticeable.

What is the approach to handle this?

Folding multiple DML statements into one with a help of CTE does not work
either (see https://github.com/pgjdbc/pgjdbc/issues/1165 ):

CTE doc>Trying to update the same row twice in a single statement is not
supported. Only one of the modifications takes place, but it is not easy
(and sometimes not possible) to reliably predict which one

Vladimir


Re: Diagonal storage model

2018-04-01 Thread David Fetter
On Sun, Apr 01, 2018 at 03:48:07PM +0300, Konstantin Knizhnik wrote:
> Hi hackers,
> 
> Vertical (columnar) storage mode is most optimal for analytic and this is why 
> it is widely used in databases oriented on OLAP, such as Vertica, 
> HyPer,KDB,...
> In Postgres we have cstore extension which is not able to provide all 
> benefits of vertical model because of lack of support of vector operations in 
> executor.
> Situation can be changed if we will have pluggable storage API with support 
> of vectorized execution.
> 
> But veritcal model is not so good for updates and load of data (because data 
> is mostly imported in horizontal format).
> This is why in most of the existed systems data is presentin both formats (at 
> least for some time).
> 
> I want to announce new model, "diagonal storage" which combines benefits of 
> both approaches.
> The idea is very simple: we first store column 1 of first record, then column 
> 2 of second record, ... and so on until we reach the last column.
> After it we store second column of first record, third column of the second 
> record,...
> 
> Profiling of TPC-H queries shows that mode of the time of query exectution 
> (about 17%) is spent is heap_deform_tuple.
> New format will allow to significantly reduce time of heap deforming, because 
> there is just of column if the particular record in each tile.
> Moreover over we can perform deforming of many tuples in parallel, which ids 
> especially efficient at quantum computers.
> 
> Attach please find patch with first prototype implementation. It provides 
> about 3.14 times improvement of performance at most of TPC-H queries.

You're sure it's not 3.14159265358979323...?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: WIP: Covering + unique indexes.

2018-04-01 Thread Peter Geoghegan
On Sun, Apr 1, 2018 at 10:09 AM, Alexander Korotkov
 wrote:
>> So? GIN doesn't have the same legacy at all. The GIN posting lists
>> *don't* have regular heap TID pointers at all. They started out
>> without them, and still don't have them.
>
>
> Yes, GIN never stored heap TID pointers in t_tid of index tuple.  But GIN
> assumes that heap TID pointer has at most 11 significant bits during
> posting list encoding.

I think that we should avoid assuming things, unless the cost of
representing them is too high, which I don't think applies here. The
more defensive general purpose code can be, the better.

I will admit to being paranoid here. But experience suggests that
paranoia is a good thing, if it isn't too expensive. Look at the
thread on XFS + fsync() for an example of things being wrong for a
very long time without anyone realizing, and despite the best efforts
of many smart people. As far as anyone can tell, PostgreSQL on Linux +
XFS is kinda, sorta broken, and has been forever. XFS was mature
before ext4 was, and is a popular choice, and yet this is the first
we're hearing about it being kind of broken. After many years.

Look at this check that made it into my amcheck patch, that was
committed yesterday:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/amcheck/verify_nbtree.c;h=a15fe21933b9a5b8baefedaa8f38e517d6c91877;hb=7f563c09f8901f6acd72cb8fba7b1bd3cf3aca8e#l745

As it says, nbtree is surprisingly tolerant of corrupt lp_len fields.
You may find it an interesting exercise to use pg_hexedit to corrupt
many lp_len fields in an index page. What's really interesting about
this is that it doesn't appear to break anything at all! We don't get
the length from there in most cases, so reads won't break at all. I
see that we use ItemIdGetLength() in a couple of rare cases (though
even those could be avoided) during a page split. You'd be lucky to
notice a problem if lp_len fields were regularly corrupt. When you
notice, it will probably have already caused big problems.

On a similar note, I've noticed that many of my experimental B-Tree
patches (that I never find time to finish) tend to almost work quite
early on, sometimes without my really understanding why. The whole L&Y
approach of recovering from problems that were detected (detecting
concurrent page splits, and moving right) makes the code *very*
forgiving. I hope that I don't sound trite, but everyone should try to
be modest about what they *don't* know when writing complex system
software with concurrency. It is not a platitude, even though it
probably seems that way. A tiny mistake can have big consequences, so
it's very important that we have a way to easily detect them after the
fact.

> I don't think we should use assertions, because they are typically disabled
> on
> production PostgreSQL builds.  But we can have some explicit check in some
> common path.  In the attached patch I've such check to _bt_compare().
> Probably,
> together with amcheck, that would be sufficient.

Good idea -- a "can't happen" check in _bt_compare seems better, which
I see here:

> diff --git a/src/backend/access/nbtree/nbtsearch.c 
> b/src/backend/access/nbtree/nbtsearch.c
> index 51dca64e13..fcf9832147 100644
> --- a/src/backend/access/nbtree/nbtsearch.c
> +++ b/src/backend/access/nbtree/nbtsearch.c
> @@ -443,6 +443,17 @@ _bt_compare(Relation rel,
> if (!P_ISLEAF(opaque) && offnum == P_FIRSTDATAKEY(opaque))
> return 1;
>
> +   /*
> +* Check tuple has correct number of attributes.
> +*/
> +   if (!_bt_check_natts(rel, page, offnum))
> +   {
> +   ereport(ERROR,
> +   (errcode(ERRCODE_INTERNAL_ERROR),
> +errmsg("tuple has wrong number of attributes in index 
> \"%s\"",
> +   RelationGetRelationName(rel;
> +   }
> +
> itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));

It seems like it might be a good idea to make this accept an
IndexTuple, though, to possibly save some work. Also, perhaps this
should be an unlikely() condition, if only because it makes the intent
clearer (might actually matter in a tight loop like this too, though).

Do you store an attribute number in the "minus infinity" item (the
leftmost one of internal pages)? I guess that that should be zero,
because it's totally truncated.

> OK, thank for the explanation.  I agree that check of offset is redundant
> here.

Cool.

>> The fact is that that information can go out of date almost
>> immediately, whereas high keys usually last forever. The only reason
>> that there is a heap TID in the high key is because we'd have to add
>> special code to remove it; not because it has any real value. I find
>> it very hard to imagine it being used in a forensic situation. If you
>> actually wanted to do this, the key itself is probably enough -- you
>> probably wouldn't need the TID.
>
>
> I don't know,  When I wrote my own implementation of B-tree and debug
> it, I found saving hikeys "as is" to 

Re: Diagonal storage model

2018-04-01 Thread Marko Tiikkaja
On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> I want to announce new model, "diagonal storage" which combines benefits
> of both approaches.
> The idea is very simple: we first store column 1 of first record, then
> column 2 of second record, ... and so on until we reach the last column.
> After it we store second column of first record, third column of the
> second record,...
>

I'm a little worried about the fact that even with this model we're still
limited to only two dimensions.  That's bound to cause problems sooner or
later.


.m


Re: [HACKERS] path toward faster partition pruning

2018-04-01 Thread Amit Langote
On 2018/03/30 22:41, David Rowley wrote:
> On 31 March 2018 at 02:00, David Rowley  wrote:
>> On 31 March 2018 at 01:18, David Rowley  wrote:
>>> I've noticed that there are no outfuncs or readfuncs for all the new
>>> Step types you've added.
>>>
>>> Also, the copy func does not properly copy the step_id in the base
>>> node type. This will remain at 0 after a copyObject()
>>
>> Attaching it as it may save you some time from doing it yourself.
>> Please check it though.
> 
> The attached might be slightly easier to apply. The previous version
> was based on top of some other changes I'd been making.

Thanks David.  I have merged this.

Regards,
Amit




Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

2018-04-01 Thread Jeevan Ladhe
Hi,

I noticed that there were no tests covering this case causing 4dba331cb3
> to not notice this failure in the first place.  I updated your patch to
> add a few tests.  Also, I revised the comment changed by your patch a bit.
>

1. A minor typo:

+-- check that violating rows are correctly reported when attching as the
s/attching/attaching


2. I think following part of the test is already covered:

+-- trying to add a partition for 2 should fail because the default
+-- partition contains a row that would violate its new constraint which
+-- prevents rows containing 2
+create table defpart_attach_test2 partition of defpart_attach_test for
values in (2);
+ERROR:  updated partition constraint for default partition
"defpart_attach_test_d" would be violated by some row
+drop table defpart_attach_test;

IIUC, the test in create_table covers the same scenario as of above:

-- check default partition overlap
INSERT INTO list_parted2 VALUES('X');
CREATE TABLE fail_part PARTITION OF list_parted2 FOR VALUES IN ('W', 'X',
'Y');
ERROR:  updated partition constraint for default partition
"list_parted2_def" would be violated by some row

Regards,
Jeevan Ladhe


Re: Add default role 'pg_access_server_files'

2018-04-01 Thread Michael Paquier
On Sun, Apr 01, 2018 at 09:39:02AM -0400, Stephen Frost wrote:
> Thanks for checking.  Attached is an updated version which also includes
> the changes for adminpack, done in a similar manner to how pgstattuple
> was updated, as discussed.  Regression tests updated and extended a bit,
> doc updates also included.
> 
> If you get a chance to take a look, that'd be great.  I'll do my own
> review of it again also after stepping away for a day or so.

I have spotted some issues mainly in patch 3.

I am not sure what has happened to your editor, but git diff --check is
throwing a dozen of warnings coming from adminpack.c.

c0cbe00 has stolen the OIDs your patch is using for the new roles, so
patch 2 needs a refresh.

@@ -68,6 +77,15 @@ convert_and_check_filename(text *arg, bool logAllowed)
[...]
+   /*
+* Members of the 'pg_read_server_files' role are allowed to access any
+* files on the server as the PG user, so no need to do any further checks
+* here.
+*/
+   if (is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_SERVER_FILES))
+   return filename;
So...  If a user has loaded adminpack v1.0 with Postgres v11, then
convert_and_check_filename would actually be able to read paths out of
the data folder for a user within pg_read_server_files, while with
Postgres v10 then only paths within the data folder were allowed.  And
that7s actually fine because a superuser check happens before entering
in this code path.

 pg_file_rename(PG_FUNCTION_ARGS)
+{
+   text   *file1;
+   text   *file2;
+   text   *file3;
+   boolresult;
+
+   if (PG_ARGISNULL(0) || PG_ARGISNULL(1))
+   PG_RETURN_NULL();
+
+   file1 = PG_GETARG_TEXT_PP(0);
+   file2 = PG_GETARG_TEXT_PP(1);
+
+   if (PG_ARGISNULL(2))
+   file3 = NULL;
+   else
+   file3 = PG_GETARG_TEXT_PP(2);
+
+   requireSuperuser();
Here requireSuperuser() should be called before looking at the
argument values.

No refactoring for pg_file_unlink and its v1.1?

The argument checks are exactly the same for +pg_file_rename and
pg_file_rename_v1_1.  Why about just passing fcinfo around and simplify
the patch?

+CREATE OR REPLACE FUNCTION pg_catalog.pg_file_rename(text, text)
+RETURNS bool
+AS 'SELECT pg_catalog.pg_file_rename($1, $2, NULL::pg_catalog.text);'
+LANGUAGE SQL VOLATILE STRICT;
You forgot a REVOKE clause for pg_file_rename(text, text).

In adminpack.c, convert_and_check_filename is always called with false
as second argument.  Why not dropping it and use the version in
genfile.c instead?  As far as I can see, both functions are the same.

pg_read_file and pg_read_file_v2 could be refactored as well with an
internal routine.  Having to support v1 and v2 functions in the backend
code is not elegant.  Would it actually work to keep the v1 function in
adminpack.c and the v2 function in genfile.c even if adminpack 1.0 is
loaded?  This way you keep in core only one function.  What matters is
that the function name matches, right?

+int64 pg_file_write_internal(text *file, text *data, bool replace);
+bool pg_file_rename_internal(text *file1, text *file2, text *file3);
+Datum pg_logdir_ls_internal(FunctionCallInfo fcinfo)
Those three functions should be static.
--
Michael


signature.asc
Description: PGP signature


Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

2018-04-01 Thread Amit Langote
Thanks Jeevan for reviewing.

On 2018/04/02 13:10, Jeevan Ladhe wrote:
> Hi,
> 
> I noticed that there were no tests covering this case causing 4dba331cb3
>> to not notice this failure in the first place.  I updated your patch to
>> add a few tests.  Also, I revised the comment changed by your patch a bit.
>>
> 
> 1. A minor typo:
> 
> +-- check that violating rows are correctly reported when attching as the
> s/attching/attaching

Oops, fixed.

> 2. I think following part of the test is already covered:
> 
> +-- trying to add a partition for 2 should fail because the default
> +-- partition contains a row that would violate its new constraint which
> +-- prevents rows containing 2
> +create table defpart_attach_test2 partition of defpart_attach_test for
> values in (2);
> +ERROR:  updated partition constraint for default partition
> "defpart_attach_test_d" would be violated by some row
> +drop table defpart_attach_test;
> 
> IIUC, the test in create_table covers the same scenario as of above:
> 
> -- check default partition overlap
> INSERT INTO list_parted2 VALUES('X');
> CREATE TABLE fail_part PARTITION OF list_parted2 FOR VALUES IN ('W', 'X',
> 'Y');
> ERROR:  updated partition constraint for default partition
> "list_parted2_def" would be violated by some row

Sorry, didn't realize that it was already covered in create_tabel.sql.
Removed this one.

Attached updated patch.  Adding this to the v11 open items list.

Thanks,
Amit
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c8da82217d..9d474ad5e2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14114,11 +14114,9 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
}
 
/*
-* Check whether default partition has a row that would fit the 
partition
-* being attached.
+* Check if the default partition contains a row that would belong in 
the
+* partition being attached.
 */
-   defaultPartOid =
-   get_default_oid_from_partdesc(RelationGetPartitionDesc(rel));
if (OidIsValid(defaultPartOid))
{
Relationdefaultrel;
diff --git a/src/test/regress/expected/alter_table.out 
b/src/test/regress/expected/alter_table.out
index a80d16a394..4712fab540 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3891,3 +3891,19 @@ ALTER TABLE attmp ALTER COLUMN i RESET 
(n_distinct_inherited);
 ANALYZE attmp;
 DROP TABLE attmp;
 DROP USER regress_alter_table_user1;
+-- check that violating rows are correctly reported when attaching as the
+-- default partition
+create table defpart_attach_test (a int) partition by list (a);
+create table defpart_attach_test1 partition of defpart_attach_test for values 
in (1);
+create table defpart_attach_test_d (like defpart_attach_test);
+insert into defpart_attach_test_d values (1), (2);
+-- error because its constraint as the default partition would be violated
+-- by the row containing 1
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+ERROR:  partition constraint is violated by some row
+delete from defpart_attach_test_d where a = 1;
+alter table defpart_attach_test_d add check (a > 1);
+-- should be attached successfully and without needing to be scanned
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+INFO:  partition constraint for table "defpart_attach_test_d" is implied by 
existing constraints
+drop table defpart_attach_test;
diff --git a/src/test/regress/sql/alter_table.sql 
b/src/test/regress/sql/alter_table.sql
index 8198d1e930..c557b050af 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2565,3 +2565,21 @@ ANALYZE attmp;
 DROP TABLE attmp;
 
 DROP USER regress_alter_table_user1;
+
+-- check that violating rows are correctly reported when attaching as the
+-- default partition
+create table defpart_attach_test (a int) partition by list (a);
+create table defpart_attach_test1 partition of defpart_attach_test for values 
in (1);
+create table defpart_attach_test_d (like defpart_attach_test);
+insert into defpart_attach_test_d values (1), (2);
+
+-- error because its constraint as the default partition would be violated
+-- by the row containing 1
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+delete from defpart_attach_test_d where a = 1;
+alter table defpart_attach_test_d add check (a > 1);
+
+-- should be attached successfully and without needing to be scanned
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+
+drop table defpart_attach_test;


Re: PATCH: Configurable file mode mask

2018-04-01 Thread Michael Paquier
On Fri, Mar 30, 2018 at 01:27:11PM -0400, David Steele wrote:
> I have replaced data_directory_group_access with data_directory_mode.

That looks way better.  Thanks for considering it.

> I decided this made sense to do.  It was only a few lines in initdb.c
> using a very well established pattern.  It would be surprising if log
> files did not follow the mode of the rest of PGDATA after initdb -g,
> even if it is standard practice to relocate them.

Okay for me.

@@ -285,7 +286,7 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size 
request_size,
 * returning.
 */
flags = O_RDWR | (op == DSM_OP_CREATE ? O_CREAT | O_EXCL : 0);
-   if ((fd = shm_open(name, flags, 0600)) == -1)
+   if ((fd = shm_open(name, flags, PG_FILE_MODE_DEFAULT)) == -1)

Hm.  Sorry for the extra noise.  This one is actually incorrect as
shm_open will use the path specified by glibc, which is out of
pg_dynshmem so it does not matter for base backups, and we can keep
0600.  pg_dymshmem is used for mmap, still this would map with the umask
setup by the postmaster as OpenTransientFile & friends are used.  sysv
uses IPCProtection but there is no need to care about it as well.  No
need for a comment perhaps..

pg_basebackup.c creates recovery.conf with 0600 all the time ;)

Except for those two nits, I am fine to pass down to a committer patch
number 1.  This has value in my opinion per the refactoring it does and
the umask handling of pg_rewind and pg_resetwal added.

Now for patch 2...

+  
+If the data directory allows group read access then certificate files may
+need to be located outside of the data directory in order to conform to the
+security requirements outlined above.  Generally, group access is enabled
+to allow an unprivileged user to backup the database, and in that case the
+backup software will not be able to read the certificate files and will
+likely error.
+  

signature.asc
Description: PGP signature


Re: [PATCH] Logical decoding of TRUNCATE

2018-04-01 Thread Simon Riggs
On 1 April 2018 at 21:01, Andres Freund  wrote:

>> ***
>> *** 111,116  CREATE PUBLICATION > class="parameter">name
>> --- 111,121 
>> and so the default value for this option is
>> 'insert, update, delete'.
>>
>> +  
>> +TRUNCATE is treated as a form of
>> +DELETE for the purpose of deciding whether
>> +to publish, or not.
>> +  
>>   
>>  
>> 
>
> Why is this a good idea?

TRUNCATE removes rows, just as DELETE does, so anybody that wants to
publish the removal of rows will be interested in this.

This avoids unnecessary overcomplication of the existing interface.

>> +  * Write a WAL record to allow this set of actions to be logically 
>> decoded.
>> +  * We could optimize this away when !RelationIsLogicallyLogged(rel)
>> +  * but that doesn't save much space or time.
>
> What you're saying isn't that you're not logging anything, but that
> you're allocating the header regardless? Because this certainly sounds
> like you unconditionally log a WAL record.

It says that, yes, my idea - as explained.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: json(b)_to_tsvector with numeric values

2018-04-01 Thread Arthur Zakirov
Hello Dmitry,

Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> Any opinions about this suggestion? Can it be considered as a bug fix and
> included into this release?
>

I think there is no chance to include it into v11. You can add the patch to
the 2018-09 commitfest.


-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


check_ssl_key_file_permissions should be in be-secure-common.c

2018-04-01 Thread Michael Paquier
Peter, Daniel,

The recent commit 8a3d9425 which has introduced SSL passphrase support
has also added be-secure-common.c, which works similarly to
fe-secure-common.c but for the backend.

I was just reading this code area, when I noticed that
check_ssl_key_file_permissions is called by be-secure-openssl.c but the
routine is defined in be-secure.c, causing some back-and-forth between
the two files.

It seems to me that this routine should be logically put into
be-secure-common.c so as future SSL implementations can use it.  This
makes the code more consistent with the frontend refactoring that has
happened in f75a959.  I would not have bothered about this refactoring
if be-secure-openssl.c did not exist yet, but as it does I think that we
should bite the bullet, and do that for v11 so as a good base is in
place for the future.

A patch is attached.

Thanks,
--
Michael
From 907810a8b35362503f99876ad86599d66652582c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 2 Apr 2018 15:36:46 +0900
Subject: [PATCH] Make be-secure-common.c more consistent for future SSL
 implementations

Recent commit 8a3d9425 which introduced an implementation for SSL
passphrases has introduced at the same time be-secure-common.c which is
aimed at including backend-side APIs which can be used across any
OpenSSL implementations.  The logic behind this file is similar to
fe-secure-openssl.c for the frontend-side APIs.

However, this has forgotten to include check_ssl_key_file_permissions
in the move, which causes a double dependency between be-secure.c and
be-secure-openssl.c.

Refactor the code in a more logic way.  This allows puts into light an
API which is usable by future SSL implementations for permissions on SSL
key files.
---
 src/backend/libpq/be-secure-common.c | 74 
 src/backend/libpq/be-secure.c| 69 -
 src/include/libpq/libpq.h|  3 +-
 3 files changed, 76 insertions(+), 70 deletions(-)

diff --git a/src/backend/libpq/be-secure-common.c b/src/backend/libpq/be-secure-common.c
index d1740967f1..46f0b5f4a3 100644
--- a/src/backend/libpq/be-secure-common.c
+++ b/src/backend/libpq/be-secure-common.c
@@ -19,6 +19,9 @@
 
 #include "postgres.h"
 
+#include 
+#include 
+
 #include "libpq/libpq.h"
 #include "storage/fd.h"
 
@@ -118,3 +121,74 @@ error:
 	pfree(command.data);
 	return len;
 }
+
+
+/*
+ * Check permissions for SSL key files.
+ */
+bool
+check_ssl_key_file_permissions(const char *ssl_key_file, bool isServerStart)
+{
+	int			loglevel = isServerStart ? FATAL : LOG;
+	struct stat	buf;
+
+	if (stat(ssl_key_file, &buf) != 0)
+	{
+		ereport(loglevel,
+(errcode_for_file_access(),
+ errmsg("could not access private key file \"%s\": %m",
+		ssl_key_file)));
+		return false;
+	}
+
+	if (!S_ISREG(buf.st_mode))
+	{
+		ereport(loglevel,
+(errcode(ERRCODE_CONFIG_FILE_ERROR),
+ errmsg("private key file \"%s\" is not a regular file",
+		ssl_key_file)));
+		return false;
+	}
+
+	/*
+	 * Refuse to load key files owned by users other than us or root.
+	 *
+	 * XXX surely we can check this on Windows somehow, too.
+	 */
+#if !defined(WIN32) && !defined(__CYGWIN__)
+	if (buf.st_uid != geteuid() && buf.st_uid != 0)
+	{
+		ereport(loglevel,
+(errcode(ERRCODE_CONFIG_FILE_ERROR),
+ errmsg("private key file \"%s\" must be owned by the database user or root",
+		ssl_key_file)));
+		return false;
+	}
+#endif
+
+	/*
+	 * Require no public access to key file. If the file is owned by us,
+	 * require mode 0600 or less. If owned by root, require 0640 or less to
+	 * allow read access through our gid, or a supplementary gid that allows
+	 * to read system-wide certificates.
+	 *
+	 * XXX temporarily suppress check when on Windows, because there may not
+	 * be proper support for Unix-y file permissions.  Need to think of a
+	 * reasonable check to apply on Windows.  (See also the data directory
+	 * permission check in postmaster.c)
+	 */
+#if !defined(WIN32) && !defined(__CYGWIN__)
+	if ((buf.st_uid == geteuid() && buf.st_mode & (S_IRWXG | S_IRWXO)) ||
+		(buf.st_uid == 0 && buf.st_mode & (S_IWGRP | S_IXGRP | S_IRWXO)))
+	{
+		ereport(loglevel,
+(errcode(ERRCODE_CONFIG_FILE_ERROR),
+ errmsg("private key file \"%s\" has group or world access",
+		ssl_key_file),
+ errdetail("File must have permissions u=rw (0600) or less if owned by the database user, or permissions u=rw,g=r (0640) or less if owned by root.")));
+		return false;
+	}
+#endif
+
+	return true;
+}
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index fb1f6b5bbe..edfe2c0751 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -18,12 +18,10 @@
 
 #include "postgres.h"
 
-#include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #ifdef HAVE_NETINET_TCP_H
@@ -320,70 +318,3 @@ secure_raw_write(Port *port, const void *ptr, size_t len)
 
 	return n;
 }
-
-bool
-check_ss