Re: A small note on the portability of cmake

2019-01-20 Thread Andres Freund
Hi,

On 2019-01-19 23:39:37 -0800, Jesse Zhang wrote:
> On Sat, Jan 19, 2019 at 8:50 AM Tom Lane  wrote:
> >
> > Failed miserably.  It turns out cmake has a hard dependency on libuv
> > which (a) has a hard dependency on atomic ops and (b) according to its
> > own docs, doesn't really care about any platforms other than
> > Linux/macOS/Windows and maybe FreeBSD.>
> 
> I beg to disagree with point b above: the libuv project continually
> receives (and accepts) patches fixing bugs specifically for OpenBSD.
> Atomic ops (compare-and-exchange) might be a harder dependency to shed
> for libuv. Does the fallback onto compiler intrinsics
> (__sync_val_compare_and_swap, or on GCC 4.7+,
> __atomic_compare_exchange_n) not work here?

HPPA doesn't hardware instructions for atomic ops other than
test-and-set IIRC.

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

2019-01-20 Thread Surafel Temesgen
On Sun, Jan 20, 2019 at 2:24 AM Tomas Vondra 
wrote:

>
> Pushed, thanks for the patch.
>
> cheers
>
>
>
Thank you

regards
Surafel


Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-20 Thread Fabien COELHO


Hello Tom,


Maybe on OpenBSD pg should switch srandom to srandom_deterministic?


Dunno.  I'm fairly annoyed by their idea that they're smarter than POSIX.


Hmmm. I'm afraid that is not that hard.


However, for most of our uses of srandom, this behavior isn't awful;
it's only pgbench that has an expectation that the platform random()
can be made to behave deterministically.  And TBH I think that's just
an expectation that's going to bite us.

I'd suggest that maybe we should get rid of the use of both random()
and srandom() in pgbench, and go over to letting set_random_seed()
fill the pg_erand48 state directly.  In the integer-seed case you
could use something equivalent to pg_srand48.  (In the other cases
probably you could do better, certainly the strong-random case could
just fill all 6 bytes directly.)  That would get us to a place where
the behavior of --random-seed=N is not only deterministic but
platform-independent, which seems like an improvement.


That's a point. Althought I'm not found of round48, indeed having something 
platform independent for testing makes definite sense.


I'll look into it.


Here is a POC which defines an internal interface for a PRNG, and use it 
within pgbench, with several possible implementations which default to 
rand48.


I must admit that I have a grudge against standard rand48:

 - it is a known poor PRNG which was designed at a time when LCG where
   basically the only low cost PRNG available. Newer designs were very
   recent when the standard was set.
 - it is a LCG, i.e. its low bits cycle quickly, so should not be used.
 - so the 48 bit state size is relevant for generating 32 bits ints
   and floats.
 - however it eis used to generate more bits...
 - the double function uses all 48 bits, whereas 52 need to be filled...
 - and it is used to generate integers, which means that for large range
   some values are inaccessible.
 - 3 * 16 bits integers state looks silly on 32/64 bit architectures.
 - ...

Given that postgres needs doubles (52 bits mantissa) and possibly 64 bits 
integers, IMO the internal state should be 64 bits as a bare minimum, 
which anyway is also the minimal bite on 64 bit architectures, which is 
what is encoutered in practice.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index b5c75ce1c6..06f1083d5a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -185,7 +185,7 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
-/* random seed used when calling srandom() */
+/* random seed used to initialize the PRNG */
 int64		random_seed = -1;
 
 /*
@@ -279,14 +279,6 @@ typedef struct StatsData
 	SimpleStats lag;
 } StatsData;
 
-/*
- * Struct to keep random state.
- */
-typedef struct RandomState
-{
-	unsigned short xseed[3];
-} RandomState;
-
 /*
  * Connection state machine states.
  */
@@ -383,7 +375,7 @@ typedef struct
 	 * Separate randomness for each client. This is used for random functions
 	 * PGBENCH_RANDOM_* during the execution of the script.
 	 */
-	RandomState cs_func_rs;
+	pg_prng_state cs_func_rs;
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
@@ -450,9 +442,9 @@ typedef struct
 	 * random state to make all of them independent of each other and therefore
 	 * deterministic at the thread level.
 	 */
-	RandomState ts_choose_rs;	/* random state for selecting a script */
-	RandomState ts_throttle_rs;	/* random state for transaction throttling */
-	RandomState ts_sample_rs;	/* random state for log sampling */
+	pg_prng_state ts_choose_rs;	/* random state for selecting a script */
+	pg_prng_state ts_throttle_rs;	/* random state for transaction throttling */
+	pg_prng_state ts_sample_rs;	/* random state for log sampling */
 
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
@@ -832,30 +824,34 @@ strtodouble(const char *str, bool errorOK, double *dv)
 }
 
 /*
- * Initialize a random state struct.
+ * Initialize a pseudo-random state struct from main PRNG.
+ * This done at the beginning of process allows to have deterministic
+ * pgbench runs.
  */
 static void
-initRandomState(RandomState *random_state)
+pg_prng_state_init(pg_prng_state *rs)
 {
-	random_state->xseed[0] = random();
-	random_state->xseed[1] = random();
-	random_state->xseed[2] = random();
+	pg_prng_setstate_r(rs, pg_prng_u64());
 }
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(RandomState *random_state, int64 min, int64 max)
+getrand(pg_prng_state *random_state, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
 	 * being selected as do numbers between them.
 	 *
-	 * pg_erand48() is thread-safe and concurrent, which is why we use it
+	 * pg_prng_double_r() is thread-safe and concurrent, which is why we use it
 	 * rather than random()

Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead

2019-01-20 Thread Nikolay Shaplov
В письме от четверг, 17 января 2019 г. 20:33:06 MSK пользователь Alvaro 
Herrera написал:

> You introduced new macros IsHeapRelation and IsViewRelation, but I don't
> want to introduce such API.  Such things have been heavily contested and
> I don't to have one more thing to worry about for this patch, so please
> just put the relkind directly in the code.
Sorry.
I've been trying to avoid repeating

   (AssertMacro(relation->rd_rel->relkind == RELKIND_RELATION ||   \   
relation->rd_rel->relkind == RELKIND_MATVIEW), \  
all the time.

Fixed.

Also I make more relaxed parallel_workers behavior in get_relation_info as we 
discussed in another thread.diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index cdf1f4a..b1fd67c 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -22,7 +22,7 @@
 #include "access/htup_details.h"
 #include "access/nbtree.h"
 #include "access/reloptions.h"
-#include "access/spgist.h"
+#include "access/spgist_private.h"
 #include "access/tuptoaster.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
@@ -46,9 +46,8 @@
  * upper and lower bounds (if applicable); for strings, consider a validation
  * routine.
  * (ii) add a record below (or use add__reloption).
- * (iii) add it to the appropriate options struct (perhaps StdRdOptions)
- * (iv) add it to the appropriate handling routine (perhaps
- * default_reloptions)
+ * (iii) add it to the appropriate options struct
+ * (iv) add it to the appropriate handling routine
  * (v) make sure the lock level is set correctly for that operation
  * (vi) don't forget to document the option
  *
@@ -1010,7 +1009,7 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
 		case RELKIND_TOASTVALUE:
 		case RELKIND_MATVIEW:
 		case RELKIND_PARTITIONED_TABLE:
-			options = heap_reloptions(classForm->relkind, datum, false);
+			options = relation_reloptions(classForm->relkind, datum, false);
 			break;
 		case RELKIND_VIEW:
 			options = view_reloptions(datum, false);
@@ -1343,63 +1342,69 @@ fillRelOptions(void *rdopts, Size basesize,
 
 
 /*
- * Option parser for anything that uses StdRdOptions.
+ * Option parsing definition for autovacuum. Used in toast and heap options.
+ */
+
+#define AUTOVACUUM_RELOPTIONS(OFFSET)\
+		{"autovacuum_enabled", RELOPT_TYPE_BOOL, \
+		OFFSET + offsetof(AutoVacOpts, enabled)},\
+		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, vacuum_threshold)},   \
+		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, analyze_threshold)},  \
+		{"autovacuum_vacuum_cost_delay", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_delay)},  \
+		{"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_limit)},  \
+		{"autovacuum_freeze_min_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, freeze_min_age)}, \
+		{"autovacuum_freeze_max_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, freeze_max_age)}, \
+		{"autovacuum_freeze_table_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, freeze_table_age)},   \
+		{"autovacuum_multixact_freeze_min_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_min_age)},   \
+		{"autovacuum_multixact_freeze_max_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_max_age)},   \
+		{"autovacuum_multixact_freeze_table_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_table_age)}, \
+		{"log_autovacuum_min_duration", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, log_min_duration)},   \
+		{"autovacuum_vacuum_scale_factor", RELOPT_TYPE_REAL, \
+		OFFSET + offsetof(AutoVacOpts, vacuum_scale_factor)},\
+		{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,\
+		OFFSET + offsetof(AutoVacOpts, analyze_scale_factor)}
+
+/*
+ * Option parser for heap
  */
 bytea *
-default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
+heap_reloptions(Datum reloptions, bool validate)
 {
 	relopt_value *options;
-	StdRdOptions *rdopts;
+	HeapRelOptions *rdopts;
 	int			numoptions;
 	static const relopt_parse_elt tab[] = {
-		{"fillfactor", RELOPT_TYPE_INT, offsetof(StdRdOptions, fillfactor)},
-		{"autovacuum_enabled", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, enabled)},
-		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_threshold)},
-		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, analyze_threshold)},
-		{"autovacuum_

Re: Alternative to \copy in psql modelled after \g

2019-01-20 Thread Fabien COELHO



Bonjour Daniel,


:ROW_COUNT is incorrect after COPY TO STDOUT but this is a PG11-only
bug, :ROW_COUNT being a recent addition, so maybe we should deal with
this separately, since the current patch is supposed to address all versions?


ISTM that the patch is considered a bug fix, so it shoud be applied to 
pg11 and fix it?



I understand from the code that the COPY is really executed, so the ERROR
and so ROW_COUNT about the SQL should reflect that. Basically the change
makes the client believe that there is an SQL error whereas the error is
on the client.


Right, but wether COPY fails because psql can't write the output,
possibly half-way because of a disk full condition, or because the
query was cancelled or the server went down, are these distinctions
meaningful for a script?


It could if the SQL command has side effects, but probably this does not 
apply to COPY TO which cannot have.



If we say yes, how can the user know that the data fetched is
empty or incomplete? We don't have a CLIENT_SIDE_ERROR variable.


Yep. Maybe we should.

The documentation states that ERROR is about SQL, not psql internal stuff:

 ERROR true if the last SQL query failed, false if it succeeded.
   See also SQLSTATE.

And this is somehow the behavior on all other SQL commands, with or 
without your patch:


  sql> SELECT 1 \g /BAD
  /BAD: Permission denied

  sql> \echo :ERROR
  false

Basically, with your patch, the behavior becomes inconsistent between COPY 
and other SQL commands, so there is something to fix.


Given that, I see two options:

(1) document ERROR as being muddy, i.e. there has been some error which 
may be SQL or possibly client side. Although SQLSTATE would still allow to 
know whether an SQL error occured, there is still no client side 
expressions, and even if I had moved pgbench expressions to psql they 
would still need to be extended to handle strings. This suggest that maybe 
there could be an SQL_ERROR boolean which does store whether SQL succeeded 
or not, and possibly a CLIENT_ERROR on the side, and ERROR = SQL_ERROR OR 
CLIENT_ERROR.


(2) keep ERROR as is, i.e. about SQL, and add some CLIENT_ERROR, but then 
I see another issue, if it is *only* the client error, then it should be 
true if there is an SQL error, thus checking if there is any error becomes 
ERROR OR CLIENT_ERROR, which is muddy as well especially as there are no 
client-side expressions in psql.



Also, the fact that psql retrieves the results when it doesn't have
a valid destination for them is an implementation detail.


Yep, but if there are side effects, a script could want to know if they 
occured?



I think we could also cancel the query in a way that would be
technically an SQL error, as Ctrl+C would do.


Hmmm.


Hiding these details under a generic ERROR=true seems reasonable,
especially since we expose SQLSTATE for fine-grained error checking,
should that be needed.


Yep, but no expression.


ERROR=true and SQLSTATE empty is already mentioned as plausible
in SetResultVariables():


Indeed. This suggest that ERROR is already muddy, contrary to the 
documentation.



SetVariable(pset.vars, "ERROR", "true");
if (code == NULL)
code = "";
SetVariable(pset.vars, "SQLSTATE", code);


Overall, ISTM that it should point to solution (1).

 - document that ERROR is muddy, "some SQL or client error occured"
 - add SQL_ERROR, which is always about SQL error and nothing else
 - add CLIENT_ERROR, same
 - make the behavior consistent for all SQL command, COPY & others


The later part of the comment is already stated in the documentation, I'm
not sure it is worth repeating it here. I'd suggest a simpler /* handle
"COPY TO STDOUT \g ..." */


The bug we're fixing here is due to missing the point the comment is
making, so being thick seems fair.


I would not be, because ISTM that mentionning that COPY TO STDOUT is 
specially handled already points clearly to the previous issue. No big 
deal.


--
Fabien.



Re: Delay locking partitions during INSERT and UPDATE

2019-01-20 Thread Tomas Vondra
On 1/20/19 5:45 AM, John Naylor wrote:
> On Sat, Jan 19, 2019 at 10:59 AM Tomas Vondra
>  wrote:
>>
>> On 1/19/19 12:05 AM, John Naylor wrote:
>>> I used a similar test, but with unlogged tables, and "-c 2", and got:
>>>
>>> normal table: 32000tps
>>> 10k partitions / master: 82tps
>>> 10k partitions / patch: 7000tps
>>>
>>> So far I haven't gotten quite as good performance as you and Tomas,
>>> although it's still a ~85x improvement.
>>
>> What hardware are you running the tests on? I wouldn't be surprised if
>> you were hitting some CPU or I/O bottleneck, which we're not hitting.
> 
> 9 year-old laptop, Core i3. Side note, I miswrote my test parameters
> above -- should be "-c4 -j2".
> 

Hmmm, I wouldn't be surprised if it was getting throttled for some
reasons (e.g. temperature).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: possible deadlock: different lock ordering for heap pages

2019-01-20 Thread Amit Kapila
On Sat, Jan 19, 2019 at 4:02 AM Nishant, Fnu  wrote:
>
> Hello,
>
> While locking heap pages (function RelationGetBufferForTuple() in file hio.c) 
> we acquire locks in increasing block number order to avoid deadlock. In 
> certain cases, we do have to drop and reacquire lock on heap pages (when we 
> have to pin visibility map pages) to avoid doing IO while holding exclusive 
> lock. The problem is we now acquire locks in bufferId order, which looks like 
> a slip and the intention was to grab it in block number order. Since it is a 
> trivial change, I am attaching a patch to correct it.
>

On a quick look, your analysis seems correct, but I am bit surprised
to see how this survived so long without being caught.  I think you
need to change below code as well if your analysis and fix is correct.
GetVisibilityMapPins()
{
..
Assert(buffer2 == InvalidBuffer || buffer1 <= buffer2);
..
}

AFAICS, this code has been added by below commit, so adding author in the loop.

commit e16954f3d27fa8e16c379ff6623ae18d6250a39c
Author: Robert Haas 
Date:   Mon Jun 27 13:55:55 2011 -0400

Try again to make the visibility map crash safe.

My previous attempt was quite a bit less than half-baked with respect to
heap_update().

Robert, can you please once see if we are missing anything here
because to me the report and fix look genuine.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Alternative to \copy in psql modelled after \g

2019-01-20 Thread Fabien COELHO




I understand from the code that the COPY is really executed, so the ERROR
and so ROW_COUNT about the SQL should reflect that. Basically the change
makes the client believe that there is an SQL error whereas the error is
on the client.


Right, but wether COPY fails because psql can't write the output,
possibly half-way because of a disk full condition, or because the
query was cancelled or the server went down, are these distinctions
meaningful for a script?


It could if the SQL command has side effects, but probably this does not 
apply to COPY TO which cannot have.


Yes it can:

COPY (
  UPDATE pgbench_branches
SET bbalance = bbalance + 1
WHERE bid <= 5
  RETURNING *) TO STDOUT \g /BAD

The SQL command is executed but the backslash command fails.

--
Fabien.



Re: A small note on the portability of cmake

2019-01-20 Thread Tom Lane
Andres Freund  writes:
> On 2019-01-19 23:39:37 -0800, Jesse Zhang wrote:
>> Atomic ops (compare-and-exchange) might be a harder dependency to shed
>> for libuv. Does the fallback onto compiler intrinsics
>> (__sync_val_compare_and_swap, or on GCC 4.7+,
>> __atomic_compare_exchange_n) not work here?

Nope, the failure manifests as

/usr/local/lib/libuv.so.1.0: undefined reference to 
`__sync_val_compare_and_swap_4'

when some dependent package tries to use the library.  So the build
failed to notice that the compiler intrinsics don't exist.  (I was
using OpenBSD's packaging of libuv, which I guess doesn't bother
running any test cases during build.)

> HPPA doesn't hardware instructions for atomic ops other than
> test-and-set IIRC.

Indeed, the main reason why I'm interested in keeping this old dinosaur
going at all is that it is so different from other platforms in terms
of what we can assume about spinlocks and atomic ops.  Keeps us honest.

regards, tom lane



Re: PostgreSQL vs SQL/XML Standards

2019-01-20 Thread Chapman Flack
Hi,

On 01/20/19 00:26, Pavel Stehule wrote:
>>> column expressions are evaluated once per row, but XPath  expression is
>>> compiled per row too, if I remember well.
>>
>> I looked for evidence of that in the code, but did not find it; the
>> compilation appears to happen in XmlTableSetColumnFilter, which is
>> called from tfuncInitialize.
> 
> it is called per input row.

I could be misunderstanding what you mean by 'input row' here.

If I write a simple xmltable query where the row_expression produces
six rows:

SELECT *
FROM
  xmltable('pg_am/row'
PASSING table_to_xml('pg_am', true, false, '')
COLUMNS amname text PATH 'amname');

six rows are produced, though a breakpoint set on XmlTableSetColumnFilter
fires only once, from tfuncInitialize at the start of xmltable's execution.


By contrast, in the regression test example with PATH ''||lower(_path)||'c',
four rows are produced and the breakpoint fires four times.

However ... that isn't because one call to xmltable is producing four rows
and recomputing the column_expression each time.

It's because that xmltable is the RHS of a LATERAL, and the LHS of the
LATERAL is producing four tuples with different values of columns the RHS
depends on, so the RHS (xmltable) is being called four different times,
producing one row each time, still with XmlTableSetColumnFilter being called
only during initialization.


> sure - using any expression in PATH clause should to demonstrate this
> functionality.

Well, there seem to be two distinct features touched on in the docs:

1. The column_expression is allowed to be an expression, not restricted
   to a string literal as it is in the standard (and Oracle).

2. Not only is it an expression, but it's an expression whose evaluation
   is deferred and can happen more than once in the same xmltable call.

The example in the regression tests certainly demonstrates (1). Without (1),
it would be a syntax error.

I think the same example would produce the same output even with feature (2)
absent. It's LATERAL doing the magic there. So I am doubtful that it
demonstrates (2).

I put some effort last night into trying to even construct any query that
would demonstrate (2), and I came up short, but that could be my lack of
imagination. (Somewhere in that effort I happened to notice that xmltable
doesn't seem to be parsed successfully inside a ROWS FROM (...) construct,
which might be another issue for another time)

So, if you have a way to build a query that demonstrates (2), my aha! moment
might then arrive.

Regards,
-Chap



could not access status of transaction

2019-01-20 Thread chenhj
In our PG 10.2(CentOS 7.3) database, the following error is reported when 
querying a table. We have already restored the production data through backup, 
but i want to confirm what may be the reason and how to avoid it in the future.

lma=# select count(*) from bi_dm.tdm_ttk_site_on_way_rt;
ERROR:  could not access status of transaction 3250922107
DETAIL:  Could not open file "pg_xact/0C1C": No such file or directory.

Here are some related information

The CLOG files in pg_xact diractory is as follow:

0C4A(Last update date: 2018/12/29)
...
0D09(Last update date: 2019/01/13)

The stack when the error occurs is as follows

(gdb) bt
#0  SlruReportIOError (ctl=ctl@entry=0xca98e0 , 
pageno=pageno@entry=99210, xid=xid@entry=3250922107) at slru.c:901
#1  0x004e53a8 in SimpleLruReadPage (ctl=ctl@entry=0xca98e0 
, pageno=pageno@entry=99210, write_ok=write_ok@entry=1 '\001', 
xid=xid@entry=3250922107)
at slru.c:445
#2  0x004e5460 in SimpleLruReadPage_ReadOnly 
(ctl=ctl@entry=0xca98e0 , pageno=pageno@entry=99210, 
xid=xid@entry=3250922107) at slru.c:492
#3  0x004dde4d in TransactionIdGetStatus 
(xid=xid@entry=3250922107, lsn=lsn@entry=0x7ffeb4472660) at clog.c:411
#4  0x004e6e28 in TransactionLogFetch 
(transactionId=3250922107) at transam.c:79
#5  0x004e6e7f in TransactionIdDidCommit 
(transactionId=) at transam.c:129
#6  0x0084ce6a in HeapTupleSatisfiesMVCC (htup=, 
snapshot=0xf7f0c8, buffer=837) at tqual.c:1124
#7  0x004b106e in heapgetpage (scan=scan@entry=0x10bb1b0, 
page=page@entry=1078209) at heapam.c:439
#8  0x004b23ab in heapgettup_pagemode (key=0x0, nkeys=0, 
dir=1078209, scan=0x10bb1b0) at heapam.c:1034
#9  heap_getnext (scan=scan@entry=0x10bb1b0, 
direction=direction@entry=ForwardScanDirection) at heapam.c:1801
#10 0x0060fa93 in SeqNext (node=node@entry=0x108c360) at 
nodeSeqscan.c:81
#11 0x005f51ca in ExecScanFetch (recheckMtd=0x60fa40 
, accessMtd=0x60fa60 , node=0x108c360) at execScan.c:97
#12 ExecScan (node=0x108c360, accessMtd=0x60fa60 , 
recheckMtd=0x60fa40 ) at execScan.c:147
#13 0x005fad89 in ExecProcNode (node=0x108c360) at 
../../../src/include/executor/executor.h:250
#14 fetch_input_tuple (aggstate=aggstate@entry=0x108bcb8) at 
nodeAgg.c:695
#15 0x005fcdbf in agg_retrieve_direct (aggstate=0x108bcb8) at 
nodeAgg.c:2448
#16 ExecAgg (pstate=0x108bcb8) at nodeAgg.c:2158
#17 0x005ef612 in ExecProcNode (node=0x108bcb8) at 
../../../src/include/executor/executor.h:250
#18 ExecutePlan (execute_once=, dest=0x10c40e8, 
direction=, numberTuples=0, sendTuples=1 '\001', 
operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x108bcb8, 
estate=0x108baa8) at execMain.c:1722
#19 standard_ExecutorRun (queryDesc=0x10c7168, direction=, count=0, execute_once=) at execMain.c:363
#20 0x7f766427915d in pgss_ExecutorRun (queryDesc=0x10c7168, 
direction=ForwardScanDirection, count=0, execute_once=) at 
pg_stat_statements.c:889
#21 0x0071a28b in PortalRunSelect 
(portal=portal@entry=0xfac708, forward=forward@entry=1 '\001', count=0, 
count@entry=9223372036854775807, dest=dest@entry=0x10c40e8)
at pquery.c:932
#22 0x0071b63f in PortalRun (portal=portal@entry=0xfac708, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
run_once=run_once@entry=1 '\001', dest=dest@entry=0x10c40e8, 
altdest=altdest@entry=0x10c40e8, 
completionTag=completionTag@entry=0x7ffeb4472c80 "") at pquery.c:773
#23 0x007175b3 in exec_simple_query (query_string=0x102a4e8 
"select count(*) from tdm_ttk_site_on_way_rt ;") at postgres.c:1099
#24 0x007188ac in PostgresMain (argc=, 
argv=argv@entry=0xfa2d68, dbname=0xfa2c50 "lma", username=) at 
postgres.c:4088
#25 0x0047ad3c in BackendRun (port=0xfa49b0) at 
postmaster.c:4405
#26 BackendStartup (port=0xfa49b0) at postmaster.c:4077
#27 ServerLoop () at postmaster.c:1755
#28 0x006afacf in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0xf5fa20) at postmaster.c:1363
#29 0x0047bb6f in main (argc=3, argv=0xf5fa20) at main.c:228
(gdb) f 6
#6  0x0084ce6a in HeapTupleSatisfiesMVCC (htup=, 
snapshot=0xf7f0c8, buffer=837) at tqual.c:1124
1124if 
(!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
(gdb) p *tuple
$3 = {t_choice = {t_heap = {t_xmin = 3238223060, t_xmax = 3250922107, 
t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = -1056744236, 
datum_typmod = -1044045189, 
  datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 16, bi_lo = 
29633}, ip_posid = 3}, t_infomask2 = 8211, t_infomask

Re: could not access status of transaction

2019-01-20 Thread Tomas Vondra
On 1/20/19 5:07 PM, chenhj wrote:
> In our PG 10.2(CentOS 7.3) database, the following error is reported when 
> querying a table. We have already restored the production data through 
> backup, but i want to confirm what may be the reason and how to avoid it in 
> the future.
> 
>   lma=# select count(*) from bi_dm.tdm_ttk_site_on_way_rt;
>   ERROR:  could not access status of transaction 3250922107
>   DETAIL:  Could not open file "pg_xact/0C1C": No such file or directory.
> 
> Here are some related information
> 
> The CLOG files in pg_xact diractory is as follow:
> 
>   0C4A(Last update date: 2018/12/29)
> ...
> 0D09(Last update date: 2019/01/13)
> 

Yes, that very much looks like a data corruption, probably due to
truncating the clog too early or something like that.

> ...
> 
> A similar problem has been reported in 9.0, but there is no reason to mention 
> it.
> 
> https://www.postgresql.org/message-id/flat/1300970362.2349.27.camel%40stevie
> 

The symptoms are the same, but that's far from sufficient to conclude
it's the same root cause.

> Currently I suspect that it may be the same problem as the bug below. is it 
> possible?
> 
> The bug will cause some sessions to cache the wrong relfrozenxid of the 
> table. The session that may call vac_truncate_clog() will clean up the clog 
> after the actual relfrozenxid due to reading the wrong relfrozenxid.
> 
> https://www.postgresql.org/message-id/flat/20180809161148.GB22623%40momjian.us#a7cc4d41464064b7752a5574eb74a06d
> 

Maybe. But it'll be hard to confirm it's what happened. It also shows
why it's important to keep up with minor updates (you're running 10.3,
which is almost 1 year old).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

2019-01-20 Thread Dave Cramer
Dave Cramer


On Tue, 15 Jan 2019 at 07:53, Dave Cramer  wrote:

>
> Dave Cramer
>
>
> On Sun, 13 Jan 2019 at 23:19, Craig Ringer  wrote:
>
>> On Mon, 3 Dec 2018 at 19:38, Dave Cramer  wrote:
>>
>>> Dmitry,
>>>
>>> Please see attached rebased patches
>>>
>>
>> I'm fine with patch 0001, though I find this comment a bit hard to follow:
>>
>> + * The send_data callback must enqueue complete CopyData messages to
>> libpq
>> + * using pq_putmessage_noblock or similar, since the walsender loop may
>> send
>> + * CopyDone then exit and return to command mode in response to a client
>> + * CopyDone between calls to send_data.
>>
>>
> I think it needs splitting up into a couple of sentences.
>>
>> Fair point, remember it was originally written by a non-english speaker
>

Thoughts on below ?

+/*
+ * Main loop of walsender process that streams the WAL over Copy messages.
+ *
+ * The send_data callback must enqueue complete CopyData messages to the
client
+ * using pq_putmessage_noblock or similar
+ * In order to preserve the protocol it is necessary to send all of the
existing
+ * messages still in the buffer as the WalSender loop may send
+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.
+ */


>
>> In patch 0002, stopping during a txn. It's important that once we skip
>> any action, we continue skipping. In patch 0002 I'd like it to be clearer
>> that we will *always* skip the rb->commit callback
>> if rb->continue_decoding_cb() returned false and aborted the while loop. I
>> suggest storing the result of (rb->continue_decoding_cb == NULL ||
>> rb->continue_decoding_cb())  in an assignment within the while loop, and
>> testing that result later.
>>
>> e.g.
>>
>> (continue_decoding = (rb->continue_decoding_cb == NULL ||
>> rb->continue_decoding_cb()))
>>
>> and later
>>
>> if (continue_decoding) {
>> rb->commit(rb, txn, commit_lsn);
>> }
>>
>> Will do
>

Hmmm... I don't actually see how this is any different than what we have in
the patch now where below would the test occur?


> I don't actually think it's necessary to re-test the continue_decoding
>> callback and skip commit here. If we've sent all the of the txn
>> except the commit, we might as well send the commit, it's tiny and might
>> save some hassle later.
>>
>>
>> I think a comment on the skipped commit would be good too:
>>
>> /*
>>  * If we skipped any changes due to a client CopyDone we must not send a
>> commit
>>  * otherwise the client would incorrectly think it received a complete
>> transaction.
>>  */
>>
>> I notice that the fast-path logic in WalSndWriteData is removed by this
>> patch. It isn't moved; there's no comparison of last_reply_timestamp
>> and wal_sender_timeout now. What's the rationale there? You want to ensure
>> that we reach ProcessRepliesIfAny() ? Can we cheaply test for a readable
>> client socket then still take the fast-path if it's not readable?
>>
>
> This may have been a mistake as I am taking this over from a very old
> patch that I did not originally write. I'll have a look at this
>

OK, I'm trying to decipher the original intent of this patch as well as I
didn't write it.

There are some hints here
https://www.postgresql.org/message-id/CAFgjRd1LgVbtH%3D9O9_xvKQjvUP7aRF-edxqwKfaNs9hMFW_4gw%40mail.gmail.com

As to why the fast path logic was removed. Does it make sense to you?

Dave

>
>>


Re: PostgreSQL vs SQL/XML Standards

2019-01-20 Thread Pavel Stehule
ne 20. 1. 2019 v 17:06 odesílatel Chapman Flack 
napsal:

> Hi,
>
> On 01/20/19 00:26, Pavel Stehule wrote:
> >>> column expressions are evaluated once per row, but XPath  expression is
> >>> compiled per row too, if I remember well.
> >>
> >> I looked for evidence of that in the code, but did not find it; the
> >> compilation appears to happen in XmlTableSetColumnFilter, which is
> >> called from tfuncInitialize.
> >
> > it is called per input row.
>
> I could be misunderstanding what you mean by 'input row' here.
>

input row mean a row of processed relation. The xml value from this row can
be transformed to 0..M output rows.

The column filter expressions are evaluated once per input rows, default
expressions are evaluated when it is necessary - possibly once for any
output row


>
> If I write a simple xmltable query where the row_expression produces
> six rows:
>
> SELECT *
> FROM
>   xmltable('pg_am/row'
> PASSING table_to_xml('pg_am', true, false, '')
> COLUMNS amname text PATH 'amname');
>
> six rows are produced, though a breakpoint set on XmlTableSetColumnFilter
> fires only once, from tfuncInitialize at the start of xmltable's execution.
>
>
> By contrast, in the regression test example with PATH
> ''||lower(_path)||'c',
> four rows are produced and the breakpoint fires four times.
>

it is expected - the input relation has four lines - the function was 4x
initialized. In this case 1 call of xmltable produces 1 row.


>
> However ... that isn't because one call to xmltable is producing four rows
> and recomputing the column_expression each time.
>
> It's because that xmltable is the RHS of a LATERAL, and the LHS of the
> LATERAL is producing four tuples with different values of columns the RHS
> depends on, so the RHS (xmltable) is being called four different times,
> producing one row each time, still with XmlTableSetColumnFilter being
> called
> only during initialization.
>
>
> > sure - using any expression in PATH clause should to demonstrate this
> > functionality.
>
> Well, there seem to be two distinct features touched on in the docs:
>
> 1. The column_expression is allowed to be an expression, not restricted
>to a string literal as it is in the standard (and Oracle).
>
> 2. Not only is it an expression, but it's an expression whose evaluation
>is deferred and can happen more than once in the same xmltable call.
>

yes, it is any expressions used there are evaluated per 1 call. If input
relation has N rows, then xmltable is initialized N times - by different
words. xmltable is evaluated independently for any processed XML document.



> The example in the regression tests certainly demonstrates (1). Without
> (1),
> it would be a syntax error.
>
> I think the same example would produce the same output even with feature
> (2)
> absent. It's LATERAL doing the magic there. So I am doubtful that it
> demonstrates (2).
>

LATERAL is necessary, because  XMLTABLE can be used only in FROM clause,
and in this case XMLTABLE has mutable parameters.


> I put some effort last night into trying to even construct any query that
> would demonstrate (2), and I came up short, but that could be my lack of
> imagination. (Somewhere in that effort I happened to notice that xmltable
> doesn't seem to be parsed successfully inside a ROWS FROM (...) construct,
> which might be another issue for another time)
>
> So, if you have a way to build a query that demonstrates (2), my aha!
> moment
> might then arrive.
>
> Regards,
> -Chap
>


Re: Thread-unsafe coding in ecpg

2019-01-20 Thread Michael Meskes
> > The question is, what do we do on those platforms? Use setlocale()
> > or
> > fallback to (a) and document that ecpg has to run in a C locale?
> 
> No, we shouldn't use setlocale(), because it clearly is hazardous
> even on platforms where it doesn't fail outright.  I don't see
> anything so wrong with just documenting the hazard.  The situation

Actually I meant using setlocale() and documenting that it must not be
used with threads, or document it must not be used with locales?

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL




Re: shared-memory based stats collector

2019-01-20 Thread Tomas Vondra


Hi,

The patch needs rebasing, as it got broken by 285d8e1205, and there's
some other minor bitrot.

On 11/27/18 4:40 PM, Tomas Vondra wrote:
> On 11/27/18 9:59 AM, Kyotaro HORIGUCHI wrote:
>>>
>>> ...>>
>>> For the main workload there's pretty much no difference, but for
>>> selects from the stats catalogs there's ~20% drop in throughput.
>>> In absolute numbers this means drop from ~670tps to ~550tps. I
>>> haven't investigated this, but I suppose this is due to dshash
>>> seqscan being more expensive than reading the data from file.
>>
>> Thanks for finding that. The three seqscan loops in 
>> pgstat_vacuum_stat cannot take such a long time, I think. I'll 
>> investigate it.
>>
> 
> OK. I'm not sure this is related to pgstat_vacuum_stat - the
> slowdown happens while querying the catalogs, so why would that
> trigger vacuum of the stats? I may be missing something, of course.
> 
> FWIW, the "query statistics" test simply does this:
> 
>   SELECT * FROM pg_stat_all_tables;
>   SELECT * FROM pg_stat_all_indexes;
>   SELECT * FROM pg_stat_user_indexes;
>   SELECT * FROM pg_stat_user_tables;
>   SELECT * FROM pg_stat_sys_tables;
>   SELECT * FROM pg_stat_sys_indexes;
> 
> and the slowdown happened even it was running on it's own (nothing
> else running on the instance). Which mostly rules out concurrency
> issues with the hash table locking etc.
> 

Did you have time to investigate the slowdown?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Pluggable Storage - Andres's take

2019-01-20 Thread Dmitry Dolgov
> On Fri, Jan 18, 2019 at 11:22 AM Amit Khandekar  
> wrote:
>
> --- a/src/test/regress/expected/copy2.out
> +++ b/src/test/regress/expected/copy2.out
> @@ -1,3 +1,4 @@
> +\set HIDE_TABLEAM on
>
> I thought we wanted to avoid having to add this setting in individual
> regression tests. Can't we do this in pg_regress as a common setting ?

Yeah, you're probably right. Actually, I couldn't find anything that looks like
"common settings", and so far I've placed it into psql_start_test as a psql
argument. But not sure, maybe there is a better place.

> + /* Access method info */
> + if (pset.sversion >= 12 && verbose && tableinfo.relam != NULL &&
> +!(pset.hide_tableam && tableinfo.relam_is_default))
> + {
> + printfPQExpBuffer(&buf, _("Access method: %s"),
> fmtId(tableinfo.relam));
>
> So this will make psql hide the access method if it's same as the
> default. I understand that this was kind of concluded in the other
> thread "Displaying and dumping of table access methods". But IMHO, if
> the hide_tableam is false, we should *always* show the access method,
> regardless of the default value. I mean, we can make it simple : off
> means never show table-access, on means always show table-access,
> regardless of the default access method. And this also will work with
> regression tests. If some regression test wants specifically to output
> the access method, it can have a "\SET HIDE_TABLEAM off" command.
>
> If we hide the method if it's default, then for a regression test that
> wants to forcibly show the table access method of all tables, it won't
> show up for tables that have default access method.

I can't imagine, what kind of test would need to forcibly show the table access
method of all the tables? Even if you need to verify tableam for something,
maybe it's even easier just to select it from pg_am?

> + if (pset.sversion >= 12 && verbose && tableinfo.relam != NULL &&
>
> If the server does not support relam, tableinfo.relam will be NULL
> anyways. So I think sversion check is not needed.
> 
>
> + printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));
> fmtId is not required.
> ---
>
> +  printfPQExpBuffer(&buf, _("Access method: %s"), 
> fmtId(tableinfo.relam));
> +  printTableAddFooter(&cont, buf.data);
> +   }
> +
> +
>  }
>
> Last two blank lines are not needed.

Right, fixed.

> + boolhide_tableam;
>  } PsqlSettings;
>
> These variables, it seems, are supposed to be grouped together by type.

Well, this grouping looks strange for me. But since I don't have a strong
opinion, I moved the variable.

> I believe you are going to add a new regression testcase for the change ?

Yep.


psql_describe_am_v2.patch
Description: Binary data


Re: Thread-unsafe coding in ecpg

2019-01-20 Thread Tom Lane
Michael Meskes  writes:
>> No, we shouldn't use setlocale(), because it clearly is hazardous
>> even on platforms where it doesn't fail outright.  I don't see
>> anything so wrong with just documenting the hazard.  The situation

> Actually I meant using setlocale() and documenting that it must not be
> used with threads, or document it must not be used with locales?

I tend to think that has more downside than upside, in situations where
people don't read the manual closely and try to do it anyway.

First, there's the probable crash if setlocale() is thread-unsafe.
(Though the lack of previous reports suggests that on most platforms,
it isn't.)

Second, if the program is indeed trying to run with non-C LC_NUMERIC,
using setlocale() will have unsynchronized, hard-to-debug side effects
on other threads.  Not using it will have no downside at all if ecpg
isn't trying to read numeric data, while if it does do so, the failures
will be reproducible and easy to understand/debug.

Admittedly, removing the setlocale() call will be a net negative for
single-threaded applications, which are likely the majority.  But
I don't know any safe way to tell whether the app is multi threaded.

On the third hand, the lack of previous reports suggests that maybe
this whole thing is seldom a problem in practice.  Maybe we should
just use uselocale() where available and otherwise hope it's OK
to keep on doing what we were doing.

regards, tom lane



Re: PostgreSQL vs SQL/XML Standards

2019-01-20 Thread Chapman Flack
On 01/20/19 11:55, Pavel Stehule wrote:
> input row mean a row of processed relation. The xml value from this row can
> be transformed to 0..M output rows.
> 
> The column filter expressions are evaluated once per input rows, default
> expressions are evaluated when it is necessary - possibly once for any
> output row
> ...
> it is expected - the input relation has four lines - the function was 4x
> initialized. In this case 1 call of xmltable produces 1 row.

Good ... I think we're converging on a shared understanding.

I am just used to speaking of xmltable as a simple function that executes
one row_expression one time against one supplied input, and generates 0..M
output rows from it. If there are multiple rows in a relation that you
want to apply xmltable to, that's a simple matter of calling xmltable
multiple times (which is just what SQL is doing when the xmltable is on
the RHS of an explicit or implied LATERAL).

The upshot seems to be that there is nothing necessarily special about
how xmltable treats its column_expressions: it compiles them once upon
entry to the function, as one would naïvely expect. (Or, if there is
anything more special about how the column_expression is being handled,
it seems not to be necessary, as the naïve behavior would be adequate.)

Accordingly, I think the paragraph beginning "Unlike regular PostgreSQL
functions," is more likely to perplex readers than to enlighten them.
What it says about column_expression does not seem to lead to any useful
difference from the behavior if it were "just like regular PostgreSQL
functions".

The part about usefully using volatile functions in default_expression
remains important to mention.

The statement in an earlier paragraph that "It is possible for a
default_expression to reference the value of output columns that appear
prior to it in the column list" still may need some rework, because it
does not seem possible to refer to prior columns /within xmltable's own
column list/ (though that could be useful, and I think it is intended
in the standard). Doesn't seem to work in Oracle either

While it does seem possible to refer to columns supplied by
/earlier FROM items in the containing SELECT/, that simply results in
multiple calls of xmltable, just as in the column_expression case.

>> I think the same example would produce the same output even with feature
>> (2)
>> absent. It's LATERAL doing the magic there. So I am doubtful that it
>> demonstrates (2).
> 
> LATERAL is necessary, because  XMLTABLE can be used only in FROM clause,
> and in this case XMLTABLE has mutable parameters.

For what it's worth, if I repeat the query with the word LATERAL removed,
it works just the same. I think that's simply because the LATERAL behavior
is implied for a function-call FROM item, so the explicit word isn't needed.
The main thing is, evaluation proceeds in the way described under LATERAL
in the ref page for SELECT.

Regards,
-Chap



Re: PostgreSQL vs SQL/XML Standards

2019-01-20 Thread Pavel Stehule
>
> Accordingly, I think the paragraph beginning "Unlike regular PostgreSQL
> functions," is more likely to perplex readers than to enlighten them.
> What it says about column_expression does not seem to lead to any useful
> difference from the behavior if it were "just like regular PostgreSQL
> functions".
>

regular Postgres' functions has evaluated all arguments before own
execution. I think so this note is related much more to expressions used as
defaults.


> The part about usefully using volatile functions in default_expression
> remains important to mention.
>
> The statement in an earlier paragraph that "It is possible for a
> default_expression to reference the value of output columns that appear
> prior to it in the column list" still may need some rework, because it
> does not seem possible to refer to prior columns /within xmltable's own
> column list/ (though that could be useful, and I think it is intended
> in the standard). Doesn't seem to work in Oracle either
>
> While it does seem possible to refer to columns supplied by
> /earlier FROM items in the containing SELECT/, that simply results in
> multiple calls of xmltable, just as in the column_expression case.
>
> >> I think the same example would produce the same output even with feature
> >> (2)
> >> absent. It's LATERAL doing the magic there. So I am doubtful that it
> >> demonstrates (2).
> >
> > LATERAL is necessary, because  XMLTABLE can be used only in FROM clause,
> > and in this case XMLTABLE has mutable parameters.
>
> For what it's worth, if I repeat the query with the word LATERAL removed,
> it works just the same. I think that's simply because the LATERAL behavior
> is implied for a function-call FROM item, so the explicit word isn't
> needed.
> The main thing is, evaluation proceeds in the way described under LATERAL
> in the ref page for SELECT.
>

In this case the LATERAL keyword is optional - with or without this keyword
it is lateral join.

Regards

Pavel


>
> Regards,
> -Chap
>


Re: A small note on the portability of cmake

2019-01-20 Thread Andres Freund
Hi,

On 2019-01-20 10:15:53 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > HPPA doesn't hardware instructions for atomic ops other than
> > test-and-set IIRC.
> 
> Indeed, the main reason why I'm interested in keeping this old dinosaur
> going at all is that it is so different from other platforms in terms
> of what we can assume about spinlocks and atomic ops.  Keeps us honest.

FWIW, while that clearly is the policy right now, I quite doubt that
it's beneficial. It's not like there's going to be new hardware
platforms without at least cmpxchg / ll/sc support. So I'm not seeing
what not requiring them keeps us honest about.

Greetings,

Andres Freund



Re: explain plans with information about (modified) gucs

2019-01-20 Thread Tomas Vondra
Hi John,

On 1/16/19 10:08 PM, John Naylor wrote:
>> [v5]
> 
> Hi Tomas,
> Peter suggested upthread to use 'settings' rather than 'gucs' for the
> explain option (and output?), and you seemed to agree. Are you going
> to include that in a future version? Speaking of the output, v5's
> default text doesn't match that of formatted text ('GUCs' / 'GUC').
> 

Attached is v6 of the patch, adopting "settings" instead of "guc".

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 7b22927674..edc50f9368 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -28,6 +28,7 @@ static bool auto_explain_log_verbose = false;
 static bool auto_explain_log_buffers = false;
 static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
+static bool auto_explain_log_settings = false;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static int	auto_explain_log_level = LOG;
 static bool auto_explain_log_nested_statements = false;
@@ -112,6 +113,17 @@ _PG_init(void)
 			 NULL,
 			 NULL);
 
+	DefineCustomBoolVariable("auto_explain.log_settings",
+			 "Log modified configuration parameters affecting query planning.",
+			 NULL,
+			 &auto_explain_log_settings,
+			 false,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomBoolVariable("auto_explain.log_verbose",
 			 "Use EXPLAIN VERBOSE for plan logging.",
 			 NULL,
@@ -356,6 +368,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 			es->timing = (es->analyze && auto_explain_log_timing);
 			es->summary = es->analyze;
 			es->format = auto_explain_log_format;
+			es->settings = auto_explain_log_settings;
 
 			ExplainBeginOutput(es);
 			ExplainQueryText(es, queryDesc);
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 120b168d45..b0a129cfee 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -169,6 +169,23 @@ LOAD 'auto_explain';
 

 
+   
+
+ auto_explain.log_settings (boolean)
+ 
+  auto_explain.log_settings configuration parameter
+ 
+
+
+ 
+  auto_explain.log_settings controls whether information
+  about modified configuration options affecting query planning are logged
+  with the execution plan.  Only options modified at the database, user, client
+  or session level are considered modified.  This parameter is off by default.
+ 
+
+   
+

 
  auto_explain.log_format (enum)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 8dc0d7038a..a7dca5f208 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -39,6 +39,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] statementboolean ]
 VERBOSE [ boolean ]
 COSTS [ boolean ]
+SETTINGS [ boolean ]
 BUFFERS [ boolean ]
 TIMING [ boolean ]
 SUMMARY [ boolean ]
@@ -152,6 +153,17 @@ ROLLBACK;
 

 
+   
+SETTINGS
+
+ 
+  Include information on configuration parameters.  Specifically, include
+  configuration parameters that were modified and are considered to affect
+  query planning.  This parameter defaults to FALSE.
+ 
+
+   
+

 BUFFERS
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ae7f038203..4309c8d137 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -30,6 +30,7 @@
 #include "storage/bufmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_tables.h"
 #include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -163,6 +164,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
 			es->costs = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "buffers") == 0)
 			es->buffers = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "settings") == 0)
+			es->settings = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "timing") == 0)
 		{
 			timing_set = true;
@@ -597,6 +600,68 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	ExplainCloseGroup("Query", NULL, true, es);
 }
 
+static void
+ExplainShowSettings(ExplainState *es)
+{
+	int		num;
+	struct config_generic **gucs;
+
+	/* bail out if GUC information not requested */
+	if (!es->settings)
+		return;
+
+	gucs = get_explain_guc_options(&num);
+
+	/* also bail out of there are no options */
+	if (!num)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		int		i;
+
+		ExplainOpenGroup("Settings", "Settings", false, es);
+
+		for (i = 0; i < num; i++)
+		{
+			char *setting;
+			struct config_generic *conf = gucs[i];
+
+			setting = GetConfigOptionByName(conf->name, NULL, true);
+
+			ExplainPropertyText(co

Re: A small note on the portability of cmake

2019-01-20 Thread Tom Lane
Andres Freund  writes:
> On 2019-01-20 10:15:53 -0500, Tom Lane wrote:
>> Indeed, the main reason why I'm interested in keeping this old dinosaur
>> going at all is that it is so different from other platforms in terms
>> of what we can assume about spinlocks and atomic ops.  Keeps us honest.

> FWIW, while that clearly is the policy right now, I quite doubt that
> it's beneficial. It's not like there's going to be new hardware
> platforms without at least cmpxchg / ll/sc support. So I'm not seeing
> what not requiring them keeps us honest about.

I think you're being short-sighted.  I agree that any reasonable new
hardware platform would have that functionality in some form, but
it won't necessarily be exactly like x86_64 does it.  The particular
things I think HPPA is keeping us honest about have to do with the
size of spinlocks and whether they initialize to zero or not.
See e.g. 6b93fcd14 for an actual bug caught by that.

regards, tom lane



Re: A small note on the portability of cmake

2019-01-20 Thread Andres Freund
Hi,

On 2019-01-20 14:37:43 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-01-20 10:15:53 -0500, Tom Lane wrote:
> >> Indeed, the main reason why I'm interested in keeping this old dinosaur
> >> going at all is that it is so different from other platforms in terms
> >> of what we can assume about spinlocks and atomic ops.  Keeps us honest.
> 
> > FWIW, while that clearly is the policy right now, I quite doubt that
> > it's beneficial. It's not like there's going to be new hardware
> > platforms without at least cmpxchg / ll/sc support. So I'm not seeing
> > what not requiring them keeps us honest about.
> 
> I think you're being short-sighted.  I agree that any reasonable new
> hardware platform would have that functionality in some form, but
> it won't necessarily be exactly like x86_64 does it.

I agree on that part.  I can't see them doing something as weird as HPPA
however - yes there'll be some ll/sc and some cmpxchg style archs, and
perhaps their alignment requirements will be different, but not much
more than that. And that's largely what we already have covered.


> The particular things I think HPPA is keeping us honest about have to
> do with the size of spinlocks and whether they initialize to zero or
> not.  See e.g. 6b93fcd14 for an actual bug caught by that.

Sure.  But it also increases the test matrix, requires adding code for
fallbacks, makes dependencies more painful as evidenced here.

Greetings,

Andres Freund



Why does execReplication.c lock tuples?

2019-01-20 Thread Andres Freund
Hi,

Currently RelationFindReplTupleByIndex(), RelationFindReplTupleSeq()
lock the found tuple. I don't quite get what that achieves - why isn't
dealing with concurrency in the table_update/delete calls at the
callsites sufficient? As far as I can tell there's no meaningful
concurrency handling in the heap_lock_tuple() paths, so it's not like we
follow update chains or anything. As far as I can tell this just nearly
doubles the number of WAL records when replaying updates/deletes?

Greetings,

Andres Freund



Re: Thread-unsafe coding in ecpg

2019-01-20 Thread Tom Lane
I wrote:
> On the third hand, the lack of previous reports suggests that maybe
> this whole thing is seldom a problem in practice.  Maybe we should
> just use uselocale() where available and otherwise hope it's OK
> to keep on doing what we were doing.

If we go with that approach, I think we need to adapt the patch
as attached.  I autoconfiscated it and fixed a portability problem
(it didn't compile on macOS, which has these decls in ).

I've verified that this fixes the problem I was seeing on OpenBSD 6.4.
I've not bothered to test on a platform lacking uselocale() --- I
think it's clear by inspection that the patch doesn't change anything
in that case.

Not sure if we need to document this or not.  On platforms with
uselocale(), it should fix the problem without any need for user
attention.  On platforms without, there's no change, and given
the lack of previous complaints I'm not sure it's really an issue.

regards, tom lane

diff --git a/configure b/configure
index 7602e65..1e69eda 100755
*** a/configure
--- b/configure
*** fi
*** 15209,15215 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15209,15215 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index d599ad8..556186c 100644
*** a/configure.in
--- b/configure.in
*** AC_CHECK_FUNCS(m4_normalize([
*** 1618,1623 
--- 1618,1624 
  	strsignal
  	symlink
  	sync_file_range
+ 	uselocale
  	utime
  	utimes
  	wcstombs_l
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 9d99816..2c899a1 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 691,696 
--- 691,699 
  /* Define to 1 if the system has the type `unsigned long long int'. */
  #undef HAVE_UNSIGNED_LONG_LONG_INT
  
+ /* Define to 1 if you have the `uselocale' function. */
+ #undef HAVE_USELOCALE
+ 
  /* Define to 1 if you have the `utime' function. */
  #undef HAVE_UTIME
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 8a560ef..3964433 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***
*** 545,550 
--- 545,553 
  /* Define to 1 if you have the `unsetenv' function. */
  /* #undef HAVE_UNSETENV */
  
+ /* Define to 1 if you have the `uselocale' function. */
+ /* #undef HAVE_USELOCALE */
+ 
  /* Define to 1 if you have the `utime' function. */
  #define HAVE_UTIME 1
  
diff --git a/src/interfaces/ecpg/ecpglib/ecpglib_extern.h b/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
index 1c9bce1..22a0557 100644
*** a/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
--- b/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
***
*** 12,17 
--- 12,20 
  #ifndef CHAR_BIT
  #include 
  #endif
+ #ifdef LOCALE_T_IN_XLOCALE
+ #include 
+ #endif
  
  enum COMPAT_MODE
  {
*** struct statement
*** 61,67 
--- 64,75 
  	bool		questionmarks;
  	struct variable *inlist;
  	struct variable *outlist;
+ #ifdef HAVE_USELOCALE
+ 	locale_t	clocale;
+ 	locale_t	oldlocale;
+ #else
  	char	   *oldlocale;
+ #endif
  	int			nparams;
  	char	  **paramvalues;
  	PGresult   *results;
diff --git a/src/interfaces/ecpg/ecpglib/execute.c b/src/interfaces/ecpg/ecpglib/execute.c
index 3f5034e..d3e32d2 100644
*** a/src/interfaces/ecpg/ecpglib/execute.c
--- b/src/interfaces/ecpg/ecpglib/execute.c
*** free_statement(struct statement *stmt)
*** 102,108 
--- 102,113 
  	free_variable(stmt->outlist);
  	ecpg_free(stmt->command);
  	ecpg_free(stmt->name);
+ #ifdef HAVE_USELOCALE
+ 	if (stmt->clocale)
+ 		freelocale(stmt->clocale);
+ #else
  	ecpg_free(stmt->oldlocale);
+ #endif
  	ecpg_free(stmt);
  }
  
*** ecpg_do_prologue(int lineno, const int c
*** 1771,1778 
  
  	/*
  	 * Make sure we do NOT honor the locale for numeric input/output since the
! 	 * database wants the standard decimal point
  	 */
  	stmt->oldlocale = ecpg_strdup(setlocale(LC_NUMERIC, NULL), lineno);
  	if (stmt->oldlocale == NULL)
  	

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-20 Thread Tom Lane
Fabien COELHO  writes:
>>> I'd suggest that maybe we should get rid of the use of both random()
>>> and srandom() in pgbench, and go over to letting set_random_seed()
>>> fill the pg_erand48 state directly.

> Here is a POC which defines an internal interface for a PRNG, and use it 
> within pgbench, with several possible implementations which default to 
> rand48.

I seriously dislike this patch.  pgbench's random support is quite
overengineered already IMO, and this proposes to add a whole batch of
new code and new APIs to fix a very small bug.

> I must admit that I have a grudge against standard rand48:

I think this is nonsense, particularly the claim that anything in PG
cares about the lowest-order bits of random doubles.  I'm aware that
there are applications where that does matter, but people aren't doing
high-precision weather simulations in pgbench.

BTW, did you look at the question of the range of zipfian?  I confirmed
here that as used in the test case, it's generating a range way smaller
than the other ones: repeating the insertion snippet 1000x produces stats
like this:

regression=# select seed,rand,min(val),max(val),count(distinct val) from 
seeded_random group by 1,2 order by 2,1;
seed|rand | min  | max  | count 
+-+--+--+---
 1957482663 | exponential | 2000 | 2993 |   586
 1958556409 | exponential | 2000 | 2995 |   569
 1959867462 | exponential | 2000 | 2997 |   569
 1957482663 | gaussian| 3009 | 3997 |   493
 1958556409 | gaussian| 3027 | 3956 |   501
 1959867462 | gaussian| 3018 | 3960 |   511
 1957482663 | uniform | 1001 | 1999 |   625
 1958556409 | uniform | 1000 | 1999 |   642
 1959867462 | uniform | 1001 | 1999 |   630
 1957482663 | zipfian | 4000 | 4081 |19
 1958556409 | zipfian | 4000 | 4022 |18
 1959867462 | zipfian | 4000 | 4156 |23

I have no idea whether that indicates an actual bug, or just poor
choice of parameter in the test's call.  But the very small number
of distinct outputs is disheartening at least.

regards, tom lane



Postgres doesn't remove useless join when using partial unique index

2019-01-20 Thread Kim Rose Carlsen
Hi

remove_useless_join does not prove uniqueness if the unique index is partial, 
and therefore wont remove the join if no columns are referenced (see example in 
bottom).

I have been trying to look around the source code and from what I have 
identified the problem seems to be that "check_index_predicates(..)" happens 
after "remove_useless_join(..)", and therefore cannot see that the unique index 
is actually covered by the join condition.

>From analyzejoins.c:612, rel_supports_distinctness(..)
  if (ind->unique && ind->immediate &&
   (ind->indpred == NIL || ind->predOK))
   return true;

But the problem is ind->predOK is calculated in check_index_predicates(..) but 
this happens later so ind->predOK is always false when checked here.

I have tried to add check_index_predicates(..) to rel_supports_distinctness(..) 
and this produces the expected plan, but I have no idea of the implication of 
doing check_index_predicates(..) earlier.

This is my first time looking at the postgres source code, so I know attached 
"patch" is not the solution, but any pointers on where to go from here would be 
appreciated.


Example:
CREATE TABLE a (
  id INTEGER PRIMARY KEY,
  sub_id INTEGER NOT NULL,
  deleted_at TIMESTAMP
);
CREATE UNIQUE INDEX ON a (sub_id) WHERE (deleted_at IS NULL);

ANALYZE a;

EXPLAIN SELECT 1 FROM a AS a LEFT JOIN a AS b ON a.id = b.sub_id AND 
b.deleted_at IS NULL;

Expected plan:
 QUERY PLAN
-
 Seq Scan on a  (cost=0.00..28.50 rows=1850 width=4)

Actual plan:
  QUERY PLAN
---
 Hash Left Join  (cost=14.76..48.13 rows=1850 width=4)
   Hash Cond: (a.id = b.sub_id)
   ->  Seq Scan on a  (cost=0.00..28.50 rows=1850 width=4)
   ->  Hash  (cost=14.65..14.65 rows=9 width=4)
 ->  Bitmap Heap Scan on a b  (cost=4.13..14.65 rows=9 width=4)
   Recheck Cond: (deleted_at IS NULL)
   ->  Bitmap Index Scan on a_sub_id_idx  (cost=0.00..4.13 rows=9 
width=0)
(7 rows)


diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 1593dbec21..12da689983 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -596,6 +596,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
return false;
if (rel->rtekind == RTE_RELATION)
{
+   check_index_predicates(root, rel);
/*
 * For a plain relation, we only know how to prove uniqueness by
 * reference to unique indexes.  Make sure there's at least one



Re: Delay locking partitions during INSERT and UPDATE

2019-01-20 Thread David Rowley
On Sat, 19 Jan 2019 at 12:05, John Naylor  wrote:
>
> On 11/22/18, David Rowley  wrote:
> > If required, such operations could LOCK TABLE the top partitioned
> > table to block the DML operation. There's already a risk of similar
> > deadlocks from such operations done on multiple separate tables when
> > the order they're done is not the same as the order the tables are
> > written in a query, although, in that case, the window for the
> > deadlock is likely to be much smaller.
>
> Is this something that would need documentation anywhere?

Probably at least the release notes. I'm unsure where else to mention
it.  I don't feel the workaround of using LOCK TABLE is special to
this case. The patch does, however, make it a possible requirement for
performing DDL on individual partitions where it was not previously.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-20 Thread Fabien COELHO



Hello Tom,


Here is a POC which defines an internal interface for a PRNG, and use it
within pgbench, with several possible implementations which default to
rand48.


I seriously dislike this patch.  pgbench's random support is quite
overengineered already IMO, and this proposes to add a whole batch of
new code and new APIs to fix a very small bug.


My intention is rather to discuss postgres' PRNG, in passing. Full success 
on this point:-)



I must admit that I have a grudge against standard rand48:


I think this is nonsense, particularly the claim that anything in PG
cares about the lowest-order bits of random doubles.  I'm aware that
there are applications where that does matter, but people aren't doing
high-precision weather simulations in pgbench.


Sure. My point is not that it is an actual issue for pgbench, but as the 
same PRNG is used more or less everywhere in postgres, I think that it 
should be a good one rather than a known bad one.


Eg, about double:

  \set i debug(random(1, POWER(2,49)) % 2)

Always return 1 because of the 48 bit precision, i.e. the output is never 
even.


  \set i debug(random(1, POWER(2,48)) % 2)

Return 0 1 0 1 0 1 0 1 0 1 0 1 0 1 ... because it is a LCG.

  \set i debug(random(1, POWER(2,48)) % 4)

Cycles over (3 2 1 0)*

  \set i debug(random(1, power(2, 47)) % 4)

Cycles over (0 0 1 1 2 2 3 3)*, and so on.


BTW, did you look at the question of the range of zipfian?


Yep.

I confirmed here that as used in the test case, it's generating a range 
way smaller than the other ones: repeating the insertion snippet 1000x 
produces stats like this: [...]



I have no idea whether that indicates an actual bug, or just poor
choice of parameter in the test's call.  But the very small number
of distinct outputs is disheartening at least.


Zipf distribution is highly skewed, somehow close to an exponential. To 
reduce the decreasing probability the parameter must be closer to 1, eg 
1.05 or something. However as far as the test is concerned I do not see 
this as a significant issue. I was rather planning to submit a 
documentation improvement to provide more precise hints about how the 
distribution behaves depending on the parameter, and possibly reduce the 
parameter used in the test in passing, but I see this as not very urgent.


--
Fabien.



Re: Postgres doesn't remove useless join when using partial unique index

2019-01-20 Thread David Rowley
On Mon, 21 Jan 2019 at 09:51, Kim Rose Carlsen  wrote:
> remove_useless_join does not prove uniqueness if the unique index is partial, 
> and therefore wont remove the join if no columns are referenced (see example 
> in bottom).
>
> I have been trying to look around the source code and from what I have 
> identified the problem seems to be that "check_index_predicates(..)" happens 
> after "remove_useless_join(..)", and therefore cannot see that the unique 
> index is actually covered by the join condition.

The main reason that join removal happens so early on in planning is
that we want to forego doing as much work as possible on a relation
that that might get removed.

> From analyzejoins.c:612, rel_supports_distinctness(..)
>   if (ind->unique && ind->immediate &&
>(ind->indpred == NIL || ind->predOK))
>return true;

This is really just a precheck to see if there are any unique indexes
which may serve as proof that the join does not duplicate any rows
from the other side of the join. If this fails then the code only
knows not to bother looking any further. If it passes then more work
needs to be done to see if the relation supports distinctness.

> I have tried to add check_index_predicates(..) to 
> rel_supports_distinctness(..) and this produces the expected plan, but I have 
> no idea of the implication of doing check_index_predicates(..) earlier.

Looking at check_index_predicates() it makes use of
root->all_baserels, which only gets set in make_one_rel() which is
well after the join removal is performed. So it does look like there's
a bit of a chicken and egg problem there around which relations to use
in generate_join_implied_equalities(). Moving the
check_index_predicates() call earlier would cause it to miss using
these additional quals completely due to all_baserels being empty. I'm
unsure if join removing a relation that we've found a qual during
generate_join_implied_equalities() in would be very safe. I'm not that
sure if it would even be possible to remove such a relation.  That
would require a bit of research.

Maybe it might be worth thinking about making predOK have 3 possible
values, with the additional one being "Unknown". We could then
consider calling predicate_implied_by() in
relation_has_unique_index_for() for indexes that are unique and
immediate but predOK is still unknown.  That might reduce the
additional work to a level that might be acceptable.  The extra check
could either be done before or after the column matching code in
relation_has_unique_index_for(). I guess either one is equally as
likely to fail, but one may be cheaper than the other to perform.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: PostgreSQL vs SQL/XML Standards

2019-01-20 Thread Chapman Flack
On 01/20/19 12:48, Pavel Stehule wrote:
>>
>> Accordingly, I think the paragraph beginning "Unlike regular PostgreSQL
>> functions," is more likely to perplex readers than to enlighten them.
>> What it says about column_expression does not seem to lead to any useful
>> difference from the behavior if it were "just like regular PostgreSQL
>> functions".
> 
> regular Postgres' functions has evaluated all arguments before own
> execution. I think so this note is related much more to expressions used as
> defaults.

Sure, but again, is there an example, or can one easily be constructed,
that shows the default expressions working in such a way?

I am not able to use a default expression to refer to an earlier
column in the column list of the xmltable call.

I am able to use a default expression to refer to a column of an earlier
FROM item in the enclosing SELECT. But such a query ends up having LATERAL
form (whether or not the word LATERAL is used), and re-executes xmltable
whenever the referenced column value changes. In that case, whether the
default argument is evaluated at function entry or later doesn't seem
to matter: the function is re-executed, so evaluating the new default
at the time of entry is sufficient.

So, I have still not been able to construct a query that requires the
deferred evaluation behavior. But perhaps there is a way I haven't
thought of.

Regards,
-Chap



Re: [PROPOSAL] Shared Ispell dictionaries

2019-01-20 Thread Tomas Vondra
On 1/17/19 3:15 PM, Arthur Zakirov wrote:
> I attached files of new version of the patch, I applied your tweaks.
> 
>> XXX All dictionaries, but only when there's invalid dictionary?
> 
> I've made a little optimization. I introduced hashvalue into
> TSDictionaryCacheEntry. Now released only DSM of altered or dropped
> dictionaries.
> 
>>  > /* XXX not really a pointer, so the name is misleading */
>>
>> I think we don't need DictPointerData struct anymore, because only
>> ts_dict_shmem_release function needs it (see comments above) and we only
>> need it to hash search. I'll move all fields of DictPointerData to
>> TsearchDictKey struct.
> 
> I was wrong, DictInitData also needs DictPointerData. I didn't remove
> DictPointerData, I renamed it to DictEntryData. Hope that it is a more
> appropriate name.
> 

Thanks. I've reviewed v17 today and I haven't discovered any new issues
so far. If everything goes fine and no one protests, I plan to get it
committed over the next week or so.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PROPOSAL] Shared Ispell dictionaries

2019-01-20 Thread Andres Freund
On 2019-01-20 23:15:35 +0100, Tomas Vondra wrote:
> On 1/17/19 3:15 PM, Arthur Zakirov wrote:
> > I attached files of new version of the patch, I applied your tweaks.
> > 
> >> XXX All dictionaries, but only when there's invalid dictionary?
> > 
> > I've made a little optimization. I introduced hashvalue into
> > TSDictionaryCacheEntry. Now released only DSM of altered or dropped
> > dictionaries.
> > 
> >>  > /* XXX not really a pointer, so the name is misleading */
> >>
> >> I think we don't need DictPointerData struct anymore, because only
> >> ts_dict_shmem_release function needs it (see comments above) and we only
> >> need it to hash search. I'll move all fields of DictPointerData to
> >> TsearchDictKey struct.
> > 
> > I was wrong, DictInitData also needs DictPointerData. I didn't remove
> > DictPointerData, I renamed it to DictEntryData. Hope that it is a more
> > appropriate name.
> > 
> 
> Thanks. I've reviewed v17 today and I haven't discovered any new issues
> so far. If everything goes fine and no one protests, I plan to get it
> committed over the next week or so.

There doesn't seem to be any docs about what's needed to be able to take
advantage of shared dicts, and how to prevent them from permanently
taking up a significant share of memory.

Greetings,

Andres Freund



Re: jsonpath

2019-01-20 Thread Alexander Korotkov
On Sun, Jan 20, 2019 at 6:30 AM Alexander Korotkov
 wrote:
> I'll continue revising this patchset.  Nikita, could you please write
> tests for 3-argument versions of functions?  Also, please answer the
> question regarding "id" property.

I've some more notes regarding function set provided in jsonpath patch:
1) Function names don't follow the convention we have.  All our
functions dealing with jsonb have "jsonb_" prefix.  Exclusions have
"jsonb" in other part of name, for example, "to_jsonb(anyelement)".
We could name functions at SQL level in the same way they are named in
C.  So, they would be jsonb_jsonpath_exists() etc.  But it's probably
too long.  What about jsonb_path_exists() etc?
2) jsonpath_query_wrapped() name seems counter intuitive for me.  What
about jsonpath_query_array()?
3) jsonpath_query_wrapped() behavior looks tricky to me.  It returns
NULL for no results.  When result item is one, it is returned "as is".
When there are two or more result items, they are wrapped into array.
This representation of result seems extremely inconvenient for me.  It
becomes hard to solve even trivial tasks with that: for instance,
iterate all items found.  Without extra assumptions on what is going
to be returned it's also impossible for caller to distinguish single
array item found from multiple items found wrapped into array.  And
that seems very bad.  I know this behavior is inspired by SQL/JSON
standard.  But since these functions are anyway our extension, we can
implement them as we like.  So, I propose to make this function always
return array of items regardless on count of those items (even for 1
or 0 items).
4) If we change behavior of jsonpath_query_wrapped() as above, we can
also implement jsonpath_query_one() function, which would return first
result item or NULL for no items.
Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Libpq support to connect to standby server as priority

2019-01-20 Thread Haribabu Kommi
On Fri, Jan 18, 2019 at 5:33 PM Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com]
> > As for prefer-standby/prefer-read, if host parameter specifies
> host1,host2
> > in this order, and host1 is the primary with
> > default_transaction_read_only=true, does the app get a connection to
> host1?
> > I want to connect to host2 (standby) only if host1 is down.
>
> Oops, reverse -- I wanted to say "I want to connect to host1 (primary)
> only if host2 is down."
>

Thanks for finding out the problem, how about the following way of checking
for prefer-read/prefer-standby.

1. (default_transaction_read_only = true and recovery = true)
2. If none of the host satisfies the above scenario, then recovery = true
3. Last check is for default_transaction_read_only = true

Regards,
Haribabu Kommi
Fujitsu Australia


Re: Pluggable Storage - Andres's take

2019-01-20 Thread Haribabu Kommi
On Tue, Jan 15, 2019 at 6:05 PM Andres Freund  wrote:

> Hi,
>
> On 2019-01-15 18:02:38 +1100, Haribabu Kommi wrote:
> > On Tue, Dec 11, 2018 at 1:13 PM Andres Freund 
> wrote:
> >
> > > Hi,
> > >
> > > On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
> > > Further tasks I'm not yet planning to tackle, that I'd welcome help on:
> > > - pg_upgrade testing
> > >
> >
> > I did the pg_upgrade testing from older version with some tables and
> views
> > exists,  and all of them are properly transformed into new server with
> heap
> > as the default access method.
> >
> > I will add the dimitry pg_dump patch and test the pg_upgrade to confirm
> > the proper access method is retained on the upgraded database.
> >
> >
> >
> > > - I think we should consider removing HeapTuple->t_tableOid, it should
> > >   imo live entirely in the slot
> > >
> >
> > I removed the t_tableOid from HeapTuple and during testing I found some
> > problems with triggers, will post the patch once it is fixed.
>
>
> Please note that I'm working on a heavily revised version of the patch
> right now, trying to clean up a lot of things (you might have seen some
> of the threads I started). I hope to post it ~Thursday.  Local-ish
> patches shouldn't be a problem though.
>

Yes, I am checking you other threads of refactoring and cleanups.
I will rebase this patch once the revised code is available.

I am not able to remove the complete t_tableOid from HeapTuple,
because of its use in triggers, as the slot is not available in triggers
and I need to store the tableOid also as part of the tuple.

Currently setting of t_tableOid is done only when the tuple is formed
from the slot, and it is use is replaced with slot member.

comments?

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Reduce-the-use-of-HeapTuple-t_tableOid.patch
Description: Binary data


Re: [PROPOSAL] Shared Ispell dictionaries

2019-01-20 Thread Tomas Vondra
On 1/20/19 11:21 PM, Andres Freund wrote:
> On 2019-01-20 23:15:35 +0100, Tomas Vondra wrote:
>> On 1/17/19 3:15 PM, Arthur Zakirov wrote:
>>> I attached files of new version of the patch, I applied your tweaks.
>>>
 XXX All dictionaries, but only when there's invalid dictionary?
>>>
>>> I've made a little optimization. I introduced hashvalue into
>>> TSDictionaryCacheEntry. Now released only DSM of altered or dropped
>>> dictionaries.
>>>
   > /* XXX not really a pointer, so the name is misleading */

 I think we don't need DictPointerData struct anymore, because only
 ts_dict_shmem_release function needs it (see comments above) and we only
 need it to hash search. I'll move all fields of DictPointerData to
 TsearchDictKey struct.
>>>
>>> I was wrong, DictInitData also needs DictPointerData. I didn't remove
>>> DictPointerData, I renamed it to DictEntryData. Hope that it is a more
>>> appropriate name.
>>>
>>
>> Thanks. I've reviewed v17 today and I haven't discovered any new issues
>> so far. If everything goes fine and no one protests, I plan to get it
>> committed over the next week or so.
> 
> There doesn't seem to be any docs about what's needed to be able to take
> advantage of shared dicts, and how to prevent them from permanently
> taking up a significant share of memory.
> 

Yeah, those are good points. I agree the comments might be clearer, but
essentially ispell dictionaries are shared and everything else is not.

As for the memory consumption / unloading dicts - I agree that's
something we need to address. There used to be a way to specify memory
limit and ability to unload dictionaries explicitly, but both features
have been ditched. The assumption was that UNLOAD would be introduced
later, but that does not seem to have happened.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Allowing extensions to supply operator-/function-specific info

2019-01-20 Thread Tom Lane
Over in the thread at [1], we realized that PostGIS has been thrashing
around trying to fake its way to having "special index operators", ie
a way to automatically convert WHERE clauses into lossy index quals.
That's existed in a non-extensible way inside indxpath.c for twenty
years come July.  Since the beginning I've thought we should provide
a way for extensions to do similar things, but it never got to the top
of the to-do queue.  Now I think it's time.

One low-effort answer is to add a hook call in indxpath.c that lets
extensions manipulate the sets of index clauses extracted from a
relation's qual clauses, but I don't especially like that: it dumps
all the work onto extensions, resulting in lots of code duplication,
plus they have a badly-documented and probably moving target for what
they have to do.

Another bit of technical debt that's even older is the lack of a way
to attach selectivity estimation logic to boolean-returning functions.
So that motivates me to think that whatever we do here should be easily
extensible to allow different sorts of function- or operator-related
knowledge to be supplied by extensions.  We already have oprrest,
oprjoin, and protransform hooks that allow certain kinds of knowledge
to be attached to operators and functions, but we need something a bit
more general.

What I'm envisioning therefore is that we allow an auxiliary function to
be attached to any operator or function that can provide functionality
like this, and that we set things up so that the set of tasks that
such functions can perform can be extended over time without SQL-level
changes.  For example, we could say that the function takes a single
Node* argument, and that the type of Node tells it what to do, and if it
doesn't recognize the type of Node it should just return NULL indicating
"use default handling".  We'd start out with two relevant Node types,
one for the selectivity-estimation case and one for the extract-a-lossy-
index-qual case, and we could add more over time.

What we can do to attach such a support function to a target function
is to repurpose the pg_proc.protransform column to represent the
support function.  The existing protransform functions already have
nearly the sort of API I'm thinking about, but they only accept
FuncExpr* not any other node type.  It'd be easy to change them
though, because there's only about a dozen and they are all in core;
we never invented any way for extensions to access that functionality.
(So actually, the initial API spec here would include three
possibilities, the third one being equivalent to the current
protransform behavior.)

As for attaching support functions to operators, we could
consider widening the pg_operator catalog to add a new column.
But I think it might be a cleaner answer to just say "use the support
function attached to the operator's implementation function,
if there is one".  This would require that the support functions
be able to cope with either FuncExpr or OpExpr inputs, but that
does not seem like much of a burden as long as it's part of the
API spec from day one.

Since there isn't any SQL API for attaching support functions,
we'd have to add one, but adding another clause to CREATE FUNCTION
isn't all that hard.  (Annoyingly, we haven't created any cheaply
extensible syntax for CREATE FUNCTION, so this'd likely require
adding another keyword.  I'm not interested in doing more than
that right now, though.)

I'd be inclined to rename pg_proc.protransform to "prosupport"
to reflect its wider responsibility, and make the new CREATE FUNCTION
clause be "SUPPORT FUNCTION foo" or some such.  I'm not wedded
to that terminology, if anyone has a better idea.

One thing that's not entirely clear to me is what permissions would be
required to use that clause.  The support functions will have signature
"f(internal) returns internal", so creating them at all will require
superuser privilege, but it seems like we probably also need to restrict
the ability to attach one to a target function --- attaching one to
the wrong function could probably have bad consequences.  The easy way
out is to say "you must be superuser"; maybe that's enough for now,
since all the plausible use-cases for this are in extensions containing
C functions anyway.  (A support function would have to be coded in C,
although it seems possible that its target function could be something
else.)

Thoughts?  If we have agreement on this basic design, making it happen
seems like a pretty straightforward task.

regards, tom lane

PS: there is, however, a stumbling block that I'll address in a separate
message, as it seems independent of this infrastructure.

[1] 
https://www.postgresql.org/message-id/flat/CACowWR0TXXL0NfPMW2afCKzX++nHHBZLW3-BLusu_B0WjBB1=a...@mail.gmail.com



Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread Tom Lane
In [1] I propose that we should allow extensions to get their hands
on the ability to transform function calls as per "protransform" and
to generate lossy index quals based on items in WHERE clauses.  The
APIs to give an extension control at the right points seem pretty
straightforward, but there's a problem standing in the way of actually
doing anything useful once you've got control.  In order to either
analyze a passed-in clause or generate a new one, the extension will
need to know the OIDs of the functions/operators it's working with.
And extension objects don't have fixed OIDs.

In principle this could be dealt with by looking up said OIDs from
the catalogs, but that's (1) complicated, (2) slow, (3) prone to
possibly-security-sensitive mistakes such as omitting a schema
specification, and (4) at risk of getting broken entirely by
user-accessible changes such as ALTER FUNCTION RENAME.  Point (2) can
be alleviated by caching, but that just makes (1) even worse, plus
there are lots of ways to do caching wrong.

I thought about extending the extension infrastructure to provide
some way of retrieving relevant OIDs.  We could imagine, for instance,
that an extension script has a way to say "this function is object
number three within this extension", and while running the script
we make a catalog entry showing that object number three has OID
thus-and-so, and then that catalog entry can be consulted to get
the right OID (by C code that has hard-wired knowledge that object
number three is the function it cares about).  This is still kind
of messy, because aside from the hand-assigned object numbers
you'd have to use the extension name as part of the lookup key,
making the name into something the C code critically depends on.
We don't have ALTER EXTENSION RENAME, so maybe that's okay, but
it seems painful to say that we can never have it.

In the end it seems like possibly the cleanest answer is to change
things so that extensions *can* have fixed OIDs that their C code can
know, eliminating lookup costs and allowing coding conventions for this
sort of work to be the same as in the core backend.  We could raise
FirstNormalObjectId to create some unused OID space that we could then
assign chunks of to specific extensions on-request.  This is problematic
for relations, types, and roles, because pg_upgrade wants to preserve
OIDs of those objects across upgrades, so we could not ensure that the
"unused" space is free of such objects.  But it would work for all
other object types, and I think that it might well be sufficient if
an extension can have fixed OIDs for its functions, operators, and
opfamilies/opclasses.  (At need, it could find out the OID for one of
its types by looking up the argument or result types for one of its
functions.)

There are various places in pg_upgrade and postgres_fdw that
assume more than they perhaps should about the significance of
FirstNormalObjectId, but I think that that could be dealt with.

A larger issue is whether "hand out some OIDs on-demand" is a
sustainable strategy.  I think that it is, if we encourage extensions
to assign fixed OIDs only to objects they really need to.  In thirty-ish
years of core PG development, we've only used up ~4200 fixed OIDs,
and a lot of those are for functions that probably don't really need
fixed OIDs but got one because we give one to every built-in function.
However, if there's a big land rush to claim large chunks of OIDs,
we might have a problem.

We'd have to invent some SQL syntax whereby extension scripts can
actually apply their assigned OIDs to their objects.  I'm not very
enthused about adding an "OID nnn" option to every type of CREATE
command that might need this.  A quick-and-dirty answer is to create
support functions similar to binary_upgrade_set_next_pg_type_oid()
that set the OID to give to the next-created object of each category
we need to support.

There are various issues and bits of work around this, but the only one
that I've thought of that I haven't got an answer for is "how should
an extension upgrade script assign a fixed OID to an object that already
existed in the previous extension version, but without a fixed OID?".
We can't just change the recorded OID because that'll break
dependencies, view references, etc.  Conceivably we could write code
that runs through the catalogs and changes all references, but man
that'd be a mess.  Anyone have another idea?

Another question is whether you need any special permissions to assign
a fixed OID in this way.  The most conservative answer is to require
superuserness, which might be enough because the plausible use-cases
for fixed OIDs involve C code, which you'd need to be superuser to
install anyhow.  But it seems overkill somehow.  OTOH, it would be
annoying if a random user could eat up a "reserved" OID that later
prevented someone from installing an extension they wanted.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/15

Re: Changing SQL Inlining Behaviour (or...?)

2019-01-20 Thread Tom Lane
I wrote:
> I'll try to sketch up a more concrete plan soon.

I've posted some preliminary design ideas at

https://www.postgresql.org/message-id/15193.1548028...@sss.pgh.pa.us
and
https://www.postgresql.org/message-id/15289.1548028...@sss.pgh.pa.us

While there's a nontrivial amount of work needed to make that happen,
I think it's doable, and it would lead to a significantly better
solution than proceeding along the inlining path could do.  My
current feeling, therefore, is that we should reject this patch
(or at least stick it in the deep freeze) and go work on that plan.

regards, tom lane



RE: Delay locking partitions during INSERT and UPDATE

2019-01-20 Thread Kato, Sho
on Fri, 18 2019 at 19:41, David Rowley  wrote:
>Perhaps you're running with plan_cache_mode = force_custom_plan.
>You'll likely need it set to auto for these to pass.

Thank you.
I was running with plan_cache_mode = force_custom_plan.
When it set to auto, all tests are passed.

regards,

sho kato
> -Original Message-
> From: David Rowley [mailto:david.row...@2ndquadrant.com]
> Sent: Friday, January 18, 2019 7:41 PM
> To: Kato, Sho/加藤 翔 
> Cc: PostgreSQL Hackers ; David
> Rowley 
> Subject: Re: Delay locking partitions during INSERT and UPDATE
> 
> On Fri, 18 Jan 2019 at 19:08, sho kato  wrote:
> > I confirmed that this patch improve performance by 10 times or more.
> 
> Thanks for testing this out.
> 
> > Also, I did make installcheck world, but test partition_prune failed.
> > However, this test case failed even before applying a patch, so this
> patch is not a problem.
> > One of the results is as follows.
> >
> >  explain (analyze, costs off, summary off, timing off) execute ab_q1
> (2, 2, 3);
> > -   QUERY PLAN
> > --
> > +  QUERY PLAN
> > +--
> >   Append (actual rows=0 loops=1)
> > -   Subplans Removed: 6
> > ->  Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
> > - Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
> > + Filter: ((a >= 2) AND (a <= 2) AND (b <= 3))
> > ->  Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
> > - Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
> > + Filter: ((a >= 2) AND (a <= 2) AND (b <= 3))
> > ->  Seq Scan on ab_a2_b3 (actual rows=0 loops=1)
> > - Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
> > -(8 rows)
> > + Filter: ((a >= 2) AND (a <= 2) AND (b <= 3))
> > +(7 rows)
> 
> Perhaps you're running with plan_cache_mode = force_custom_plan.
> You'll likely need it set to auto for these to pass.
> 
> 
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> 



Re: Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> A larger issue is whether "hand out some OIDs on-demand" is a
 Tom> sustainable strategy.

No.

Not for any concerns over availability of oids, but simply from the fact
that we have no business whatsoever inserting ourselves into the
extension development process in this way.

-- 
Andrew (irc:RhodiumToad)



Re: Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread Chapman Flack
On 01/20/19 18:50, Tom Lane wrote:
> we make a catalog entry showing that object number three has OID
> thus-and-so, and then that catalog entry can be consulted to get
> the right OID (by C code that has hard-wired knowledge that object
> number three is the function it cares about).  This is still kind
> of messy, because aside from the hand-assigned object numbers
> you'd have to use the extension name as part of the lookup key,
> making the name into something the C code critically depends on.
> We don't have ALTER EXTENSION RENAME, so maybe that's okay, but
> it seems painful to say that we can never have it.

An extension *has* an OID, doesn't it? pg_extension has 'em.

If the extension script could somehow be informed at CREATE EXTENSION time
of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no?

Somehow, I find this first idea more aesthetically appealing than
actually trying to bind things in extensions to fixed OIDs for all time.

-Chap



Re: Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> A larger issue is whether "hand out some OIDs on-demand" is a
>  Tom> sustainable strategy.

> No.

> Not for any concerns over availability of oids, but simply from the fact
> that we have no business whatsoever inserting ourselves into the
> extension development process in this way.

I'm not exactly following this concern.  I wasn't imagining that we'd
assign each individual OID ourselves, but rather give out blocks of OIDs.
Admittedly, the blocks can't be huge, but it doesn't seem to me that
this'd create an impossible burden for either us or extension developers.

We could also reserve some range of OIDs for "local extensions", whereby
people who didn't intend to publish their extensions for widespread use
could just use some of those OIDs rather than having to ask for a
public assignment.

regards, tom lane



Re: Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread Tom Lane
Chapman Flack  writes:
> On 01/20/19 18:50, Tom Lane wrote:
>> we make a catalog entry showing that object number three has OID
>> thus-and-so, and then that catalog entry can be consulted to get
>> the right OID (by C code that has hard-wired knowledge that object
>> number three is the function it cares about).  This is still kind
>> of messy, because aside from the hand-assigned object numbers
>> you'd have to use the extension name as part of the lookup key,
>> making the name into something the C code critically depends on.
>> We don't have ALTER EXTENSION RENAME, so maybe that's okay, but
>> it seems painful to say that we can never have it.

> An extension *has* an OID, doesn't it? pg_extension has 'em.

Sure.

> If the extension script could somehow be informed at CREATE EXTENSION time
> of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no?

And it remembers that where?

> Somehow, I find this first idea more aesthetically appealing than
> actually trying to bind things in extensions to fixed OIDs for all time.

I don't find it appealing particularly, but at least it hasn't got
any insurmountable-looking problems --- other than the "you can't
rename your extension" one.  If we can't make the fixed-OIDs approach
work, this might be a workable second choice.

regards, tom lane



RE: speeding up planning with partitions

2019-01-20 Thread Imai, Yoshikazu
Hi Amit-san,

On Wed, Jan 17, 2019 at 10:25 AM, Amit Langote wrote:
> Thank you Imai-san for testing.  Sorry it totally slipped my mind to reply to 
> this email.

Thanks for replying and sorry for my late reply. I've been undertaking 
on-the-job training last week.

> Are you saying that, when using auto mode, all executions of the query
> starting from 7th are slower than the first 5 executions?  That is, the
> latency of creating and using a custom plan increases *after* a generic
> plan is created and discarded on the 6th execution of the query?  If so,
> that is inexplicable to me.

Yes. And it's also inexplicable to me.

I'll check if this fact is really correct by majoring the time of the
first 5 queries before generic plan is created and the other queries
after generic plan is created.

Yoshikazu Imai 



Re: Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> I'm not exactly following this concern. I wasn't imagining that
 Tom> we'd assign each individual OID ourselves, but rather give out
 Tom> blocks of OIDs. Admittedly, the blocks can't be huge, but it
 Tom> doesn't seem to me that this'd create an impossible burden for
 Tom> either us or extension developers.

Even that's not acceptable. There is no reason why someone should not be
able to create extensions freely without us ever knowing about them or
needing to.

In fact I suggest that "there shall be no registries of third parties"
be made a formal project policy.

 Tom> We could also reserve some range of OIDs for "local extensions",
 Tom> whereby people who didn't intend to publish their extensions for
 Tom> widespread use could just use some of those OIDs rather than
 Tom> having to ask for a public assignment.

That's not acceptable either; local extensions have a way of becoming
global.

Seriously, this whole idea is a lazy hack. Fixed assignments? really?

-- 
Andrew (irc:RhodiumToad)



RE: speeding up planning with partitions

2019-01-20 Thread Imai, Yoshikazu
Tsunakawa-san

On Thu, Jan 18, 2019 at 5:29 AM, Amit Langote wrote:
> On 2019/01/18 14:12, Tsunakawa, Takayuki wrote:
...
> > Isn't CheckCachedPlan() (and AcquireExecutorLocks() therein) called
> in every EXECUTE after 6th one due to some unknow issue?
> 
> CheckCachedPlan is only called if choose_custom_plan() returns false
> resulting in generic plan being created/reused.  With plan_cache_mode
> = auto, I see it always returns true, because a custom plan containing
> a single partition to scan is way cheaper than the generic plan.
> 
> > Does choose_custom_plan() always return true after 6th EXECUTE?
> 
> Yes.

Yes.
I also checked choose_custom_plan() always returns true excluding 6th EXECUTE
and CheckCachedPlan() is only called at 6th EXECUTE.

Yoshikazu Imai 



Re: Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread Tom Lane
Andrew Gierth  writes:
> In fact I suggest that "there shall be no registries of third parties"
> be made a formal project policy.

You're a decade or two too late for that; see pg_statistic.h.

In any case, it's not like this issue applies to every extension anybody
might want to make.  Only quite advanced extensions would have any need
for the features that known-at-compile-time OIDs would help with, as shown
by the fact that everyone's gotten by without them so far.  And people who
have a reason to fly under the radar could always stick with the method of
doing object-name-based runtime lookups.

I also note that multiple people have asked for extensions to have stable
OIDs for other reasons.  Unfortunately, the most common reason is so that
client apps could hard-wire knowledge about type OIDs they see in query
results, and my proposal excludes being able to do that :-(.  But it's
not like nobody has wanted publicly-assigned OIDs before.

There may well be good technical reasons why we shouldn't go this route
(the extension upgrade problem in particular).  But your objection seems
basically political and I reject it as a valid argument.

> Seriously, this whole idea is a lazy hack. Fixed assignments? really?

Hardly lazy.  It's the most difficult approach (from our standpoint)
of the three I mentioned; but the flip side of that is it takes the
least work, and produces the most efficient code, for extension
developers.

regards, tom lane



Re: Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread Chapman Flack
On 01/20/19 19:43, Tom Lane wrote:
>> If the extension script could somehow be informed at CREATE EXTENSION time
>> of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no?
> 
> And it remembers that where?

Top level answer: up to the extension author.

Next level answer: maybe not all extensions have libraries to load,
but for those that do (a good portion), wouldn't it be convenient
for _PG_init() to get the value, either passed as an argument, or
through some API?

After the extension is created, loading of the library is going to be
occasioned through the call of some function, right? That function
just got looked up, and it has an 'e' dependency recorded on the
extension, giving the extension OID.

I can't judge whether that's too much lookup action for the library
load machinery to be doing; how frequent are library loads, and how
much would that add to the cycles they already require?

Regards,
-Chap



Re: PostgreSQL vs SQL/XML Standards

2019-01-20 Thread Chapman Flack
On 01/20/19 17:13, Chapman Flack wrote:
> On 01/20/19 12:48, Pavel Stehule wrote:
>> regular Postgres' functions has evaluated all arguments before own
>> execution. I think so this note is related much more to expressions used as
>> defaults.
> 
> Sure, but again, is there an example, or can one easily be constructed,
> that shows the default expressions working in such a way?

To make my question more concrete, here is the current regression test
query that uses an SQL expression for a default:

SELECT xmltable.*
FROM
  xmltest2,
  xmltable(('/d/r/' || lower(_path) || 'c')
PASSING x
COLUMNS a int PATH 'x' DEFAULT ascii(_path) - 54);
 a

 11
 12
 13
 14
(4 rows)


Here is the same query desugared into a call of the PL/Java "xmltable":

SELECT xmltable.*
FROM
  xmltest2,
  LATERAL (SELECT x AS ".",   ascii(_path) - 54 AS "A_DFT") AS p,
  "xmltable"(('/d/r/' || lower(_path) || 'c'),
PASSING => p,
COLUMNS => ARRAY[
  'let $a := x return xs:int(if (exists($a)) then $a else $A_DFT)'
]
  ) AS (a int);
 a

 11
 12
 13
 14
(4 rows)


So the PL/Java version works and produces the same results. And yet
it certainly is a "regular PostgreSQL function" made with CREATE FUNCTION,
no special treatment of arguments, all evaluated before entry in the usual
way.

So it appears that this example does not depend on any special treatment
of the default_expression.

Is there an example that can be constructed that would depend on the
special treatment (in which case, the PL/Java implementation would be
unable to produce the same result)?

Regards,
-Chap



... the xs:int(...) cast above is needed for now, just because the PL/Java
implementation does not yet include the standard's algorithm to find
the right cast automatically.



Re: COPY FROM WHEN condition

2019-01-20 Thread Andres Freund
Hi,

On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> On 1/14/19 10:25 PM, Tomas Vondra wrote:
> > On 12/13/18 8:09 AM, Surafel Temesgen wrote:
> >>
> >>
> >> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
> >> mailto:tomas.von...@2ndquadrant.com>> wrote:
> >>
> >>
> >> Can you also update the docs to mention that the functions called from
> >> the WHERE clause does not see effects of the COPY itself?
> >>
> >>
> >> /Of course, i  also add same comment to insertion method selection
> >> /
> > 
> > FWIW I've marked this as RFC and plan to get it committed this week.
> > 
> 
> Pushed, thanks for the patch.

While rebasing the pluggable storage patch ontop of this I noticed that
the qual appears to be evaluated in query context. Isn't that a bad
idea? ISMT it should have been evaluated a few lines above, before the:

/* Triggers and stuff need to be invoked in query context. */
MemoryContextSwitchTo(oldcontext);

Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?

Greetings,

Andres Freund



Re: Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread Tom Lane
Chapman Flack  writes:
> On 01/20/19 19:43, Tom Lane wrote:
>>> If the extension script could somehow be informed at CREATE EXTENSION time
>>> of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no?

>> And it remembers that where?

> Top level answer: up to the extension author.

That isn't an answer, it's an admission of failure.

> Next level answer: maybe not all extensions have libraries to load,
> but for those that do (a good portion), wouldn't it be convenient
> for _PG_init() to get the value, either passed as an argument, or
> through some API?

There's no hard connection between libraries and extensions though.
In fact, right now there's no connection at all.

> After the extension is created, loading of the library is going to be
> occasioned through the call of some function, right?

There's LOAD, and there's also the possibility that the library supports
multiple extensions, or that some of its functions don't belong to an
extension.

A notable problem here is that at the point where the library (probably)
first gets loaded during CREATE EXTENSION, the extension doesn't exist
yet; or even if it does, the pg_depend entry linking the to-be-created
function certainly doesn't.

It is possible that an extension function could chase its 'e' dependency
when called (*not* when being loaded) to find out the OID of its
extension, but frankly I don't see typical extension authors doing that,
even if it didn't have failure cases.

Actually, behavior during CREATE EXTENSION seems like a bit of a problem
for the whole mapping idea --- partway through the script, you'd certainly
not know all the object IDs, so there would not be a complete map
available if one of the extension's functions gets called during that
script.  This could be worked around, but it makes things more complicated
for extension authors than I'd envisioned at first.  They can't just
assume that they have all the object OIDs available.  For the use-cases
described so far, it seems like it'd be OK to just fall back to doing
nothing if the map isn't ready yet, but we'd have to keep that restriction
in mind while defining future call scenarios.

regards, tom lane



Re: speeding up planning with partitions

2019-01-20 Thread David Rowley
On Mon, 21 Jan 2019 at 13:45, Imai, Yoshikazu
 wrote:
> On Wed, Jan 17, 2019 at 10:25 AM, Amit Langote wrote:
> > Are you saying that, when using auto mode, all executions of the query
> > starting from 7th are slower than the first 5 executions?  That is, the
> > latency of creating and using a custom plan increases *after* a generic
> > plan is created and discarded on the 6th execution of the query?  If so,
> > that is inexplicable to me.
>
> Yes. And it's also inexplicable to me.
>
> I'll check if this fact is really correct by majoring the time of the
> first 5 queries before generic plan is created and the other queries
> after generic plan is created.

It would be interesting to see the profiles of having the generic plan
being built on the 6th execution vs the 400,000th execution.

I'd thought maybe one difference would be the relcache hash table
having been expanded greatly after the generic plan was created, but I
see even the generic plan is selecting a random partition, so the
cache would have ended up with that many items eventually anyway, and
since we're talking in the odds of 7.8k TPS with 4k partitions, it
would have only have taken about 2-3 seconds out of the 60 second run
to hit most or all of those partitions anyway. So I doubt it could be
that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] pgbench tap tests fail if the path contains a perl special character

2019-01-20 Thread Tom Lane
=?UTF-8?B?UmHDumwgTWFyw61uIFJvZHLDrWd1ZXo=?=  writes:
>> Pushed with that correction.
> Thanks a lot!

Hm, so bowerbird (a Windows machine) has been failing the pgbench tests
since this went in, cf

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2019-01-20%2004%3A57%3A01

I'm just guessing, but I suspect that bowerbird is using a path spec that
includes a backslash directory separator and that's somehow bollixing
things.  If so, we might be able to fix it by converting backslashes to
forward slashes before applying quotemeta().

It's also possible that on Windows, "glob" handles backslashes
differently than it does elsewhere, which would be harder to fix
--- that would bring back my original fear that the appropriate
quoting rules are different for "glob" than for regexes.

Andrew, any insight?

regards, tom lane



Re: COPY FROM WHEN condition

2019-01-20 Thread Tomas Vondra




On 1/20/19 8:24 PM, Andres Freund wrote:

Hi,

On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:

On 1/14/19 10:25 PM, Tomas Vondra wrote:

On 12/13/18 8:09 AM, Surafel Temesgen wrote:



On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:


 Can you also update the docs to mention that the functions called from
 the WHERE clause does not see effects of the COPY itself?


/Of course, i  also add same comment to insertion method selection
/


FWIW I've marked this as RFC and plan to get it committed this week.



Pushed, thanks for the patch.


While rebasing the pluggable storage patch ontop of this I noticed that
the qual appears to be evaluated in query context. Isn't that a bad
idea? ISMT it should have been evaluated a few lines above, before the:

/* Triggers and stuff need to be invoked in query context. */
MemoryContextSwitchTo(oldcontext);

Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?



Yes, I agree. It's a bit too late for me to hack and push stuff, but 
I'll fix that tomorrow.


cheers

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: COPY FROM WHEN condition

2019-01-20 Thread Andres Freund
On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
> 
> 
> On 1/20/19 8:24 PM, Andres Freund wrote:
> > Hi,
> > 
> > On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> > > On 1/14/19 10:25 PM, Tomas Vondra wrote:
> > > > On 12/13/18 8:09 AM, Surafel Temesgen wrote:
> > > > > 
> > > > > 
> > > > > On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
> > > > > mailto:tomas.von...@2ndquadrant.com>> 
> > > > > wrote:
> > > > > 
> > > > > 
> > > > >  Can you also update the docs to mention that the functions 
> > > > > called from
> > > > >  the WHERE clause does not see effects of the COPY itself?
> > > > > 
> > > > > 
> > > > > /Of course, i  also add same comment to insertion method selection
> > > > > /
> > > > 
> > > > FWIW I've marked this as RFC and plan to get it committed this week.
> > > > 
> > > 
> > > Pushed, thanks for the patch.
> > 
> > While rebasing the pluggable storage patch ontop of this I noticed that
> > the qual appears to be evaluated in query context. Isn't that a bad
> > idea? ISMT it should have been evaluated a few lines above, before the:
> > 
> > /* Triggers and stuff need to be invoked in query context. */
> > MemoryContextSwitchTo(oldcontext);
> > 
> > Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
> > 
> 
> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
> fix that tomorrow.

NP. On second thought, the problem is probably smaller than I thought at
first, because ExecQual() switches to the econtext's per-tuple memory
context. But it's only reset once for each batch, so there's some
wastage. At least worth a comment.

Greetings,

Andres Freund



Re: Allowing extensions to find out the OIDs of their member objects

2019-01-20 Thread David Fetter
On Mon, Jan 21, 2019 at 12:25:16AM +, Andrew Gierth wrote:
> > "Tom" == Tom Lane  writes:
> 
>  Tom> A larger issue is whether "hand out some OIDs on-demand" is a
>  Tom> sustainable strategy.
> 
> No.
> 
> Not for any concerns over availability of oids, but simply from the fact
> that we have no business whatsoever inserting ourselves into the
> extension development process in this way.

+1 for keeping our nose out of this business.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: COPY FROM WHEN condition

2019-01-20 Thread Andres Freund
On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
> > 
> > 
> > On 1/20/19 8:24 PM, Andres Freund wrote:
> > > Hi,
> > > 
> > > On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> > > > On 1/14/19 10:25 PM, Tomas Vondra wrote:
> > > > > On 12/13/18 8:09 AM, Surafel Temesgen wrote:
> > > > > > 
> > > > > > 
> > > > > > On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
> > > > > >  > > > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > >  Can you also update the docs to mention that the functions 
> > > > > > called from
> > > > > >  the WHERE clause does not see effects of the COPY itself?
> > > > > > 
> > > > > > 
> > > > > > /Of course, i  also add same comment to insertion method selection
> > > > > > /
> > > > > 
> > > > > FWIW I've marked this as RFC and plan to get it committed this week.
> > > > > 
> > > > 
> > > > Pushed, thanks for the patch.
> > > 
> > > While rebasing the pluggable storage patch ontop of this I noticed that
> > > the qual appears to be evaluated in query context. Isn't that a bad
> > > idea? ISMT it should have been evaluated a few lines above, before the:
> > > 
> > >   /* Triggers and stuff need to be invoked in query context. */
> > >   MemoryContextSwitchTo(oldcontext);
> > > 
> > > Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
> > > 
> > 
> > Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
> > fix that tomorrow.
> 
> NP. On second thought, the problem is probably smaller than I thought at
> first, because ExecQual() switches to the econtext's per-tuple memory
> context. But it's only reset once for each batch, so there's some
> wastage. At least worth a comment.

I'm tired, but perhaps its actually worse - what's being reset currently
is the ESTate's per-tuple context:

if (nBufferedTuples == 0)
{
/*
 * Reset the per-tuple exprcontext. We can only do this 
if the
 * tuple buffer is empty. (Calling the context the 
per-tuple
 * memory context is a bit of a misnomer now.)
 */
ResetPerTupleExprContext(estate);
}

but the quals are evaluated in the ExprContext's:

ExecQual(ExprState *state, ExprContext *econtext)
...
ret = ExecEvalExprSwitchContext(state, econtext, &isnull);


which is created with:

/* Get an EState's per-output-tuple exprcontext, making it if first use */
#define GetPerTupleExprContext(estate) \
((estate)->es_per_tuple_exprcontext ? \
 (estate)->es_per_tuple_exprcontext : \
 MakePerTupleExprContext(estate))

and creates its own context:
/*
 * Create working memory for expression evaluation in this context.
 */
econtext->ecxt_per_tuple_memory =
AllocSetContextCreate(estate->es_query_cxt,
  "ExprContext",
  
ALLOCSET_DEFAULT_SIZES);

so this is currently just never reset. Seems just using
ExecQualAndReset() ought to be sufficient?

Greetings,

Andres Freund



RE: Thread-unsafe coding in ecpg

2019-01-20 Thread Tsunakawa, Takayuki
On Windows, _configthreadlocale() enables us to restrict the effect of 
setlocale() only to the calling thread.  We can call it in 
ecpg_do_prolog/epilog().

https://docs.microsoft.com/en-us/cpp/parallel/multithreading-and-locales?view=vs-2017

Regards
Takayuki Tsunakawa





Re: Thread-unsafe coding in ecpg

2019-01-20 Thread Tom Lane
"Tsunakawa, Takayuki"  writes:
> On Windows, _configthreadlocale() enables us to restrict the effect of 
> setlocale() only to the calling thread.  We can call it in 
> ecpg_do_prolog/epilog().

How far back does that exist?

regards, tom lane



Re: Feature: temporary materialized views

2019-01-20 Thread Andreas Karlsson

On 1/17/19 8:31 PM, Tom Lane wrote:

Creating the view object inside the rStartup callback is itself pretty
much of a kluge; you'd expect that to happen earlier.  I think the
reason it was done that way was it was easier to find out the view's
column set there, but I'm sure we can find another way --- doing the
object creation more like a regular view does it is the obvious approach.


Here is a a stab at refactoring this so the object creation does not 
happen in a callback. I am not that fond of the new API, but given how 
different all the various callers of CreateIntoRelDestReceiver() are I 
had no better idea.


The idea behind the patch is to always create the empty 
table/materialized view before executing the query and do it in one more 
unified code path, and then later populate it unless NO DATA was 
specified. I feel this makes the code easier to follow.


This patch introduces a small behavioral change, as can be seen from the 
diff in the test suite, where since the object creation is moved earlier 
the CTAS query snapshot will for example see the newly created table. 
The new behavior seems more correct to me, but maybe I am missing some 
unintended consequences.


Andreas
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 5947996d67..6ee96628cf 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -55,16 +55,15 @@ typedef struct
 {
 	DestReceiver pub;			/* publicly-known function pointers */
 	IntoClause *into;			/* target relation specification */
+	ObjectAddress reladdr;		/* address of rel, for intorel_startup */
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
-	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
 	CommandId	output_cid;		/* cmin to insert in output tuples */
 	int			hi_options;		/* heap_insert performance options */
 	BulkInsertState bistate;	/* bulk insert state */
 } DR_intorel;
 
-/* utility functions for CTAS definition creation */
-static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into);
+/* utility function for CTAS definition creation */
 static ObjectAddress create_ctas_nodata(List *tlist, IntoClause *into);
 
 /* DestReceiver routines for collecting data */
@@ -75,16 +74,18 @@ static void intorel_destroy(DestReceiver *self);
 
 
 /*
- * create_ctas_internal
+ * create_ctas_nodata
  *
- * Internal utility used for the creation of the definition of a relation
- * created via CREATE TABLE AS or a materialized view.  Caller needs to
- * provide a list of attributes (ColumnDef nodes).
+ * Create CTAS or materialized view without the data, starting from the
+ * targetlist of the SELECT or view definition.
  */
 static ObjectAddress
-create_ctas_internal(List *attrList, IntoClause *into)
+create_ctas_nodata(List *tlist, IntoClause *into)
 {
-	CreateStmt *create = makeNode(CreateStmt);
+	List	   *attrList;
+	ListCell   *t,
+			   *lc;
+	CreateStmt *create;
 	bool		is_matview;
 	char		relkind;
 	Datum		toast_options;
@@ -96,71 +97,6 @@ create_ctas_internal(List *attrList, IntoClause *into)
 	relkind = is_matview ? RELKIND_MATVIEW : RELKIND_RELATION;
 
 	/*
-	 * Create the target relation by faking up a CREATE TABLE parsetree and
-	 * passing it to DefineRelation.
-	 */
-	create->relation = into->rel;
-	create->tableElts = attrList;
-	create->inhRelations = NIL;
-	create->ofTypename = NULL;
-	create->constraints = NIL;
-	create->options = into->options;
-	create->oncommit = into->onCommit;
-	create->tablespacename = into->tableSpaceName;
-	create->if_not_exists = false;
-
-	/*
-	 * Create the relation.  (This will error out if there's an existing view,
-	 * so we don't need more code to complain if "replace" is false.)
-	 */
-	intoRelationAddr = DefineRelation(create, relkind, InvalidOid, NULL, NULL);
-
-	/*
-	 * If necessary, create a TOAST table for the target table.  Note that
-	 * NewRelationCreateToastTable ends with CommandCounterIncrement(), so
-	 * that the TOAST table will be visible for insertion.
-	 */
-	CommandCounterIncrement();
-
-	/* parse and validate reloptions for the toast table */
-	toast_options = transformRelOptions((Datum) 0,
-		create->options,
-		"toast",
-		validnsps,
-		true, false);
-
-	(void) heap_reloptions(RELKIND_TOASTVALUE, toast_options, true);
-
-	NewRelationCreateToastTable(intoRelationAddr.objectId, toast_options);
-
-	/* Create the "view" part of a materialized view. */
-	if (is_matview)
-	{
-		/* StoreViewQuery scribbles on tree, so make a copy */
-		Query	   *query = (Query *) copyObject(into->viewQuery);
-
-		StoreViewQuery(intoRelationAddr.objectId, query, false);
-		CommandCounterIncrement();
-	}
-
-	return intoRelationAddr;
-}
-
-
-/*
- * create_ctas_nodata
- *
- * Create CTAS or materialized view when WITH NO DATA is used, starting from
- * the targetlist of the SELECT or view definition.
- */
-static ObjectAddress
-create_ctas_nodata(List *tlist, I

Re: PostgreSQL vs SQL/XML Standards

2019-01-20 Thread Chapman Flack
On 01/20/19 20:07, Chapman Flack wrote:

> So it appears that this example does not depend on any special treatment
> of the default_expression.
> 
> Is there an example that can be constructed that would depend on the
> special treatment (in which case, the PL/Java implementation would be
> unable to produce the same result)?

I see that I briefly forgot one difference that does matter, namely the
timing of a call to a volatile function like nextval. That detail certainly
needs to remain in the docs.

I am left wondering if that might be the only effect of the deferred
argument evaluation that really does matter.

Regards,
-Chap



Re: Feature: temporary materialized views

2019-01-20 Thread Andreas Karlsson

On 1/18/19 8:32 PM, Mitar wrote:

On Fri, Jan 18, 2019 at 7:18 AM Andreas Karlsson  wrote:

These rules are usually pretty easy to add. Just take a look in
src/bin/psql/tab-complete.c to see how it is usually done.


Thanks. I have added the auto-complete and attached a new patch.


Hm, I do not think we should complete UNLOGGED MATERIALIZED VIEW even 
though it is valid syntax. If you try to create one you will just get an 
error. I am leaning towards removing the existing completion for this, 
because I do not see the point of completing to useless but technically 
valid syntax.


This is the one I think we should probably remove:

else if (TailMatches("CREATE", "UNLOGGED"))
COMPLETE_WITH("TABLE", "MATERIALIZED VIEW");


I might take a stab at refactoring this myself this weekend. Hopefully
it is not too involved.


That would be great! I can afterwards update the patch accordingly.


I have submitted a first shot at this. Let's see what others think of my 
patch.


Andreas



RE: Thread-unsafe coding in ecpg

2019-01-20 Thread Tsunakawa, Takayuki
From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> "Tsunakawa, Takayuki"  writes:
> > On Windows, _configthreadlocale() enables us to restrict the effect of
> setlocale() only to the calling thread.  We can call it in
> ecpg_do_prolog/epilog().
> 
> How far back does that exist?

I couldn't find the relevant doc, but I've just confirmed I can use it with 
Visual Studio 2008 on Win7, which is my oldest combination at hand.  VS 2008 is 
already past its EOL, and the support for Win7 will end next year, so the 
combination is practically enough.

Regards
Takayuki Tsunakawa





Re: Thread-unsafe coding in ecpg

2019-01-20 Thread Tom Lane
"Tsunakawa, Takayuki"  writes:
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
>> How far back does that exist?

> I couldn't find the relevant doc, but I've just confirmed I can use it with 
> Visual Studio 2008 on Win7, which is my oldest combination at hand.  VS 2008 
> is already past its EOL, and the support for Win7 will end next year, so the 
> combination is practically enough.

Hm.  Well, I suppose we can figure that the buildfarm should tell us
if there's anything too old that we still care about.

So like this ...

regards, tom lane

diff --git a/configure b/configure
index 7602e65..1e69eda 100755
*** a/configure
--- b/configure
*** fi
*** 15209,15215 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15209,15215 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index d599ad8..556186c 100644
*** a/configure.in
--- b/configure.in
*** AC_CHECK_FUNCS(m4_normalize([
*** 1618,1623 
--- 1618,1624 
  	strsignal
  	symlink
  	sync_file_range
+ 	uselocale
  	utime
  	utimes
  	wcstombs_l
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 9d99816..2c899a1 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 691,696 
--- 691,699 
  /* Define to 1 if the system has the type `unsigned long long int'. */
  #undef HAVE_UNSIGNED_LONG_LONG_INT
  
+ /* Define to 1 if you have the `uselocale' function. */
+ #undef HAVE_USELOCALE
+ 
  /* Define to 1 if you have the `utime' function. */
  #undef HAVE_UTIME
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 8a560ef..3964433 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***
*** 545,550 
--- 545,553 
  /* Define to 1 if you have the `unsetenv' function. */
  /* #undef HAVE_UNSETENV */
  
+ /* Define to 1 if you have the `uselocale' function. */
+ /* #undef HAVE_USELOCALE */
+ 
  /* Define to 1 if you have the `utime' function. */
  #define HAVE_UTIME 1
  
diff --git a/src/interfaces/ecpg/ecpglib/ecpglib_extern.h b/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
index 1c9bce1..41851d5 100644
*** a/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
--- b/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
***
*** 12,17 
--- 12,20 
  #ifndef CHAR_BIT
  #include 
  #endif
+ #ifdef LOCALE_T_IN_XLOCALE
+ #include 
+ #endif
  
  enum COMPAT_MODE
  {
*** struct statement
*** 61,67 
--- 64,78 
  	bool		questionmarks;
  	struct variable *inlist;
  	struct variable *outlist;
+ #ifdef HAVE_USELOCALE
+ 	locale_t	clocale;
+ 	locale_t	oldlocale;
+ #else
  	char	   *oldlocale;
+ #ifdef WIN32
+ 	int			oldthreadlocale;
+ #endif
+ #endif
  	int			nparams;
  	char	  **paramvalues;
  	PGresult   *results;
diff --git a/src/interfaces/ecpg/ecpglib/execute.c b/src/interfaces/ecpg/ecpglib/execute.c
index 3f5034e..f67d774 100644
*** a/src/interfaces/ecpg/ecpglib/execute.c
--- b/src/interfaces/ecpg/ecpglib/execute.c
*** free_statement(struct statement *stmt)
*** 102,108 
--- 102,113 
  	free_variable(stmt->outlist);
  	ecpg_free(stmt->command);
  	ecpg_free(stmt->name);
+ #ifdef HAVE_USELOCALE
+ 	if (stmt->clocale)
+ 		freelocale(stmt->clocale);
+ #else
  	ecpg_free(stmt->oldlocale);
+ #endif
  	ecpg_free(stmt);
  }
  
*** ecpg_do_prologue(int lineno, const int c
*** 1771,1778 
  
  	/*
  	 * Make sure we do NOT honor the locale for numeric input/output since the
! 	 * database wants the standard decimal point
  	 */
  	stmt->oldlocale = ecpg_strdup(setlocale(LC_NUMERIC, NULL), lineno);
  	if (stmt->oldlocale == NULL)
  	{
--- 1776,1806 
  
  	/*
  	 * Make sure we do NOT honor the locale for numeric input/output since the
! 	 * database wants the standard decimal point.  If available, use
! 	 * uselocale() for this because it's thread-safe.
  	 */
+ #ifdef HAVE_USELOCALE
+ 	stmt->clocale = newlocale(LC_NUMERIC_MASK, "C", (locale_t) 0);
+ 	if (stmt->clocale

Re: COPY FROM WHEN condition

2019-01-20 Thread Tomas Vondra



On 1/21/19 3:12 AM, Andres Freund wrote:
> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
>>>
>>>
>>> On 1/20/19 8:24 PM, Andres Freund wrote:
 Hi,

 On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>
>>>
>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>> mailto:tomas.von...@2ndquadrant.com>> 
>>> wrote:
>>>
>>>
>>>  Can you also update the docs to mention that the functions called 
>>> from
>>>  the WHERE clause does not see effects of the COPY itself?
>>>
>>>
>>> /Of course, i  also add same comment to insertion method selection
>>> /
>>
>> FWIW I've marked this as RFC and plan to get it committed this week.
>>
>
> Pushed, thanks for the patch.

 While rebasing the pluggable storage patch ontop of this I noticed that
 the qual appears to be evaluated in query context. Isn't that a bad
 idea? ISMT it should have been evaluated a few lines above, before the:

/* Triggers and stuff need to be invoked in query context. */
MemoryContextSwitchTo(oldcontext);

 Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?

>>>
>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
>>> fix that tomorrow.
>>
>> NP. On second thought, the problem is probably smaller than I thought at
>> first, because ExecQual() switches to the econtext's per-tuple memory
>> context. But it's only reset once for each batch, so there's some
>> wastage. At least worth a comment.
> 
> I'm tired, but perhaps its actually worse - what's being reset currently
> is the ESTate's per-tuple context:
> 
>   if (nBufferedTuples == 0)
>   {
>   /*
>* Reset the per-tuple exprcontext. We can only do this 
> if the
>* tuple buffer is empty. (Calling the context the 
> per-tuple
>* memory context is a bit of a misnomer now.)
>*/
>   ResetPerTupleExprContext(estate);
>   }
> 
> but the quals are evaluated in the ExprContext's:
> 
> ExecQual(ExprState *state, ExprContext *econtext)
> ...
>   ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
> 
> 
> which is created with:
> 
> /* Get an EState's per-output-tuple exprcontext, making it if first use */
> #define GetPerTupleExprContext(estate) \
>   ((estate)->es_per_tuple_exprcontext ? \
>(estate)->es_per_tuple_exprcontext : \
>MakePerTupleExprContext(estate))
> 
> and creates its own context:
>   /*
>* Create working memory for expression evaluation in this context.
>*/
>   econtext->ecxt_per_tuple_memory =
>   AllocSetContextCreate(estate->es_query_cxt,
> "ExprContext",
> 
> ALLOCSET_DEFAULT_SIZES);
> 
> so this is currently just never reset.

So context, much simple. Wow.
 > Seems just using ExecQualAndReset() ought to be sufficient?
> 

Seems like it.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: COPY FROM WHEN condition

2019-01-20 Thread Tomas Vondra



On 1/21/19 3:12 AM, Andres Freund wrote:
> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
>>>
>>>
>>> On 1/20/19 8:24 PM, Andres Freund wrote:
 Hi,

 On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>
>>>
>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>> mailto:tomas.von...@2ndquadrant.com>> 
>>> wrote:
>>>
>>>
>>>  Can you also update the docs to mention that the functions called 
>>> from
>>>  the WHERE clause does not see effects of the COPY itself?
>>>
>>>
>>> /Of course, i  also add same comment to insertion method selection
>>> /
>>
>> FWIW I've marked this as RFC and plan to get it committed this week.
>>
>
> Pushed, thanks for the patch.

 While rebasing the pluggable storage patch ontop of this I noticed that
 the qual appears to be evaluated in query context. Isn't that a bad
 idea? ISMT it should have been evaluated a few lines above, before the:

/* Triggers and stuff need to be invoked in query context. */
MemoryContextSwitchTo(oldcontext);

 Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?

>>>
>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
>>> fix that tomorrow.
>>
>> NP. On second thought, the problem is probably smaller than I thought at
>> first, because ExecQual() switches to the econtext's per-tuple memory
>> context. But it's only reset once for each batch, so there's some
>> wastage. At least worth a comment.
> 
> I'm tired, but perhaps its actually worse - what's being reset currently
> is the ESTate's per-tuple context:
> 
>   if (nBufferedTuples == 0)
>   {
>   /*
>* Reset the per-tuple exprcontext. We can only do this 
> if the
>* tuple buffer is empty. (Calling the context the 
> per-tuple
>* memory context is a bit of a misnomer now.)
>*/
>   ResetPerTupleExprContext(estate);
>   }
> 
> but the quals are evaluated in the ExprContext's:
> 
> ExecQual(ExprState *state, ExprContext *econtext)
> ...
>   ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
> 
> 
> which is created with:
> 
> /* Get an EState's per-output-tuple exprcontext, making it if first use */
> #define GetPerTupleExprContext(estate) \
>   ((estate)->es_per_tuple_exprcontext ? \
>(estate)->es_per_tuple_exprcontext : \
>MakePerTupleExprContext(estate))
> 
> and creates its own context:
>   /*
>* Create working memory for expression evaluation in this context.
>*/
>   econtext->ecxt_per_tuple_memory =
>   AllocSetContextCreate(estate->es_query_cxt,
> "ExprContext",
> 
> ALLOCSET_DEFAULT_SIZES);
> 
> so this is currently just never reset.

Actually, no. The ResetPerTupleExprContext boils down to

MemoryContextReset((econtext)->ecxt_per_tuple_memory)

and ExecEvalExprSwitchContext does this

MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);

So it's resetting the right context, although only on batch boundary.
But now I see 31f38174 does this:

else if (cstate->whereClause != NULL ||
 contain_volatile_functions(cstate->whereClause))
{
...
insertMethod = CIM_SINGLE;
}

so it does not do batching with WHERE. But the condition seems wrong, I
guess it should be && instead of ||. Will investigate in the morning.

> Seems just using ExecQualAndReset() ought to be sufficient?
> 

That may still be the right thing to do.


cheers

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: speeding up planning with partitions

2019-01-20 Thread Amit Langote
Imai-san,

On 2019/01/21 9:45, Imai, Yoshikazu wrote:
> I'll check if this fact is really correct by majoring the time of the
> first 5 queries before generic plan is created and the other queries
> after generic plan is created.

That would really help.  If you are able to recreate that behavior
consistently, that might something to try to fix at some point.

Thanks,
Amit




Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

2019-01-20 Thread David Rowley
On Tue, 15 Jan 2019 at 13:17, Tom Lane  wrote:
>
> David Rowley  writes:
> > I also did a quick benchmark of v6 and found the slowdown to be
> > smaller after the change made in build_simple_rel()
>
> Thanks for confirming.  I was not very sure that was worth the extra
> few bytes of code space, but if you see a difference too, then it's
> probably worthwhile.

It occurred to me that a common case where you'll hit the new code is
INSERT INTO ... VALUES.

I thought I'd better test this, so I carefully designed the following
table so it would have as little INSERT overhead as possible.

create table t();

With fsync=off and a truncate between each pgbench run.

insert.sql = insert into t default values;

Unpatched:

$ pgbench -n -f insert.sql -T 60 postgres
tps = 27986.757396 (excluding connections establishing)
tps = 28220.905728 (excluding connections establishing)
tps = 28234.331176 (excluding connections establishing)
tps = 28254.392421 (excluding connections establishing)
tps = 28691.946948 (excluding connections establishing)

Patched:

tps = 28426.183388 (excluding connections establishing)
tps = 28464.517261 (excluding connections establishing)
tps = 28505.178616 (excluding connections establishing)
tps = 28414.275662 (excluding connections establishing)
tps = 28648.103349 (excluding connections establishing)

The patch seems to average out slightly faster on those runs, but the
variance is around the noise level.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



RE: Thread-unsafe coding in ecpg

2019-01-20 Thread Tsunakawa, Takayuki
From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Hm.  Well, I suppose we can figure that the buildfarm should tell us if
> there's anything too old that we still care about.
> 
> So like this ...

How quick!  Thank you.  I've reviewed the code for both Unix and Windows, and 
it looks OK.  I haven't built the patch, but expect the buildfarm to do the 
test.


Regards
Takayuki Tsunakawa






RE: Libpq support to connect to standby server as priority

2019-01-20 Thread Tsunakawa, Takayuki
From: Laurenz Albe [mailto:laurenz.a...@cybertec.at]
> Tsunakawa, Takayuki wrote:
> > I'm sorry to repeat myself, but anyway, I think we need a method to connect
> to a standby
> > as the original desire, because the primary instance may be read only
> by default while
> > only limited users update data.  That's for reducing the burdon on the
> primary and
> > minimizing the impact on users who update data.  For example,
> >
> > * run data reporting on the standby
> > * backup the database from the standby
> > * cascade replication from the standby
> 
> I see.
> 
> But then the new value should not be called "prefer-read", because that
> would be
> misleading.  It would also not be related to the existing "read-write".
> 
> For what you have in mind, there should be the options "primary-required"
> and
> "standby-preferred", however we implement them.

Yes, that's what I'm proposing and expecting with a new parameter whose naming 
follows PgJDBC's. 

> Have there been a lot of complaints that the existing "read-write" is not
> good
> enough to detect replication primaries?

I haven't heard anything.  I guess almost nobody uses 
default_transaction_read_only.

Before that, see the description of target_session_attr:

https://www.postgresql.org/docs/devel/libpq-connect.html#LIBPQ-PARAMKEYWORDS

I'm afraid most users don't know whether they can connect to the 
primary/standby.  Just searching "primary", "master" or "standby" in this page 
doesn't show anything relevant.


> One use case I can think of is logical replication (or other replication
> methods like
> Slony).  You can use the feature by setting default_transaction_read_only
> = on
> on the standby.

I know that, but I suspect that's really a practical use case.  Anyway, I'm OK 
with relying on target_session_attr to fulfill that need.


Regards
Takayuki Tsunakawa





Re: PostgreSQL vs SQL/XML Standards

2019-01-20 Thread Pavel Stehule
ne 20. 1. 2019 23:13 odesílatel Chapman Flack 
napsal:

> On 01/20/19 12:48, Pavel Stehule wrote:
> >>
> >> Accordingly, I think the paragraph beginning "Unlike regular PostgreSQL
> >> functions," is more likely to perplex readers than to enlighten them.
> >> What it says about column_expression does not seem to lead to any useful
> >> difference from the behavior if it were "just like regular PostgreSQL
> >> functions".
> >
> > regular Postgres' functions has evaluated all arguments before own
> > execution. I think so this note is related much more to expressions used
> as
> > defaults.
>
> Sure, but again, is there an example, or can one easily be constructed,
> that shows the default expressions working in such a way?
>
> I am not able to use a default expression to refer to an earlier
> column in the column list of the xmltable call.
>


probably you can see the effect if you use some volatile function ..
random(), nextval(),

I think so notice in documentation was not a motivation to use it. It was
explanation of implementation and warnings against side effect.


>
> I am able to use a default expression to refer to a column of an earlier
> FROM item in the enclosing SELECT. But such a query ends up having LATERAL
> form (whether or not the word LATERAL is used), and re-executes xmltable
> whenever the referenced column value changes. In that case, whether the
> default argument is evaluated at function entry or later doesn't seem
> to matter: the function is re-executed, so evaluating the new default
> at the time of entry is sufficient.
>

it has sense only for volatile functions. it was not often. On second hand
deferred evaluation shoul not be a problem, and more, it is evaluated only
when it is necessary, what is more sensible for me.


> So, I have still not been able to construct a query that requires the
> deferred evaluation behavior. But perhaps there is a way I haven't
> thought of.
>
> Regards,
> -Chap
>


RE: Libpq support to connect to standby server as priority

2019-01-20 Thread Tsunakawa, Takayuki
From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
> Thanks for finding out the problem, how about the following way of checking
> for prefer-read/prefer-standby.
> 
> 1. (default_transaction_read_only = true and recovery = true)
> 
> 2. If none of the host satisfies the above scenario, then recovery = true
> 3. Last check is for default_transaction_read_only = true

That would be fine.  But as I mentioned in another mail, I think "get read-only 
session" and "connect to standby" differ.  So I find it better to separate 
parameters for those request; target_session_attr and target_server_type.


Regards
Takayuki Tsunakawa




RE: Protect syscache from bloating with negative cache entries

2019-01-20 Thread Tsunakawa, Takayuki
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> 0003: Remote GUC setting
> 
> It is independent from the above two, and heavily arguable.
> 
> pg_set_backend_config(pid, name, value) changes the GUC  on the
> backend with  to .
> 

Not having looked at the code yet, why did you think this is necessary?  Can't 
we always collect the cache stats?  Is it heavy due to some locking in the 
shared memory, or sending the stats to the stats collector?


Regards
Takayuki Tsunakawa




Re: Protect syscache from bloating with negative cache entries

2019-01-20 Thread Kyotaro HORIGUCHI
Hello.

At Fri, 18 Jan 2019 17:09:41 -0800, "and...@anarazel.de"  
wrote in <20190119010941.6ruftewah7t3k...@alap3.anarazel.de>
> Hi,
> 
> On 2019-01-18 19:57:03 -0500, Robert Haas wrote:
> > On Fri, Jan 18, 2019 at 4:23 PM and...@anarazel.de  
> > wrote:
> > > My proposal for this was to attach a 'generation' to cache entries. Upon
> > > access cache entries are marked to be of the current
> > > generation. Whenever existing memory isn't sufficient for further cache
> > > entries and, on a less frequent schedule, triggered by a timer, the
> > > cache generation is increased and th new generation's "creation time" is
> > > measured.  Then generations that are older than a certain threshold are
> > > purged, and if there are any, the entries of the purged generation are
> > > removed from the caches using a sequential scan through the cache.
> > >
> > > This outline achieves:
> > > - no additional time measurements in hot code paths

It is caused at every transaction start time and stored in
TimestampTz in this patch. No additional time measurement exists
already but cache puruing won't happen if a transaction lives for
a long time. Time-driven generation value, maybe with 10s-1min
fixed interval, is a possible option.

> > > - no need for a sequential scan of the entire cache when no generations
> > >   are too old

This patch didn't precheck against the oldest generation, but it
can be easily calculated. (But doesn't base on the creation time
but on the last-access time.) (Attached applies over the
v7-0001-Remove-entries-..patch)

Using generation time, entries are purged even if it is recently
accessed. I think last-accessed time is more sutable for the
purpse. On the other hand using last-accessed time, the oldest
generation can be stale by later access.

> > > - both size and time limits can be implemented reasonably cheaply
> > > - overhead when feature disabled should be close to zero

Overhead when disabled is already nothing since scanning is
inhibited when cache_prune_min_age is a negative value.

> > Seems generally reasonable.  The "whenever existing memory isn't
> > sufficient for further cache entries" part I'm not sure about.
> > Couldn't that trigger very frequently and prevent necessary cache size
> > growth?
> 
> I'm thinking it'd just trigger a new generation, with it's associated
> "creation" time (which is cheap to acquire in comparison to creating a
> number of cache entries) . Depending on settings or just code policy we
> can decide up to which generation to prune the cache, using that
> creation time.  I'd imagine that we'd have some default cache-pruning
> time in the minutes, and for workloads where relevant one can make
> sizing configurations more aggressive - or something like that.

The current patch uses last-accesed time by non-gettimeofday()
method. The genreation is fixed up to 3 and infrequently-accessed
entries are removed sooner. Generation interval is determined by
cache_prune_min_age.

Although this doesn't put a hard cap on memory usage, it is
indirectly and softly limited by the cache_prune_min_age and
cache_memory_target, which determins how large a cache can grow
until pruning happens. They are per-cache basis.

If we prefer to set a budget on all the syschaches (or even
including other caches), it would be more complex.

regares.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 4a3b3094a0..8274704af7 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -859,6 +859,7 @@ InitCatCache(int id,
 	for (i = 0; i < nkeys; ++i)
 		cp->cc_keyno[i] = key[i];
 	cp->cc_tupsize = 0;
+	cp->cc_noprune_until = 0;
 
 	/*
 	 * new cache is initialized as far as we can go for now. print some
@@ -898,6 +899,7 @@ CatCacheCleanupOldEntries(CatCache *cp)
 	int			i;
 	int			nremoved = 0;
 	size_t		hash_size;
+	TimestampTz oldest_lastaccess = 0;
 #ifdef CATCACHE_STATS
 	/* These variables are only for debugging purpose */
 	int			ntotal = 0;
@@ -918,6 +920,10 @@ CatCacheCleanupOldEntries(CatCache *cp)
 	if (cache_prune_min_age < 0)
 		return false;
 
+	/* Return immediately if apparently no entry to remove */
+	if (cp->cc_noprune_until == 0 || catcacheclock <= cp->cc_noprune_until)
+		return false;
+
 	/*
 	 * Return without pruning if the size of the hash is below the target.
 	 */
@@ -939,6 +945,7 @@ CatCacheCleanupOldEntries(CatCache *cp)
 			CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur);
 			long entry_age;
 			int us;
+			bool removed = false;
 
 
 			/*
@@ -982,12 +989,24 @@ CatCacheCleanupOldEntries(CatCache *cp)
 	{
 		CatCacheRemoveCTup(cp, ct);
 		nremoved++;
+		removed = true;
 	}
 }
 			}
+
+			/* Take the oldest lastaccess among survived entries */
+			if (!removed &&
+(oldest_lastaccess == 0 || ct->lastaccess < oldest_lastaccess))
+oldest_lastaccess = ct->lastaccess;
 		}
 	}
 
+	/* Calculate