date:20170330

Re: [HACKERS] Allow interrupts on waiting standby

2017-03-30 Thread Tsunakawa, Takayuki

Hi Michael, Simon,

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> > Oh, I see.  But how does the startup process respond quickly?  It seems
> that you need to call HandleStartupProcInterrupts() instead of
> CHECK_FOR_INTERRUPTS().  But I'm not sure whether
> HandleStartupProcInterrupts() can be called here.
> 
> Bah. Of course you are right. We don't care about SetLatch() here as signals
> are processed with a different code path than normal backends.

So, I understood you agreed that CHECK_FOR_INTERRUPTS() here does nothing.  But 
your patch still calls it:

+   /* An interrupt may have occurred while waiting */
+   CHECK_FOR_INTERRUPTS();

I got confused because the problem is not defined in this thread.  What problem 
does this patch address?  These ones?

* The startup process terminates as soon as postmaster dies.
* pg_stat_activity does not show the wait event of startup process waiting for 
a recovery conflict resolution.


My guess about why you decided to not call HandleStartupProcInterrupts() here 
is:

* Respond to postmaster death here.
* No need to reload config file here because there's no parameter to affect 
this conflict wait.  But max_standby_{archive | streaming}_delay seems to 
affect the wait period.
* No need to handle SIGTERM and exit here, because the startup process doesn't 
wait for a conflict resolution here when he can terminate.

I think you can call HandleStartupProcInterrupts() here, instead of checking 
postmaster death.  Did you perform tests?

Did Simon's committed patch solve the problem as expected?

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] RE: DECLARE STATEMENT setting up a connection in ECPG

2017-03-30 Thread Ideriha, Takeshi

Thank you for prompt check!

>As per above test steps, it doesn't produce the results and doesn't
>generate the error also. I feel this needs to be fixed.

>As we are at the end of commitfest, it is better you can move it
>to next one commitfest and provide an updated patch to solve the
>above problem.

I tottaly agreed.
I moved to next CF with waiting on author.

Regards,
Ideriha Takeshi

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-30 Thread Fabien COELHO



Hello Tom,


Patch applies cleanly. Make check ok. Feature still works!


Idem for v30.

[...] Aside from cosmetic changes, I've made it behave reasonably for 
cases where \if is used on portions of a query, for instance


SELECT
\if :something
   var1
\else
   var2
\endif
FROM table;


This is commendable, but I would not have bothered, although it is more 
cpp-like with it.


A small issue I see is that I was planning to add such an if syntax to 
pgbench (well, at least if I succeed in getting boolean expressions and 
setting variables, which is just a maybe), but this kind of if in the 
middle of expression does not make much sense for a pgbench script where 
"if" must be evaluated at execution time, not parse time.



which as I mentioned a long time ago is something that people will
certainly expect to work.


I would not have expected it to work, but indeed other people could. 
Sometimes I try something with pg and it does not work as I hoped. That is 
life.


I also cleaned up a lot of corner-case discrepancies between how much 
text is consumed in active-branch and inactive-branch cases (OT_FILEPIPE 
is a particularly nasty case in that regard :-()


Indeed.


I plan to read this over again tomorrow and then push it, if there are
not objections/corrections.


My small objection is that an eventual if in pgbench, with a separate 
parsing and execution, will not work in the middle of queries as this one. 
Do you think that such a discrepancy would be admissible.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Craig Ringer

On 30 March 2017 at 11:34, Craig Ringer  wrote:
> On 29 March 2017 at 23:13, Simon Riggs  wrote:
>> On 29 March 2017 at 10:17, Craig Ringer  wrote:
>>> On 29 March 2017 at 16:44, Craig Ringer  wrote:
>>>
 * Split oldestCatalogXmin tracking into separate patch
>>>
>>> Regarding this, Simon raised concerns about xlog volume here.
>>>
>>> It's pretty negligible.
>>>
>>> We only write a new record when a vacuum runs after catalog_xmin
>>> advances on the slot with the currently-lowest catalog_xmin (or, if
>>> vacuum doesn't run reasonably soon, when the bgworker next looks).
>>
>> I'd prefer to slow things down a little, not be so eager.
>>
>> If we hold back update of the catalog_xmin until when we run
>> GetRunningTransactionData() we wouldn't need to produce any WAL
>> records at all AND we wouldn't need to have VACUUM do
>> UpdateOldestCatalogXmin(). Bgwriter wouldn't need to perform an extra
>> task.
>>
>> That would also make this patch about half the length it is.
>>
>> Let me know what you think.
>
> Good idea.
>
> We can always add a heuristic later to make xl_running_xacts get
> emitted more often at high transaction rates if it's necessary.
>
> Patch coming soon.

Attached.

A bit fiddlier than expected, but I like the result more.

In the process I identified an issue with both the prior patch and
this one where we don't check slot validity for slots that existed on
standby prior to promotion of standby to master. We were just assuming
that being the master was good enough, since it controls
replication_slot_catalog_xmin, but that's not true for pre-existing
slots.

Fixed by forcing update of the persistent safe catalog xmin after the
first slot is created on the master - which is now done by doing an
immediate LogStandbySnapshot() after assigning the slot's
catalog_xmin.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 0df4f4ae04f8d37c623d3a533699966c3cc0479a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:36:49 +0800
Subject: [PATCH v2] Log catalog_xmin advances before removing catalog tuples

Before advancing the effective catalog_xmin we use to remove old catalog
tuple versions, make sure it is written to WAL. This allows standbys
to know the oldest xid they can safely create a historic snapshot for.
They can then refuse to start decoding from a slot or raise a recovery
conflict.

The catalog_xmin advance is logged in the next xl_running_xacts records, so
vacuum of catalogs may be held back up to 10 seconds when a replication slot
with catalog_xmin is holding down the global catalog_xmin.
---
 src/backend/access/heap/rewriteheap.c   |  3 +-
 src/backend/access/rmgrdesc/standbydesc.c   |  5 ++-
 src/backend/access/transam/varsup.c |  1 -
 src/backend/access/transam/xlog.c   | 26 ++-
 src/backend/replication/logical/logical.c   | 54 +++
 src/backend/replication/walreceiver.c   |  2 +-
 src/backend/replication/walsender.c | 13 ++
 src/backend/storage/ipc/procarray.c | 68 +++--
 src/backend/storage/ipc/standby.c   | 25 +++
 src/bin/pg_controldata/pg_controldata.c |  2 +
 src/include/access/transam.h| 11 +
 src/include/catalog/pg_control.h|  1 +
 src/include/storage/procarray.h |  3 +-
 src/include/storage/standby.h   |  6 +++
 src/include/storage/standbydefs.h   |  1 +
 src/test/recovery/t/006_logical_decoding.pl | 15 ++-
 16 files changed, 214 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index d7f65a5..d1400ec 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state)
 	if (!state->rs_logical_rewrite)
 		return;
 
-	ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin);
+	/* Use oldestCatalogXmin here */
+	ProcArrayGetReplicationSlotXmin(NULL, &logical_xmin, NULL);
 
 	/*
 	 * If there are no logical slots in progress we don't need to do anything,
diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
index 278546a..4aaae59 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -21,10 +21,11 @@ standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec)
 {
 	int			i;
 
-	appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u",
+	appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u oldestCatalogXmin %u",
 	 xlrec->nextXid,
 	 xlrec->latestCompletedXid,
-	 xlrec->oldestRunningXid);
+	 xlrec->oldestRunningXid,
+	 xlrec->oldestCatalogXmin);
 	if (xlrec->xcnt > 0)
 	{
 		appendStringInfo(buf, "; %d xacts:", xlrec->xcnt);
diff --git a/src/backend/access/transam/varsup.c b/src/b

[HACKERS] inconsistent page found on STANDBY server

2017-03-30 Thread Ashutosh Sharma

Hi All,

When running make installcheck on a master with wal consistency check
enabled, inconsistent page is detected on standby. I could see the
following FATAL message in the standby server logfile,

2017-03-30 07:31:10.101 BST [27994] LOG:  entering standby mode
2017-03-30 07:31:10.106 BST [27994] LOG:  redo starts at 0/224
2017-03-30 07:31:10.108 BST [27994] LOG:  consistent recovery state
reached at 0/2E4
2017-03-30 07:31:10.108 BST [27992] LOG:  database system is ready to
accept read only connections
2017-03-30 07:31:10.113 BST [27998] LOG:  started streaming WAL from
primary at 0/300 on timeline 1
2017-03-30 07:33:19.040 BST [27994] FATAL:  inconsistent page found,
rel 1663/13157/16391, forknum 0, blkno 0
2017-03-30 07:33:19.040 BST [27994] CONTEXT:  WAL redo at 0/351CF03C
for Hash/UPDATE_META_PAGE: ntuples -nan
2017-03-30 07:33:19.041 BST [27992] LOG:  startup process (PID 27994)
exited with exit code 1

Steps to reproduce:
===
1)PG v10 sources
2)Setup Master/SLAVE replication
3)run make installcheck on Master
4)Check database logs ,generated on SLAVE directory.

Please note that above issue is observed only on 32 bit LInux machine
and was offlist reported to me by Tushar Ahuja. Tushar also allowed me
to use his 32 bit Linux machine to analyse this problem. I also had a
small offlist discussion with Amit (included in this mail) when
analysing this problem.

RCA:

After debugging the hash index code for deletion, I could find that
while registering data for xlog record 'XLOG_HASH_UPDATE_META_PAGE' we
are not passing the correct length of data being registered and
therefore, data (xl_hash_update_meta_page) is not completely recorded
into the wal record.

Fix:
===
Attached patch fixes this issue.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..a5798bd 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -691,7 +691,7 @@ loop_top:
 		xlrec.ntuples = metap->hashm_ntuples;
 
 		XLogBeginInsert();
-		XLogRegisterData((char *) &xlrec, sizeof(SizeOfHashUpdateMetaPage));
+		XLogRegisterData((char *) &xlrec, SizeOfHashUpdateMetaPage);
 
 		XLogRegisterBuffer(0, metabuf, REGBUF_STANDARD);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash after running sqlsmith

2017-03-30 Thread tushar


On 03/29/2017 12:06 AM, Tom Lane wrote:

Hm ... I don't see a crash here,

I am getting this issue only on Linux3.

  but I wonder whether you have parameters
set that would cause this query to be run as a parallel query?  Because
pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
probably insane.
No, i have not changed any parameters except logging_collector=on in 
postgresql.conf file.


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-30 Thread Masahiko Sawada

On Wed, Mar 29, 2017 at 5:38 PM, vinayak
 wrote:
>
> On 2017/03/25 4:30, Robert Haas wrote:
>>
>> On Fri, Mar 24, 2017 at 3:41 AM, vinayak
>>  wrote:
>>>
>>> I have updated the patch.
>>
>> You can't change the definition of AcquireSampleRowsFunc without
>> updating the documentation in fdwhandler.sgml, but I think I don't
>> immediately understand why that's a thing we want to do at all if
>> neither file_fdw nor postgres_fdw are going to use the additional
>> argument.  It seems to be entirely pointless because the new value
>> being passed down is always zero and even if the callee were to update
>> it, do_analyze_rel() would just ignore the change on return.  Am I
>> missing something here?  Also, the whitespace-only changes to fdwapi.h
>> related to AcquireSampleRowsFunc going to get undone by pgindent, so
>> even if there's some reason to change that you should leave the lines
>> that don't need changing untouched.

I reviewed v7 patch.

When I ran ANALYZE command to the table having 5 millions rows with 3
child tables, the progress report I got shows the strange result. The
following result was output after sampled all target rows from child
tables (i.g, after finished acquire_inherited_sample_rows). I think
the progress information should report 100% complete at this point.

#= select * from pg_stat_progress_analyze ;
  pid  | datid | datname  | relid |  phase   |
num_target_sample_rows | num_rows_sampled
---+---+--+---+--++--
 81719 | 13215 | postgres | 16440 | collecting inherited sample rows |
   300 |  180
(1 row)

I guess the cause of this is that you always count the number of
sampled rows in acquire_sample_rows staring from 0 for each child
table. num_rows_sampled is reset whenever starting collecting sample
rows.
Also, even if table has child table, the total number of sampling row
is not changed. I think that main differences between analyzing on
normal table and on partitioned table is where we fetch the sample row
from. So I'm not sure if adding "computing inherited statistics" phase
is desirable. If you want to report progress of collecting sample rows
on child table I think that you might want to add the information
showing which child table is being analyzed.

--
pg_stat_progress_analyze.num_target_sample_rows is set while ignoring
the number of rows the target table has actually. So If the table has
rows less than target number of rows, the num_rows_sampled never reach
to num_target_sample_rows.

--
@@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel,
}

samplerows += 1;
+
+   /* Report number of rows sampled so far */
+
pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED,
numrows);
}
}

I think that it's wrong to use numrows variable to report the progress
of the number of rows sampled. As the following comment mentioned, we
first save row into rows[] and increment numrows until numrows <
targrows. And then we could replace stored sample row with new sampled
row. That is, after collecting "numrows" rows, analyze can still take
a long time to get and replace the sample row. To computing progress
of collecting sample row, IMO reporting the number of blocks we
scanned so far is better than number of sample rows. Thought?

> /*
>  * The first targrows sample rows are simply copied into the
>  * reservoir. Then we start replacing tuples in the sample
>  * until we reach the end of the relation.  This algorithm is
>  * from Jeff Vitter's paper (see full citation below). It
>  * works by repeatedly computing the number of tuples to skip
>  * before selecting a tuple, which replaces a randomly chosen
>  * element of the reservoir (current set of tuples).  At all
>  * times the reservoir is a true random sample of the tuples
>  * we've passed over so far, so when we fall off the end of
>  * the relation we're done.
>  */

> I Understood that we could not change the definition of
> AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml.
> In the last patch I have modified the definition of AcquireSampleRowsFunc to
> handle the inheritance case.
> If the table being analyzed has one or more children, ANALYZE will gather
> statistics twice:
> once on the rows of parent table only and second on the rows of the parent
> table with all of its children.
> So while reporting the progress the value of number of target sample rows
> and number of rows sampled is mismatched.
> For single relation it works fine.
>
> In the attached patch I have not change the definition of
> AcquireSampleRowsFunc.
> I have updated inheritance case in the the documentation.
>
>>
>> +/* Reset rows processed to zero for the next column */
>> +

Re: [HACKERS] inconsistent page found on STANDBY server

2017-03-30 Thread Amit Kapila

On Thu, Mar 30, 2017 at 1:55 PM, Ashutosh Sharma  wrote:
>
> RCA:
> 
> After debugging the hash index code for deletion, I could find that
> while registering data for xlog record 'XLOG_HASH_UPDATE_META_PAGE' we
> are not passing the correct length of data being registered and
> therefore, data (xl_hash_update_meta_page) is not completely recorded
> into the wal record.
>
> Fix:
> ===
> Attached patch fixes this issue.
>

The fix looks good to me.  I have scanned the hash index code to see
if there is any other similar problem, but couldn't find any.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-30 Thread Ashutosh Bapat

This should be added to the open items list. I am not able to add it
myself, as I don't have "editor" privileges on open items wiki. I have
requested for those privileges.

On Thu, Mar 30, 2017 at 7:00 AM, Shinoda, Noriyoshi
 wrote:
> Hello,
>
> I tried this feature using most recently snapshot. In case of added 
> constraint PRIMARY KEY for partition table, INSERT ON CONFLICT DO NOTHING 
> statement failed with segmentaion fault.
> If the primary key constraint was not created on the partition, this 
> statement executed successfully.
>
> - Test
> postgres=> CREATE TABLE part1(c1 NUMERIC, c2 VARCHAR(10)) PARTITION BY RANGE 
> (c1) ;
> CREATE TABLE
> postgres=> CREATE TABLE part1p1 PARTITION OF part1 FOR VALUES FROM (100) TO 
> (200) ;
> CREATE TABLE
> postgres=> ALTER TABLE part1p1 ADD CONSTRAINT pk_part1p1 PRIMARY KEY (c1) ;
> ALTER TABLE
> postgres=> INSERT INTO part1 VALUES (100, 'init') ON CONFLICT DO NOTHING ;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !> \q
>
> - Part of data/log/postgresql.log file
> 2017-03-30 10:20:09.161 JST [12323] LOG:  server process (PID 12337) was 
> terminated by signal 11: Segmentation fault
> 2017-03-30 10:20:09.161 JST [12323] DETAIL:  Failed process was running: 
> INSERT INTO part1 VALUES (100, 'init') ON CONFLICT DO NOTHING ;
> 2017-03-30 10:20:09.161 JST [12323] LOG:  terminating any other active server 
> processes
> 2017-03-30 10:20:09.163 JST [12345] FATAL:  the database system is in 
> recovery mode
> 2017-03-30 10:20:09.164 JST [12329] WARNING:  terminating connection because 
> of crash of another server process
> 2017-03-30 10:20:09.164 JST [12329] DETAIL:  The postmaster has commanded 
> this server process to roll back the current transaction and exit, because 
> another server process exited abnormally and possibly corrupted shared memory.
>
> - Environment
> OS: Red Hat Enterprise Linux 7 Update 1 (x86-64)
> Snapshot: 2017-03-29 20:30:05 with default configure.
>
> Best Regards,
>
> --
> Noriyoshi Shinoda
>
> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org 
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Langote
> Sent: Tuesday, March 28, 2017 9:56 AM
> To: Robert Haas 
> Cc: Peter Geoghegan ; Simon Riggs ; 
> PostgreSQL Hackers ; Thom Brown 
> Subject: Re: [HACKERS] Partitioning vs ON CONFLICT
>
> On 2017/03/27 23:40, Robert Haas wrote:
>> On Thu, Mar 9, 2017 at 7:20 PM, Amit Langote
>>  wrote:
>>> On 2017/03/10 9:10, Amit Langote wrote:
 On 2017/03/09 23:25, Robert Haas wrote:
> On Fri, Feb 17, 2017 at 1:47 AM, Amit Langote wrote:
>> I updated the patch.  Now it's reduced to simply removing the
>> check in
>> transformInsertStmt() that prevented using *any* ON CONFLICT on
>> partitioned tables at all.
>
> This patch no longer applies.

 Rebased patch is attached.
>>>
>>> Oops, really attached this time,
>>
>> Committed with a bit of wordsmithing of the documentation.
>
> Thanks.
>
> Regards,
> Amit
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-30 Thread Rushabh Lathia

On Wed, Mar 29, 2017 at 8:38 AM, Tomas Vondra 
wrote:

> Hi,
>
> On 03/28/2017 11:07 AM, Rushabh Lathia wrote:
>
>> ...
>> I think we all agree that we should get rid of nreaders from the
>> GatherMergeState and need to do some code re-factor. But if I
>> understood correctly that Robert's concern was to do that re-factor
>> as separate commit.
>>
>>
> Maybe. It depends on how valuable it's to keep Gather and GatherMerge
> similar - having nreaders in one and not the other seems a bit weird. But
> maybe the refactoring would remove it from both nodes?
>
> Also, it does not really solve the issue that we're using 'nreaders' or
> 'nworkers_launched' to access array with one extra element.
>

With the latest patches, into gather_merge_init() there is one loop where we
initialize the tuple array (which is to hold the tuple came from the each
worker)
and slot (which is to hold the current tuple slot).

We hold the tuple array for the worker because when we read tuples from
workers,
it's a good idea to read several at once for efficiency when possible: this
minimizes
context-switching overhead.

For leader we don't initialize the slot because for leader we directly call
then ExecProcNode(),
which returns the slot. Also there is no point in holding the tuple array
for the leader.

With the latest patch's, gather_merge_clear_slots() will only clean the
tuple table
slot and the tuple array buffer for the workers (nworkers_launched).

Only time we loop through the "nworkers_launched + 1" is while reading the
tuple
from the each gather merge input. Personally I don't see any problem with
that,
but I am very much open for suggestions.

>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

-- 
Rushabh Lathia

Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-30 Thread Amit Langote

On 2017/03/30 18:02, Ashutosh Bapat wrote:
> This should be added to the open items list. I am not able to add it
> myself, as I don't have "editor" privileges on open items wiki. I have
> requested for those privileges.

I am going to shortly, after I reply to Shinoda-san's report.  While the
crash can be fixed with a simple patch, I think we need to consider a
bigger question of whether ON CONFLICT processing on leaf partitions
should really occur.  Commit 8355a011a0 enabled specifying ON CONFLICT DO
NOTHING on when inserting into a partitioned root table, but I think we
might need to reconsider.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-30 Thread vinayak



On 2017/03/30 17:39, Masahiko Sawada wrote:

On Wed, Mar 29, 2017 at 5:38 PM, vinayak
 wrote:

On 2017/03/25 4:30, Robert Haas wrote:

On Fri, Mar 24, 2017 at 3:41 AM, vinayak
 wrote:

I have updated the patch.

You can't change the definition of AcquireSampleRowsFunc without
updating the documentation in fdwhandler.sgml, but I think I don't
immediately understand why that's a thing we want to do at all if
neither file_fdw nor postgres_fdw are going to use the additional
argument.  It seems to be entirely pointless because the new value
being passed down is always zero and even if the callee were to update
it, do_analyze_rel() would just ignore the change on return.  Am I
missing something here?  Also, the whitespace-only changes to fdwapi.h
related to AcquireSampleRowsFunc going to get undone by pgindent, so
even if there's some reason to change that you should leave the lines
that don't need changing untouched.

I reviewed v7 patch.

Thank you for reviewing the patch.


When I ran ANALYZE command to the table having 5 millions rows with 3
child tables, the progress report I got shows the strange result. The
following result was output after sampled all target rows from child
tables (i.g, after finished acquire_inherited_sample_rows). I think
the progress information should report 100% complete at this point.

#= select * from pg_stat_progress_analyze ;
   pid  | datid | datname  | relid |  phase   |
num_target_sample_rows | num_rows_sampled
---+---+--+---+--++--
  81719 | 13215 | postgres | 16440 | collecting inherited sample rows |
300 |  180
(1 row)

I guess the cause of this is that you always count the number of
sampled rows in acquire_sample_rows staring from 0 for each child
table. num_rows_sampled is reset whenever starting collecting sample
rows.
Also, even if table has child table, the total number of sampling row
is not changed. I think that main differences between analyzing on
normal table and on partitioned table is where we fetch the sample row
from. So I'm not sure if adding "computing inherited statistics" phase
is desirable. If you want to report progress of collecting sample rows
on child table I think that you might want to add the information
showing which child table is being analyzed.

Yes. I think showing child table information would be good to user/DBA.

--
pg_stat_progress_analyze.num_target_sample_rows is set while ignoring
the number of rows the target table has actually. So If the table has
rows less than target number of rows, the num_rows_sampled never reach
to num_target_sample_rows.

--
@@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 }

 samplerows += 1;
+
+   /* Report number of rows sampled so far */
+
pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED,
numrows);
 }
 }

I think that it's wrong to use numrows variable to report the progress
of the number of rows sampled. As the following comment mentioned, we
first save row into rows[] and increment numrows until numrows <
targrows. And then we could replace stored sample row with new sampled
row. That is, after collecting "numrows" rows, analyze can still take
a long time to get and replace the sample row. To computing progress
of collecting sample row, IMO reporting the number of blocks we
scanned so far is better than number of sample rows. Thought?


 /*
  * The first targrows sample rows are simply copied into the
  * reservoir. Then we start replacing tuples in the sample
  * until we reach the end of the relation.  This algorithm is
  * from Jeff Vitter's paper (see full citation below). It
  * works by repeatedly computing the number of tuples to skip
  * before selecting a tuple, which replaces a randomly chosen
  * element of the reservoir (current set of tuples).  At all
  * times the reservoir is a true random sample of the tuples
  * we've passed over so far, so when we fall off the end of
  * the relation we're done.
  */
I think we can either report number of blocks scanned so far or number 
of sample rows.
But I agree with you that reporting the number of blocks scanned so far 
would be better than reporting number of sample rows.

I Understood that we could not change the definition of
AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml.
In the last patch I have modified the definition of AcquireSampleRowsFunc to
handle the inheritance case.
If the table being analyzed has one or more children, ANALYZE will gather
statistics twice:
once on the rows of parent table only and second on the rows of the parent
table with all of its children.
So while reporting the progress the value of number of target sample rows
and number o

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-30 Thread Amit Langote

On 2017/03/29 23:58, Robert Haas wrote:
> On Wed, Mar 29, 2017 at 4:21 AM, Amit Langote
>  wrote:
>> Looks correct, so incorporated in the attached updated patch.  Thanks.
> 
> This seems like a hacky way to limit the reloptions to just OIDs.
> Shouldn't we instead have a new RELOPT_KIND_PARTITIONED or something
> like that?

OK, I tried that in the updated patch.

DefineRelation() and ATExecSetRelOptions() still calls heap_reloptions(),
but passes the new relopt_kind.  None of the options listed in
reloptions.c are of this kind as of now, so parseRelOptions() simply
outputs the "unrecognized parameter %s" in the case of partitioned tables
(except in some cases described below, but not because of the limitations
of this patch).  If and when partitioned tables start supporting the
existing parameters, we'll update the relopt_gen.kinds bitmask of those
parameters to allow them to be specified for partitioned tables.

With the patch:

create table parted (a int) partition by list (a) with (fillfactor = 10);
ERROR:  unrecognized parameter "fillfactor"

create table parted (a int) partition by list (a) with (toast.fillfactor =
10);
ERROR:  unrecognized parameter "fillfactor"

and:

create table parted (a int) partition by list (a) with (oids = true);
alter table parted set (fillfactor = 90);
ERROR:  unrecognized parameter "fillfactor"

but:

-- appears to succeed, whereas an error should have been reported (I think)
alter table parted set (toast.fillfactor = 10);
ALTER TABLE

-- even with views
create table foo (a int);
create view foov with (toast.fillfactor = 10) as select * from foo;
CREATE VIEW
alter view foov set (toast.fillfactor = 20);
ALTER VIEW

Nothing is stored in pg_class.reloptions really, but the validation that
should have occurred in parseRelOptions() doesn't.  This happens because
of the way transformRelOptions() works.  It will ignore the DefElem's that
don't apply to the specified relation based on the value of the namspace
parameter ("toast" or NULL).  That means it won't be included in the array
of options later received by parseRelOptions(), which is where the
validation occurs.

Also, alter table reset (xxx) never validates anything:

alter table foo reset (foo);
ALTER TABLE
alter table foo reset (foo.bar);
ALTER TABLE

Again, no pg_class.reloptions update occurs in this case. The reason this
happens is because transformRelOptions() never includes the options to be
reset in the array of options received by parseRelOptions(), so no
validation occurs.

But since both are existing behaviors, perhaps we can worry about it some
other time.

Thanks,
Amit
>From f150dec562985642b1af83236b65a42f7350033e Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 24 Jan 2017 14:22:34 +0900
Subject: [PATCH] Do not allocate storage for partitioned tables.

Currently, it is not possible to insert any data into a partitioned
table.  So they're empty at all times, which means it is wasteful to
allocate relfilenode and related storage objects.

Also, because there is no storage, it doesn't make sense to allow
specifying storage parameters such as fillfactor, etc.

Patch by: Amit Langote & Maksim Milyutin
Reviwed by: Kyotaro Horiguchi
---
 doc/src/sgml/ref/create_table.sgml |  6 ++
 src/backend/access/common/reloptions.c | 33 ++
 src/backend/catalog/heap.c | 20 ++
 src/include/access/reloptions.h|  3 ++-
 src/test/regress/expected/alter_table.out  |  5 +
 src/test/regress/expected/create_table.out |  5 +
 src/test/regress/sql/alter_table.sql   |  5 +
 src/test/regress/sql/create_table.sql  |  4 
 8 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 283d53e203..5f13f240f5 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1040,6 +1040,12 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 will use the table's parameter value.

+   
+Note that specifying the following parameters for partitioned tables is
+not supported.  You must specify them for individual leaf partitions if
+necessary.
+   
+

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 72e12532ab..f3f5ade38f 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1024,21 +1024,28 @@ parseRelOptions(Datum options, bool validate, relopt_kind kind,
 		if (relOpts[i]->kinds & kind)
 			numoptions++;

-	if (numoptions == 0)
+	/*
+	 * Even if we found no options of kind, it's better to raise the
+	 * "unrecognized parameter %s" error here, instead of at the caller.
+	 */
+	if (numoptions == 0 && !validate)
 	{
 		*numrelopts = 0;
 		return NULL;
 	}

-	reloptions = palloc(numoptions * sizeof(relopt_value));
-
-	for (i = 0, j = 0; relOpts[i]; i++)
+	if (numoptions > 0)

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-30 Thread Ashutosh Bapat

> On Wed, Mar 29, 2017 at 8:39 AM, Robert Haas  wrote:
> I don't think 0011 is likely to be acceptable in current form.  I
> can't imagine that we just went to the trouble of getting rid of
> AppendRelInfos for child partitioned rels only to turn around and put
> them back again.  If you just need the parent-child mappings, you can
> get that from the PartitionedChildRelInfo list.
>

Please refer to my earlier mails on this subject [1], [2]. For
multi-level partition-wise join, we need RelOptInfo of a partitioned
table to contain RelOptInfo of its immediate partitions. I have not
seen any counter arguments not to create RelOptInfos for intermediate
partitioned tables. We create child RelOptInfos only for entries in
root->append_rel_list i.e. only for those relations which have an
AppendRelInfo. Since we are not creating AppendRelInfos for
partitioned partitions, we do not create RelOptInfos for those. So, to
me it looks like we have to either have AppendRelInfos for partitioned
partitions or create RelOptInfos by traversing some other list like
PartitionedChildRelInfo list. It looks odd to walk
root->append_rel_list as well as this new list for creating
RelOptInfos. But for a moment, we assume that we have to walk this
other list. But then that other list is also lossy. It stores only the
topmost parent of any of the partitioned partitions and not the
immediate parent as required to add RelOptInfos of immediate children
to the RelOptInfo of a parent.

Coming back to the point of PartitionedChildRelInfo list as a way to
maintain parent - child relationship. All the code assumes that the
parent-child relationship is stored in AppendRelInfo linked as
root->append_rel_list and walks that list to find children of a given
parent of parent/s of a given child. We will have to modify all those
places to traverse two lists instead of one. Some of those even return
AppendRelInfo structure, and now they some times return an
AppendRelInfo and sometimes PartitionedChildRelInfo. That looks ugly.

Consider a case where P has partitions p1 and p2, which in turn have
partitions p11, p12 and p21, p22 resp. Another partitioned table Q has
partitions q1, q2. q1 is further partitioned into q11, q12 but q2 is
not partitioned. The partitioning scheme of P and Q matches. Also,
partitioning scheme of p1 and q1 matches. So, a partition-wise join
between P and Q would look like P J Q = append (p11 J q11, p12 J q12,
p2 J q2), p2 J q2 being append(p21, p22) J q2. When constructing the
restrictlist (and other clauses) for p2 J q2 we need to translate the
restrictlist applicable for P J Q. This translation requires
AppendRelInfo of p2 which does not exist today. We can not use
PartitionedChildRelInfo because it doesn't have information about how
to translate Vars of P to those of p2.

I don't see a way to avoid creating AppendRelInfos for partitioned partitions.

[1] 
https://www.postgresql.org/message-id/CAFjFpRefs5ZMnxQ2vP9v5zOtWtNPuiMYc01sb1SWjCOB1CT%3DuQ%40mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-30 Thread Kuntal Ghosh

On Tue, Mar 28, 2017 at 1:06 AM, Beena Emerson  wrote:
> On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut
>  wrote:
>>
>> At this point, I suggest splitting this patch up into several
>> potentially less controversial pieces.
>>
>> One big piece is that we currently don't support segment sizes larger
>> than 64 GB, for various internal arithmetic reasons.  Your patch appears
>> to address that.  So I suggest isolating that.  Assuming it works
>> correctly, I think there would be no great concern about it.
>>
>> The next piece would be making the various tools aware of varying
>> segment sizes without having to rely on a built-in value.
>>
>> The third piece would then be the rest that allows you to set the size
>> at initdb
>>
>> If we take these in order, we would make it easier to test various sizes
>> and see if there are any more unforeseen issues when changing sizes.  It
>> would also make it easier to do performance testing so we can address
>> the original question of what the default size should be.
>
>
> PFA the patches divided into 3 parts:
Thanks for splitting the patches.
01-add-XLogSegmentOffset-macro.patch is same as before and it looks good.

> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.
looks good.

> 03-modify-tools.patch - Makes XLogSegSize into a variable, currently set as
> XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
> inbuilt value.
Several methods are declared and defined in different tools to fetch
the size of wal-seg-size.
In pg_standby.c,
RetrieveXLogSegSize() - /* Set XLogSegSize from the WAL file header */

In pg_basebackup/streamutil.c,
 on behaRetrieveXLogSegSize(PGconn *conn) - /* set XLogSegSize using
SHOW wal_segment_size */

In pg_waldump.c,
ReadXLogFromDir(char *archive_loc)
RetrieveXLogSegSize(char *archive_path) /* Scan through the archive
location to set XLogSegsize from the first WAL file */

IMHO, it's better to define a single method in xlog.c and based on the
different strategy, it can retrieve the XLogSegsize on behalf of
different modules. I've suggested the same in my first set review and
I'll still vote for it. For example, in xlog.c, you can define
something as following:
bool RetrieveXLogSegSize(RetrieveStrategy rs, void* ptr)

Now based on the RetrieveStrategy(say Conn, File, Dir), you can cast
the void pointer to the appropriate type. So, when a new tool needs to
retrieve XLogSegSize, it can just call this function instead of
defining a new RetrieveXLogSegSize method.

It's just a suggestion from my side. Is there anything I'm missing
which can cause the aforesaid approach not to be working?
Apart from that, I've nothing to add here.

> 04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and
> make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE
> instead of XLOG_SEG_SIZE
looks good.

>>
>> One concern I have is that your patch does not contain any tests.  There
>> should probably be lots of tests.
>
>
> 05-initdb_tests.patch adds tap tests to initialize cluster with different
> wal_segment_size and then check the config values. What other tests do you
> have in mind? Checking the various tools?
Nothing from me to add here.

I've nothing to add here for the attached set of patches. On top of
these, David's patch can be used for preserving LSNs in the file
naming for all segment sizes.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-30 Thread Amit Langote

Shinoda-san,

Thanks a lot for testing.

On 2017/03/30 10:30, Shinoda, Noriyoshi wrote:
> Hello, 
> 
> I tried this feature using most recently snapshot. In case of added 
> constraint PRIMARY KEY for partition table, INSERT ON CONFLICT DO NOTHING 
> statement failed with segmentaion fault.
> If the primary key constraint was not created on the partition, this 
> statement executed successfully.
> 
> - Test
> postgres=> CREATE TABLE part1(c1 NUMERIC, c2 VARCHAR(10)) PARTITION BY RANGE 
> (c1) ;
> CREATE TABLE
> postgres=> CREATE TABLE part1p1 PARTITION OF part1 FOR VALUES FROM (100) TO 
> (200) ;
> CREATE TABLE
> postgres=> ALTER TABLE part1p1 ADD CONSTRAINT pk_part1p1 PRIMARY KEY (c1) ;
> ALTER TABLE
> postgres=> INSERT INTO part1 VALUES (100, 'init') ON CONFLICT DO NOTHING ;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !> \q

I found out why the crash occurs, but while I was trying to fix it, I
started growing doubtful about the way this is being handled currently.

Patch to fix the crash would be to pass 'true' instead of 'false' for
speculative when ExecSetupPartitionTupleRouting() calls ExecOpenIndices()
on leaf partitions.  That will initialize the information needed when
ExecInsert() wants check for conflicts using the constraint-enforcing
indexes.  If we do initialize the speculative insertion info (which will
fix the crash), ExecCheckIndexConstraints() will be called on a given leaf
partition's index to check if there is any conflict.  But since the insert
was performed on the root table, conflicts should be checked across all
the partitions, which won't be the case.  Even though the action is
NOTHING, the check for conflicts still uses only that one leaf partition's
index, which seems insufficient.

Commit 8355a011a0 enabled specifying ON CONFLICT DO NOTHING on when
inserting into a partitioned root table, but given the above, I think we
might need to reconsider it.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-03-30 Thread Venkata B Nagothi

On Thu, Mar 30, 2017 at 4:46 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello,
>
> At Thu, 30 Mar 2017 15:59:14 +1100, Venkata B Nagothi 
> wrote in  xwu0j05x_j...@mail.gmail.com>
> > Yes, downloaded from the email on Windows and copied across to Linux and
> > did "git apply".
>
> The same works for me. But --keep-cr gave me the same result with
> you.
>
> > $ git am --keep-cr ~/work/patches/0001-Fix-a-bug-
> of-physical-replication-slot_a6f22e8.patch
> > Applying: Fix a bug of physical replication slot.
> > .git/rebase-apply/patch:13: trailing whitespace.
> > /*
>

for me too -

[dba@buildhost postgresql]$ git am --keep-cr
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch
Applying: Fix a bug of physical replication slot.
/data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:13: trailing
whitespace.
/*
/data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:14: trailing
whitespace.
 * This variable corresponds to restart_lsn in pg_replication_slots for a
/data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:15: trailing
whitespace.
 * physical slot. This has a valid value only when it differs from the
current
/data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:16: trailing
whitespace.
 * flush pointer.
/data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch:17: trailing
whitespace.
 */
error: patch failed: src/backend/replication/walsender.c:210
error: src/backend/replication/walsender.c: patch does not apply
Patch failed at 0001 Fix a bug of physical replication slot.
The copy of the patch that failed is found in:
   /data/PostgreSQL-GIT-Repo/postgresql/.git/rebase-apply/patch
When you have resolved this problem, run "git am --resolved".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

> https://git-scm.com/docs/git-am
>
> | --[no-]keep-cr
> |
> | With --keep-cr, call git mailsplit (see git-mailsplit[1]) with
> | the same option, to prevent it from stripping CR at the end of
> | lines. am.keepcr configuration variable can be used to specify
> | the default behaviour. --no-keep-cr is useful to override
> | am.keepcr.
>
> I don't know why it preserves CRs only for the lines, but anyway,
> don't you have am.keepcr in you configuration?
>

May be, I do not think i have am.keepcr in my configuration. I am not 100%
sure of it.

I only did "git apply.." which produced white space errors.

Regards,

Venkata B N
Database Consultant

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Dilip Kumar

On Wed, Mar 29, 2017 at 11:51 AM, Pavan Deolasee
 wrote:
> Thanks. I think your patch of tracking interesting attributes seems ok too
> after the performance issue was addressed. Even though we can still improve
> that further, at least Mithun confirmed that there is no significant
> regression anymore and in fact for one artificial case, patch does better
> than even master.

I was trying to compile these patches on latest
head(f90d23d0c51895e0d7db7910538e85d3d38691f0) for some testing but I
was not able to compile it.

make[3]: *** [postgres.bki] Error 1
make[3]: Leaving directory
`/home/dilip/work/pg_codes/pbms_final/postgresql/src/backend/catalog'
make[2]: *** [submake-schemapg] Error 2
make[2]: Leaving directory
`/home/dilip/work/pg_codes/pbms_final/postgresql/src/backend'
make[1]: *** [all-backend-recurse] Error 2
make[1]: Leaving directory `/home/dilip/work/pg_codes/pbms_final/postgresql/src'
make: *** [all-src-recurse] Error 2

I tried doing maintainer-clean, deleting postgres.bki but still the same error.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting huge pages on Windows

2017-03-30 Thread Amit Kapila

On Thu, Mar 9, 2017 at 7:06 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ashutosh Sharma
>> To start with, I ran the regression test-suite and didn't find any failures.
>> But, then I am not sure if huge_pages are getting used or not. However,
>> upon checking the settings for huge_pages and I found it as 'on'. I am
>> assuming, if huge pages is not being used due to shortage of large pages,
>> it should have fallen back to non-huge pages.
>
> You are right, the server falls back to non-huge pages when the large pages 
> run short.
>

The latest patch looks good to me apart from one Debug message, so I
have marked it as Ready For Committer.

+ ereport(DEBUG1,
+ (errmsg("disabling huge pages")));

I think this should be similar to what we display in sysv_shmem.c as below:

elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m",
allocsize);


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-30 Thread Amit Langote

On 2017/03/30 18:35, Ashutosh Bapat wrote:
>> On Wed, Mar 29, 2017 at 8:39 AM, Robert Haas  wrote:
>> I don't think 0011 is likely to be acceptable in current form.  I
>> can't imagine that we just went to the trouble of getting rid of
>> AppendRelInfos for child partitioned rels only to turn around and put
>> them back again.  If you just need the parent-child mappings, you can
>> get that from the PartitionedChildRelInfo list.
>>
> 
> Please refer to my earlier mails on this subject [1], [2]. For
> multi-level partition-wise join, we need RelOptInfo of a partitioned
> table to contain RelOptInfo of its immediate partitions. I have not
> seen any counter arguments not to create RelOptInfos for intermediate
> partitioned tables. We create child RelOptInfos only for entries in
> root->append_rel_list i.e. only for those relations which have an
> AppendRelInfo. Since we are not creating AppendRelInfos for
> partitioned partitions, we do not create RelOptInfos for those. So, to
> me it looks like we have to either have AppendRelInfos for partitioned
> partitions or create RelOptInfos by traversing some other list like
> PartitionedChildRelInfo list. It looks odd to walk
> root->append_rel_list as well as this new list for creating
> RelOptInfos. But for a moment, we assume that we have to walk this
> other list. But then that other list is also lossy. It stores only the
> topmost parent of any of the partitioned partitions and not the
> immediate parent as required to add RelOptInfos of immediate children
> to the RelOptInfo of a parent.

So, because we want to create an Append path for each partitioned table in
a tree separately, we'll need RelOptInfo for each one, which in turn
requires an AppendRelInfo.  Note that we do that only for those
partitioned child RTEs that have inh set to true, so that all the later
stages will treat it as the parent rel to create an Append path for.
There would still be partitioned child RTEs with inh set to false for
which, just like before, no AppendRelInfos and RelOptInfos are created;
they get added as the only member of partitioned_rels in the
PartitionedChildRelInfo of each partitioned table.  Finally, when the
Append path for the root parent is created, its subpaths list will contain
paths of leaf partitions of all levels and its partitioned_rels list
should contain the RT indexes of partitioned tables of all levels.

If we have the following partition tree:

  A
/ | \
   B  C  D
 / \
E   F

The following RTEs will be created, in that order.  RTEs with inh=true are
shown with suffix _i.  RTEs that get an AppendRelInfo (& a RelOptInfo) are
shown with suffix _a.

A_i_a
A
B_a
C_i_a
C
E_a
F_a
D_a

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Pavan Deolasee

On Wed, Mar 29, 2017 at 4:42 PM, Amit Kapila 
wrote:

> On Wed, Mar 29, 2017 at 1:10 PM, Pavan Deolasee
>  wrote:
> >
> > On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila 
> > wrote:
> >>
> >> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila 
> >> wrote:
> >
> > Then during recheck, we pass already compressed values to
> > index_form_tuple(). But my point is, the following code will ensure that
> we
> > don't compress it again. My reading is that the first check for
> > !VARATT_IS_EXTENDED will return false if the value is already compressed.
> >
>
> You are right.  I was confused with previous check of VARATT_IS_EXTERNAL.
>
>
Ok, thanks.


> >
> > TBH I couldn't find why the original index insertion code will always
> supply
> > uncompressed values.
> >
>
> Just try by inserting large value of text column ('aa.bbb')
> upto 2.5K.  Then have a breakpoint in heap_prepare_insert and
> index_form_tuple, and debug both the functions, you can find out that
> even though we compress during insertion in heap, the index will
> compress the original value again.
>
>
Ok, tried that. AFAICS index_form_tuple gets compressed values.


>
> Yeah probably you are right, but I am not sure if it is good idea to
> compare compressed values.
>
>
Again, I don't see a problem there.


> I think with this new changes in btrecheck, it would appear to be much
> costlier as compare to what you have few versions back.  I am afraid
> that it can impact performance for cases where there are few WARM
> updates in chain and many HOT updates as it will run recheck for all
> such updates.


My feeling is that the recheck could be costly for very fat indexes, but
not doing WARM could be costly too for such indexes. We can possibly
construct a worst case where
1. set up a table with a fat index.
2. do a WARM update to a tuple
3. then do several HOT updates to the same tuple
4. query the row via the fat index.


Initialisation:

-- Adjust parameters to force index scans
-- enable_seqscan to false
-- seq_page_cost = 1

DROP TABLE IF EXISTS pgbench_accounts;

CREATE TABLE pgbench_accounts (
aid text,
bid bigint,
abalance bigint,
filler1 text DEFAULT md5(random()::text),
filler2 text DEFAULT md5(random()::text),
filler3 text DEFAULT md5(random()::text),
filler4 text DEFAULT md5(random()::text),
filler5 text DEFAULT md5(random()::text),
filler6 text DEFAULT md5(random()::text),
filler7 text DEFAULT md5(random()::text),
filler8 text DEFAULT md5(random()::text),
filler9 text DEFAULT md5(random()::text),
filler10 text DEFAULT md5(random()::text),
filler11 text DEFAULT md5(random()::text),
filler12 text DEFAULT md5(random()::text)
) WITH (fillfactor=90);
\set end 0
\set start (:end + 1)
\set end (:start + (:scale * 100))

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
<2300 chars string>, (random()::bigint) % :scale, 0;

CREATE UNIQUE INDEX pgb_a_aid ON pgbench_accounts(aid);
CREATE INDEX pgb_a_filler1 ON pgbench_accounts(filler1);
CREATE INDEX pgb_a_filler2 ON pgbench_accounts(filler2);
CREATE INDEX pgb_a_filler3 ON pgbench_accounts(filler3);
CREATE INDEX pgb_a_filler4 ON pgbench_accounts(filler4);

-- Force a WARM update on one row
UPDATE pgbench_accounts SET filler1 = 'X' WHERE aid = '100' ||
repeat('abcdefghij', 2);

Test:
-- Fetch the row using the fat index. Since the row contains a
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
 <2300 chars string> ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
 <2300 chars string>;
END;

I did 4 5-minutes runs with master and WARM and there is probably a 2-3%
regression.

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
scans on the fat index)
master:
txns  idx_scan
414117 828233
411109 822217
411848 823695
408424 816847

WARM:
txns   idx_scan
404139 808277
398880 797759
399949 799897
397927 795853

==

I then also repeated the tests, but this time using compressible values.
The regression in this case is much higher, may be 15% or more.

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
repeat('abcdefghij', 2), (random()::bigint) % :scale, 0;

-- Fetch the row using the fat index. Since the row contains a
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
repeat('abcdefghij', 2) ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
repeat('abcdefghij', 2);
END;

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
scans on the fat index)
master:
txns   idx_scan
56976 113953
56822 113645
56915 113831
56865 113731

WARM:
txns  idx_scan
49044 98087
49020 98039
49007 98013
49006 98011

But TBH I believe this regression is coming from the changes
to heap_tuple_attr_equals where we are decompressing both old and new
values and then comparing them. For 200K bytes long values, that must be
something. Another reason why I think so is

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-30 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 30 Mar 2017 18:24:16 +0900, Amit Langote 
 wrote in 

> On 2017/03/29 23:58, Robert Haas wrote:
> > On Wed, Mar 29, 2017 at 4:21 AM, Amit Langote
> >  wrote:
> >> Looks correct, so incorporated in the attached updated patch.  Thanks.
> > 
> > This seems like a hacky way to limit the reloptions to just OIDs.
> > Shouldn't we instead have a new RELOPT_KIND_PARTITIONED or something
> > like that?
> 
> OK, I tried that in the updated patch.

The name RELOPT_KIND_PARTITIONED looks somewhat odd. RELKIND for
partitioned tables is RELKIND_PARTITIONED_TABLE, so is this
better to be _TABLE, even if a bit longer?

parseRelOptions seems to return garbage pointer if no option of
the kind is available.

> DefineRelation() and ATExecSetRelOptions() still calls heap_reloptions(),
> but passes the new relopt_kind.  None of the options listed in
> reloptions.c are of this kind as of now, so parseRelOptions() simply
> outputs the "unrecognized parameter %s" in the case of partitioned tables
> (except in some cases described below, but not because of the limitations
> of this patch).  If and when partitioned tables start supporting the
> existing parameters, we'll update the relopt_gen.kinds bitmask of those
> parameters to allow them to be specified for partitioned tables.
> 
> With the patch:
> 
> create table parted (a int) partition by list (a) with (fillfactor = 10);
> ERROR:  unrecognized parameter "fillfactor"
> 
> create table parted (a int) partition by list (a) with (toast.fillfactor =
> 10);
> ERROR:  unrecognized parameter "fillfactor"
> 
> and:
> 
> create table parted (a int) partition by list (a) with (oids = true);
> alter table parted set (fillfactor = 90);
> ERROR:  unrecognized parameter "fillfactor"
> 
> but:
> 
> -- appears to succeed, whereas an error should have been reported (I think)
> alter table parted set (toast.fillfactor = 10);
> ALTER TABLE
> 
> -- even with views
> create table foo (a int);
> create view foov with (toast.fillfactor = 10) as select * from foo;
> CREATE VIEW
> alter view foov set (toast.fillfactor = 20);
> ALTER VIEW
> 
> Nothing is stored in pg_class.reloptions really, but the validation that
> should have occurred in parseRelOptions() doesn't.  This happens because
> of the way transformRelOptions() works.  It will ignore the DefElem's that
> don't apply to the specified relation based on the value of the namspace
> parameter ("toast" or NULL).  That means it won't be included in the array
> of options later received by parseRelOptions(), which is where the
> validation occurs.
> 
> Also, alter table reset (xxx) never validates anything:
> 
> alter table foo reset (foo);
> ALTER TABLE
> alter table foo reset (foo.bar);
> ALTER TABLE
> 
> Again, no pg_class.reloptions update occurs in this case. The reason this
> happens is because transformRelOptions() never includes the options to be
> reset in the array of options received by parseRelOptions(), so no
> validation occurs.
> 
> But since both are existing behaviors, perhaps we can worry about it some
> other time.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CORRESPONDING clause design

2017-03-30 Thread Surafel Temesgen

hi

Thank you very much for your help .
here is the patch fix that issue as you suggest

Regards

Surafel


On Tue, Mar 28, 2017 at 5:44 PM, Pavel Stehule 
wrote:

>
>
> 2017-03-28 14:18 GMT+02:00 Pavel Stehule :
>
>>
>>
>> 2017-03-28 13:58 GMT+02:00 Surafel Temesgen :
>>
>>> can you help with fixing it Pavel?
>>>
>>
>> There must be some new preanalyze stage - you have to know result columns
>> before you are starting a analyze
>>
>
> maybe some recheck after analyze stage to remove invalid columns can be
> good enough.
>
> Regards
>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>>
>>> On Mon, Mar 27, 2017 at 11:48 AM, Pavel Stehule >> > wrote:
>>>
 Hi

 fresh update - I enhanced Value node by location field as Tom proposal.

 Few more regress tests.

 But I found significant issue, that needs bigger fix - Surafel, please,
 can you fix it.

 It crash on

 SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
 UNION ALL CORRESPONDING SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS
 x6
 UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS b, -100 AS x9;

 I'll mark this patch as waiting on author

 Regards

 Pavel



>>>
>>
>


corresponding_clause_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-30 Thread Ashutosh Bapat

The patch applies cleanly, compiles. make check in regress as well as
postgres_fdw works fine. Here are few comments

local-join should be local join.

The comments should explain why.
+/* Should be unparameterized */
+Assert(outer_path->param_info == NULL);
+Assert(inner_path->param_info == NULL);

+ a suitable local join path, which can be used as the alternative local
May be we should reword this as ... which can be used to create an alternative
local ... This rewording is required even in the existing docs.

+/* outer_path should not require rels from inner_path */
+Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent));
Probably this should throw an error or return NULL in such case rather than
Asserting. This function is callable from any FDW, and that FDW may provide
such paths, may be because of an internal bug. Same case with
+/* Neither path should require rels from the other path */
+Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
+   !PATH_PARAM_BY_REL(inner_path, outer_path->parent));

While the comment below mentions ON true, the testcase you have added is for ON
false. Either the testcase should change or this comment. That raises another
question, what happens when somebody does FULL JOIN ON false?
+ * If special case: for "x FULL JOIN y ON true", there

Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We should be able
to create a nested loop join for JOIN_RIGHT?
+case JOIN_RIGHT:
+case JOIN_FULL:

On Thu, Mar 23, 2017 at 5:50 PM, Etsuro Fujita
 wrote:
> On 2017/03/21 18:40, Etsuro Fujita wrote:
>>
>> Ok, I'll update the patch.  One thing I'd like to revise in addition to
>> that is (1) add to JoinPathExtraData a flag member to indicate whether
>> to give the FDW a chance to consider a remote join, which will be set to
>> true if the joinrel's fdwroutine is not NULL and the fdwroutine's
>> GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info
>> to create an alternative local join path, such as hashclauses and
>> mergeclauses proposed in the patch, into JoinPathExtraData in
>> add_paths_to_joinrel.  This would avoid useless overhead in saving such
>> info into JoinPathExtraData when we don't give the FDW that chance.
>
>
> Done.  Attached is a new version of the patch.
>
> Best regards,
> Etsuro Fujita



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sequence data type

2017-03-30 Thread Vitaly Burovoy

On 3/29/17, Vitaly Burovoy  wrote:
> On 3/29/17, Michael Paquier  wrote:
>> On Thu, Mar 30, 2017 at 11:18 AM, Vitaly Burovoy
>>  wrote:
>>> I think min_value and max_value should not be set to "1" or "-1" but
>>> to real min/max of the type by default.
>>
>> This is the default behavior for ages, since e8647c45 to be exact. So
>> you would change 20 years of history?
>
> ... is it a wrong way to keep historical minimum as "1" by
> default: it is not a minimum of any of supported type.

I've read the standard about "minvalue", "maxvalue" and "start".
OK, I was wrong. Since "start" should be equal to "minvalue" unless
defined explicitly, the only bug left from my first email here is
resetting "minvalue" back to 1 when data type changes and if the value
matches the bound of the old type (the last case there).

P.S.: the same thing with "maxvalue" when "increment" is negative.

-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Amit Kapila

On Thu, Mar 30, 2017 at 4:07 PM, Pavan Deolasee
 wrote:
>
> On Wed, Mar 29, 2017 at 4:42 PM, Amit Kapila 
> wrote:
>>
>> On Wed, Mar 29, 2017 at 1:10 PM, Pavan Deolasee
>>  wrote:
>> >
>> > On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila 
>> > wrote:
>> >>
>> >> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila 
>> >> wrote:
>> >
>> > Then during recheck, we pass already compressed values to
>> > index_form_tuple(). But my point is, the following code will ensure that
>> > we
>> > don't compress it again. My reading is that the first check for
>> > !VARATT_IS_EXTENDED will return false if the value is already
>> > compressed.
>> >
>>
>> You are right.  I was confused with previous check of VARATT_IS_EXTERNAL.
>>
>
> Ok, thanks.
>
>>
>> >
>> > TBH I couldn't find why the original index insertion code will always
>> > supply
>> > uncompressed values.
>> >
>>
>> Just try by inserting large value of text column ('aa.bbb')
>> upto 2.5K.  Then have a breakpoint in heap_prepare_insert and
>> index_form_tuple, and debug both the functions, you can find out that
>> even though we compress during insertion in heap, the index will
>> compress the original value again.
>>
>
> Ok, tried that. AFAICS index_form_tuple gets compressed values.
>

How have you verified that?  Have you checked that in
heap_prepare_insert it has called toast_insert_or_update() and then
returned a tuple different from what the input tup is?  Basically, I
am easily able to see it and even the reason why the heap and index
tuples will be different.  Let me try to explain,
toast_insert_or_update returns a new tuple which contains compressed
data and this tuple is inserted in heap where as slot still refers to
original tuple (uncompressed one) which is passed to heap_insert.
Now, ExecInsertIndexTuples and the calls under it like FormIndexDatum
will refer to the tuple in slot which is uncompressed and form the
values[] using uncompressed value.  Try with a simple case as below:

Create table t_comp(c1 int, c2 text);
Create index idx_t_comp_c2 on t_comp(c2);
Create index idx_t_comp_c1 on t_comp(c1);

Insert into t_comp(1,' ...aaa');

Repeat 'a' in above line for 2700 times or so.  You should notice what
I am explaining above.


>>
>>
>> Yeah probably you are right, but I am not sure if it is good idea to
>> compare compressed values.
>>
>
> Again, I don't see a problem there.
>
>>
>> I think with this new changes in btrecheck, it would appear to be much
>> costlier as compare to what you have few versions back.  I am afraid
>> that it can impact performance for cases where there are few WARM
>> updates in chain and many HOT updates as it will run recheck for all
>> such updates.
>
>
>
> INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
> <2300 chars string>, (random()::bigint) % :scale, 0;
>
> CREATE UNIQUE INDEX pgb_a_aid ON pgbench_accounts(aid);
> CREATE INDEX pgb_a_filler1 ON pgbench_accounts(filler1);
> CREATE INDEX pgb_a_filler2 ON pgbench_accounts(filler2);
> CREATE INDEX pgb_a_filler3 ON pgbench_accounts(filler3);
> CREATE INDEX pgb_a_filler4 ON pgbench_accounts(filler4);
>
> -- Force a WARM update on one row
> UPDATE pgbench_accounts SET filler1 = 'X' WHERE aid = '100' ||
> repeat('abcdefghij', 2);
>
> Test:
> -- Fetch the row using the fat index. Since the row contains a
> BEGIN;
> SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
> <2300 chars string> ORDER BY aid;
> UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
> <2300 chars string>;
> END;
>
> I did 4 5-minutes runs with master and WARM and there is probably a 2-3%
> regression.
>

So IIUC, in above test during initialization you have one WARM update
and then during actual test all are HOT updates, won't in such a case
the WARM chain will be converted to HOT by vacuum and then all updates
from thereon will be HOT and probably no rechecks?

> (Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
> scans on the fat index)
> master:
> txns  idx_scan
> 414117 828233
> 411109 822217
> 411848 823695
> 408424 816847
>
> WARM:
> txns   idx_scan
> 404139 808277
> 398880 797759
> 399949 799897
> 397927 795853
>
> ==
>
> I then also repeated the tests, but this time using compressible values. The
> regression in this case is much higher, may be 15% or more.
>

Sounds on higher side.

> INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
> repeat('abcdefghij', 2), (random()::bigint) % :scale, 0;
>
> -- Fetch the row using the fat index. Since the row contains a
> BEGIN;
> SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
> repeat('abcdefghij', 2) ORDER BY aid;
> UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
> repeat('abcdefghij', 2);
> END;
>
> (Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
> scans on the fat index)
> master:
> txns   idx_scan
> 56976 113953

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-30 Thread Amit Langote

Thanks for the review.

On Thu, Mar 30, 2017 at 7:37 PM, Kyotaro HORIGUCHI wrote:
> At Thu, 30 Mar 2017 18:24:16 +0900, Amit Langote wrote:
>> On 2017/03/29 23:58, Robert Haas wrote:
>> > On Wed, Mar 29, 2017 at 4:21 AM, Amit Langote
>> >  wrote:
>> >> Looks correct, so incorporated in the attached updated patch.  Thanks.
>> >
>> > This seems like a hacky way to limit the reloptions to just OIDs.
>> > Shouldn't we instead have a new RELOPT_KIND_PARTITIONED or something
>> > like that?
>>
>> OK, I tried that in the updated patch.
>
> The name RELOPT_KIND_PARTITIONED looks somewhat odd. RELKIND for
> partitioned tables is RELKIND_PARTITIONED_TABLE, so is this
> better to be _TABLE, even if a bit longer?

Hm, OK.  Done.

> parseRelOptions seems to return garbage pointer if no option of
> the kind is available.

Oops, fixed that too.

Updated patch attached.

Thanks,
Amit


0001-Do-not-allocate-storage-for-partitioned-tables.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-03-30 Thread Peter Moser

2017-03-01 10:56 GMT+01:00 Peter Moser :
> A similar walkthrough for ALIGN will follow soon.
>
> We are thankful for any suggestion or ideas, to be used to write a
> good SGML documentation.

The attached README explains the ALIGN operation step-by-step with a
TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
input, show how we rewrite it during parser stage, and show how the
final execution generates result tuples.


Best regards,
Anton, Michael, Johann, Peter

ALIGN


  This is an exemplary walkthrough of a temporal ALIGN operation, from the
  query input, over the query rewrite, until the result tuple outputs during
  execution.



EXAMPLE


  These tables represent employees and projects in a company. An employee works
  for a department over a certain time period. Different projects get developed
  in a department. The start and end time points of each project is stored in
  the period column.

CREATE TABLE emp (empname VARCHAR, dept VARCHAR, period INT4RANGE);
INSERT INTO emp VALUES
('Ann', 'DB', '[1,5)'),
('Bea', 'DB', '[3,8)'),
('Ron', 'AI', '[6,9)');

CREATE TABLE prj (prjname VARCHAR, dept VARCHAR, period INT4RANGE);
INSERT INTO prj VALUES
('PR1', 'DB', '[3,7)'),
('PR2', 'DB', '[1,5)'),
('PR3', 'HW', '[2,8)');


Timeline representation
---

EMPNAME  DEPT  PERIOD
Ann  DB[1,5)    <- Tuple 1, valid from 1 to 5 excl.
Bea  DB[3,8)-   <- Tuple 2
Ron  AI[6,9)   ---  <- Tuple 3
  123456789 <- Timeline

PRJNAME  DEPT  PERIOD
PR1  DB[3,7)<- Tuple 1, valid from 3 to 7 excl.
PR2  DB[1,5)    <- Tuple 2
PR3  HW[2,8)   --   <- Tuple 3
  123456789 <- Timeline


TEMPORAL LEFT OUTER JOIN query
--

  Query: At each time point, to which projects is an  employee assigned, and
 when does an employee not have an assigned project?


WITH emp AS (SELECT period u, * FROM emp),
 prj AS (SELECT period v, * FROM prj)
SELECT empname, prjname, dept, period FROM (
emp ALIGN prj ON emp.dept = prj.dept WITH (emp.period, prj.period)
) emp_aligned
LEFT OUTER JOIN (
prj ALIGN emp ON emp.dept = prj.dept WITH (prj.period, emp.period)
) prj_aligned
USING(dept, period)
WHERE period = u * v OR u IS NULL OR v IS NULL;


Result
--

 empname | prjname | dept | period
-+-+--+
 Ann | PR2 | DB   | [1,5)   
 Ann | PR1 | DB   | [3,5) --
 Bea | PR1 | DB   | [3,7) 
 Bea | PR2 | DB   | [3,5) --
 Bea | | DB   | [7,8) -
 Ron | | AI   | [6,9)---
123456789  <- Timeline



STEP-BY-STEP EXPLANATION


  In this chapter we describe first the rewrite and processing of the two ALIGN
  clauses, that are,

( emp ALIGN prj ON emp.dept = prj.dept WITH (emp.period, prj.period)
) emp_aligned

  ...and...

( prj ALIGN emp ON emp.dept = prj.dept WITH (prj.period, emp.period)
) prj_aligned.

  Secondly, we explain what the above query does in general, and why we need two
  ALIGNs, the WHERE-clause and CTEs (WITH clauses).


ALIGN subquery processing
-

  After receiving the ALIGN query (see above) as input, we rewrite it into
  the following query:

SELECT emp.*, GREATEST(LOWER(emp.period), LOWER(prj.period)) p1,
 LEAST(UPPER(emp.period), UPPER(prj.period)) p2
FROM
(SELECT *, row_id() OVER () rn FROM emp) emp
LEFT OUTER JOIN
prj
ON emp.dept = prj.dept AND emp.period && prj.period
ORDER BY rn, p1, p2;

  Then, we rewrite the second ALIGN subquery analoguously.


Intermediate results: These are the inputs of our ALIGN execution nodes
---
  (See Appendix [1] for details)

  For emp ALIGN prj...

TUPIDempname | dept | period | rn | p1 | p2
-+--++++
  A  Ann | DB   | [1,5)  |  1 |  1 |  5
  B  Ann | DB   | [1,5)  |  1 |  3 |  5
  C  Bea | DB   | [3,8)  |  2 |  3 |  5
  D  Bea | DB   | [3,8)  |  2 |  3 |  7
  E  Ron | AI   | [6,9)  |  3 |  6 |  9


  For prj ALIGN emp...

TUPIDprjname | dept

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-30 Thread Jesper Pedersen


Hi Ashutosh,

On 03/29/2017 09:16 PM, Ashutosh Sharma wrote:

This patch needs a rebase.


Please try applying these patches on top of [1]. I think you should be able
to apply it cleanly. Sorry, I think I forgot to mention this point in my
earlier mail.

[1] -
https://www.postgresql.org/message-id/CAE9k0P%3DV2LhtyeMXd295fhisp%3DNWUhRVJ9EZQCDowWiY9rSohQ%40mail.gmail.com



Thanks, that works !

As you have provided a patch for Robert's comments, and no other review 
have been posted I'm moving this patch to "Ready for Committer" for 
additional committer feedback.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Pavan Deolasee

On Thu, Mar 30, 2017 at 5:27 PM, Amit Kapila 
wrote:

>
>
> How have you verified that?  Have you checked that in
> heap_prepare_insert it has called toast_insert_or_update() and then
> returned a tuple different from what the input tup is?  Basically, I
> am easily able to see it and even the reason why the heap and index
> tuples will be different.  Let me try to explain,
> toast_insert_or_update returns a new tuple which contains compressed
> data and this tuple is inserted in heap where as slot still refers to
> original tuple (uncompressed one) which is passed to heap_insert.
> Now, ExecInsertIndexTuples and the calls under it like FormIndexDatum
> will refer to the tuple in slot which is uncompressed and form the
> values[] using uncompressed value.


Ah, yes. You're right. Not sure why I saw things differently. That doesn't
anything though because during recheck we'll get compressed value and not
do anything with it. In the index we already have compressed value and we
can compare them. Even if we decide to decompress everything and do the
comparison, that should be possible. So I don't see a problem as far as
correctness goes.


>
> So IIUC, in above test during initialization you have one WARM update
> and then during actual test all are HOT updates, won't in such a case
> the WARM chain will be converted to HOT by vacuum and then all updates
> from thereon will be HOT and probably no rechecks?
>

There is no AV.. Just 1 tuple being HOT updated out of 100 tuples.
Confirmed by looking at pg_stat_user_tables. Also made sure that the tuple
doesn't get non-HOT updated in between, thus breaking the WARM chain.


>
>
> >
> > I then also repeated the tests, but this time using compressible values.
> The
> > regression in this case is much higher, may be 15% or more.
> >
>
> Sounds on higher side.
>
>
Yes, definitely. If we can't reduce that, we might want to provide table
level option to explicitly turn WARM off on such tables.


> IIUC, by the time you are comparing tuple attrs to check for modified
> columns, you don't have the compressed values for new tuple.
>
>
I think it depends. If the value is not being modified, then we will get
both values as compressed. At least I confirmed with your example and
running an update which only changes c1. Don't know if that holds for all
cases.


> >  I know you had
> > raised concerns, but Robert confirmed that (IIUC) it's not a problem
> today.
> >
>
> Yeah, but I am not sure if we can take Robert's statement as some sort
> of endorsement for what the patch does.
>
>
Sure.


> > We will figure out how to deal with it if we ever add support for
> different
> > compression algorithms or compression levels. And I also think this is
> kinda
> > synthetic use case and the fact that there is not much regression with
> > indexes as large as 2K bytes seems quite comforting to me.
> >
>
> I am not sure if we can consider it as completely synthetic because we
> might see some similar cases for json datatypes.  Can we once try to
> see the impact when the same test runs from multiple clients?


Ok. Might become hard to control HOT behaviour though. Or will need to do
mix of WARM/HOT updates. Will see if this is something easily doable by
setting high FF etc.


>   For
> your information, I am also trying to setup some tests along with one
> of my colleague and we will report the results once the tests are
> complete.
>
>
That'll be extremely helpful, especially if its a something close to
real-world scenario. Thanks for doing that.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity

2017-03-30 Thread Stephen Frost

Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Thu, Mar 30, 2017 at 4:08 AM, Stephen Frost  wrote:
> >> > If there's a way to change the verbosity for just those scripts, I'd be
> >> > happy to do that, if we're unable to agree on reducing it across the
> >> > board..
> >>
> >> I'd rather silence only scripts that are overly verbose.
> >
> > I'm fine with that, but how?
> 
> The important point to me is not to show the tests that passed, it is
> to show the tests that are failing when running them. So I would
> suggest to just remove the --verbose flag in PROVE_FLAGS. If you do
> so, the test of pg_dump would show up like that, printing as well a
> count number while running:

Right, but you can't do that for just pg_dump.

I'd be fine with removing --verbose globally, as your patch does, but
there was some argument that we then would have long 'quiet' periods.
I haven't had a chance to go test if that's really the case yet though.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity

2017-03-30 Thread Michael Paquier

On Thu, Mar 30, 2017 at 9:52 PM, Stephen Frost  wrote:
> Michael,
>
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> On Thu, Mar 30, 2017 at 4:08 AM, Stephen Frost  wrote:
>> >> > If there's a way to change the verbosity for just those scripts, I'd be
>> >> > happy to do that, if we're unable to agree on reducing it across the
>> >> > board..
>> >>
>> >> I'd rather silence only scripts that are overly verbose.
>> >
>> > I'm fine with that, but how?
>>
>> The important point to me is not to show the tests that passed, it is
>> to show the tests that are failing when running them. So I would
>> suggest to just remove the --verbose flag in PROVE_FLAGS. If you do
>> so, the test of pg_dump would show up like that, printing as well a
>> count number while running:
>
> Right, but you can't do that for just pg_dump.
>
> I'd be fine with removing --verbose globally, as your patch does, but
> there was some argument that we then would have long 'quiet' periods.
> I haven't had a chance to go test if that's really the case yet though.

I don't see much the point to have --verbose enabled by default, but
if that's the consensus I have nothing better to propose than adding
something like that to the command launching the prove command: $(if
$(PG_PROVE_FLAGS),$(PG_PROVE_FLAGS),--verbose)
And then it is possible to enforce the flag for noisy tests.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity

2017-03-30 Thread Alvaro Herrera

Stephen Frost wrote:

> I'd be fine with removing --verbose globally, as your patch does, but
> there was some argument that we then would have long 'quiet' periods.
> I haven't had a chance to go test if that's really the case yet though.

Michael said that a running counter was displayed, which perhaps is
enough of a progress indicator.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity

2017-03-30 Thread Craig Ringer

On 30 March 2017 at 20:57, Michael Paquier  wrote:

> I don't see much the point to have --verbose enabled by default, but
> if that's the consensus I have nothing better to propose than adding
> something like that to the command launching the prove command: $(if
> $(PG_PROVE_FLAGS),$(PG_PROVE_FLAGS),--verbose)
> And then it is possible to enforce the flag for noisy tests.

That's what I'd like to go for. Even default to it, so long as I can
override on command line.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schedule and Release Management Team for PG10

2017-03-30 Thread Bruce Momjian

On Tue, Mar 28, 2017 at 11:35:17PM -0400, Tom Lane wrote:
> Well, formally speaking, the RMT declares the onset of feature freeze.
> I expressed an opinion that they'd probably do so at the end of March,
> but it's their call as to exactly when to do so, and whether to grant
> any limited extensions/exceptions.
> 
> My own thought is that there's room for at least a few days' slop in
> the end date of the final commitfest, depending on what patches remain
> open and what the prospects are for getting them done.  (In the past
> we've sometimes let the final fest stretch on indefinitely, which is
> clearly the Wrong Thing; but that doesn't mean that the Right Thing is
> to say that it ends at 2017-04-01 00:00 UTC no matter what.)  The RMT
> should look at things in another day or two and make a judgment call
> about that.

I agree we need to extend, and not wait until any longer to do it.  We
have people at the NYC conference and I don't want their memory of the
conference being that they were stressed trying to work on closing the
commit fest --- that could lead to bad memories and them declining
future conference attendance.  If that delays our final release for a
week, it is worth it.

I propose we go for a week delay in closing the commit fest, and we
decide right now.  Ideally I like to to see delay in one-week increments
_and_ announce that a week before each deadline.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Typo in libpq

2017-03-30 Thread Daniel Gustafsson

There seems to be a typo in libpq as per attached, “..we will loose error
messages” should probably be “..we will lose error messages”.

cheers ./daniel



typo-libpq-loose.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Robert Haas

On Tue, Mar 28, 2017 at 1:13 AM, Mithun Cy  wrote:
> B. In tuple sort we can use hash function bucket = hash_key %
> num_buckets instead of existing one which does bitwise "and" to
> determine the bucket of hash key. This way we will not wrongly assign
> buckets beyond max_buckets and sorted hash keys will be in sync with
> actual insertion order of _hash_doinsert.

I think approach B is incorrect.  Suppose we have 1536 buckets and
hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193.  If I
understand correctly, each of these values should be mapped either to
bucket 0 or to bucket 1, and the goal of the sort is to put all of the
bucket 0 tuples before all of the bucket 1 tuples, so that we get
physical locality when inserting.  With approach A, the sort keys will
match the bucket numbers -- we'll be sorting the list 0, 1, 0, 1, 0,
1, 0, 1 -- and we will end up doing all of the inserts to bucket 0
before any of the inserts to bucket 1.  With approach B, we'll be
sorting 512, 513, 1024, 1025, 0, 1, 512, 513 and will end up
alternating inserts to bucket 0 with inserts to bucket 1.

To put that another way, see this comment at the top of hashsort.c:

 * When building a very large hash index, we pre-sort the tuples by bucket
 * number to improve locality of access to the index, and thereby avoid
 * thrashing.  We use tuplesort.c to sort the given index tuples into order.

So, you can't just decide to sort on a random number, which is what
approach B effectively does.  Or, you can, but it completely misses
the point of sorting in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Robert Haas

On Thu, Mar 30, 2017 at 6:37 AM, Pavan Deolasee
 wrote:
> I think we can fix that by comparing compressed values.  I know you had
> raised concerns, but Robert confirmed that (IIUC) it's not a problem today.

I'm not sure that's an entirely fair interpretation of what I said.
My point was that, while it may not be broken today, it might not be a
good idea to rely for correctness on it always being true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v25)

2017-03-30 Thread David Rowley

On 25 March 2017 at 07:35, Alvaro Herrera  wrote:

> As I said in another thread, I pushed parts 0002,0003,0004.  Tomas said
> he would try to rebase patches 0001,0005,0006 on top of what was
> committed.  My intention is to give that one a look as soon as it is
> available.  So we will have n-distinct and functional dependencies in
> PG10.  It sounds unlikely that we will get MCVs and histograms in, since
> they're each a lot of code.
>

I've been working on the MV functional dependencies part of the patch to
polish it up a bit. Tomas has been busy with a few other duties.

I've made some changes around how clauselist_selectivity() determines if it
should try to apply any extended stats. The solution I came up with was to
add two parameters to this function, one for the RelOptInfo in question,
and one a bool to control if we should try to apply any extended stats.
For clauselist_selectivity() usage involving join rels we just pass the rel
as NULL, that way we can skip all the extended stats stuff with very low
overhead. When we actually have a base relation to pass along we can do so,
along with a true tryextstats value to have the function attempt to use any
extended stats to assist with the selectivity estimation.

When adding these two parameters I had 2nd thoughts that the "tryextstats"
was required at all. We could just have this controlled by if the rel is a
base rel of kind RTE_RELATION. I ended up having to pass these parameters
further, down to clauselist_selectivity's singleton couterpart,
clause_selectivity(). This was due to clause_selectivity() calling
clauselist_selectivity() for some clause types. I'm not entirely sure if
this is actually required, but I can't see any reason for it to cause
problems.

I've also attempted to simplify some of the logic within
clauselist_selectivity and some other parts of clausesel.c to remove some
unneeded code and make it a bit more efficient. For example, we no longer
count the attributes in the clause list before calling a similar function
to retrieve the actual attnums. This is now done as a single step.

I've not yet quite gotten as far as I'd like with this. I'd quite like to
see clauselist_ext_split() gone, and instead we could build up a bitmapset
of clause list indexes to ignore when applying the selectivity of clauses
that couldn't use any extended stats. I'm planning on having a bit more of
a look at this tomorrow.

The attached patch should apply to master as
of f90d23d0c51895e0d7db7910538e85d3d38691f0.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

mv_functional-deps_2017-03-31.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Pavan Deolasee

On Thu, Mar 30, 2017 at 7:27 PM, Robert Haas  wrote:

> On Thu, Mar 30, 2017 at 6:37 AM, Pavan Deolasee
>  wrote:
> > I think we can fix that by comparing compressed values.  I know you had
> > raised concerns, but Robert confirmed that (IIUC) it's not a problem
> today.
>
> I'm not sure that's an entirely fair interpretation of what I said.
> My point was that, while it may not be broken today, it might not be a
> good idea to rely for correctness on it always being true.
>
>
I take that point. We have a choice of fixing it today or whenever to
support multiple compression techniques. We don't even know how that will
look like and whether we will be able to look at compressed data and tell
whether two values are compressed by exact same way.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Schedule and Release Management Team for PG10

2017-03-30 Thread Robert Haas

On Thu, Mar 30, 2017 at 9:41 AM, Bruce Momjian  wrote:
> I agree we need to extend, and not wait until any longer to do it.  We
> have people at the NYC conference and I don't want their memory of the
> conference being that they were stressed trying to work on closing the
> commit fest --- that could lead to bad memories and them declining
> future conference attendance.  If that delays our final release for a
> week, it is worth it.
>
> I propose we go for a week delay in closing the commit fest, and we
> decide right now.  Ideally I like to to see delay in one-week increments
> _and_ announce that a week before each deadline.

Summary of opinions on this thread:

- Tom thinks the RMT should consider extending by a day or two.
- Stephen agrees.
- Alvaro proposes allowing more time next time, but not to change the
dates for this time.
- Andres and Dave agree with Alvaro's surprise about the selected
date, but it's not clear whether they want to change things now or
just for next time.
- Robert thinks that the RMT shouldn't presume to vary the dates
without a broad consensus, and somewhat favors not extending.

Other views?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Amit Kapila

On Thu, Mar 30, 2017 at 5:55 PM, Pavan Deolasee
 wrote:
>
> On Thu, Mar 30, 2017 at 5:27 PM, Amit Kapila 
> wrote:
>>
>>
>>
>> How have you verified that?  Have you checked that in
>> heap_prepare_insert it has called toast_insert_or_update() and then
>> returned a tuple different from what the input tup is?  Basically, I
>> am easily able to see it and even the reason why the heap and index
>> tuples will be different.  Let me try to explain,
>> toast_insert_or_update returns a new tuple which contains compressed
>> data and this tuple is inserted in heap where as slot still refers to
>> original tuple (uncompressed one) which is passed to heap_insert.
>> Now, ExecInsertIndexTuples and the calls under it like FormIndexDatum
>> will refer to the tuple in slot which is uncompressed and form the
>> values[] using uncompressed value.
>
>
> Ah, yes. You're right. Not sure why I saw things differently. That doesn't
> anything though because during recheck we'll get compressed value and not do
> anything with it. In the index we already have compressed value and we can
> compare them. Even if we decide to decompress everything and do the
> comparison, that should be possible.
>

I think we should not consider doing compression and decompression as
free at this point in code, because we hold a buffer lock during
recheck. Buffer locks are meant for short-term locks (it is even
mentioned in storage/buffer/README), doing all the
compression/decompression/detoast stuff under these locks doesn't
sound advisable to me.  It can block many concurrent operations.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Robert Haas

On Thu, Mar 30, 2017 at 10:08 AM, Amit Kapila  wrote:
> I think we should not consider doing compression and decompression as
> free at this point in code, because we hold a buffer lock during
> recheck. Buffer locks are meant for short-term locks (it is even
> mentioned in storage/buffer/README), doing all the
> compression/decompression/detoast stuff under these locks doesn't
> sound advisable to me.  It can block many concurrent operations.

Compression and decompression might cause performance problems, but
try to access the TOAST table would be fatal; that probably would have
deadlock hazards among other problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Simon Riggs

On 30 March 2017 at 09:07, Craig Ringer  wrote:

> Attached.

* Cleaned up in 3 places
* Added code for faked up RunningTransactions in xlog.c
* Ensure catalog_xmin doesn't go backwards

All else looks good. Comments before commit?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


log-catalog-xmin-advances-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Andres Freund

On 2017-03-30 15:26:02 +0100, Simon Riggs wrote:
> On 30 March 2017 at 09:07, Craig Ringer  wrote:
> 
> > Attached.
> 
> * Cleaned up in 3 places
> * Added code for faked up RunningTransactions in xlog.c
> * Ensure catalog_xmin doesn't go backwards
> 
> All else looks good. Comments before commit?

Can you give me till after lunch?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Simon Riggs

On 30 March 2017 at 15:27, Andres Freund  wrote:
> On 2017-03-30 15:26:02 +0100, Simon Riggs wrote:
>> On 30 March 2017 at 09:07, Craig Ringer  wrote:
>>
>> > Attached.
>>
>> * Cleaned up in 3 places
>> * Added code for faked up RunningTransactions in xlog.c
>> * Ensure catalog_xmin doesn't go backwards
>>
>> All else looks good. Comments before commit?
>
> Can you give me till after lunch?

Sure, np

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schedule and Release Management Team for PG10

2017-03-30 Thread Tom Lane

Robert Haas  writes:
> On Thu, Mar 30, 2017 at 9:41 AM, Bruce Momjian  wrote:
>> I propose we go for a week delay in closing the commit fest, and we
>> decide right now.  Ideally I like to to see delay in one-week increments
>> _and_ announce that a week before each deadline.

> Summary of opinions on this thread:

> - Tom thinks the RMT should consider extending by a day or two.

FWIW, while I'm sure we had a reason (back at the Ottawa 2016 meeting)
for limiting the final 'fest to just a month, I'm quite sure we did
not allow for most of the senior hackers being at a conference during
the last week of that time.  So I now concur with Bruce's suggestion
that a week's delay would be suitable.

In past years this sort of decision would have been taken by -core,
and given the number of core members who have already weighed in
on this thread, I think Bruce's proposal would have passed easily.
But since we just delegated the setting of feature freeze to the RMT,
they are the ones who ought to do it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Petr Jelinek

On 30/03/17 16:04, Pavan Deolasee wrote:
> 
> 
> On Thu, Mar 30, 2017 at 7:27 PM, Robert Haas  > wrote:
> 
> On Thu, Mar 30, 2017 at 6:37 AM, Pavan Deolasee
> mailto:pavan.deola...@gmail.com>> wrote:
> > I think we can fix that by comparing compressed values.  I know you had
> > raised concerns, but Robert confirmed that (IIUC) it's not a problem 
> today.
> 
> I'm not sure that's an entirely fair interpretation of what I said.
> My point was that, while it may not be broken today, it might not be a
> good idea to rely for correctness on it always being true.
> 
> 
> I take that point. We have a choice of fixing it today or whenever to
> support multiple compression techniques. We don't even know how that
> will look like and whether we will be able to look at compressed data
> and tell whether two values are compressed by exact same way.
> 

While reading this thread I am thinking if we could just not do WARM on
TOAST and compressed values if we know there might be regressions there.
I mean I've seen the problem WARM tries to solve mostly on timestamp or
boolean values and sometimes counters so it would still be helpful to
quite a lot of people even if we didn't do TOAST and compressed values
in v1. It's not like not doing WARM sometimes is somehow terrible, we'll
just fall back to current behavior.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schedule and Release Management Team for PG10

2017-03-30 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > On Thu, Mar 30, 2017 at 9:41 AM, Bruce Momjian  wrote:
> >> I propose we go for a week delay in closing the commit fest, and we
> >> decide right now.  Ideally I like to to see delay in one-week increments
> >> _and_ announce that a week before each deadline.
> 
> > Summary of opinions on this thread:
> 
> > - Tom thinks the RMT should consider extending by a day or two.
> 
> FWIW, while I'm sure we had a reason (back at the Ottawa 2016 meeting)
> for limiting the final 'fest to just a month, I'm quite sure we did
> not allow for most of the senior hackers being at a conference during
> the last week of that time.  So I now concur with Bruce's suggestion
> that a week's delay would be suitable.

Agreed, same here.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Schedule and Release Management Team for PG10

2017-03-30 Thread Petr Jelinek

On 30/03/17 16:54, Alvaro Herrera wrote:
> Robert Haas wrote:
> 
>> - Alvaro proposes allowing more time next time, but not to change the
>> dates for this time.
> 
> FWIW I didn't realize that the NY conference was ongoing, so count me
> for postponing the end of the current CF.
> 

+1, the conference makes it bit inconvenient to have freeze on exactly 31st.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schedule and Release Management Team for PG10

2017-03-30 Thread Magnus Hagander

On Thu, Mar 30, 2017 at 4:40 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Thu, Mar 30, 2017 at 9:41 AM, Bruce Momjian  wrote:
> >> I propose we go for a week delay in closing the commit fest, and we
> >> decide right now.  Ideally I like to to see delay in one-week increments
> >> _and_ announce that a week before each deadline.
>
> > Summary of opinions on this thread:
>
> > - Tom thinks the RMT should consider extending by a day or two.
>
> FWIW, while I'm sure we had a reason (back at the Ottawa 2016 meeting)
> for limiting the final 'fest to just a month, I'm quite sure we did
> not allow for most of the senior hackers being at a conference during
> the last week of that time.  So I now concur with Bruce's suggestion
> that a week's delay would be suitable.
>

+1.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Schedule and Release Management Team for PG10

2017-03-30 Thread Alvaro Herrera

Robert Haas wrote:

> - Alvaro proposes allowing more time next time, but not to change the
> dates for this time.

FWIW I didn't realize that the NY conference was ongoing, so count me
for postponing the end of the current CF.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Mithun Cy

On Thu, Mar 30, 2017 at 7:23 PM, Robert Haas  wrote:
> I think approach B is incorrect.  Suppose we have 1536 buckets and
> hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193.  If I
> understand correctly, each of these values should be mapped either to
> bucket 0 or to bucket 1, and the goal of the sort is to put all of the
> bucket 0 tuples before all of the bucket 1 tuples, so that we get
> physical locality when inserting.  With approach A, the sort keys will
> match the bucket numbers -- we'll be sorting the list 0, 1, 0, 1, 0,
> 1, 0, 1 -- and we will end up doing all of the inserts to bucket 0
> before any of the inserts to bucket 1.  With approach B, we'll be
> sorting 512, 513, 1024, 1025, 0, 1, 512, 513 and will end up
> alternating inserts to bucket 0 with inserts to bucket 1.

Oops sorry, yes 2 denominators are different (one used in an insert
and another used in sorting keys) we will end up with different bucket
numbers. I think in patch B, I should have actually taken next 2-power
number of 1536 as the denominator and try to get the mod value. If the
mod value is > 1536 then reduce the denominator by half and retake the
mod to get the bucket within 1536. Which is what effectively Patch A
is doing. Approach B is a blunder, I apologize for that mistake. I
think Patch A should be considered. If adding the members of struct
Tuplesortstate is a concern I will rewrite Patch B as said above.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] reorder tablespaces in basebackup tar stream for backup_label

2017-03-30 Thread Fujii Masao

On Wed, Mar 29, 2017 at 5:36 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Wed, 29 Mar 2017 09:23:42 +0200, Michael Banck  
> wrote in <149077.18436.14.ca...@credativ.de>
>> Hi,
>>
>> Am Mittwoch, den 29.03.2017, 15:22 +0900 schrieb Michael Paquier:
>> > On Wed, Mar 29, 2017 at 3:56 AM, Fujii Masao  wrote:
>> > > If your need other information except START WAL LOCATION at the 
>> > > beginning of
>> > > base backup and they are very useful for many third-party softwares,
>> > > you can add them into that first result set. If you do this, you can
>> > > retrieve them
>> > > at the beginning even when WAL files are included in the backup.
>> >
>> > You mean in the result tuple of pg_start_backup(), right? Why not.
>>
>> The replication protocol chapter says: "When the backup is started, the
>> server will first send two ordinary result sets, followed by one or more
>> CopyResponse results. The first ordinary result set contains the
>> starting position of the backup, in a single row with two columns."
>>
>> However, I don't think it is very obvious to users (or at least it is
>> not to me) how to get at this from psql, if you want to script it.  If I

I don't think that using psql to run BASE_BACKUP command is good
approach. Instead, IMO you basically should implement the client program
which can handle BASE_BACKUP properly, or extend pg_basebackup
so that you can do what you want to do.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity

2017-03-30 Thread Tom Lane

Stephen Frost  writes:
> I'd be fine with removing --verbose globally, as your patch does, but
> there was some argument that we then would have long 'quiet' periods.
> I haven't had a chance to go test if that's really the case yet though.

I've been running it like this lately:

make -s check-world -j4 PROVE_FLAGS='-j4 --quiet --nocolor --nocount'

and it is still pretty noisy ;-).  The only place where it sits for
more than a couple of seconds without printing anything is at the very
start, which I believe to be the initial "make temp-install" step,
which would be unaffected by prove verbosity anyway.

So I'd be +1 for just removing --verbose by default.  Anybody who really
wants it can put it back via PROVE_FLAGS.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-03-30 Thread Alexander Korotkov

On Tue, Mar 28, 2017 at 10:12 AM, Pavel Stehule 
wrote:

> 2017-03-27 13:59 GMT+02:00 Alexander Korotkov :
>
>> On Fri, Mar 10, 2017 at 6:06 PM, Pavel Stehule 
>> wrote:
>>
>>> 2017-03-10 16:00 GMT+01:00 Alexander Korotkov >> >:
>>>
 On Fri, Mar 10, 2017 at 5:16 PM, Stephen Frost 
 wrote:

> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> > On 2/24/17 16:32, Pavel Stehule wrote:
> > > set EXTENDED_DESCRIBE_SORT size_desc
> > > \dt+
> > > \l+
> > > \di+
> > >
> > > Possible variants: schema_table, table_schema, size_desc,
> size_asc
> >
> > I can see this being useful, but I think it needs to be organized a
> > little better.
> >
> > Sort key and sort direction should be separate settings.
> >
> > I'm not sure why we need to have separate settings to sort by schema
> > name and table name.  But if we do, then we should support that for
> all
> > object types.  I think maybe that's something we shouldn't get into
> > right now.
> >
> > So I would have one setting for sort key = {name|size} and on for
> sort
> > direction = {asc|desc}.
>
> Perhaps I'm trying to be overly cute here, but why not let the user
> simply provide a bit of SQL to be put at the end of the query?
>
> That is, something like:
>
> \pset EXTENDED_DESCRIBE_ORDER_LIMIT 'ORDER BY 5 DESC LIMIT 10'
>

 I think that's the question of usability.  After all, one can manually
 type corresponding SQL instead of \d* commands.  However, it's quite
 cumbersome to do this every time.
 I found quite useful to being able to switch between different sortings
 quickly.  For instance, after seeing tables sorted by name, user can
 require them sorted by size descending, then sorted by size ascending,
 etc...
 Therefore, I find user-defined SQL clause to be cumbersome.  Even psql
 variable itself seems to be cumbersome for me.
 I would propose to add sorting as second optional argument to \d*
 commands.  Any thoughts?

>>>
>>> This proposal was here already - maybe two years ago. The psql command
>>> parser doesn't allow any complex syntax - more - the more parameters in one
>>> psql commands is hard to remember, hard to read.
>>>
>>
>> Could you please provide a link to this discussion.  Probably working
>> with multiple parameters in psql commands require some rework, but that's
>> definitely doable.
>>
>
> http://grokbase.com/t/postgresql/pgsql-hackers/
> 137nt5p6s0/proposal-psql-show-longest-tables/oldest
> https://www.postgresql.org/message-id/AANLkTikyaeJ0XdKDzxSvqPE8kaRRT
> iuqjqhwnj8ec...@mail.gmail.com
>

I took a look to these threads, but I didn't find place where difficulties
of adding extra arguments to psql commands are pointed.
Could you, please, point particular messages about it?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-30 Thread Kevin Grittner

On Wed, Mar 29, 2017 at 11:17 PM, Mengxing Liu
 wrote:
> Thanks, I've updated the proposal. Just one issue:
> I agree that we can make skip list a general data structure.  But
> can we use the fixed-level skip list as a Plan B? Or a quick attempt
> before the general data structure ?
> Because I am not familiar with shared memory structure and tricks
> used in it, and I cannot estimate how much time it would take.

It's not really too bad for fixed allocation shared memory, and I
can help with that.  If I thought it would save much I could see
doing a prototype without generalization, but you would still have
most of the same shared memory issues, since the structure *must*
live in shared memory.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Teodor Sigaev


-   IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
+   IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P 
INCLUDE

I think your syntax would read no worse, possibly even better, if you
just used the existing INCLUDING keyword.
It was a discussion in this thread about naming and both databases, which 
support covering indexes, use INCLUDE keyword.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Teodor Sigaev

I had a look on patch and played with it, seems, it looks fine. I splitted it to 
two patches: core changes (+bloom index fix) and btree itself. All docs are left 
in first patch - I'm too lazy to rewrite documentation which is changed in 
second patch.

Any objection from reviewers to push both patches?


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


0001-covering-core.patch
Description: binary/octet-stream


0002-covering-btree.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Andres Freund

Hi,

On 2017-03-30 16:43:41 +0530, Pavan Deolasee wrote:
> Looks like OID conflict to me.. Please try rebased set.

Pavan, Alvaro, everyone: I know you guys are working very hard on this,
but I think at this point it's too late to commit this for v10.  This is
patch that's affecting the on-disk format, in quite subtle
ways.  Committing this just at the end of the development cyle / shortly
before feature freeze, seems too dangerous to me.

Let's commit this just at the beginning of the cycle, so we have time to
shake out the bugs.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-30 Thread Tom Lane

Fabien COELHO  writes:
>> [...] Aside from cosmetic changes, I've made it behave reasonably for 
>> cases where \if is used on portions of a query, for instance

> A small issue I see is that I was planning to add such an if syntax to 
> pgbench (well, at least if I succeed in getting boolean expressions and 
> setting variables, which is just a maybe), but this kind of if in the 
> middle of expression does not make much sense for a pgbench script where 
> "if" must be evaluated at execution time, not parse time.

Well, it's not really clear to me why that would be true.  If it actually
is impossible to give pgbench equivalent behavior, we'll just have to live
with the discrepancy, but ISTM it could probably be made to work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [pgsql-students] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"

2017-03-30 Thread Kevin Grittner

On Tue, Mar 28, 2017 at 12:36 PM, Shubham Barai
 wrote:

> My name is Shubham Barai and I am a final year student at Maharashtra
> Institute of Technology, Pune, India. I am very interested in contributing
> Postgresql this year through GSoC project.

Welcome!  Sorry I didn't spot this post earlier -- I'm behind on
reading the community lists and this didn't trigger any of the phrases
that pop things out to my attention right away.

> I am particularly interested in working on the project "Explicitly support
> predicate locks in index access methods besides btree". I have gone through
> some research papers which were recommended on
> https://wiki.postgresql.org/wiki/GSoC_2017 to understand the concept of
> Serializable Snapshot Isolation used in PostgreSQL. I have also browsed
> through the codebase to get some idea of different access methods for gin,
> gist, and hash indexes. I want to discuss my proposal to get some feedback
> before I write my final proposal. Sorry, I am discussing my proposal little
> late. I was really busy in my academics.

Understandable, but please be careful to get your final proposal in by
the deadline.  Deadlines in GSoC are not flexible.

> Currently, only B+-trees support page level predicate locking.For other
> indexes, it acquires relation level lock which can lead to unnecessary
> serialization failure due to rw dependency caused by any insert into the
> index. So, the main task of this project is to support page level predicate
> locking for remaining indexes.

Right.

> [calls out several places that specific calls to predicate locking functions 
> are needed]

> There may be a lot of other places where we need to insert function calls
> for predicate locking that I haven't included yet. I didn't go into details
> of every index AM.

That will be about half the work of the project.  It is fine to
identify examples for your proposal, to show that you know what to
look for, but tracking down every last location can be completed after
the proposal is accepted.  The other half of the work will be testing
and responding to issues others might raise.

> can anyone help me find existing tests for b-tree?

I think this should be it:

kgrittn@kevin-desktop:~/pg/master$ find src/backend/access/nbtree
-type f -name '*.c' | grep -v '/tmp_check/' | grep -v '/Debug/' |
xargs egrep -n 'PredicateLock|SerializableConflict'
src/backend/access/nbtree/nbtinsert.c:201:
CheckForSerializableConflictIn(rel, NULL, buf);
src/backend/access/nbtree/nbtinsert.c:402:
 CheckForSerializableConflictIn(rel, NULL, buf);
src/backend/access/nbtree/nbtinsert.c:784:
PredicateLockPageSplit(rel,
src/backend/access/nbtree/nbtsearch.c:1040:
PredicateLockRelation(rel, scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1052:
PredicateLockPage(rel, BufferGetBlockNumber(buf),
src/backend/access/nbtree/nbtsearch.c:1483:
 PredicateLockPage(rel, blkno, scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1578:
 PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf),
scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1869:
PredicateLockRelation(rel, scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1874: PredicateLockPage(rel,
BufferGetBlockNumber(buf), scan->xs_snapshot);
src/backend/access/nbtree/nbtpage.c:1410:
PredicateLockPageCombine(rel, leafblkno, leafrightsib);

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Robert Haas

On Wed, Mar 29, 2017 at 8:03 AM, Mithun Cy  wrote:
> Thanks, Amit for a detailed review.

I think that the macros in hash.h need some more work:

- Pretty much any time you use the argument of a macro, you need to
parenthesize it in the macro definition to avoid surprises if the
macros is called using an expression.  That isn't done consistently
here.

- The macros make extensive use of magic numbers like 1, 2, and 3.  I
suggest something like:

#define SPLITPOINT_PHASE_BITS 2
#define SPLITPOINT_PHASES_PER_GROUP (1 << SPLITPOINT_PHASE_BITS)

And then use SPLITPOINT_PHASE_BITS any place where you're currently
saying 2.  The reference to 3 is really SPLITPOINT_PHASE_BITS + 1.

- Many of these macros are only used in one place.  Maybe just move
the computation to that place and get rid of the macro.  For example,
_hash_spareindex() could be written like this:

if (splitpoint_group < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)
return splitpoint_group;

/* account for single-phase groups */
splitpoint = SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE;

/* account for completed groups */
splitpoint += (splitpoint_group -
SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) << SPLITPOINT_PHASE_BITS;

/* account for phases within current group */
splitpoint += (bucket_num >> (SPLITPOINT_PHASE_BITS + 1)) &
SPLITPOINT_PHASE_MASK;

return splitpoint;

That eliminates the only use of two complicated macros and is in my
opinion more clear than what you've currently got.

- Some of these macros lack clear comments explaining their purpose.

- Some of them don't include HASH anywhere in the name, which is
essential for a header that may easily be included by non-hash index
code.

- The names don't all follow a consistent format.  Maybe that's too
much to hope for at some level, but I think they could be more
consistent than they are.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-03-30 Thread Pavel Stehule

>
>

 This proposal was here already - maybe two years ago. The psql command
 parser doesn't allow any complex syntax - more - the more parameters in one
 psql commands is hard to remember, hard to read.

>>>
>>> Could you please provide a link to this discussion.  Probably working
>>> with multiple parameters in psql commands require some rework, but that's
>>> definitely doable.
>>>
>>
>> http://grokbase.com/t/postgresql/pgsql-hackers/137nt5p6s0/
>> proposal-psql-show-longest-tables/oldest
>> https://www.postgresql.org/message-id/AANLkTikyaeJ0XdKDzxSvq
>> pe8karrtiuqjqhwnj8ec...@mail.gmail.com
>>
>
> I took a look to these threads, but I didn't find place where difficulties
> of adding extra arguments to psql commands are pointed.
> Could you, please, point particular messages about it?
>

I am sorry - maybe my memory doesn't serve well

Pavel


>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Robert Haas

On Thu, Mar 30, 2017 at 11:41 AM, Andres Freund  wrote:
> On 2017-03-30 16:43:41 +0530, Pavan Deolasee wrote:
>> Looks like OID conflict to me.. Please try rebased set.
>
> Pavan, Alvaro, everyone: I know you guys are working very hard on this,
> but I think at this point it's too late to commit this for v10.  This is
> patch that's affecting the on-disk format, in quite subtle
> ways.  Committing this just at the end of the development cyle / shortly
> before feature freeze, seems too dangerous to me.
>
> Let's commit this just at the beginning of the cycle, so we have time to
> shake out the bugs.

+1, although I think it should also have substantially more review first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [pgsql-students] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"

2017-03-30 Thread Kevin Grittner

On Tue, Mar 28, 2017 at 12:36 PM, Shubham Barai
 wrote:

> * Hash Index

> currently, I am trying to understand how the splitting of bucket works.
> should we acquire predicate lock on every page from an old and new bucket in
> case there is a predicate lock on a page of an old bucket?

For page level locking in hash indexes, I think it might be fine to
lock the first page of a bucket.

Also, in case you missed it, here are some guidelines I suggested.
There weren't any comments, which means they probably didn't offend
anyone else too badly.  They're just my opinion, but you might want
to consider them:

Each GSoC student proposal should be a PDF file of 6 to 8 pages.  In
the end, Google will publish these documents on a web page, so the
student should make each proposal something which they will be happy
to have future potential employers review.

Some ideas for desirable content:

  - A resume or CV of the student, including any prior GSoC work
  - Their reasons for wanting to participate
  - What else they have planned for the summer, and what their time
commitment to the GSoC work will be
  - A clear statement that there will be no intellectual property
problems with the work they will be doing -- that the PostgreSQL
community will be able to use their work without encumbrances
(e.g., there should be no agreements related to prior or
ongoing work which might assign the rights to the work they do
to someone else)
  - A description of what they will do, and how
  - Milestones with dates
  - What they consider to be the test that they have successfully
completed the project

-- 
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PDF build is broken

2017-03-30 Thread Devrim Gündüz


Hi Peter,

On Sun, 2017-03-26 at 15:05 -0400, Peter Eisentraut wrote:
> Fixed. 

(Sorry for the late response): Thanks, it builds fine.

>  But I also suggest that you try out the FOP based builds,
> because the jadetex-based builds will probably go away soon.

Can you please let me know how I will do it? Any docs somewhere?

Regards,
-- 
Devrim Gündüz
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Aleksander Alekseev

Hi Teodor,

> I had a look on patch and played with it, seems, it looks fine. I splitted
> it to two patches: core changes (+bloom index fix) and btree itself. All
> docs are left in first patch - I'm too lazy to rewrite documentation which
> is changed in second patch.
> Any objection from reviewers to push both patches?

These patches look OK. Definitely no objections from me.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-30 Thread Fabien COELHO



Hello Tom,


pgbench (well, at least if I succeed in getting boolean expressions and
setting variables, which is just a maybe), but this kind of if in the
middle of expression does not make much sense for a pgbench script where
"if" must be evaluated at execution time, not parse time.


Well, it's not really clear to me why that would be true.


For example, how can you PREPARE a possibly combinatorial thing?

SELECT
  \if ... XX \else YY \endif
FROM
  \if ... ZZ \else WW \endif
WHERE
  \if ... AA \else BB \endif
 ;

Or the kind of operation:

  \if ...
SELECT *
  \else
DELETE
  \endif
  FROM table WHERE condition;

Even the structure can be changed somehow:

  SELECT
\if ...
  1 ;
  SELECT 2
\endif
  ;

If it actually is impossible to give pgbench equivalent behavior, we'll 
just have to live with the discrepancy,


Yep.


but ISTM it could probably be made to work.


Even if it could somehow, I do not see it as a useful feature for pgbench. 
I also lack a good use case for psql for this feature.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Allow vacuums to report oldestxmin

2017-03-30 Thread Fujii Masao

On Wed, Mar 29, 2017 at 3:31 PM, Masahiko Sawada  wrote:
> On Wed, Mar 29, 2017 at 1:32 AM, Fujii Masao  wrote:
>> On Tue, Mar 28, 2017 at 1:06 AM, Masahiko Sawada  
>> wrote:
>>> On Sun, Mar 26, 2017 at 2:26 AM, Masahiko Sawada  
>>> wrote:
 On Sun, Mar 26, 2017 at 1:37 AM, Fujii Masao  wrote:
> On Mon, Mar 6, 2017 at 9:37 PM, Masahiko Sawada  
> wrote:
>> On Fri, Mar 3, 2017 at 10:50 PM, Simon Riggs  
>> wrote:
>>> Allow vacuums to report oldestxmin
>>>
>>> Allow VACUUM and Autovacuum to report the oldestxmin value they
>>> used while cleaning tables, helping to make better sense out of
>>> the other statistics we report in various cases.
>>>
>>> Branch
>>> --
>>> master
>>>
>>> Details
>>> ---
>>> http://git.postgresql.org/pg/commitdiff/9eb344faf54a849898d9be012ddfa8204cfeb57c
>>>
>>> Modified Files
>>> --
>>> src/backend/commands/vacuumlazy.c | 9 +
>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>>
>>
>> Should we change the example in vacuum.sgml file as well? Attached patch.
>
> "tuples" in the above should be "row versions"?
> We should review not only this line but also all the lines in the example
> of VERBOSE output, I think.

 Right. These verbose log messages are out of date. I ran
 VACUUM(VERBOSE, ANALYZE) with same scenario as current example as
 possible. Attached patch updates verbose log messages.


>>>
>>> Surprisingly the changes "tuples" -> "row versions" in vacuumlazy.c is
>>> introduced by commit feb4f44d296b88b7f0723f4a4f3945a371276e0b in 2003.
>>
>> This is the evidence that no one cares about the details of VACUUM VERBOSE
>> output example. So I'm tempted to simplify the example (please see the
>> attached patch) instead of keeping updating the example.
>
> Yes. I agree.

Pushed. I back-patched to all supported versions according to
Alvaro's comment upthread.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Robert Haas

On Thu, Mar 30, 2017 at 11:26 AM, Teodor Sigaev  wrote:
> I had a look on patch and played with it, seems, it looks fine. I splitted
> it to two patches: core changes (+bloom index fix) and btree itself. All
> docs are left in first patch - I'm too lazy to rewrite documentation which
> is changed in second patch.
> Any objection from reviewers to push both patches?

Has this really had enough review and testing?  The last time it was
pushed, it didn't go too well.  And laziness is not a very good excuse
for not dividing up patches properly.

It seems highly surprising to me that CheckIndexCompatible() only gets
a one line change in this patch.  That seems unlikely to be correct.

Has anybody done some testing of this patch with the WAL consistency
checker?  Like, create some tables with indexes that have INCLUDE
columns, set up a standby, enable consistency checking, pound the
master, and see if the standby bails?

Has anybody tested this patch with amcheck?  Does it break amcheck?

A few minor comments:

-foreach(lc, constraint->keys)
+else foreach(lc, constraint->keys)

That doesn't look like a reasonable way of formatting the code.

+/* Here is some code duplication. But we do need it. */

That is not a very informative comment.

+* NOTE It is not crutial for reliability in present,

Spelling, punctuation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-30 Thread Kevin Grittner

On Thu, Mar 23, 2017 at 11:36 PM, Thomas Munro
 wrote:

> One more thought: should this be allowed?
>
> postgres=# create table mytab (i int) partition by list (i);
> CREATE TABLE
> postgres=# create table mytab1 partition of mytab for values in (42);
> CREATE TABLE
> postgres=# create function my_trigger_function() returns trigger as $$
> begin end; $$ language plpgsql;
> CREATE FUNCTION
> postgres=# create trigger my_trigger after update on mytab referencing
> old table as my_old for each statement execute procedure
> my_trigger_function();
> CREATE TRIGGER

> Perhaps the moral equivalent should be possible for statement triggers
> with transition tables, and that already works with your patch as far
> as I know.  So I think your patch probably just needs to reject them
> on partitioned tables.

> [patch provided]

Yeah, that looks good.  Included in next patch version.


On Sun, Mar 26, 2017 at 6:39 AM, Thomas Munro
 wrote:

> BTW I had to make the following change to your v12 because of commit b8d7f053:

Yeah, I ran into that, too, and used exactly the same fix.


On Sun, Mar 26, 2017 at 6:39 AM, Thomas Munro
 wrote:
> On Fri, Mar 24, 2017 at 1:14 PM, Thomas Munro

>> When PlanCacheRelCallback runs, I don't think it understands that
>> these named tuplestore RangeTblEntry objects are dependent on the
>> subject table.  Could that be fixed like this?
>>
>> @@ -2571,6 +2582,9 @@ extract_query_dependencies_walker(Node *node,
>> PlannerInfo *context)
>> if (rte->rtekind == RTE_RELATION)
>> context->glob->relationOids =
>>
>> lappend_oid(context->glob->relationOids, rte->relid);
>> +   else if (rte->rtekind == RTE_NAMEDTUPLESTORE)
>> +   context->glob->relationOids =
>> +
>> lappend_oid(context->glob->relationOids, [subject table's OID]);
>
> I'm not sure if this is the right approach and it may have style
> issues, but it does fix the crashing in the ALTER TABLE case I
> reported: see attached patch which applies on top of your v12.

I had been working along similar lines, but had not gotten it
working.  Merged your version and mine, taking the best of both.
:-)

Thanks for the reviews and the fixes!

New version attached.  It needs some of these problem cases added to
the testing, and a mention in the docs that only C and plpgsql
triggers can use the feature so far.  I'll add those tomorrow.

--
Kevin Grittner


transition-v13.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-30 Thread Tom Lane

Fabien COELHO  writes:
>> If it actually is impossible to give pgbench equivalent behavior, we'll 
>> just have to live with the discrepancy,

> Yep.

>> but ISTM it could probably be made to work.

> Even if it could somehow, I do not see it as a useful feature for pgbench. 

Perhaps not.

> I also lack a good use case for psql for this feature.

It doesn't seem very outlandish to me: the alternative would be to repeat
all of a query that might span dozens of lines, in order to change one or
two lines.  That's not very readable or maintainable.

I'm prepared to believe that extremely long queries aren't ever going
to be common in pgbench scripts, so that there's not much need to support
the equivalent behavior in pgbench.  So maybe it's fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schedule and Release Management Team for PG10

2017-03-30 Thread Joe Conway

On 03/30/2017 10:59 AM, Magnus Hagander wrote:
> On Thu, Mar 30, 2017 at 4:40 PM, Tom Lane  > wrote:
> 
> Robert Haas mailto:robertmh...@gmail.com>>
> writes:
> > On Thu, Mar 30, 2017 at 9:41 AM, Bruce Momjian  > wrote:
> >> I propose we go for a week delay in closing the commit fest, and we
> >> decide right now.  Ideally I like to to see delay in one-week 
> increments
> >> _and_ announce that a week before each deadline.
> 
> > Summary of opinions on this thread:
> 
> > - Tom thinks the RMT should consider extending by a day or two.
> 
> FWIW, while I'm sure we had a reason (back at the Ottawa 2016 meeting)
> for limiting the final 'fest to just a month, I'm quite sure we did
> not allow for most of the senior hackers being at a conference during
> the last week of that time.  So I now concur with Bruce's suggestion
> that a week's delay would be suitable.
> 
> 
> +1.

+1


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

[HACKERS] Use \if ... \elif ... \else ... \endif to simplify alternative in psql scripting.

2017-03-30 Thread Andres Freund

On 2017-03-30 16:59:39 +, Tom Lane wrote:
> Support \if ... \elif ... \else ... \endif in psql scripting.

Could we use this to simplify maintenance of some of the larger
alternative output files?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use \if ... \elif ... \else ... \endif to simplify alternative in psql scripting.

2017-03-30 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-30 16:59:39 +, Tom Lane wrote:
>> Support \if ... \elif ... \else ... \endif in psql scripting.

> Could we use this to simplify maintenance of some of the larger
> alternative output files?

Worth thinking about, for sure.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Andres Freund

> @@ -9633,6 +9643,12 @@ xlog_redo(XLogReaderState *record)
>   SetTransactionIdLimit(checkPoint.oldestXid, 
> checkPoint.oldestXidDB);
>  
>   /*
> +  * There can be no concurrent writers to oldestCatalogXmin 
> during
> +  * recovery, so no need to take ProcArrayLock.
> +  */
> + ShmemVariableCache->oldestCatalogXmin = 
> checkPoint.oldestCatalogXmin;

s/writers/writes/?

> @@ -9731,6 +9748,15 @@ xlog_redo(XLogReaderState *record)
> 
> checkPoint.oldestXid))
>   SetTransactionIdLimit(checkPoint.oldestXid,
> 
> checkPoint.oldestXidDB);
> +
> + /*
> +  * There can be no concurrent writers to oldestCatalogXmin 
> during
> +  * recovery, so no need to take ProcArrayLock.
> +  */
> + if (TransactionIdPrecedes(ShmemVariableCache->oldestCatalogXmin,
> + 
> checkPoint.oldestCatalogXmin)
> + ShmemVariableCache->oldestCatalogXmin = 
> checkPoint.oldestCatalogXmin;

dito.



> @@ -276,8 +279,21 @@ CreateInitDecodingContext(char *plugin,
>  
>   ReplicationSlotsComputeRequiredXmin(true);
>  
> + /*
> +  * If this is the first slot created on the master we won't have a
> +  * persistent record of the oldest safe xid for historic snapshots yet.
> +  * Force one to be recorded so that when we go to replay from this slot 
> we
> +  * know it's safe.
> +  */
> + force_standby_snapshot =
> + !TransactionIdIsValid(ShmemVariableCache->oldestCatalogXmin);

s/first slot/first logical slot/. Also, the reference to master doesn't
seem right.


>   LWLockRelease(ProcArrayLock);
>  
> + /* Update ShmemVariableCache->oldestCatalogXmin */
> + if (force_standby_snapshot)
> + LogStandbySnapshot();

The comment and code don't quite square to me - it's far from obvious
that LogStandbySnapshot does something like that. I'd even say it's a
bad idea to have it do that.


>   /*
>* tell the snapshot builder to only assemble snapshot once reaching the
>* running_xact's record with the respective xmin.
> @@ -376,6 +392,8 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>   start_lsn = slot->data.confirmed_flush;
>   }
>  
> + EnsureActiveLogicalSlotValid();

This seems like it should be in a separate patch, and seperately
reviewed. It's code that's currently unreachable (and thus untestable).


> +/*
> + * Test to see if the active logical slot is usable.
> + */
> +static void
> +EnsureActiveLogicalSlotValid(void)
> +{
> + TransactionId shmem_catalog_xmin;
> +
> + Assert(MyReplicationSlot != NULL);
> +
> + /*
> +  * A logical slot can become unusable if we're doing logical decoding 
> on a
> +  * standby or using a slot created before we were promoted from standby
> +  * to master.

Neither of those is currently possible...


> If the master advanced its global catalog_xmin past the
> +  * threshold we need it could've removed catalog tuple versions that
> +  * we'll require to start decoding at our restart_lsn.
> +  *
> +  * We need a barrier so that if we decode in recovery on a standby we
> +  * don't allow new decoding sessions to start after redo has advanced
> +  * the threshold.
> +  */
> + if (RecoveryInProgress())
> + pg_memory_barrier();

I don't think this is a meaningful locking protocol.  It's a bad idea to
use lock-free programming without need, especially when the concurrency
protocol isn't well defined.  With what other barrier does this pair
with?  What prevents the data being out of date by the time we actually
do the check?


> diff --git a/src/backend/replication/walsender.c 
> b/src/backend/replication/walsender.c
> index cfc3fba..cdc5f95 100644
> --- a/src/backend/replication/walsender.c
> +++ b/src/backend/replication/walsender.c
> @@ -1658,6 +1658,11 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
>* be energy wasted - the worst lost information can do here is give us
>* wrong information in a statistics view - we'll just potentially be 
> more
>* conservative in removing files.
> +  *
> +  * We don't have to do any effective_xmin / effective_catalog_xmin 
> testing
> +  * here either, like for LogicalConfirmReceivedLocation. If we received
> +  * the xmin and catalog_xmin from downstream replication slots we know 
> they
> +  * were already confirmed there,
>*/
>  }

This comment reads as if LogicalConfirmReceivedLocation had
justification for not touching / checking catalog_xmin. But it does.



>   /*
> +  * Update our knowledge of the oldest xid we can safely create historic
> +  * snapshots for.
> +  *
> +

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Andres Freund

On 2017-03-29 08:01:34 +0800, Craig Ringer wrote:
> On 28 March 2017 at 23:22, Andres Freund  wrote:
> 
> >> --- a/doc/src/sgml/protocol.sgml
> >> +++ b/doc/src/sgml/protocol.sgml
> >> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
> >>   
> >>Drops a replication slot, freeing any reserved server-side 
> >> resources. If
> >>the slot is currently in use by an active connection, this command 
> >> fails.
> >> +  If the slot is a logical slot that was created in a database other 
> >> than
> >> +  the database the walsender is connected to, this command fails.
> >>   
> >>   
> >>
> >
> > Shouldn't the docs in the drop database section about this?
> 
> DROP DATABASE doesn't really discuss all the resources it drops, but
> I'm happy to add mention of replication slots handling.

I don't think that's really comparable, because the other things aren't
global objects, which replication slots are.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Monitoring roles patch

2017-03-30 Thread Simon Riggs

On 29 March 2017 at 21:42, Dave Page  wrote:
> On Wed, Mar 29, 2017 at 2:51 PM, Stephen Frost  wrote:
>>
>> Dave's currently hacking on a new patch based on our discussion, so I'd
>> suggest waiting another hour or so anyway until he's done.
>>
>> Might be a bit longer as he's trying to do it in a hallway at
>> PGConf.US...
>
> Thanks Stephen.
>
> Here's an updated patch, and description of the changes. Simon,
> Stephen and Robert have looked at the description and are all happy
> with it \o/. Thank you to them for taking the time out of the
> conference to go through it with me.

Moving to commit this over the next hour. Last chance...

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Variable substitution in psql backtick expansion

2017-03-30 Thread Tom Lane

Awhile back in the discussion about the \if feature for psql,
I'd pointed out that you shouldn't really need very much in
the way of boolean-expression evaluation smarts, because you
ought to be able to use a backtick shell escape:

\if `expr :foo \> :bar`
\echo :foo is greater than :bar
\endif

Now that the basic feature is in, I went to play around with this,
and was disappointed to find out that it doesn't work.  The reason
is not far to seek: we do not do variable substitution within the
text between backticks.  psqlscanslash.l already foresaw that some
day we'd want to do that:

/*
 * backticked text: copy everything until next backquote, then evaluate.
 *
 * XXX Possible future behavioral change: substitute for :VARIABLE?
 */

I think today is that day, because it's going to make a material
difference to the usability of this feature.

I propose extending backtick processing so that

1. :VARIABLE is replaced by the contents of the variable

2. :'VARIABLE' is replaced by the contents of the variable,
single-quoted according to Unix shell conventions.  (So the
processing would be a bit different from what it is for the
same notation in SQL contexts.)

This doesn't look like it would take very much new code to do.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generic type subscripting

2017-03-30 Thread Arthur Zakirov


On 29.03.2017 20:14, Arthur Zakirov wrote:


I wanted to implement subscripting for ltree or hstore extensions.
Subscripting for ltree looks more interesting. Especially with slicing.
But I haven't done it yet. I hope that I will implement it tomorrow.



ltree
-

I've implemented fetching ltree elements using subscripting. But haven't 
implemented assigning ltree elements yet. I'll send second patch after 
implementing assigning.


Now you can execute the following query:

SELECT ('Top.Science.Astronomy.Astrophysics'::ltree)[1:2];
ltree
-
 Top.Science

Comments


But I've noticed about some points.

In array_subscript_parse() passed uninitialized values of "typesource" 
and "typeneeded" variables for coerce_to_target_type().



+   typesource = exprType(assignExpr);
+   typesource = is_slice ? sbsref->refcontainertype : 
sbsref->refelemtype;


Here is the bug. Second variable should be "typeneeded". Moreover these 
assignments should be moved up to first coerce_to_target_type() execution.



+   foreach(l, sbsref->reflowerindexpr)
+   {
+   List *expr_ai = (List *) lfirst(l);
+   A_Indices *ai = (A_Indices *) lfirst(list_tail(expr_ai));
+
+   subexpr = (Node *) lfirst(list_head(expr_ai));


This code looks like a magic. This happens because of appending 
A_Indeces to lowerIndexpr:



-   subexpr = NULL;
+   lowerIndexpr = lappend(lowerIndexpr, 
list_make2(subexpr, ai));
}


And this A_Indeces used only when slicing is not used to make a constant 
1. Maybe there are another way?


Also it would be better if "refevalfunc" and "refnestedfunc" had 
pointers to functions not Oid type. Now you need to create "..._fetch" 
and "..._assign" functions in catalog and in "..._parse" function you 
need get their Oid using to_regproc() function.


Can we use IndexAmRoutine structure method, when you use only pointers 
to necessary functions? You can see an example in blhandler() function 
in blutils.c.


The last point is about the tutorial. As Tom pointed it is not useful 
when the tutorial doesn't execute. It happens because there is not 
"custom" type in subscripting.sql. Also it contradicts the README of 
tutorials.


--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-30 Thread Bruce Momjian

On Tue, Mar 21, 2017 at 04:04:58PM -0400, Bruce Momjian wrote:
> On Tue, Mar 21, 2017 at 04:56:16PM -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > > On Tue, Mar 21, 2017 at 04:43:58PM -0300, Alvaro Herrera wrote:
> > > > Bruce Momjian wrote:
> > > > 
> > > > > I don't think it makes sense to try and save bits and add complexity
> > > > > when we have no idea if we will ever use them,
> > > > 
> > > > If we find ourselves in dire need of additional bits, there is a known
> > > > mechanism to get back 2 bits from old-style VACUUM FULL.  I assume that
> > > > the reason nobody has bothered to write the code for that is that
> > > > there's no *that* much interest.
> > > 
> > > We have no way of tracking if users still have pages that used the bits
> > > via pg_upgrade before they were removed.
> > 
> > Yes, that's exactly the code that needs to be written.
> 
> Yes, but once it is written it will take years before those bits can be
> used on most installations.

Actually, the 2 bits from old-style VACUUM FULL bits could be reused if
one of the WARM bits would be set  when it is checked.  The WARM bits
will all be zero on pre-9.0.  The check would have to be checking the
old-style VACUUM FULL bit and checking that a WARM bit is set.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Logical replication support for initial data copy

2017-03-30 Thread Fujii Masao

On Thu, Mar 23, 2017 at 9:59 PM, Peter Eisentraut  wrote:
> Logical replication support for initial data copy

+ case T_SQLCmd:
+ if (MyDatabaseId == InvalidOid)
+ ereport(ERROR,
+ (errmsg("not connected to database")));

This error message doesn't seem to follow the error message style in docs.
Also It seems a bit unclear to me. So what about replacing it with
something like the following?

ERROR:  must connect to database to execute command \"%s\"

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-30 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

> Hmm, With batch mode, after sending COPY command to server(and server
> started processing the query and goes into COPY state) , client does not
> immediately read the result , but it keeps sending other queries to the
> server. By this time, server already encountered the error
> scenario(Receiving different message during COPY state) and sent error
> messages

IOW, the test intentionally violates the protocol and then all goes wonky
because of that.
That's why I was wondering upthread what's it's supposed to test.
I mean, regression tests are meant to warn against a desirable behavior
being unknowingly changed by new code into an undesirable behavior.
Here we have the undesirable behavior to start with.
What kind of regression could we fear from that?

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Monitoring roles patch

2017-03-30 Thread Simon Riggs

On 30 March 2017 at 18:29, Simon Riggs  wrote:

> Moving to commit this over the next hour. Last chance...

Done. Great work Dave, thanks everybody.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Mithun Cy

Thanks Robert, I have tried to fix the comments given as below.

On Thu, Mar 30, 2017 at 9:19 PM, Robert Haas  wrote:
> I think that the macros in hash.h need some more work:
>
> - Pretty much any time you use the argument of a macro, you need to
> parenthesize it in the macro definition to avoid surprises if the
> macros is called using an expression.  That isn't done consistently
> here.

--I have tried to fix same in the latest patch.

> - The macros make extensive use of magic numbers like 1, 2, and 3.  I
> suggest something like:
>
> #define SPLITPOINT_PHASE_BITS 2
> #define SPLITPOINT_PHASES_PER_GROUP (1 << SPLITPOINT_PHASE_BITS)
>
> And then use SPLITPOINT_PHASE_BITS any place where you're currently
> saying 2.  The reference to 3 is really SPLITPOINT_PHASE_BITS + 1.

-- Taken modified same in the latest patch.

> - Many of these macros are only used in one place.  Maybe just move
> the computation to that place and get rid of the macro.  For example,
> _hash_spareindex() could be written like this:
>
> if (splitpoint_group < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)
> return splitpoint_group;
>
> /* account for single-phase groups */
> splitpoint = SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE;
>
> /* account for completed groups */
> splitpoint += (splitpoint_group -
> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) << SPLITPOINT_PHASE_BITS;
>
> /* account for phases within current group */
> splitpoint += (bucket_num >> (SPLITPOINT_PHASE_BITS + 1)) &
> SPLITPOINT_PHASE_MASK;
>
> return splitpoint;
>
> That eliminates the only use of two complicated macros and is in my
> opinion more clear than what you've currently got.

-- Taken, also rewrote _hash_get_totalbuckets in similar lines.

With that, we will end up with only 2 macros which have some computing code
+/* defines max number of splitpoit phases a hash index can have */
+#define HASH_MAX_SPLITPOINT_GROUP 32
+#define HASH_MAX_SPLITPOINTS \
+ (((HASH_MAX_SPLITPOINT_GROUP - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) * \
+  HASH_SPLITPOINT_PHASES_PER_GRP) + \
+ HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE)
+
+/* given a splitpoint phase get its group */
+#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \
+ (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \
+ (sp_phase) : \
+ sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \
+   HASH_SPLITPOINT_PHASE_BITS) + \
+  HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE))

> - Some of these macros lack clear comments explaining their purpose.

-- I have written some comments to explain the use of the macros.

> - Some of them don't include HASH anywhere in the name, which is
> essential for a header that may easily be included by non-hash index
> code.

-- Fixed, all MACROS are prefixed with HASH

> - The names don't all follow a consistent format.  Maybe that's too
> much to hope for at some level, but I think they could be more
> consistent than they are.

-- Fixed, apart from old HASH_MAX_SPLITPOINTS rest all have a prefix
HASH_SPLITPOINT.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


yet_another_expand_hashbucket_efficiently_12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Tomas Vondra


Hi,

While doing some benchmarking, I've ran into a fairly strange issue with 
OOM breaking LaunchParallelWorkers() after the restart. What I see 
happening is this:


1) a query is executed, and at the end of LaunchParallelWorkers we get

nworkers=8 nworkers_launched=8

2) the query does a Hash Aggregate, but ends up eating much more memory 
due to n_distinct underestimate (see [1] from 2015 for details), and 
gets killed by OOM


3) the server restarts, the query is executed again, but this time we 
get in LaunchParallelWorkers


nworkers=8 nworkers_launched=0

There's nothing else running on the server, and there definitely should 
be free parallel workers.


4) The query gets killed again, and on the next execution we get

nworkers=8 nworkers_launched=8

again, although not always. I wonder whether the exact impact depends on 
OOM killing the leader or worker, for example.


regards


[1] 
https://www.postgresql.org/message-id/flat/CAFWGqnsxryEevA5A_CqT3dExmTaT44mBpNTy8TWVsSVDS71QMg%40mail.gmail.com


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Simon Riggs

On 30 March 2017 at 18:16, Andres Freund  wrote:

>>  /*
>>   * Each page of XLOG file has a header like this:
>>   */
>> -#define XLOG_PAGE_MAGIC 0xD097   /* can be used as WAL version 
>> indicator */
>> +#define XLOG_PAGE_MAGIC 0xD100   /* can be used as WAL version 
>> indicator */
>
> We normally only advance this by one, it's not tied to the poistgres version.

That was my addition. I rounded it up cos this is release 10. No biggie.

(Poistgres? Is that the Manhattan spelling?)

> I'm sorry, but to me this patch isn't ready.  I'm also doubtful that it
> makes a whole lot of sense on its own, without having finished the
> design for decoding on a standby - it seems quite likely that we'll have
> to redesign the mechanisms in here a bit for that.  For 10 this seems to
> do nothing but add overhead?

I'm sure we can reword the comments.

We've been redesigning the mechanisms for 2 years now, so it seems
unlikely that further redesign can be required. If it is required,
this patch is fairly low touch and change is possible later,
incremental development etc. As regards overhead, this adds a small
amount of time to a background process executed every 10 secs,
generates no new WAL records.

So I don't see any reason not to commit this feature, after the minor
corrections.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Thomas Munro

On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra
 wrote:
> Hi,
>
> While doing some benchmarking, I've ran into a fairly strange issue with OOM
> breaking LaunchParallelWorkers() after the restart. What I see happening is
> this:
>
> 1) a query is executed, and at the end of LaunchParallelWorkers we get
>
> nworkers=8 nworkers_launched=8
>
> 2) the query does a Hash Aggregate, but ends up eating much more memory due
> to n_distinct underestimate (see [1] from 2015 for details), and gets killed
> by OOM
>
> 3) the server restarts, the query is executed again, but this time we get in
> LaunchParallelWorkers
>
> nworkers=8 nworkers_launched=0
>
> There's nothing else running on the server, and there definitely should be
> free parallel workers.
>
> 4) The query gets killed again, and on the next execution we get
>
> nworkers=8 nworkers_launched=8
>
> again, although not always. I wonder whether the exact impact depends on OOM
> killing the leader or worker, for example.

I don't know what's going on but I think I have seen this once or
twice myself while hacking on test code that crashed.  I wonder if the
DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because
the DSM control is somehow confused?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Andres Freund

On 2017-03-30 18:26:05 +0300, Teodor Sigaev wrote:
> Any objection from reviewers to push both patches?


> diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
> index f2eda67..59029b9 100644
> --- a/contrib/bloom/blutils.c
> +++ b/contrib/bloom/blutils.c
> @@ -120,6 +120,7 @@ blhandler(PG_FUNCTION_ARGS)
>   amroutine->amclusterable = false;
>   amroutine->ampredlocks = false;
>   amroutine->amcanparallel = false;
> + amroutine->amcaninclude = false;

That name doesn't strike me as very descriptive.


> +  INCLUDE
> +  
> +   
> +An optional INCLUDE clause allows a list of columns to be
> +specified which will be included in the non-key portion of the index.
> +Columns which are part of this clause cannot also exist in the
> +key columns portion of the index, and vice versa. The
> +INCLUDE columns exist solely to allow more queries to 
> benefit
> +from index-only scans by including certain columns in 
> the
> +index, the value of which would otherwise have to be obtained by 
> reading
> +the table's heap. Having these columns in the INCLUDE 
> clause
> +in some cases allows PostgreSQL to skip the heap read
> +completely. This also allows UNIQUE indexes to be 
> defined on
> +one set of columns, which can include another set of columns in the
> +   INCLUDE clause, on which the uniqueness is not enforced.
> +It's the same with other constraints (PRIMARY KEY and EXCLUDE). This 
> can
> +also can be used for non-unique indexes as any columns which are not 
> required
> +for the searching or ordering of records can be used in the
> +INCLUDE clause, which can slightly reduce the size of 
> the index.
> +Currently, only the B-tree access method supports this feature.
> +Expressions as included columns are not supported since they cannot 
> be used
> +in index-only scans.
> +   
> +  
> + 

This could use some polishing.


> +/*
> + * Reform index tuple. Truncate nonkey (INCLUDE) attributes.
> + */
> +IndexTuple
> +index_truncate_tuple(Relation idxrel, IndexTuple olditup)
> +{
> + TupleDesc   itupdesc = RelationGetDescr(idxrel);
> + Datum   values[INDEX_MAX_KEYS];
> + boolisnull[INDEX_MAX_KEYS];
> + IndexTuple  newitup;
> + int indnatts = IndexRelationGetNumberOfAttributes(idxrel);
> + int indnkeyatts = IndexRelationGetNumberOfKeyAttributes(idxrel);
> +
> + Assert(indnatts <= INDEX_MAX_KEYS);
> + Assert(indnkeyatts > 0);
> + Assert(indnkeyatts < indnatts);
> +
> + index_deform_tuple(olditup, itupdesc, values, isnull);
> +
> + /* form new tuple that will contain only key attributes */
> + itupdesc->natts = indnkeyatts;
> + newitup = index_form_tuple(itupdesc, values, isnull);
> + newitup->t_tid = olditup->t_tid;
> +
> + itupdesc->natts = indnatts;

Uh, isn't this a *seriously* bad idea?  If index_form_tuple errors out,
this'll corrupt the tuple descriptor.


Maybe also rename the function to index_build_key_tuple()?

>   * Construct a string describing the contents of an index entry, in the
>   * form "(key_name, ...)=(key_value, ...)".  This is currently used
> - * for building unique-constraint and exclusion-constraint error messages.
> + * for building unique-constraint and exclusion-constraint error messages,
> + * so only key columns of index are checked and printed.

s/index/the index/


> @@ -368,7 +370,7 @@ systable_beginscan(Relation heapRelation,
>   {
>   int j;
>  
> - for (j = 0; j < irel->rd_index->indnatts; j++)
> + for (j = 0; j < 
> IndexRelationGetNumberOfAttributes(irel); j++)

>   {
>   if (key[i].sk_attno == 
> irel->rd_index->indkey.values[j])
>   {
> @@ -376,7 +378,7 @@ systable_beginscan(Relation heapRelation,
>   break;
>   }
>   }
> - if (j == irel->rd_index->indnatts)
> + if (j == IndexRelationGetNumberOfAttributes(irel))
>   elog(ERROR, "column is not in index");
>   }

Not that it matters overly much, but why are we doing this for all
attributes, rather than just key attributes?


> --- a/src/backend/bootstrap/bootstrap.c
> +++ b/src/backend/bootstrap/bootstrap.c
> @@ -600,7 +600,7 @@ boot_openrel(char *relname)
>relname, (int) ATTRIBUTE_FIXED_PART_SIZE);
>  
>   boot_reldesc = heap_openrv(makeRangeVar(NULL, relname, -1), NoLock);
> - numattr = boot_reldesc->rd_rel->relnatts;
> + numattr = RelationGetNumberOfAttributes(boot_reldesc);
>   for (i = 0; i < numattr; i++)
>   {
>   if (attrtypes[i] == NULL)

That seems a bit unrelated.


> @@ -2086

Re: [HACKERS] [COMMITTERS] pgsql: Default monitoring roles

2017-03-30 Thread Tom Lane

Simon Riggs  writes:
> On 30 March 2017 at 19:31, Erik Rijkers  wrote:
>> The buildfarm is showing red (the same errors that I get...):
>> pgrowlocks.c: In function ‘pgrowlocks’:
>> pgrowlocks.c:105:65: error: expected ‘)’ before ‘;’ token
>> is_member_of_role(GetUserId(), DEFAULT_ROLE_STAT_SCAN_TABLES);

> Weird. make check-world just skipped that directory. I guess for Dave also.

If you just did check-world, it probably didn't build contrib modules that
don't have tests, because the "check" target wouldn't do anything without
tests to run.

AFAICS, there isn't any top-level make target that will build all of
contrib except "make world", which also builds the docs and therefore
isn't very fast.  I wonder if we should create some more convenient
testing target that builds all code but not the docs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CORRESPONDING clause design

2017-03-30 Thread Pavel Stehule

Hi

2017-03-30 13:11 GMT+02:00 Surafel Temesgen :

> hi
>
> Thank you very much for your help .
> here is the patch fix that issue as you suggest
>

The crash is fixed

I did a rebase + few more regress tests.

Is following use case defined in standard?

postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
   UNION ALL CORRESPONDING BY(a,b) SELECT 4 AS b, 0 AS x4, 3 AS a,
0 AS x6, -1 AS x6
   UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa;
┌───┐
│ a │
╞═══╡
│ 1 │
│ 3 │
│ 6 │
└───┘
(3 rows)

It depends on order of implementation

if we do (T1 U T2) U T3 ---> then result is correct,
but if we do T1 U (T2 U T3) ---> than it should to fail

I am not sure, if this result is expected (correct). I expect more syntax
error because corresponding by is not filled.


>
> Regards
>
> Surafel
>
>
> On Tue, Mar 28, 2017 at 5:44 PM, Pavel Stehule 
> wrote:
>
>>
>>
>> 2017-03-28 14:18 GMT+02:00 Pavel Stehule :
>>
>>>
>>>
>>> 2017-03-28 13:58 GMT+02:00 Surafel Temesgen :
>>>
 can you help with fixing it Pavel?

>>>
>>> There must be some new preanalyze stage - you have to know result
>>> columns before you are starting a analyze
>>>
>>
>> maybe some recheck after analyze stage to remove invalid columns can be
>> good enough.
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> Regards
>>>
>>> Pavel
>>>

 On Mon, Mar 27, 2017 at 11:48 AM, Pavel Stehule <
 pavel.steh...@gmail.com> wrote:

> Hi
>
> fresh update - I enhanced Value node by location field as Tom proposal.
>
> Few more regress tests.
>
> But I found significant issue, that needs bigger fix - Surafel,
> please, can you fix it.
>
> It crash on
>
> SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
> UNION ALL CORRESPONDING SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS
> x6
> UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS b, -100 AS x9;
>
> I'll mark this patch as waiting on author
>
> Regards
>
> Pavel
>
>
>

>>>
>>
>
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 30792f45f1..2d60718ff1 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -1601,6 +1601,9 @@ SELECT DISTINCT ON (expression , EXCEPT
   
   
+   CORRESPONDING
+  
+  
set union
   
   
@@ -1617,9 +1620,9 @@ SELECT DISTINCT ON (expression , 
-query1 UNION ALL query2
-query1 INTERSECT ALL query2
-query1 EXCEPT ALL query2
+query1 UNION ALL CORRESPONDING BY query2
+query1 INTERSECT ALL CORRESPONDING BY query2
+query1 EXCEPT ALL CORRESPONDING BY query2
 
query1 and
query2 are queries that can use any of
@@ -1659,14 +1662,31 @@ SELECT DISTINCT ON (expression , 
 
   
-   In order to calculate the union, intersection, or difference of two
-   queries, the two queries must be union compatible,
-   which means that they return the same number of columns and
-   the corresponding columns have compatible data types, as
-   described in .
+   EXCEPT returns all rows that are in the result of
+   query1 but not in the result of
+   query2.  (This is sometimes called the
+   difference between two queries.)  Again, duplicates
+   are eliminated unless EXCEPT ALL is used.
   
- 
 
+  
+   CORRESPONDING returns all columns that are in both 
+   query1 and query2 with the same name.
+  
+
+  
+   CORRESPONDING BY returns all columns in the column list 
+   that are also in both query1 and 
+   query2 with the same name. The names in column list
+   must be unique.
+  
+
+  
+   The names of columns in result when CORRESPONDING or
+   CORRESPONDING BY clause is used must be unique in
+   query1 and query2.
+  
+ 
 
  
   Sorting Rows
diff --git a/doc/src/sgml/sql.sgml b/doc/src/sgml/sql.sgml
index 57396d7c24..f98c22e696 100644
--- a/doc/src/sgml/sql.sgml
+++ b/doc/src/sgml/sql.sgml
@@ -859,7 +859,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressioncondition ]
 [ GROUP BY expression [, ...] ]
 [ HAVING condition [, ...] ]
-[ { UNION | INTERSECT | EXCEPT } [ ALL ] select ]
+[ { UNION | INTERSECT | EXCEPT } [ ALL ] [ CORRESPONDING [ BY ( expression ) ] ] select ]
 [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start ]
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1c88d601bd..11e0590eec 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2995,6 +2995,7 @@ _copySelectStmt(const SelectStmt *from)
 	COPY_NODE_FIELD(withClause);
 	COPY_SCALAR_FIELD(op);
 	COPY_SCALAR_FIELD(all);
+	COPY_NODE_FIELD(correspondingClause);
 	COPY_NODE_FIELD(larg);
 	COPY_NODE_FIELD(rarg);
 
@@ -3010,6 +3011,8 @@ _copySetOperationStmt(const SetOperationStmt *from)
 	COPY_SCALAR_FIELD(all);
 	COPY_NODE_FIELD(larg);
 	COPY_NODE_FIELD(rarg);
+	COPY_NODE_FIELD(correspondingColumns);
+	COPY_SCALAR_FIELD(hasCorrespondingBy);
 	COPY_NODE_FIELD(colTypes);
 	COPY_NODE_FIELD(co

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Aleksander Alekseev

Hi Robert,

> Has anybody done some testing of this patch with the WAL consistency
> checker?  Like, create some tables with indexes that have INCLUDE
> columns, set up a standby, enable consistency checking, pound the
> master, and see if the standby bails?

I've decided to run such a test. It looks like there is a bug indeed.

Steps to reproduce:

0. Apply a patch.
1. Build PostgreSQL using quick-build.sh [1]
2. Install master and replica using install.sh [2]
3. Download test.sql [3]
4. Run: `cat test.sql | psql`
5. In replica's logfile:

```
FATAL:  inconsistent page found, rel 1663/16384/16396, forknum 0, blkno 1
```

> Has anybody tested this patch with amcheck?  Does it break amcheck?

Amcheck doesn't complain.

[1] https://github.com/afiskon/pgscripts/blob/master/quick-build.sh
[2] https://github.com/afiskon/pgscripts/blob/master/install.sh
[3] http://afiskon.ru/s/88/93c544e6cf_test.sql

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Monitoring roles patch

2017-03-30 Thread Dave Page

On Thu, Mar 30, 2017 at 2:24 PM, Simon Riggs  wrote:
> On 30 March 2017 at 18:29, Simon Riggs  wrote:
>
>> Moving to commit this over the next hour. Last chance...
>
> Done. Great work Dave, thanks everybody.

Thanks Simon.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CORRESPONDING clause design

2017-03-30 Thread Tom Lane

Pavel Stehule  writes:
> Is following use case defined in standard?

> postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
>UNION ALL CORRESPONDING BY(a,b) SELECT 4 AS b, 0 AS x4, 3 AS a,
> 0 AS x6, -1 AS x6
>UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa;
> ┌───┐
> │ a │
> ╞═══╡
> │ 1 │
> │ 3 │
> │ 6 │
> └───┘
> (3 rows)

> It depends on order of implementation

> if we do (T1 U T2) U T3 ---> then result is correct,
> but if we do T1 U (T2 U T3) ---> than it should to fail

UNION ALL should associate left-to-right, just like most other binary
operators, so this looks fine to me.  Did you check that you get an
error if you put in parens to force the other order?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-30 Thread Pavel Stehule

2017-03-29 20:11 GMT+02:00 Jan Michálek :

>
>
> 2017-03-27 19:41 GMT+02:00 Jan Michálek :
>
>>
>>
>> 2017-03-23 17:26 GMT+01:00 Pierre Ducroquet :
>>
>>> The following review has been posted through the commitfest application:
>>> make installcheck-world:  tested, passed
>>> Implements feature:   tested, passed
>>> Spec compliant:   tested, passed
>>> Documentation:tested, passed
>>>
>>> Hi
>>>
>>> This is my first review (Magnus said in his presentation in PGDay Paris
>>> that volunteers should just come and help, so here I am), so please notify
>>> me for any mistake I do when using the review tools...
>>>
>>> The feature seems to work as expected, but I don't claim to be a
>>> markdown and rst expert.
>>> Some minor issues with the code itself :
>>> - some indentation issues (documentation and code itself with mix
>>> between space based and tab based indentation) and a few trailing spaces in
>>> code
>>>
>>
>> corrected
>>
>>
>>> - typographic issues in the documentation :
>>>   - "The html, asciidoc, latex, latex-longtable, troff-ms, and markdown
>>> and rst formats" ==> duplicated and
>>>
>>
>> corrected
>>
>>>   - "Sets the output format to one of unaligned, aligned, wrapped, html,
>>> asciidoc, latex (uses tabular), latex-longtable, rst, markdown, or
>>> troff-ms." ==> extra comma at the end of the list
>>> - the comment " dont add line after last row, because line is added
>>> after every row" is misleading, it should warn that it's only for rst
>>> - there is a block of commented out code left
>>> - in the print_aligned_vertical function, there is a mix between
>>> "cont->opt->format == PRINT_RST" and "format == &pg_rst" and I don't see
>>> any obvious reason for that
>>>
>> corrected
>>
>>> - the documentation doesn't mention (but ok, it's kind of obvious) that
>>> the linestyle option will not work with rst and markdown
>>>
>>>
>> In this patch are corrected (i hope, i had correct changes in vimrc)
>> indentation issues. Plese, look at this if it is OK (i men indentats) and
>> some minor errors. And it should work on current master (probably).
>>
>
> Added \x option form markdown
> In markdown works multiline cels (newline replaced by )
> regre tests passed
>

\pset format rst
\x
select 10
crash on segfault

Program received signal SIGSEGV, Segmentation fault.
0x7f77673a866c in vfprintf () from /lib64/libc.so.6
(gdb) bt
#0  0x7f77673a866c in vfprintf () from /lib64/libc.so.6
#1  0x7f77673b1574 in fprintf () from /lib64/libc.so.6
#2  0x00437bc5 in print_aligned_vertical (cont=0x7fffade43da0,
fout=,
is_pager=) at print.c:1755
#3  0x0043a70d in printTable (cont=cont@entry=0x7fffade43da0,
fout=,
fout@entry=0x7f77677255e0 <_IO_2_1_stdout_>, is_pager=,
is_pager@entry=0 '\000',
flog=flog@entry=0x0) at print.c:3466
#4  0x0043c37f in printQuery (result=result@entry=0x9c4b60,
opt=opt@entry=0x7fffade43f00,
fout=0x7f77677255e0 <_IO_2_1_stdout_>, is_pager=is_pager@entry=0
'\000', flog=0x0) at print.c:3551
#5  0x0040da6d in PrintQueryTuples (results=0x9c4b60) at
common.c:808
#6  PrintQueryResults (results=0x9c4b60) at common.c:1140
#7  SendQuery (query=0x9c1700 "select 10;") at common.c:1317
#8  0x0041c3d4 in MainLoop (source=0x7f77677248a0 <_IO_2_1_stdin_>)
at mainloop.c:319
#9  0x00405d5d in main (argc=, argv=)
at startup.c:396

Regards

Pavel


>
>
> Jan
>
>
>
>>
>> Have nice day
>>
>> Jan
>>
>>
>>> Thanks !
>>>
>>> The new status of this patch is: Waiting on Author
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>>
>>
>>
>> --
>> Jelen
>> Starší čeledín datovýho chlíva
>>
>
>
>
> --
> Jelen
> Starší čeledín datovýho chlíva
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: [HACKERS] New CORRESPONDING clause design

2017-03-30 Thread Pavel Stehule

2017-03-30 21:43 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > Is following use case defined in standard?
>
> > postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
> >UNION ALL CORRESPONDING BY(a,b) SELECT 4 AS b, 0 AS x4, 3 AS
> a,
> > 0 AS x6, -1 AS x6
> >UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa;
> > ┌───┐
> > │ a │
> > ╞═══╡
> > │ 1 │
> > │ 3 │
> > │ 6 │
> > └───┘
> > (3 rows)
>
> > It depends on order of implementation
>
> > if we do (T1 U T2) U T3 ---> then result is correct,
> > but if we do T1 U (T2 U T3) ---> than it should to fail
>
> UNION ALL should associate left-to-right, just like most other binary
> operators, so this looks fine to me.  Did you check that you get an
> error if you put in parens to force the other order?
>

yes - it fails

 postgres=# SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
UNION ALL CORRESPONDING BY(a,b) (SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6,
-1 AS x6 UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS a, -100 AS aa);
ERROR:  column name "b" can not be used in CORRESPONDING BY list
LINE 1: ...b, 0 AS x3, -1 AS x3 UNION ALL CORRESPONDING BY(a,b) (SELECT...
 ^
HINT:  UNION queries with a CORRESPONDING BY clause must contain column
names from both tables.
Time: 1,135 ms

Regards

Pavel

>
> regards, tom lane
>

Re: [HACKERS] [COMMITTERS] pgsql: Default monitoring roles

2017-03-30 Thread Tom Lane

I wrote:
> Simon Riggs  writes:
>> Weird. make check-world just skipped that directory. I guess for Dave also.

> If you just did check-world, it probably didn't build contrib modules that
> don't have tests, because the "check" target wouldn't do anything without
> tests to run.

I wondered why this is --- wouldn't "check" depend on "all", so that
things get built, even if there's no tests to run?

The answer is, no it doesn't.  "check" depends on "temp-install", and
then temp-install indirectly depends on "all".  So when you say
"make check-world", the reason that any code gets built at all is that
we're trying to install all the executables into the temporary
installation.  However, the basic temp-install target only installs
(and therefore only forces building of) the core code.  In a contrib
subdirectory, what is supposed to happen is that this line in
pgxs.mk causes that contrib subdirectory to also get built/installed:

temp-install: EXTRA_INSTALL+=$(subdir)

But if you look around a bit further, you discover that that line is
inside an "ifdef REGRESS" block --- therefore, unless the module
defines at least one REGRESS-style test, it's not built by "make check".

I thought that the attached patch would be a narrow fix for this,
just moving that dependency out of the "ifdef REGRESS" block.
However, it seems to send "make" into some infinite loop, at
least with make 3.81 which I'm using here.  Not sure why.

I also experimented with just changing "check: temp-install"
in Makefile.global to be "check: all temp-install", and that
seemed to work, but since I don't understand why it wasn't
like that already, I'm a bit afraid of that solution.

regards, tom lane

diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index c27004e..c156806 100644
*** a/src/makefiles/pgxs.mk
--- b/src/makefiles/pgxs.mk
*** check:
*** 282,291 
  else
  check: submake $(REGRESS_PREP)
  	$(pg_regress_check) $(REGRESS_OPTS) $(REGRESS)

  temp-install: EXTRA_INSTALL+=$(subdir)
  endif
- endif # REGRESS

  # STANDARD RULES
--- 282,293 
  else
  check: submake $(REGRESS_PREP)
  	$(pg_regress_check) $(REGRESS_OPTS) $(REGRESS)
+ endif
+ endif # REGRESS

+ ifndef PGXS
  temp-install: EXTRA_INSTALL+=$(subdir)
  endif

  # STANDARD RULES

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity

2017-03-30 Thread Stephen Frost

Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I'd be fine with removing --verbose globally, as your patch does, but
> > there was some argument that we then would have long 'quiet' periods.
> > I haven't had a chance to go test if that's really the case yet though.
> 
> I've been running it like this lately:
> 
> make -s check-world -j4 PROVE_FLAGS='-j4 --quiet --nocolor --nocount'
> 
> and it is still pretty noisy ;-).  The only place where it sits for
> more than a couple of seconds without printing anything is at the very
> start, which I believe to be the initial "make temp-install" step,
> which would be unaffected by prove verbosity anyway.
> 
> So I'd be +1 for just removing --verbose by default.  Anybody who really
> wants it can put it back via PROVE_FLAGS.

Yeah, I'm feeling the same way on this, and it'd reduce the size of the
logs on the buildfarm too, which I believe is a good thing.

Unless people wish to object, I'll use Michael's patch to remove
--verbose from the top level tomorrow.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh

On Fri, Mar 31, 2017 at 12:32 AM, Thomas Munro
 wrote:
> On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra
>  wrote:
>> Hi,
>>
>> While doing some benchmarking, I've ran into a fairly strange issue with OOM
>> breaking LaunchParallelWorkers() after the restart. What I see happening is
>> this:
>>
>> 1) a query is executed, and at the end of LaunchParallelWorkers we get
>>
>> nworkers=8 nworkers_launched=8
>>
>> 2) the query does a Hash Aggregate, but ends up eating much more memory due
>> to n_distinct underestimate (see [1] from 2015 for details), and gets killed
>> by OOM
>>
>> 3) the server restarts, the query is executed again, but this time we get in
>> LaunchParallelWorkers
>>
>> nworkers=8 nworkers_launched=0
>>
>> There's nothing else running on the server, and there definitely should be
>> free parallel workers.
>>
>> 4) The query gets killed again, and on the next execution we get
>>
>> nworkers=8 nworkers_launched=8
>>
>> again, although not always. I wonder whether the exact impact depends on OOM
>> killing the leader or worker, for example.
>
> I don't know what's going on but I think I have seen this once or
> twice myself while hacking on test code that crashed.  I wonder if the
> DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because
> the DSM control is somehow confused?
>
I think I've run into the same problem while working on parallelizing
plans containing InitPlans. You can reproduce that scenario by
following steps:

1. Put an Assert(0) in ParallelQueryMain(), start server and execute
any parallel query.
 In LaunchParallelWorkers, you can see
   nworkers = n nworkers_launched = n (n>0)
But, all the workers will crash because of the assert statement.
2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count. In LaunchParallelWorkers, we have the
following condition:
if ((BackgroundWorkerData->parallel_register_count -
 BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
DO NOT launch any parallel worker.
Hence, nworkers = n nworkers_launched = 0.

I thought because of my stupid mistake the parallel worker is
crashing, so, this is supposed to happen. Sorry for that.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 152 matches

Mail list logo