Re: csv format for psql

2018-11-11 Thread Daniel Verite
Michael Paquier wrote:

> - The experience is confusing, as the psql format uses different options
> than the backend to do the same things:
> -- tuples_only instead of HEADER.
> -- fieldsep_csv instead of DELIMITER
> -- null is an equivalent of the one with the same name, which is
> actually consistent.
> -- encoding is also an equivalent of ENCODING.

That conveniently ignores the fact that "\pset format somename"
as a command doesn't take any additional option, contrary to COPY.
We can't do \pset format csv (delimiter=';')
If we choosed "delimiter" as an option name, it would have to exist
within 20 other names in the \pset namespace and then it would be
too vague, whereas "fieldsep_csv" makes it clear what it applies to.
"tuples_only" is preexisting, and I don't see how the comment that it's
not called "header" could be actionable.

Overall, you seem to posit that we should mimic the entire
COPY TO interface to implement 'psql --csv'.
But the starting point is that 'psql --csv' is just a slightly
different (safer) variant of 'psql -At', which is not concerned
at all with being consistent with COPY.
The model of the csv output format is the unaligned output format,
it's just not COPY.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re:Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-11-11 Thread chenhj
Hi


Before we only discussed the connection hang on the primary, the connection 
hang on the standby database should be another problem.
 When the recovery process replays the gin's delete WAL record, the order of 
get lwlock is not the same as the select process, 
resulting in a deadlock. Accord infomation form gcore, the details are as 
follows:


## select process


   2246(rootBlkno=2246)
  |
3254(1. Held buffer=366260,LWLock=0x2d50e2e4)  --rightlink--> 483(2. 
Acquire buffer=2201739,LWLock=0x2aaab45158a4)


The ginStepRight() function in select process gets the lwlock in the order of 
left to right.


## recovey process


   2246(2. Held buffer=7034160,LWLock=0x2aaac6c081e4,rootBlkno)
  |
3254(3. Acquire buffer=366260,LWLock=0x2d50e2e4)  --rightlink--> 483(1. 
Held buffer=2201739,LWLock=0x2aaab45158a4)


But, the ginRedoDeletePage() function in recovery process gets the lwlock in 
the order of current to parent and to left.
So, i think inconsistent lock order in ginRedoDeletePage() is the reason for 
hang in the standby.


static void
ginRedoDeletePage(XLogReaderState *record)
{
XLogRecPtrlsn = record->EndRecPtr;
ginxlogDeletePage *data = (ginxlogDeletePage *) XLogRecGetData(record);
Bufferdbuffer;
Bufferpbuffer;
Bufferlbuffer;
Pagepage;


if (XLogReadBufferForRedo(record, 0, &dbuffer) == BLK_NEEDS_REDO)
{
page = BufferGetPage(dbuffer);
Assert(GinPageIsData(page));
GinPageGetOpaque(page)->flags = GIN_DELETED;
PageSetLSN(page, lsn);
MarkBufferDirty(dbuffer);
}


if (XLogReadBufferForRedo(record, 1, &pbuffer) == BLK_NEEDS_REDO)
{
page = BufferGetPage(pbuffer);
Assert(GinPageIsData(page));
Assert(!GinPageIsLeaf(page));
GinPageDeletePostingItem(page, data->parentOffset);
PageSetLSN(page, lsn);
MarkBufferDirty(pbuffer);
}


if (XLogReadBufferForRedo(record, 2, &lbuffer) == BLK_NEEDS_REDO)
{
page = BufferGetPage(lbuffer);
Assert(GinPageIsData(page));
GinPageGetOpaque(page)->rightlink = data->rightLink;
PageSetLSN(page, lsn);
MarkBufferDirty(lbuffer);
}


if (BufferIsValid(lbuffer))
UnlockReleaseBuffer(lbuffer);
if (BufferIsValid(pbuffer))
UnlockReleaseBuffer(pbuffer);
if (BufferIsValid(dbuffer))
UnlockReleaseBuffer(dbuffer);
}




The order of get lwlock in ginRedoDeletePage() may should be change from 
"dbuffer->pbuffer->lbuffer" to "lbuffer->dbuffer->pbuffer" . Is this right?




## How to reproduct this issue


1. modify ginxlog.c and build(add "sleep(60)" )


if (XLogReadBufferForRedo(record, 1, &pbuffer) == BLK_NEEDS_REDO)
{
page = BufferGetPage(pbuffer);
Assert(GinPageIsData(page));
Assert(!GinPageIsLeaf(page));
GinPageDeletePostingItem(page, data->parentOffset);
PageSetLSN(page, lsn);
MarkBufferDirty(pbuffer);
}
==>
if (XLogReadBufferForRedo(record, 1, &pbuffer) == BLK_NEEDS_REDO)
{
page = BufferGetPage(pbuffer);
Assert(GinPageIsData(page));
Assert(!GinPageIsLeaf(page));
GinPageDeletePostingItem(page, data->parentOffset);
PageSetLSN(page, lsn);
MarkBufferDirty(pbuffer);
sleep(60);//add for debug
}




2. run test SQL on the primary


create table tb1(id int);
create index ON tb1 using gin(id);
insert into tb1 select 1 from generate_series(1,100)id;
delete from tb1;


3. check recovery process in standby had enter "sleep()" branch


$ ps -ef|grep reco
postgres 13418 13417  0 22:23 ?00:00:00 postgres: startup process   
recovering 0001005E
postgres 13425 31505  0 22:23 pts/800:00:00 grep --color=auto reco
$ pstack 13418
#0  0x7f2166d39650 in __nanosleep_nocancel () from /lib64/libc.so.6
#1  0x7f2166d39504 in sleep () from /lib64/libc.so.6
#2  0x0048614f in ginRedoDeletePage (record=0x127cbe8) at 
ginxlog.c:480
#3  gin_redo (record=0x127cbe8) at ginxlog.c:732
#4  0x004efec3 in StartupXLOG ()
#5  0x00697c51 in StartupProcessMain ()
#6  0x004fd22a in AuxiliaryProcessMain ()
#7  0x00694e49 in StartChildProcess ()
#8  0x00697665 in PostmasterMain ()
#9  0x004766e1 in main ()


4. execute select SQL


set enable_seqscan = false;
select count(*) from tb1 where id =2;


5. check result


recovery process block in LWLock


$ pstack 13418
#0  0x7f216775779b in do_futex_wait.constprop.1 () from 
/lib64/libpthread.so.0
#1  0x7f216775782f in __new_sem_wait_slow.constprop.0 () from 
/lib64/libpthread.so.0
#2  0x7f21677578cb in sem_wait@@GLIBC_2.2.5 () from 
/lib64/libpthread.so.0
#3  0x00685df2 in PGSemaphoreLock ()
#4  0x006edd64 in LWLockAcquire ()
#5  0x004fa66a in XLogReadBufferForRedoExtended ()
#6  0x00486161 in ginRedoDeletePage (record=0x127cbe8) at 
ginxlog.c:483
#7  gin_redo (record=0x127cbe8) at ginxlog.c:732
#8  0x004efec3 in StartupXLOG ()
#9  0x00697c51 in StartupProcessMain ()
#10 0x004fd22a in AuxiliaryProcessMain ()
#11 0x000

Re: zheap: a new storage format for PostgreSQL

2018-11-11 Thread Kuntal Ghosh
On Sat, Nov 10, 2018 at 8:51 PM Daniel Westermann
 wrote:
>
> >>Thanks. Initializing the variable seems like the right fix here.
>
> ... just had a warning when recompiling from the latest sources on CentOS 7:
>
> labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing 
> -fwrapv -fexcess-precision=standard -O2 -I../../../../src/include  
> -D_GNU_SOURCE -I/usr/include/libxml2   -c -o tpd.o tpd.c
> tpd.c: In function ‘TPDFreePage’:
> tpd.c:1003:15: warning: variable ‘curblkno’ set but not used 
> [-Wunused-but-set-variable]
>   BlockNumber  curblkno = InvalidBlockNumber;
>^
Thanks Daniel for testing zheap and reporting the issue. We'll push a
fix for the same.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: COPY FROM WHEN condition

2018-11-11 Thread Tomas Vondra




On 11/9/18 4:51 PM, Adam Berlin wrote:
> As a newcomer to this patch, when I read this example:
> 
> COPY table_name WHEN (some_condition)
> 
> .. I expect COPY to only be run when the condition is true, and I do
not expect the WHEN clause to filter rows. I'm curious what you think about:
> 

Hmmm. Currently we have WHEN clause in three places:

1) triggers, where it specifies arbitrary expression on rows,
determining whether the trigger should be invoked at all

2) event triggers, where it does about the same thing as for regular
triggers, but only expressions "column IN (...)" are allowed

3) CASE WHEN ... THEN  ...

I'd say 3 is rather unrelated to this discussion, and 1+2 seem to
support you objection to using WHEN for COPY, as it kinda specifies
whether the command itself should be executed.

> COPY table_name WHERE (some_condition)
> 
> Users should already be familiar with the idea that WHERE performs a filter.
> 

So, what about using FILTER here? We already use it for aggregates when
filtering rows to process.

That being said, I have no strong feelings either way. I'd be OK with
both WHEN and WHERE.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [PATCH v2] Add and report the new "session_read_only" GUC pseudo-variable.

2018-11-11 Thread Melanie Plageman
On Mon, Sep 24, 2018 at 9:28 AM Elvis Pranskevichus 
wrote:

> On Thursday, March 1, 2018 4:25:16 AM EDT Andres Freund wrote:
> > A CF entry for this patch has been created yesterday, without any
> > changes having happend since the above status update.  Given that
> > there's been no progress for multiple commitfests, and this is the
> > last CF, this doesn't seem appropriate. Therefore marked as returned
> > with feedback.
>
> Sorry for the long silence on this.  I'm attaching a rebased and
> cleaned-up patch with the earlier review issues addressed.  I've checked
> all relevant tests and verified manually.
>
>  Elvis


After taking a look at the updated patch, it appears to address the review
feedback which I provided. However, the patch didn't apply cleanly for me,
so
it likely needs a quick look along with a rebase.

After looking at the patch [1] to add a "prefer-read" option to
target-session-attrs which is also up for consideration this commitfest, I
was
wondering, assuming it is desirable to have both "prefer-read" and
"read-only"
modes, does it make sense for the "prefer-read" option to also be
implemented
with a ParameterStatus message instead of using polling?

[1] https://commitfest.postgresql.org/20/1677/


Re:Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-11-11 Thread chjis...@163.com
There's a mistake in the last mail.About how to reproduct the issue,the following SQL should be:set enable_seqscan = false;select count(*) from tb1 where id =1;4. execute select SQL    set enable_seqscan = false;    select count(*) from tb1 where id =2;Regard,Chen Huajun

RE: [PROPOSAL]a new data type 'bytea' for ECPG

2018-11-11 Thread Matsumura, Ryo
> From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com]
> 
> I think the host variable data type that corresponds to the server-side bytea
> should be bytea.  As the following pages state or imply, it would be better
> to create standard-compliant LOB types someday, and use the keyword BLOB in
> ECPG for that type.  The server-side data types should have the names BLOB,
> CLOB and NCLOB.  Those types should handle data larget than 1 GB and have the
> locator feature defined in the SQL standard.  Maybe we should also advanced
> LOB features like Oracle's SecureFiles LOB and SQL Server's FileTables.

Tsunakawa-san, thanks for your advice.
I understand that C type definition of client-side bytea is not constrained by 
the standard BLOB.

What should I do next?
For now, I attach a patch that is removed noise(pgindent/typedef.list).

P.S.
The patch does not support ECPG.bytea in sqltype of "struct sqlvar_struct" 
because of compatibility.

Regards
Ryo Matsumura


ecpg_bytea_v1_1.patch
Description: ecpg_bytea_v1_1.patch


Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

2018-11-11 Thread Amit Langote
On 2018/11/10 7:33, Tom Lane wrote:
> Alvaro Herrera  writes:
>> On 2018-Nov-09, Jürgen Strobel wrote:
>>> Regarding your example, what I expected is that *both* inserts would
>>> consistently result in a tuple of (1, 42) since p should route the
>>> insert to p1 and use p1's defaults. The current inconsistent behavior is
>>> the heart of the bug.
> 
>> I think that would be sensible behavior, as long as the partition
>> doesn't override values for the partitioning column -- i.e. if the
>> default values don't re-route the tuple to another partition, I think we
>> should use the partition's default rather than the parent.  This says we
>> should expand defaults after routing.
> 
> I'd argue not, actually.  I think there is plausible precedent in
> updatable views, where what we use is the defaults associated with the
> view, not the underlying table.  Correspondingly, what ought to govern
> in a partitioned insert is the defaults associated with the table actually
> named in the insert command, never mind what its partitions might say.
> That is useful for many situations, and it avoids all the logical
> inconsistencies you get into if you find that the defaults associated
> with some partition would force re-routing elsewhere.
> 
> In any case, you can't make this happen without basically blowing up
> default processing altogether.  Defaults are currently expanded by the
> planner, and pretty early too.  To make it work like the OP wishes,
> we'd have to insert them sometime late in the executor.

Considering multi-level partitioning, that'd mean the tuple being routed
would need to be filled with the defaults of every table on the way to a
leaf partition, including that of the leaf partition where the tuple
finally ends up.  If we re-route upon the final leaf partition's defaults
leading to partition constraint violation, it's possible that someone
might end up setting up an infinite re-routing loop with that behavior.

create table p (a int) partition by list (a);
create table p1 partition of p (a default 2) for values in (1);
create table p2 partition of p (a default 1) for values in (2);

It won't be fun for the server to try to prevent that kind of thing.

IOW, it might be a good idea to call the ability to set partition-level
defaults a deprecated feature?

Thanks,
Amit




Re: zheap: a new storage format for PostgreSQL

2018-11-11 Thread Amit Kapila
On Sun, Nov 11, 2018 at 11:55 PM Kuntal Ghosh
 wrote:
>
> On Sat, Nov 10, 2018 at 8:51 PM Daniel Westermann
>  wrote:
> >
> > >>Thanks. Initializing the variable seems like the right fix here.
> >
> > ... just had a warning when recompiling from the latest sources on CentOS 7:
> >
> > labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing 
> > -fwrapv -fexcess-precision=standard -O2 -I../../../../src/include  
> > -D_GNU_SOURCE -I/usr/include/libxml2   -c -o tpd.o tpd.c
> > tpd.c: In function ‘TPDFreePage’:
> > tpd.c:1003:15: warning: variable ‘curblkno’ set but not used 
> > [-Wunused-but-set-variable]
> >   BlockNumber  curblkno = InvalidBlockNumber;
> >^

This variable is used only for Asserts, so we need to use
PG_USED_FOR_ASSERTS_ONLY while declaring it.

> Thanks Daniel for testing zheap and reporting the issue. We'll push a
> fix for the same.
>

Pushed the fix now.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

2018-11-11 Thread Tom Lane
Amit Langote  writes:
> On 2018/11/10 7:33, Tom Lane wrote:
>> I'd argue not, actually.  I think there is plausible precedent in
>> updatable views, where what we use is the defaults associated with the
>> view, not the underlying table.  Correspondingly, what ought to govern
>> in a partitioned insert is the defaults associated with the table actually
>> named in the insert command, never mind what its partitions might say.
>> That is useful for many situations, and it avoids all the logical
>> inconsistencies you get into if you find that the defaults associated
>> with some partition would force re-routing elsewhere.

> ...
> IOW, it might be a good idea to call the ability to set partition-level
> defaults a deprecated feature?

Not necessarily.  They'd apply when you insert directly into a particular
partition by name.

regards, tom lane



Re: Adding a TAP test checking data consistency on standby with minRecoveryPoint

2018-11-11 Thread Michael Paquier
On Fri, Nov 09, 2018 at 12:42:27PM +, Andrew Gierth wrote:
> Seems simple enough - the LSN on the page is actually two 4-byte values
> with the most significant one first, regardless of platform byte order
> (though each individual word is in native order), so
> 
> my ($hi,$lo) = unpack("LL", $buf);
> 
> should suffice. ("L" is always 32 bits regardless of platform, and it
> has the platform's endianness.)

Thanks for the review, Andrew.  And I completely forgot that this is at
the beginning of the page.

> Looking only at the last page seems questionable.

I have switched also the online check so as it also looks at the full
range of blocks instead of only the last one.

> Something like this should work to return the largest LSN of any page
> in the specified list of files:
> 
> # find_largest_lsn(blocksize,filenames...)
> sub find_largest_lsn

Thanks!  I am stealing that stuff, and I have added an offline check by
comparing the value of minRecoveryPoint in pg_controldata.  Again, if
you revert c186ba13 and run the tests, both online and offline failures
are showing up.

What do you think?
--
Michael
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index b9cb51b9d3..ce59401cef 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -36,6 +36,7 @@ our @EXPORT = qw(
   system_or_bail
   system_log
   run_log
+  run_command
 
   command_ok
   command_fails
@@ -203,6 +204,16 @@ sub run_log
 	return IPC::Run::run(@_);
 }
 
+sub run_command
+{
+	my ($cmd) = @_;
+	my ($stdout, $stderr);
+	my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+	chomp($stdout);
+	chomp($stderr);
+	return ($stdout, $stderr);
+}
+
 # Generate a string made of the given range of ASCII characters
 sub generate_ascii_string
 {
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index daf79a0b1f..0364a927c3 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-
 
-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib/test_decoding contrib/pageinspect
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/016_min_consistency.pl b/src/test/recovery/t/016_min_consistency.pl
new file mode 100644
index 00..8909226bf5
--- /dev/null
+++ b/src/test/recovery/t/016_min_consistency.pl
@@ -0,0 +1,174 @@
+# Test for checking consistency of on-disk pages for a cluster with
+# the minimum recovery LSN, ensuring that the updates happen across
+# all processes.  In this test, the updates from the startup process
+# and the checkpointer (which triggers non-startup code paths) are
+# both checked.
+
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+# Find the largest LSN in the set of pages part of the given relation
+# file.  This is used for offline checks of page consistency.  The LSN
+# is historically stored as a set of two numbers of 4 byte-length
+# located at the beginning of each page.
+sub find_largest_lsn
+{
+	my $blocksize = int(shift);
+	my $filename = shift;
+	my ($max_hi,$max_lo) = (0,0);
+	open(my $fh, "<:raw", $filename)
+	  or die "failed to open $filename: $!";
+	my ($buf,$len);
+	while ($len = read($fh, $buf, $blocksize))
+	{
+		$len == $blocksize
+		  or die "read only $len of $blocksize bytes from $filename";
+		my ($hi,$lo) = unpack("LL", $buf);
+
+		if ($hi > $max_hi or ($hi == $max_hi and $lo > $max_lo))
+		{
+			($max_hi,$max_lo) = ($hi,$lo);
+		}
+	}
+	defined($len) or die "read error on $filename: $!";
+	close($fh);
+
+	return sprintf("%X/%X", $max_hi, $max_lo);
+}
+
+# Initialize primary node
+my $primary = get_new_node('primary');
+$primary->init(allows_streaming => 1);
+
+# Set shared_buffers to a very low value to enforce discard and flush
+# of PostgreSQL buffers on standby, enforcing other processes than the
+# startup process to update the minimum recovery LSN in the control
+# file.  Autovacuum is disabled so as there is no risk of having other
+# processes than the checkpointer doing page flushes.
+$primary->append_conf("postgresql.conf", backup('bkp');
+my $standby = get_new_node('standby');
+$standby->init_from_backup($primary, 'bkp', has_streaming => 1);
+$standby->start;
+
+# Object creations for the upcoming tests:
+# - Base table whose data consistency is checked.
+# - pageinspect to look at the page-level contents.
+# - Function wrapper on top of pageinspect to scan a range of pages and
+#   get the maximum LSN present.
+$primary->safe_psql('postgres', "
+create extension pageinspect;
+-- Function wrapper on top of pageinspect which fetches the largest LSN
+-- present in the given page range.
+create or replace function max_lsn_range(relname text,
+  start_blk int,
+  end_blk int)
+returns pg_lsn as \$\$
+declare
+  max_lsn pg_lsn = '0/0'::pg_lsn;
+  cur_lsn pg_lsn;
+begin
+  for i in start_blk..end_blk

Re: csv format for psql

2018-11-11 Thread David G. Johnston
On Friday, November 9, 2018, Michael Paquier  wrote:

> On Fri, Nov 09, 2018 at 05:28:07PM +0100, Daniel Verite wrote:
> > But again COPY is concerned with importing the data that preexists,
> > even if it's weird, whereas a psql output formats are not.
>
> Hm.  I checked the contents of the patch in details which provide output
> consistent with COPY, but after looking at the global picture I am
> getting cold feet on this patch for a couple of reasons:
> - This stuff adds new code paths in the frontend mimicking what the
> backend already does for years, both doing the same thing.


>From the original post:

"copy with csv can't help for the output of meta-commands
such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql
does work with these."

Formatting is usually a client-side concern so this feature fits well
there. The fact that we’ve provided a server interface for the same doesn’t
preclude its potential desirability in the client.

- There are already three ways to fetch data in this format with COPY,
> \copy and file_fdw, with all three using the same code paths for option
> validations (I can see the arguments at the top of the thread for which
> COPY SELECT can actually do everything you want with?).


Not always conveniently.


> - The experience is confusing, as the psql format uses different options
> than the backend to do the same things:


Yes, those who use psql need to learn its features.  I’d posit that since
this syntax is being learned anyway that transferring said knowledge to a
newly added csv format will not be confusing.  No more so that having to do
something that is usually client-side (formatting) on the server in the
first place.  That we don’t fully replicate the server functionality does’t
bother be.  This is meant to be a simple and quick ability that handles 95%
of use cases and defers to the more powerful server version for the
outliers.

Feature-wise I’m on board.  Given it’s already written I’d say it should go
in unless there are code complexity and support concerns - which given the
prevalence of other formats I have to believe there are not.

David J.


Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

2018-11-11 Thread Amit Langote
On 2018/11/12 12:59, Tom Lane wrote:
> Amit Langote  writes:
>> On 2018/11/10 7:33, Tom Lane wrote:
>>> I'd argue not, actually.  I think there is plausible precedent in
>>> updatable views, where what we use is the defaults associated with the
>>> view, not the underlying table.  Correspondingly, what ought to govern
>>> in a partitioned insert is the defaults associated with the table actually
>>> named in the insert command, never mind what its partitions might say.
>>> That is useful for many situations, and it avoids all the logical
>>> inconsistencies you get into if you find that the defaults associated
>>> with some partition would force re-routing elsewhere.
> 
>> ...
>> IOW, it might be a good idea to call the ability to set partition-level
>> defaults a deprecated feature?
> 
> Not necessarily.  They'd apply when you insert directly into a particular
> partition by name.

Yes.  Maybe, we should document that the partition default are not honored
when inserting through the parent.

Thanks,
Amit




RE: speeding up planning with partitions

2018-11-11 Thread Imai, Yoshikazu
Hi Amit,

On Thu, Nov 8, 2018 at 8:26 PM, Amit Langote wrote:
> On 2018/11/07 10:00, Imai, Yoshikazu wrote:
> > About inheritance_make_rel_from_joinlist(), I considered how it processes
> > joins for sub-partitioned-table.
> > 
> > sub-partitioned-table image:
> > part
> >   sub1
> > leaf1
> > leaf2
> > 
> > inheritance_make_rel_from_joinlist() translates join_list and join_info_list
> > for each leafs(leaf1, leaf2 in above image). To translate those two lists 
> > for
> > leaf1, inheritance_make_rel_from_joinlist() translates lists from part to 
> > sub1
> > and nextly from sub1 to leaf1. For leaf2, 
> > inheritance_make_rel_from_joinlist() 
> > translates lists from part to sub1 and from sub1 to leaf2. There are same
> > translation from part to sub1, and I think it is better if we can eliminate 
> > it.
> > I attached 0002-delta.patch.
> > 
> > In the patch, inheritance_make_rel_from_joinlist() translates lists not 
> > only for
> > leafs but for mid-nodes, in a depth-first manner, so it can use lists of
> > mid-nodes for translating lists from mid-node to leafs, which eliminates 
> > same
> > translation.
> 
> I don't think the translation happens twice for the same leaf partitions.
> 
> Applying adjust_appendrel_attrs_*multilevel* for only leaf nodes, as
> happens with the current code, is same as first translating using
> adjust_appendrel_attrs from part to sub1 and then from sub1 to leaf1 and
> leaf2 during recursion with sub1 as the parent.

Thanks for replying.
I interpreted your thoughts about translation as below.

adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1
adjust_appendrel_attrs_multilevel for leaf2: sub1(produced at above) -> leaf2

But I wonder translation of leaf2 actually reuses the results of sub1 which is
produced at leaf1 translation. ISTM translation for leaf1, leaf2 are executed
as below.

adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1
adjust_appendrel_attrs_multilevel for leaf2: root -> sub1 -> leaf2


> > I think it might be better if we can apply same logic to 
> > inheritance_planner(),
> > which once implemented the same logic as below. 
> > 
> > 
> > foreach(lc, root->append_rel_list)
> > {
> > AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
> > ...
> > /*
> >  * expand_inherited_rtentry() always processes a parent before any 
> > of
> >  * that parent's children, so the parent_root for this relation 
> > should
> >  * already be available.
> >  */
> > parent_root = parent_roots[appinfo->parent_relid];
> > Assert(parent_root != NULL);
> > parent_parse = parent_root->parse;
> > ...
> > subroot->parse = (Query *)
> > adjust_appendrel_attrs(parent_root,
> >(Node *) parent_parse,
> >1, &appinfo);
> 
> Actually, inheritance_planner is also using
> adjust_appendrel_attrs_multilevel.  I'm not sure if you're looking at the
> latest (10/26) patch.

Sorry for my poor explanation. I described the above code as old code which is
not patch applied. 


Since it is difficult to explain my thoughts with words, I will show the
performance degration case.

Partition tables are below two sets.

Set1:
[create 800 partitions directly under root]
CREATE TABLE rt (a int, b int) PARTITION BY RANGE (a);
\o /dev/null
SELECT 'CREATE TABLE leaf' || x::text || ' PARTITION OF rt FOR VALUES FROM ('
|| (x)::text || ') TO (' || (x+1)::text || ');' FROM generate_series(0, 800) x;
\gexec
\o

Set2:
[create 800 partitions under a partitioned table which is under root]
CREATE TABLE rt (a int, b int) PARTITION BY RANGE (a);
CREATE TABLE sub1 PARTITION OF rt FOR VALUES FROM (0) TO (100) PARTITION BY
RANGE (b);
\o /dev/null
SELECT 'CREATE TABLE leaf' || x::text || ' PARTITION OF sub1 FOR VALUES FROM ('
|| (x)::text || ') TO (' || (x+1)::text || ');' FROM generate_series(0, 800) x;
\gexec
\o


Create a generic plan of updation or deletion.

[create a delete generic plan]
set plan_cache_mode = 'force_generic_plan';
prepare delete_stmt(int) as delete from rt where b = $1;
execute delete_stmt(1);


In creating generic plan, paths/plans for all partitions are created because
we don't know which plan is used before "EXECUTE" command happens.
In creating paths in inheritance_planner(),
adjust_appendrel_attrs()/adjust_appendrel_attrs_multilevel() is executed many
times and it allocates a lot of memory in total if there are many partitions.

How amount of memory is used with above tests is...

without v5 patches, Set1: 242MB
without v5 patches, Set2: 247MB
with v5 patches, Set1: 420MB
with v5 patches, Set2: 820MB
# Thanks for supplying v5 patches :)

Without sub-partition(Set1), memory allocated by adjust_appendrel_attrs() with
v5 patches is 1.7x larger than without v5 patches. With sub-partition(Set2),
memory allocated by adjust_appendrel_attrs() with v5 patches is 3.3x larger
than w

Re: Tid scan improvements

2018-11-11 Thread Edmund Horner
Hi, here's the new patch(s).

Mostly the same, but trying to address your comments from earlier as
well as clean up a few other things I noticed.

Cheers,
Edmund

On Fri, 9 Nov 2018 at 15:01, Edmund Horner  wrote:
>
> On Tue, 6 Nov 2018 at 16:40, David Rowley  
> wrote:
> > I've been looking over 0001 to 0003. I ran out of steam before 0004.
>
> Hi David, thanks for another big review with lots of improvements.
>
> > I like the design of the new patch. From what I threw so far at the
> > selectivity estimation code, it seems pretty good.  I also quite like
> > the design in nodeTidscan.c for range scans.
>
>
> > I didn't quite manage to wrap my head around the code that removes
> > redundant quals from the tidquals. For example, with:
> >
> > postgres=# explain select * from t1 where ctid <= '(0,10)' and a = 0;
> > QUERY PLAN
> > --
> >  Tid Scan on t1  (cost=0.00..3.19 rows=1 width=4)
> >TID Cond: (ctid <= '(0,10)'::tid)
> >Filter: (a = 0)
> > (3 rows)
> >
> > and:
> >
> > postgres=# explain select * from t1 where ctid <= '(0,10)' or a = 20
> > and ctid >= '(0,0)';
> >   QUERY PLAN
> > --
> >  Tid Scan on t1  (cost=0.01..176.18 rows=12 width=4)
> >TID Cond: ((ctid <= '(0,10)'::tid) OR (ctid >= '(0,0)'::tid))
> >Filter: ((ctid <= '(0,10)'::tid) OR ((a = 20) AND (ctid >= 
> > '(0,0)'::tid)))
> > (3 rows)
> >
> > I understand why the 2nd query didn't remove the ctid quals from the
> > filter, and I understand why the first query could. I just didn't
> > manage to convince myself that the code behaves correctly for all
> > cases.
>
> I agree it's not obvious.
>
> 1. We extract a set of tidquals that can be directly implemented by
> the Tid scan.  This set is of the form:  "(CTID op ? AND ...) OR
> (...)" (with some limitations).
> 2. If they happened to come verbatim from the original RestrictInfos,
> then they will be found in scan_clauses, and we can remove them.
> 3. If they're not verbatim, i.e. the original RestrictInfos have
> additional criteria that the Tid scan can't use, then tidquals won't
> match anything in scan_clauses, and hence scan_clauses will be
> unchanged.
> 4. We do a bit of extra work for the common and useful case of "(CTID
> op ? AND ...)".  Since the top-level operation of the input quals is
> an AND, it will typically be split into multiple RestrictInfo items.
> We remove each part from scan_clauses.
>
> > 1. I see a few instances of:
> >
> > #define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X))
> > #define ItemPointerGetDatum(X) PointerGetDatum(X)
> >
> > in both tid.c and ginfuncs.c, and I see you have:
> >
> > + itemptr = (ItemPointer) DatumGetPointer(constval);
> >
> > Do you think it would be worth moving the macros out of tid.c and
> > ginfuncs.c into postgres.h and use that macro instead?
> >
> > (I see the code in this file already did this, so it might not matter
> > about this)
>
> I'm not sure about this one - - I think it's better as a separate
> patch, since we'd also change ginfuncs.c.  I have left it alone for
> now.
>
> > 2. In TidCompoundRangeQualFromExpr() rlst is not really needed. You
> > can just return MakeTidRangeQuals(found_quals); or return NIL.
>
> Yup, gone.
>
> > 3. Can you explain why this only needs to take place when list_length() == 
> > 1?
> >
> > /*
> > * In the case of a compound qual such as "ctid > ? AND ctid < ? AND ...",
> > * the various parts will have come from different RestrictInfos.  So
> > * remove each part separately.
> > */
> > ...
>
> I've tried to improve the comment.
>
> > 4. Accidental change?
> >
> > - tidquals);
> > + tidquals
> > + );
> >
> > 5. Shouldn't this comment get changed?
> >
> > - * NumTidsnumber of tids in this scan
> > + * NumRangesnumber of tids in this scan
> >
> > 6. There's no longer a field named NumTids
> >
> > - * TidListevaluated item pointers (array of size NumTids)
> > + * TidRangesevaluated item pointers (array of size NumTids)
> >
> > 7. The following field is not documented in TidScanState:
> >
> > + bool tss_inScan;
> >
> > 8. Can you name this exprtype instead?
> >
> > + TidExprType type; /* type of op */
> >
> > "type" is used by Node types to indicate their type.
>
> Yup, yup, yup, yup, yup.
>
> > 9. It would be neater this:
> >
> > if (expr->opno == TIDLessOperator || expr->opno == TIDLessEqOperator)
> > tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
> > else if (expr->opno == TIDGreaterOperator || expr->opno == 
> > TIDGreaterEqOperator)
> > tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
> > else
> > tidopexpr->type = TIDEXPR_EQ;
> >
> > tidopexpr->exprstate = exprstate;
> >
> > tidopexpr->inclusive = expr->opno == TIDLessEqOperator || expr->opno
> > == TIDGreaterEqOperator;
> >
> > as a switch: ...
>
> Yup, I think the switch is a bit 

Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-11 Thread Haribabu Kommi
On Fri, Nov 9, 2018 at 2:12 PM Amit Kapila  wrote:

> On Thu, Nov 8, 2018 at 10:56 PM Magnus Hagander 
> wrote:
> >
> > On Thu, Nov 8, 2018 at 3:53 PM Sergei Kornilov  wrote:
> >>
> >> Hi
> >>
> >> >>  Sure, but what are we going to achieve with that number? What
> >> >>  information user is going to get by that? If it can help us to
> ensure
> >> >>  that it has reset the expected number of statements, then I can see
> >> >>  the clear usage, but without that, the return value doesn't seem to
> >> >>  have any clear purpose. So, I don't see much value in breaking
> >> >>  compatibility.
> >> >>
> >> >>  Does anyone else have an opinion on this matter?
> >> >
> >> > This was proposed by Sergei Kornilov in
> >> > https://postgr.es/m/3368121530260...@web21g.yandex.ru saying that "it
> >> > would be nice" to return it. Maybe he has an use case in mind? I don't
> >> > see one myself.
> >> No, i have no specific usecase for this. Silently remove all matching
> rows and return void is ok for me. But i still think LOG ereport is not
> useful.
> >
> >
> > I would much prefer it to be a return code rather than a forced LOG
> message. Log message spam is very much a thing, and things that are logged
> as LOG will always be there.
> >
>
> Is any such LOG message present in the latest patch?  I agree that the
> return code might be better, but there doesn't exist any such (LOG)
> thing.  I see that it can be helpful for some users if we have any
> such return code, but don't know if it doesn't already exist, why that
> should be a requirement for this patch?  Do you have any strong
> opinion about introducing return code with this patch?
>

I thought that returning the affected number of statements with the change
of adding new parameters to the reset function will be helpful to find out
how many statements are affected?

I can revert it back to void, if I am the only one interested with that
change.

Regards,
Haribabu Kommi
Fujitsu Australia


Re: csv format for psql

2018-11-11 Thread Pavel Stehule
po 12. 11. 2018 v 5:19 odesílatel David G. Johnston <
david.g.johns...@gmail.com> napsal:

> On Friday, November 9, 2018, Michael Paquier  wrote:
>
>> On Fri, Nov 09, 2018 at 05:28:07PM +0100, Daniel Verite wrote:
>> > But again COPY is concerned with importing the data that preexists,
>> > even if it's weird, whereas a psql output formats are not.
>>
>> Hm.  I checked the contents of the patch in details which provide output
>> consistent with COPY, but after looking at the global picture I am
>> getting cold feet on this patch for a couple of reasons:
>> - This stuff adds new code paths in the frontend mimicking what the
>> backend already does for years, both doing the same thing.
>
>
> From the original post:
>
> "copy with csv can't help for the output of meta-commands
> such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql
> does work with these."
>
> Formatting is usually a client-side concern so this feature fits well
> there. The fact that we’ve provided a server interface for the same doesn’t
> preclude its potential desirability in the client.
>
> - There are already three ways to fetch data in this format with COPY,
>> \copy and file_fdw, with all three using the same code paths for option
>> validations (I can see the arguments at the top of the thread for which
>> COPY SELECT can actually do everything you want with?).
>
>
> Not always conveniently.
>
>
>> - The experience is confusing, as the psql format uses different options
>> than the backend to do the same things:
>
>
> Yes, those who use psql need to learn its features.  I’d posit that since
> this syntax is being learned anyway that transferring said knowledge to a
> newly added csv format will not be confusing.  No more so that having to do
> something that is usually client-side (formatting) on the server in the
> first place.  That we don’t fully replicate the server functionality does’t
> bother be.  This is meant to be a simple and quick ability that handles 95%
> of use cases and defers to the more powerful server version for the
> outliers.
>

This patch is not simple - not due own complexity, but due current state of
psql and output format support. The psql is aged software and implement new
format well known is not simple.

The COPY statement has different purpose and because it has server side
implementation, it cannot to cover client side space.

I afraid so there hardy be designed some better - and it is unhappy so psql
has not native csv support yet.

Regards

Pavel



> Feature-wise I’m on board.  Given it’s already written I’d say it should
> go in unless there are code complexity and support concerns - which given
> the prevalence of other formats I have to believe there are not.
>
> David J.
>
>


Re: PostgreSQL vs SQL/XML Standards

2018-11-11 Thread Markus Winand

> On 2018-11-9, at 05:07 , Pavel Stehule  wrote:
> 
> 
> 
> čt 8. 11. 2018 v 15:18 odesílatel Markus Winand  > napsal:
> 
> > On 2018-11-6, at 15:23 , Pavel Stehule  > > wrote:
> > 
> > 
> > 
> > po 29. 10. 2018 v 11:45 odesílatel Pavel Stehule  > > napsal:
> > 
> > 
> > po 29. 10. 2018 v 10:11 odesílatel Pavel Stehule  > > napsal:
> > Hi
> > 
> > čt 25. 10. 2018 v 21:47 odesílatel Alvaro Herrera  > > napsal:
> > On 2018-Oct-25, Pavel Stehule wrote:
> > 
> > > I am thinking so I can fix some issues related to XMLTABLE. Please, send 
> > > me
> > > more examples and test cases.
> > 
> > Please see Markus Winand's patch that I referenced upthread.
> > 
> > here is a fix of some XMLTABLE mentioned issues.
> > 
> > this update allows cast boolean to numeric types from XPath expressions
> > 
> > Attached patch solves some cast issues mentioned by Chap. It solves issue 
> > reported by Markus. I didn't use Markus's code, but it was inspiration for 
> > me. I found native solution from libxml2.
> > 
> > Regards
> > 
> > Pavel
> 
> Better than my patch.
> 
> But I think the chunk in xml_xmlnodetoxmltype of my patch is still needed — 
> in one way or the other (see below).
> 
> # select * from xmltable('*' PASSING 'pre arg?>&deeppost' COLUMNS x XML PATH 
> 'node()');
> x
> -
>  prec1arg&ent1&deeppost
> (1 row)
> 
> Output is not the original XML.
> 
> I dug a little further and found another case that doesn’t looks right even 
> with my change to xml_xmlnodetoxmltype applied:
> 
> # select * from xmltable('*' PASSING 'pre arg?>&deeppost' COLUMNS x XML PATH '/');
>  x
> ---
>  pre&ent1&deeppost
> (1 row)
> 
> Oracle gives in both cases XML.
> 
> To fix that I included XML_DOCUMENT_NODE in the list of nodes that use 
> xmlNodeDump. Now I wonder if that logic should be reversed to use the 
> xmlXPathCastNodeToString branch in a few selected cases but default to the 
> branch xmlNodeDump for all other cases?
> 
> I guess those few cases might be XML_ATTRIBUTE_NODE and XML_TEXT_NODE. 
> Regression tests are happy with that approach but I don’t think that proves a 
> lot.
> 
> -markus
> 
> diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
> index 37d85f7..7c1f884 100644
> --- a/src/backend/utils/adt/xml.c
> +++ b/src/backend/utils/adt/xml.c
> @@ -3682,7 +3682,7 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext 
> *xmlerrcxt)
>  {
> xmltype*result;
> 
> -   if (cur->type == XML_ELEMENT_NODE)
> +   if (cur->type != XML_ATTRIBUTE_NODE && cur->type != XML_TEXT_NODE)
> {
> xmlBufferPtr buf;
> xmlNodePtr  cur_copy;
> 
> 
> I used your patch and append regress tests. I checked the result against 
> Oracle.
> 
> Regards
> 
> Pavel

Fine from my side.

-markus



Re: Perl 5.26 and windows build system

2018-11-11 Thread Noah Misch
On Wed, Oct 17, 2018 at 05:16:15PM -0400, Andrew Dunstan wrote:
> On 10/17/2018 04:38 AM, Victor Wagner wrote:
> >BTW, have anyone experienced some success using Strawberry perl instead
> >of Active perl both for building postgres and as PL/Perl engine?
> >
> >Active State seems to abandon support for 32-bit windows and strawberry
> >perl license allows redistribution.
> 
> I have not. OTOH I'm not sure I care that much about 32bit Windows either. I
> have heard of people having considerable difficulties with Strawberry Perl.

As of commit 7e0c574, Strawberry Perl worked for me.  I would choose it over
ActivePerl if I were starting a new Windows PostgreSQL distribution.



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-11 Thread Amit Kapila
On Mon, Nov 12, 2018 at 10:55 AM Haribabu Kommi
 wrote:
>
> On Fri, Nov 9, 2018 at 2:12 PM Amit Kapila  wrote:
>>
>> On Thu, Nov 8, 2018 at 10:56 PM Magnus Hagander  wrote:
>> >
>> > On Thu, Nov 8, 2018 at 3:53 PM Sergei Kornilov  wrote:
>> >>
>> >> Hi
>> >>
>> >> >>  Sure, but what are we going to achieve with that number? What
>> >> >>  information user is going to get by that? If it can help us to ensure
>> >> >>  that it has reset the expected number of statements, then I can see
>> >> >>  the clear usage, but without that, the return value doesn't seem to
>> >> >>  have any clear purpose. So, I don't see much value in breaking
>> >> >>  compatibility.
>> >> >>
>> >> >>  Does anyone else have an opinion on this matter?
>> >> >
>> >> > This was proposed by Sergei Kornilov in
>> >> > https://postgr.es/m/3368121530260...@web21g.yandex.ru saying that "it
>> >> > would be nice" to return it. Maybe he has an use case in mind? I don't
>> >> > see one myself.
>> >> No, i have no specific usecase for this. Silently remove all matching 
>> >> rows and return void is ok for me. But i still think LOG ereport is not 
>> >> useful.
>> >
>> >
>> > I would much prefer it to be a return code rather than a forced LOG 
>> > message. Log message spam is very much a thing, and things that are logged 
>> > as LOG will always be there.
>> >
>>
>> Is any such LOG message present in the latest patch?  I agree that the
>> return code might be better, but there doesn't exist any such (LOG)
>> thing.  I see that it can be helpful for some users if we have any
>> such return code, but don't know if it doesn't already exist, why that
>> should be a requirement for this patch?  Do you have any strong
>> opinion about introducing return code with this patch?
>
>
> I thought that returning the affected number of statements with the change
> of adding new parameters to the reset function will be helpful to find out
> how many statements are affected?
>

It is not clear how will user make use of that information.

> I can revert it back to void,
>

+1, as we don't see any good reason to break backward compatibility.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



RE: COPY FROM WHEN condition

2018-11-11 Thread myungkyu.lim
>> COPY table_name WHERE (some_condition)
>> 
>> Users should already be familiar with the idea that WHERE performs a filter.
>> 

> So, what about using FILTER here? We already use it for aggregates when
> filtering rows to process.

> That being said, I have no strong feelings either way. I'd be OK with
> both WHEN and WHERE.

I don't think it's an important point,

In gram.y,
where_clause:
WHERE a_expr
{ $$ = $2; }
| /*EMPTY*/ 
{ $$ = NULL; }
;
This is similar to the 'opt_when_clause' in this patch.

So, I think 'WHERE' is a better form.

BTW, 3rd patch worked very well in my tests.
However, some wrong code style still exists.

Node*whenClause= NULL;
cstate->whenClause=whenClause;

Best regards,
Myungkyu, Lim




doc fix for pg_stat_activity.backend_type

2018-11-11 Thread John Naylor
Hi all,

Commit fc70a4b0df3 added backend_type to pg_stat_activity, but the
documentation omitted "logical replication launcher". Patch attached.

-John Naylor
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index add71458e2..ddf607bb17 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -807,7 +807,8 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   background worker, background writer,
   client backend, checkpointer,
   startup, walreceiver,
-  walsender and walwriter.
+  walsender, walwriter
+  and logical replication launcher.