date:20200830

Re: More tests with USING INDEX replident and dropped indexes

2020-08-30 Thread Michael Paquier

On Thu, Aug 27, 2020 at 11:28:35AM +0900, Michael Paquier wrote:
> Attached is a patch for 1) and 2) grouped together, to ease review for
> now.  I think that we had better fix 1) separately though, so I am
> going to start a new thread about that with a separate patch as the
> current thread is misleading.

A fix for consistency problem with indisreplident and invalid indexes
has been committed as of 9511fb37.  Attached is a rebased patch, where
I noticed two incorrect things:
- The comment of REPLICA_IDENTITY_INDEX is originally incorrect.  If
the index is dropped, the replica index switches back to nothing.
- relreplident was getting reset one transaction too late, when the
old index is marked as dead.
--
Michael
From fa7585609a614bdba27bc2e3199379d04fc0eabd Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sun, 30 Aug 2020 16:02:32 +0900
Subject: [PATCH v5] Reset pg_class.relreplident when associated replica index
 is dropped

When an index associated to REPLICA IDENTITY USING INDEX gets dropped,
relreplident gets set to 'i', which can be confusing as the catalogs
have a state inconsistent with pg_index that lacks an associated entry
marked with indisreplident.  This changes the logic so as such an index
dropped leads to 'n' to be set for the parent table, equivalent to
REPLICA IDENTITY NOTHING.

Author: Michael Paquier
Reviewed-by: Masahiko Sawada, Rahila Syed, Euler Taveira
Discussion: https://postgr.es/m/20200522035028.go2...@paquier.xyz
---
 src/include/catalog/pg_class.h|  6 +-
 src/include/commands/tablecmds.h  |  2 +
 src/backend/catalog/index.c   | 37 +++-
 src/backend/commands/tablecmds.c  | 58 ---
 .../regress/expected/replica_identity.out | 30 ++
 src/test/regress/sql/replica_identity.sql | 22 +++
 doc/src/sgml/catalogs.sgml|  3 +-
 7 files changed, 128 insertions(+), 30 deletions(-)

diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 78b33b2a7f..4f9ce17651 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -175,11 +175,7 @@ typedef FormData_pg_class *Form_pg_class;
 #define		  REPLICA_IDENTITY_NOTHING	'n'
 /* all columns are logged as replica identity */
 #define		  REPLICA_IDENTITY_FULL		'f'
-/*
- * an explicitly chosen candidate key's columns are used as replica identity.
- * Note this will still be set if the index has been dropped; in that case it
- * has the same meaning as 'd'.
- */
+/* an explicitly-chosen candidate key's columns are used as replica identity */
 #define		  REPLICA_IDENTITY_INDEX	'i'
 
 /*
diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index c1581ad178..b163a82c52 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -61,6 +61,8 @@ extern void ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_
 
 extern void SetRelationHasSubclass(Oid relationId, bool relhassubclass);
 
+extern void SetRelationReplIdent(Oid relationId, char ri_type);
+
 extern ObjectAddress renameatt(RenameStmt *stmt);
 
 extern ObjectAddress RenameConstraint(RenameStmt *stmt);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 62e487bb4c..61d05495b4 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2005,6 +2005,7 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 	Relation	indexRelation;
 	HeapTuple	tuple;
 	bool		hasexprs;
+	bool		resetreplident;
 	LockRelId	heaprelid,
 indexrelid;
 	LOCKTAG		heaplocktag;
@@ -2048,6 +2049,15 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 	 */
 	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
 
+	/*
+	 * Check if the REPLICA IDENTITY of the parent relation needs to be
+	 * reset.  This should be done only if the index to drop here is
+	 * marked as used in a replica identity.  We cannot rely on the
+	 * contents of pg_index for the index as a concurrent drop would have
+	 * reset indisreplident already, so save the existing value first.
+	 */
+	resetreplident = userIndexRelation->rd_index->indisreplident;
+
 	/*
 	 * Drop Index Concurrently is more or less the reverse process of Create
 	 * Index Concurrently.
@@ -2132,7 +2142,29 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 		 */
 		LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
 		LockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+	}
 
+	/*
+	 * If the index to be dropped was marked as indisreplident (note that it
+	 * would have been reset when clearing indisvalid previously), reset
+	 * pg_class.relreplident for the parent table.  Note that this part
+	 * needs to be done in the same transaction as the one marking the
+	 * index as invalid, so as we never finish with the case where the
+	 * parent relation uses REPLICA_IDENTITY_INDEX, without any index marked
+	 * with indisrepl

Re: Improvements in Copy From

2020-08-30 Thread vignesh C

On Thu, Aug 27, 2020 at 11:02 AM Peter Smith  wrote:
>
> Hello.
>
> FYI - that patch has conflicts when applied.
>

Thanks for letting me know. Attached new patch which is rebased on top of head.

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com
From a343fe1f8fdf4293d2ef6841e243390b99f29e28 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 30 Aug 2020 12:31:12 +0530
Subject: [PATCH v2] Improvements in copy from.

There are couple of improvements for copy from in this patch which is detailed
below:
a) copy from stdin copies lesser amount of data to buffer even though space is
available in buffer because minread was passed as 1 to CopyGetData, fixed it by
passing the actual space available in buffer, this reduces the frequent call to
CopyGetData.
b) Copy from reads header line and does nothing for the read line, we need not
clear EOL & need not convert to server encoding for the header line.
---
 src/backend/commands/copy.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a..c688baa 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -801,14 +801,18 @@ CopyLoadRawBuf(CopyState cstate)
 {
 	int			nbytes = RAW_BUF_BYTES(cstate);
 	int			inbytes;
+	int			minread = 1;
 
 	/* Copy down the unprocessed data if any. */
 	if (nbytes > 0)
 		memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
 nbytes);
 
+	if (cstate->copy_dest == COPY_NEW_FE)
+		minread = RAW_BUF_SIZE - nbytes;
+
 	inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
-		  1, RAW_BUF_SIZE - nbytes);
+		  minread, RAW_BUF_SIZE - nbytes);
 	nbytes += inbytes;
 	cstate->raw_buf[nbytes] = '\0';
 	cstate->raw_buf_index = 0;
@@ -3917,7 +3921,7 @@ CopyReadLine(CopyState cstate)
 			} while (CopyLoadRawBuf(cstate));
 		}
 	}
-	else
+	else if (!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		/*
 		 * If we didn't hit EOF, then we must have transferred the EOL marker
@@ -3951,8 +3955,9 @@ CopyReadLine(CopyState cstate)
 		}
 	}
 
-	/* Done reading the line.  Convert it to server encoding. */
-	if (cstate->need_transcoding)
+	/* Done reading the line.  Convert it to server encoding if not header. */
+	if (cstate->need_transcoding &&
+		!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		char	   *cvt;
 
-- 
1.8.3.1

Boundary value check in lazy_tid_reaped()

2020-08-30 Thread Masahiko Sawada

Hi all,

In the current lazy vacuum implementation, some index AMs such as
btree indexes call lazy_tid_reaped() for each index tuples during
ambulkdelete to check if the index tuple points to the (collected)
garbage tuple. In that function, we simply call bsearch(3) but we
should be able to know the result without bsearch(3) if the index
tuple points to the heap tuple that is out of range of the collected
garbage tuples. I've checked whether bsearch(3) does the boundary
value check or not but looking at the bsearch(3) implementation of
FreeBSD libc[1], there is not. The same is true for glibc.

So my proposal is to add boundary value check in lazy_tid_reaped()
before executing bsearch(3). This will help when index vacuum happens
multiple times or when garbage tuples are concentrated to a narrow
range.

I’ve done three performance tests. maintenance_work_mem is set to 64MB
with which we can collect about 11 million tuples at maximum and the
table size is 10GB. The table has one btree index. Here is the average
of index vacuum (ambulkdelete) execution time of three trials:

* Test case 1

I made all pages dirty with 15 million garbage tuples to invoke index
vacuum twice.

HEAD: 164.7 s
Patched: 138.81 s

* Test case 2

I made all pages dirty with 10 million garbage tuples to invoke index
vacuum only once, checking the performance degradation.

HEAD: 127.8 s
Patched: 128.25 s

* Test case 3

I made a certain range of table dirty with 1 million garbage tuples.

HEAD: 31.07 s
Patched: 9.41 s

I thought that we can have a generic function wrapping bsearch(3) that
does boundary value checks and then does bsearch(3) so that we can use
it in other similar places as well. But the attached patch doesn't do
that as I'd like to hear opinions on the proposal first.

Feedback is very welcome.

Regards,

[1]
https://svnweb.freebsd.org/base/head/lib/libc/stdlib/bsearch.c?revision=326025&view=markup

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

vacuum_boundary_check.patch
Description: Binary data

Re: Yet another fast GiST build (typo)

2020-08-30 Thread Andrey M. Borodin



> 23 авг. 2020 г., в 14:39, Andrey M. Borodin  написал(а):
> 
> Thanks for reviewing and benchmarking, Pavel!

Pavel sent me few typos offlist. PFA v12 fixing these typos.
Thanks!

Best regards, Andrey Borodin.


v12-0001-Add-sort-support-for-point-gist_point_sortsuppor.patch
Description: Binary data


v12-0002-Implement-GiST-build-using-sort-support.patch
Description: Binary data

Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)

2020-08-30 Thread Stephen Frost

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On Thu, Aug 27, 2020 at 04:28:54PM -0400, Stephen Frost wrote:
> >* Robert Haas (robertmh...@gmail.com) wrote:
> >>On Thu, Aug 27, 2020 at 2:51 PM Stephen Frost  wrote:
> >>> > Hm? At least earlier versions didn't do prefetching for records with an 
> >>> > fpw, and only for subsequent records affecting the same or if not in 
> >>> > s_b anymore.
> >>>
> >>> We don't actually read the page when we're replaying an FPW though..?
> >>> If we don't read it, and we entirely write the page from the FPW, how is
> >>> pre-fetching helping..?
> >>
> >>Suppose there is a checkpoint. Then we replay a record with an FPW,
> >>pre-fetching nothing. Then the buffer gets evicted from
> >>shared_buffers, and maybe the OS cache too. Then, before the next
> >>checkpoint, we again replay a record for the same page. At this point,
> >>pre-fetching should be helpful.
> >
> >Sure- but if we're talking about 25GB of WAL, on a server that's got
> >32GB, then why would those pages end up getting evicted from memory
> >entirely?  Particularly, enough of them to end up with such a huge
> >difference in replay time..
> >
> >I do agree that if we've got more outstanding WAL between checkpoints
> >than the system's got memory then that certainly changes things, but
> >that wasn't what I understood the case to be here.
> 
> I don't think it's very clear how much WAL there actually was in each
> case - the message only said there was more than 25GB, but who knows how
> many checkpoints that covers? In the cases with FPW=on this may easily
> be much less than one checkpoint (because with scale 45GB an update to
> every page will log 45GB of full-page images). It'd be interesting to
> see some stats from pg_waldump etc.

Also in the message was this:

--
In order to avoid checkpoints during benchmark, max_wal_size(200GB) and
checkpoint_timeout(200 mins) are set to a high value.
--

Which lead me to suspect, at least, that this was much less than a
checkpoint, as you suggest.  Also, given that the comment was 'run is
cancelled when there is a reasonable amount of WAL (>25GB), seems likely
that it's at least *around* there.

Ultimately though, there just isn't enough information provided to
really be able to understand what's going on.  I agree, pg_waldump stats
would be useful.

> >>Admittedly, I don't quite understand whether that is what is happening
> >>in this test case, or why SDD vs. HDD should make any difference. But
> >>there doesn't seem to be any reason why it doesn't make sense in
> >>theory.
> >
> >I agree that this could be a reason, but it doesn't seem to quite fit in
> >this particular case given the amount of memory and WAL.  I'm suspecting
> >that it's something else and I'd very much like to know if it's a
> >general "this applies to all (most?  a lot of?) SSDs because the
> >hardware has a larger than 8KB page size and therefore the kernel has to
> >read it", or if it's something odd about this particular system and
> >doesn't apply generally.
> 
> Not sure. I doubt it has anything to do with the hardware page size,
> that's mostly transparent to the kernel anyway. But it might be that the
> prefetching on a particular SSD has more overhead than what it saves.

Right- I wouldn't have thought the hardware page size would matter
either, but it's entirely possible that assumption is wrong and that it
does matter for some reason- perhaps with just some SSDs, or maybe with
a lot of them, or maybe there's something else entirely going on.  About
all I feel like I can say at the moment is that I'm very interested in
ways to make WAL replay go faster and it'd be great to get more
information about what's going on here to see if there's something we
can do to generally improve WAL replay.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: [PATCH] Covering SPGiST index

2020-08-30 Thread Andrey M. Borodin




> 27 авг. 2020 г., в 21:03, Pavel Borisov  написал(а):
> 
> see v8

For me is the only concerning point is putting nullmask and varatt bits into 
tuple->nextOffset.
But, probably, we can go with this.

But let's change macro a bit. When I see
SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
I expect that leafTuple->nextOffset is function argument by value and will not 
be changed.
For example see ItemPointerSetOffsetNumber() - it's not exposing ip_posid.

Also, I'd propose instead of
>*(leafChainDatums + i * natts) and leafChainIsnulls + i * natts
using something like
>int some_index = i * natts;
>leafChainDatumsp[some_index] and &leafChainIsnulls[some_index]
But, probably, it's a matter of taste...

Also I'm not sure would it be helpful to use instead of
>isnull[0] and leafDatum[0]
more complex 
>#define SpgKeyIndex 0
>isnull[SpgKeyIndex] and leafDatum[SpgKeyIndex]
There is so many [0] in the patch...

Thanks!

Best regards, Andrey Borodin.

[PATCH v1] explicit_bzero.c: using explicit_memset on NetBSD

2020-08-30 Thread David CARLIER

Thanks.

Regards.


0001-explicit_bzero.c-uses-explicit_memset-for-NetBSD.patch
Description: Binary data

Re: Rare link canary failure in dblink test

2020-08-30 Thread Tom Lane

Thomas Munro  writes:
> A couple of recent cases where an error "libpq is incorrectly linked
> to backend functions" broke the dblink test:

lorikeet seems just plain unstable these days :-(.  Don't know why.

> ... curiously the message also appeared a
> couple of times on two Unixen.  Perhaps we can write those off as lost
> in the mists of time.

No, those were expected, because they ran between ed0cdf0e0, which
intentionally introduced this error, and e3d77ea6b and 4fa3741d1
which fixed it.

regards, tom lane

Re: Disk-based hash aggregate's cost model

2020-08-30 Thread Tomas Vondra


On Sun, Aug 30, 2020 at 02:26:20AM +0200, Tomas Vondra wrote:

On Fri, Aug 28, 2020 at 06:32:38PM -0700, Jeff Davis wrote:

On Thu, 2020-08-27 at 17:28 -0700, Peter Geoghegan wrote:

We have a Postgres 13 open item for Disk-based hash aggregate, which
is the only non-trivial open item. There is a general concern about
how often we get disk-based hash aggregation when work_mem is
particularly low, and recursion seems unavoidable. This is generally
thought to be a problem in the costing.


We discussed two approaches to tweaking the cost model:

1. Penalize HashAgg disk costs by a constant amount. It seems to be
chosen a little too often, and we can reduce the number of plan
changes.

2. Treat recursive HashAgg spilling skeptically, and heavily penalize
recursive spilling.

The problem with approach #2 is that we have a default hash mem of 4MB,
and real systems have a lot more than that. In this scenario, recursive
spilling can beat Sort by a lot.



I think the main issue is that we're mostly speculating what's wrong.
I've shared some measurements and symptoms, and we've discussed what
might be causing it, but I'm not really sure we know for sure.

I really dislike (1) because it seems more like "We don't know what's
wrong so we'll penalize hashagg," kind of solution. A much more
principled solution would be to tweak the costing accordingly, not just
by adding some constant. For (2) it really depends if recursive spilling
is really the problem here. In the examples I shared, the number of
partitions/batches was very different, but the query duration was
mostly independent (almost constant).



I've decided to look at the costing a bit more closely today, and see
why the costing is so much lower compared to sort/groupagg. I've used
the same 32GB dataset and query as in [1].

I've repeated tests for all the work_mem values, and I see the number of
batches are much lower, probably thanks to the HLL improvement:

  2MBPlanned:  64Batches (old):  4160Batches:  2977
  4MBPlanned: 128Batches (old): 16512Batches:  1569
  8MBPlanned: 256Batches (old): 21488Batches:  1289
 64MBPlanned:  32Batches (old):  2720Batches:   165
256MBPlanned:   8Batches (old): 8Batches:41

The impact on duration of the queries seems pretty negligible, though.


The plans with work_mem=64MB look like this:

1) hashagg

   QUERY PLAN

 Limit  (cost=11267293.86..11267293.86 rows=1 width=36) (actual 
time=213127.515..213127.517 rows=0 loops=1)
   ->  HashAggregate  (cost=10229061.10..11267293.86 rows=6718533 width=36) 
(actual time=100862.623..212948.642 rows=640 loops=1)
 Group Key: lineitem.l_partkey
 Planned Partitions: 32  Batches: 165  Memory Usage: 67857kB  Disk 
Usage: 6802432kB
 ->  Seq Scan on lineitem  (cost=0.00..5519288.36 rows=191990736 
width=9) (actual time=0.418..20344.631 rows=192000551 loops=1)
 Planning Time: 0.064 ms
 Execution Time: 213441.986 ms
(7 rows)

2) groupagg

  QUERY PLAN
--
 Limit  (cost=36755617.81..36755617.81 rows=1 width=36) (actual 
time=180029.594..180029.595 rows=0 loops=1)
   ->  GroupAggregate  (cost=35214909.30..36755617.81 rows=6718533 width=36) 
(actual time=94017.788..179857.683 rows=640 loops=1)
 Group Key: lineitem.l_partkey
 ->  Sort  (cost=35214909.30..35694886.14 rows=191990736 width=9) 
(actual time=94017.750..151138.727 rows=192000551 loops=1)
   Sort Key: lineitem.l_partkey
   Sort Method: external merge  Disk: 3742208kB
   ->  Seq Scan on lineitem  (cost=0.00..5519288.36 rows=191990736 
width=9) (actual time=0.008..26831.264 rows=192000551 loops=1)
 Planning Time: 0.063 ms
 Execution Time: 180209.580 ms
(9 rows)


I still don't understand why the hashagg is costed so low compared to
the sort (only about 1/3 of it), because both cases use exactly the same
estimates internally. cost_tuplesort ends up with

npages = 937455
nruns  = 114.435396
input_bytes = 7679629440
log_runs = 1.0

while cost_agg uses

pages_read = 937455
pages_written = 937455
relation_size = 7679629440;

yet we end up with much lower estimate for hashagg. It however does seem
to me this is mostly due to non-I/O costs, considered by cost_tuplesort
and perhaps ignored by cost_agg? In particular, most of the sort cost
comes from this

*startup_cost = comparison_cost * tuples * LOG2(tuples);

So I'm wondering if the hashagg is not ignoring similar non-I/O costs
for the spilling case. In particular, the initial section computing
startu

Re: Row estimates for empty tables

2020-08-30 Thread Tom Lane

Pavel Stehule  writes:
> I'll mark this patch as ready for commit

Pushed, thanks for looking.

regards, tom lane

Re: list of extended statistics on psql

2020-08-30 Thread Alvaro Herrera

On 2020-Aug-30, Tomas Vondra wrote:

> On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
> > On 2020-Aug-29, Tomas Vondra wrote:

> > > Also, it might be useful to show the size of the statistics built, just
> > > like we show for \d+ etc.
> > 
> > \dX+  I  suppose?
> 
> Right. I've only used \d+ as an example of an existing command showing
> sizes of the objects.

Yeah, I understood it that way too.

How can you measure the size of the stat objects in a query?  Are you
thinking in pg_column_size()?

I wonder how to report that.  Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used.  What do people think about this?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: list of extended statistics on psql

2020-08-30 Thread Tomas Vondra

On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:

On 2020-Aug-30, Tomas Vondra wrote:

On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
> On 2020-Aug-29, Tomas Vondra wrote:

> > Also, it might be useful to show the size of the statistics built, just
> > like we show for \d+ etc.
>
> \dX+  I  suppose?

Right. I've only used \d+ as an example of an existing command showing
sizes of the objects.

Yeah, I understood it that way too.

How can you measure the size of the stat objects in a query?  Are you
thinking in pg_column_size()?

Either that or simply length() on the bytea value.

I wonder how to report that.  Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used.  What do people think about this?

I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Deprecating postfix and factorial operators in PostgreSQL 13

2020-08-30 Thread Mark Dilger

> On Aug 28, 2020, at 8:56 AM, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> So, in this version, there are six copies of the deprecation notice
>> John wrote, rather than just one. Maybe we need more than one, but I
>> doubt we need six. I don't think the CREATE OPERATOR documentation
>> needs to mention this both when first introducing the concept and then
>> again when defining right_type; the former seems sufficient. I don't
>> think xoper.sgml needs these changes either; they interrupt the flow
>> of the narrative and I don't think this is where anyone would look for
>> a deprecation notice. I would also argue for dropping the mentions in
>> the docs for ALTER OPERATOR FAMILY and CREATE OPERATOR CLASS, although
>> those seem less clear-cut. Really, CREATE OPERATOR where John had it
>> originally seems like the right place.
> 
> Yeah, I agree that there are way too many copies here.  CREATE OPERATOR
> seems sufficient.  It also seems like we should just rewrite the typeconv
> and drop_operator examples to use some other operator.  We'll have
> to do that eventually anyway, so why not now, instead of visiting those
> places twice?

John's deprecation language now only appears in the create operator 
documentation.

The first typeconv example was based on the (int8 !) operator.  I chose to 
replace it with and example based on the (jsonb ? text) operator, as the two 
operators have a relevant similarity.  Both of them are singletons, with only 
one interpretation in the standard catalogs.  In both cases, the scanner cannot 
know the types of the undecorated arguments and assigns default types (integer 
and unknown, respectively), which get resolved later to match the only types 
accepted by the operator ('int8' for !, and 'jsonb,text' for ?).

The drop operator example has been left, though altered to include the 
adjective "deprecated".  Robert mentions that the entire example could simply 
be dropped now, and I agree with that, but it also makes sense to me to drop 
the example in 14, when the operation it describes is also dropped.  If the 
committer who picks this up wants to drop that example now, that's fine, too.

v3-0001-Adding-deprecation-notices.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: list of extended statistics on psql

2020-08-30 Thread Tom Lane

Tomas Vondra  writes:
> On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
>> I wonder how to report that.  Knowing that psql \-commands are not meant
>> for anything other than human consumption, maybe we can use a format()
>> string that says "built: %d bytes" when \dX+ is used (for each stat type),
>> and just "built" when \dX is used.  What do people think about this?

Seems a little too cute to me.

> I'd use the same approach as \d+, i.e. a separate column with the size.
> Maybe that'd mean too many columns, though.

psql already has \d commands with so many columns that you pretty much
have to use \x mode to make them legible; \df+ for instance.  I don't
mind if \dX+ is also in that territory.  It'd be good though if plain
\dX can fit in a normal terminal window.

regards, tom lane

Re: Row estimates for empty tables

2020-08-30 Thread Pavel Stehule

ne 30. 8. 2020 v 18:23 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > I'll mark this patch as ready for commit
>
> Pushed, thanks for looking.
>

Thank you

Pavel

>
> regards, tom lane
>

Re: Deprecating postfix and factorial operators in PostgreSQL 13

2020-08-30 Thread Tom Lane

Mark Dilger  writes:
> [ v3-0001-Adding-deprecation-notices.patch ]

Pushed with some fiddling.

We previously found that adding id tags to  constructs in the
function lists didn't work in PDF output [1].  Your patch did build
a PDF without warnings for me, which is odd --- apparently we changed
something since April?  But the links didn't lead to quite the right
place, so the conclusion that there's something broken with that still
holds.  I made it look like the existing usages for encode() and decode().

I did not like your choice of "jsonb ? text" for the typeconv example
at all, primarily because it introduces a host of distractions for anyone
who's not already quite familiar with both JSON and that operator.
After some digging around I found the |/ (square root) operator, which
likewise has just one instance, and it's something familiar enough that
most readers probably won't be slowed down by trying to figure out what
it does.  Also, this more nearly covers the territory of the original
example, namely auto-casting an integer constant to something else.
There are later examples covering unknown-type literals, so we don't
need to address that scenario in this one.

I also found another example that depends on the ! operator, in the
operator precedence discussion.  I concluded after study that the
particular case it's on about only arises for postfix operators,
so I just got rid of that example in favor of a generalized mention
that you should use parentheses to override operator precedence.

I concluded that there's not much point in touching drop_operator.sgml
at all yet.  I don't think labeling ! as deprecated as adding anything
to that example, and besides there's another example that will also need
to be changed.

regards, tom lane

[1] https://www.postgresql.org/message-id/32159.1586738304%40sss.pgh.pa.us

Re: Compatible defaults for LEAD/LAG

2020-08-30 Thread Tom Lane

Pavel Stehule  writes:
> This is nice example of usage of anycompatible type (that is consistent
> with other things in Postgres), but standard says something else.
> It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
> but Tom doesn't like this patch.
> I am more inclined to think so this feature should be implemented
> differently - there is no strong reason to go against standard or against
> the implementations of other databases (and increase the costs of porting).
> Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument.  The case at hand is what to do
if we have, say,

select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2".  The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work.  But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise.  It is not normally the case that
any information-losing cast would be applied silently within an
expression.

So this deviation from spec doesn't bother me; we have much bigger ones.

My concern with this patch is what I said upthread: I'm not sure that
we should be adjusting this behavior in a piecemeal fashion.  I looked
through pg_proc to find all the functions that have more than one any*
argument, and found these:

  oid  | prorettype 
---+
 lag(anyelement,integer,anyelement)| anyelement
 lead(anyelement,integer,anyelement)   | anyelement
 width_bucket(anyelement,anyarray) | integer
 btarraycmp(anyarray,anyarray) | integer
 array_eq(anyarray,anyarray)   | boolean
 array_ne(anyarray,anyarray)   | boolean
 array_lt(anyarray,anyarray)   | boolean
 array_gt(anyarray,anyarray)   | boolean
 array_le(anyarray,anyarray)   | boolean
 array_ge(anyarray,anyarray)   | boolean
 array_append(anyarray,anyelement) | anyarray
 array_prepend(anyelement,anyarray)| anyarray
 array_cat(anyarray,anyarray)  | anyarray
 array_larger(anyarray,anyarray)   | anyarray
 array_smaller(anyarray,anyarray)  | anyarray
 array_position(anyarray,anyelement)   | integer
 array_position(anyarray,anyelement,integer)   | integer
 array_positions(anyarray,anyelement)  | integer[]
 array_remove(anyarray,anyelement) | anyarray
 array_replace(anyarray,anyelement,anyelement) | anyarray
 arrayoverlap(anyarray,anyarray)   | boolean
 arraycontains(anyarray,anyarray)  | boolean
 arraycontained(anyarray,anyarray) | boolean
 elem_contained_by_range(anyelement,anyrange)  | boolean
 range_contains_elem(anyrange,anyelement)  | boolean
 range_eq(anyrange,anyrange)   | boolean
 range_ne(anyrange,anyrange)   | boolean
 range_overlaps(anyrange,anyrange) | boolean
 range_contains(anyrange,anyrange) | boolean
 range_contained_by(anyrange,anyrange) | boolean
 range_adjacent(anyrange,anyrange) | boolean
 range_before(anyrange,anyrange)   | boolean
 range_after(anyrange,anyrange)| boolean
 range_overleft(anyrange,anyrange) | boolean
 range_overright(anyrange,anyrange)| boolean
 range_union(anyrange,anyrange)| anyrange
 range_merge(anyrange,anyrange)| anyrange
 range_intersect(anyrange,anyrange)| anyrange
 range_minus(anyrange,anyrange)| anyrange
 range_cmp(anyrange,anyrange)  | integer
 range_lt(anyrange,anyrange)   | boolean
 range_le(anyrange,anyrange)   | boolean
 range_ge(anyrange,anyrange)   | boolean
 range_gt(anyrange,anyrange)   | boolean
 range_gist_same(anyrange,anyrange,internal)   | internal
(45 rows)

Now, there's no point in changing range_eq and the later entries in this
table (the ones taking two anyrange's); our rather lame definition of
anycompatiblerange means that we'd get no benefit from doing so.  But
I think there's a strong case for changing everything before range_eq.
It'd be nice if something like

SELECT array[1] < array[1.1];

would work the same way that "SELECT 1 < 1.1" would.

I checked the other concern that I had about whether the more flexible
polymorphic definitions would create any new ambiguities, and I don't
see any problems with this list.  As functions, none of these names are
overloaded, except with different numbers of arguments so there's no
ambiguity.  Most of the array functions are also operators, bu

Re: New default role- 'pg_read_all_data'

2020-08-30 Thread Stephen Frost

Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Magnus Hagander (mag...@hagander.net) wrote:
> > On Fri, Aug 28, 2020 at 2:38 PM Stephen Frost  wrote:
> > > * Magnus Hagander (mag...@hagander.net) wrote:
> > > > Without having actually looked at the code, definite +1 for this 
> > > > feature.
> > > > It's much requested...
> > >
> > > Thanks.
> > >
> > > > But, should we also have a pg_write_all_data to go along with it?
> > >
> > > Perhaps, but could certainly be a different patch, and it'd need to be
> > > better defined, it seems to me...  read_all is pretty straight-forward
> > > (the general goal being "make pg_dumpall/pg_dump work"), what would
> > > write mean?  INSERT?  DELETE?  TRUNCATE?  ALTER TABLE?  System catalogs?
> > 
> > Well, it's pg_write_all_*data*, so it certainly wouldn't be alter table or
> > system catalogs.
> > 
> > I'd say insert/update/delete yes.
> > 
> > TRUNCATE is always an outlier.Given it's generally classified as DDL, I
> > wouldn't include it.
> 
> Alright, that seems like it'd be pretty easy.  We already have a check
> in pg_class_aclmask to disallow modification of system catalogs w/o
> being a superuser, so we should be alright to add a similar check for
> insert/update/delete just below where I added the SELECT check.
> 
> > > Doesn't seem like you could just declare it to be 'allow pg_restore'
> > > either, as that might include creating untrusted functions, et al.
> > 
> > No definitely not. That wouldn't be the usecase at all :)
> 
> Good. :)
> 
> > (and fwiw to me the main use case for read_all_data also isn't pg_dump,
> > because most people using pg_dump are already db owner or higher in my
> > experience. But it is nice that it helps with that too)
> 
> Glad to have confirmation that there's other use-cases this helps with.
> 
> I'll post an updated patch with that added in a day or two.

Here's that updated patch, comments welcome.

Thanks!

Stephen
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 829decd883..1213125bfd 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -517,6 +517,20 @@ DROP ROLE doomed_role;
   
  
  
+  
+   pg_read_all_data
+   Read all data (tables, views, sequences), as if having SELECT
+   rights on those objects, and USAGE rights on all schemas, even without
+   having it explicitly.  This role also has BYPASSRLS
+   set for it.
+  
+  
+   pg_write_all_data
+   Write all data (tables, views, sequences), as if having INSERT,
+   UPDATE, and DELETE rights on those objects, and USAGE rights on all
+   schemas, even without having it explicitly.  This role also has
+   BYPASSRLS set for it.
+  
   
pg_read_all_settings
Read all configuration variables, even those normally visible only to
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index c626161408..3e6d060554 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -3849,6 +3849,26 @@ pg_class_aclmask(Oid table_oid, Oid roleid,
 
 	ReleaseSysCache(tuple);
 
+	/*
+	 * Check if ACL_SELECT is being checked and, if so, and not set already
+	 * as part of the result, then check if the user is a member of the
+	 * pg_read_all_data role, which allows read access to all relations.
+	 */
+	if (mask & ACL_SELECT && !(result & ACL_SELECT) &&
+		has_privs_of_role(roleid, DEFAULT_ROLE_READ_ALL_DATA))
+		result |= ACL_SELECT;
+
+	/*
+	 * Check if ACL_INSERT, ACL_UPDATE, or ACL_DELETE is being checked
+	 * and, if so, and not set already as part of the result, then check
+	 * if the user is a member of the pg_write_all_data role, which
+	 * allows INSERT/UPDATE/DELETE access to all relations.
+	 */
+	if (mask & (ACL_INSERT | ACL_UPDATE | ACL_DELETE) &&
+	   !(result & (ACL_INSERT | ACL_UPDATE | ACL_DELETE)) &&
+		has_privs_of_role(roleid, DEFAULT_ROLE_WRITE_ALL_DATA))
+		result |= (mask & (ACL_INSERT | ACL_UPDATE | ACL_DELETE));
+
 	return result;
 }
 
@@ -4175,6 +4195,16 @@ pg_namespace_aclmask(Oid nsp_oid, Oid roleid,
 
 	ReleaseSysCache(tuple);
 
+	/*
+	 * Check if ACL_USAGE is being checked and, if so, and not set already
+	 * as part of the result, then check if the user is a member of the
+	 * pg_read_all_data or pg_write_all_data roles, which allow usage
+	 * access to all schemas.
+	 */
+	if (mask & ACL_USAGE && !(result & ACL_USAGE) &&
+		(has_privs_of_role(roleid, DEFAULT_ROLE_READ_ALL_DATA) ||
+		has_privs_of_role(roleid, DEFAULT_ROLE_WRITE_ALL_DATA)))
+		result |= ACL_USAGE;
 	return result;
 }
 
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 7c08851550..3f72474146 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -25,6 +25,16 @@
   rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
   rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
   rolpassword => '_null_', rolvaliduntil =

Re: list of extended statistics on psql

2020-08-30 Thread Tatsuro Yamada


Hi Alvaro,


IMO the per-type columns should show both the type being enabled as
well as it being built.


Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?


pg_statistic_ext_data.stxdndistinct is not null if the stats have been
built. (I'm not sure whether there's an easier way to determine this.)



Ah.. I see! Thank you.



I suggest to do this

   Name| Schema | Definition   | Ndistinct | Dependencies | MCV
---++--+---+--+-
 stts_1| public | (a, b) FROM t1   | f | t| f


I suppose that the current column order is sufficient if there is
no improvement of extended stats on PG14. Do you know any plan to
improve extended stats such as to allow it to cross multiple tables on PG14?


I suggest that changing it in the future is going to be an uphill
battle, so better get it right from the get go, without requiring a
future restructure.



I understand your suggestions. I'll replace "Columns" and "Table" columns with 
"Definition" column.



Currently, I use this query to get Extended stats info from pg_statistic_ext.


Maybe something like this would do

SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
stxname AS "Name",
format('%s FROM %s',
 (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
  FROM pg_catalog.unnest(stxkeys) s(attnum)
  JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
  a.attnum = s.attnum AND NOT attisdropped)),
  stxrelid::regclass) AS "Definition",
  CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, 
not built' END AS "n-distinct",
  CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 
'enabled, not built' END AS "functional dependencies",
  CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 
'enabled, not built' END AS mcv
 FROM pg_catalog.pg_statistic_ext es
 INNER JOIN pg_catalog.pg_class c
 ON stxrelid = c.oid
 LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
 ORDER BY 1, 2, 3;


Great! It helped me a lot to understand your suggestions correctly. Thanks. :-D
I got the below results by your query.


create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

insert into t1 select i,i from generate_series(1,100) i;
analyze t1;


Your query gave this result:

 Schema |   Name| Definition | n-distinct | 
functional dependencies |mcv
+---+++-+
 public | stts_1| a, b FROM t1   || built   
|
 public | stts_2| a, b FROM t1   | built  | built   
|
 public | stts_3| a, b FROM t1   | built  | built   
| built
 public | stts_4| b, c FROM t2   | enabled, not built | 
enabled, not built  | enabled, not built
 public | stts_hoge | col1, col2, col3 FROM hoge | enabled, not built | 
enabled, not built  | enabled, not built
(5 rows)


I guess "enabled, not built" is a little redundant. The status would better to
have three patterns: "built", "not built" or nothing (NULL) like these:

  - "built":  extended stats is defined and built (collected by analyze cmd)
  - "not built": extended stats is defined but have not built yet
  - nothing (NULL): extended stats is not defined

What do you think about it?


I will send a new patch including :

  - Replace "Columns" and "Table" column with "Definition"
  - Show the status (built/not built/null) of extended stats by using
pg_statistic_ext_data

Thanks,
Tatsuro Yamada

Re: Get rid of runtime handling of AlternativeSubPlan?

2020-08-30 Thread Andy Fan

On Sun, Aug 30, 2020 at 7:26 AM Tom Lane  wrote:

> I wrote:
> > Back in bd3daddaf232d95b0c9ba6f99b0170a0147dd8af, which introduced
> > AlternativeSubPlans, I wrote:
> >   There is a lot more that could be done based on this infrastructure: in
> >   particular it's interesting to consider switching to the hash plan if
> we start
> >   out using the non-hashed plan but find a lot more upper rows going by
> than we
> >   expected.  I have therefore left some minor inefficiencies in place,
> such as
> >   initializing both subplans even though we will currently only use one.
> >
> > That commit will be twelve years old come August, and nobody has either
> > built anything else atop it or shown any interest in making the plan
> choice
> > switchable mid-run.  So it seems like kind of a failed experiment.
> >
> > Therefore, I'm considering the idea of ripping out all executor support
> > for AlternativeSubPlan and instead having the planner replace an
> > AlternativeSubPlan with the desired specific SubPlan somewhere late in
> > planning, possibly setrefs.c.
>
> Here's a proposed patchset for that.  This runs with the idea I'd had
> that setrefs.c could be smarter than the executor about which plan node
> subexpressions will be executed how many times.  I did not take it very
> far, for fear of adding an undue number of planning cycles, but it's still
> better than what we have now.
>
> For ease of review, 0001 adds the new planner logic, while 0002 removes
> the now-dead executor support.
>
> There's one bit of dead code that I left in place for the moment, which is
> ruleutils.c's support for printing AlternativeSubPlans.  I'm not sure if
> that's worth keeping or not --- it's dead code for normal use, but if
> someone tried to use ruleutils.c to print partially-planned expression
> trees, maybe there'd be a use for it?
>
> (It's also arguable that readfuncs.c's support is now dead code, but
> I have little interest in stripping that out.)
>
> regards, tom lane
>
>
Thank you for this code!  I still have some confusion about when a SubPlan
should be executed when a join is involved.  I care about this because this
has an impact on when we can get the num_exec for a subplan.

The subplan in a target list,  it is executed after the join in my case.
The subplan
can be execute after the scan of T1(see below example) and it can also be
executed
after the join. Which one is better depends on which methods make the
num_exec
smaller.  Is it something we already considered? I drill-down to
populate_joinrel_with_paths and not find this logic.

# explain (costs off) select (select a from t2 where t2.b = t1.b) from t1,
t3;
  QUERY PLAN
--
 Nested Loop
   ->  Seq Scan on t1
   ->  Materialize
 ->  Seq Scan on t3
   SubPlan 1
 ->  Seq Scan on t2
   Filter: (b = t1.b)
(7 rows)


When the subplan is in a Qual, it is supposed to be executed as soon as
possible,
The current implementation matches the below cases.  So can we say we
knows the num_execs of SubPlan just after we plan the dependent rels?
(In Q1 below the dependent rel is t1 vs t3,  in Q2 it is t1 only) If we can
choose
a subplan and recost the related path during(not after) creating the best
path,  will
we get better results for some cases (due to the current cost method for
AlternativeSubPlan[1])?

-- the subplan depends on the result of t1 join t3
# explain (costs off) select t1.* from t1, t3 where
   t1.a > (select max(a) from t2 where t2.b = t1.b and t2.c  = t3.c);
 QUERY PLAN
-
 Nested Loop
   Join Filter: (t1.a > (SubPlan 1))
   ->  Seq Scan on t1
   ->  Materialize
 ->  Seq Scan on t3
   SubPlan 1
 ->  Aggregate
   ->  Seq Scan on t2
 Filter: ((b = t1.b) AND (c = t3.c))
(9 rows)

-- the subplan only depends on t1.
# explain (costs off) select t1.* from t1, t3 where
t1.a > (select max(a) from t2 where t2.b = t1.b);
   QUERY PLAN

 Nested Loop
   ->  Seq Scan on t3
   ->  Materialize
 ->  Seq Scan on t1
   Filter: (a > (SubPlan 1))
   SubPlan 1
 ->  Aggregate
   ->  Seq Scan on t2
 Filter: (b = t1.b)
(9 rows)


At last,  I want to use the commonly used table
in src/test/regress/sql/create_table.sql
when providing an example, but I always have issues running the
create_table.sql which
makes me uncomfortable to use that. Am I missing something?

CREATE TABLE fail_part PARTITION OF range_parted3 FOR VALUES FROM (1,
minvalue) TO (1, maxvalue);
psql:src/test/regress/sql/create_table.sql:611: ERROR:  partition
"fail_part" would overlap partition "part10"

CREATE TABLE fail_part PARTITION OF hash_parted2 FOR VALUES WITH (MODULUS
2, REMAINDER 1);
psql:src/test/regress/sql/create_table.sql:622: ERROR:  partition
"fail_part" woul

Re: Terminate the idle sessions

2020-08-30 Thread Thomas Munro

On Tue, Aug 18, 2020 at 2:13 PM Li Japin  wrote:
> On Aug 18, 2020, at 9:19 AM, Kyotaro Horiguchi  
> wrote:
> The same already happens for idle_in_transaction_session_timeout and
> we can use "ALTER ROLE/DATABASE SET" to dislable or loosen them, it's
> a bit cumbersome, though. I don't think we should (at least
> implicitly) disable those timeouts ad-hockerly for postgres_fdw.
>
> +1.

This seems like a reasonable feature to me.

The delivery of the error message explaining what happened is probably
not reliable, so to some clients and on some operating systems this
will be indistinguishable from a dropped network connection or other
error, but that's OK and we already have that problem with the
existing timeout-based disconnection feature.

The main problem I have with it is the high frequency setitimer()
calls.  If you enable both statement_timeout and idle_session_timeout,
then we get up to huge number of system calls, like the following
strace -c output for a few seconds of one backend under pgbench -S
workload shows:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 39.450.118685   0250523   setitimer
 29.980.090200   0125275   sendto
 24.300.073107   0126235   973 recvfrom
  6.010.018068   0 20950   pread64
  0.260.000779   0   973   epoll_wait
-- --- --- - - 
100.000.300839523956   973 total

There's a small but measurable performance drop from this, as also
discussed in another thread about another kind of timeout[1].  Maybe
we should try to fix that with something like the attached?

[1] 
https://www.postgresql.org/message-id/flat/77def86b27e41f0efcba411460e929ae%40postgrespro.ru

0001-Optimize-setitimer-usage.patchx
Description: Binary data

Re: list of extended statistics on psql

2020-08-30 Thread Tatsuro Yamada


On 2020/08/31 1:59, Tom Lane wrote:

Tomas Vondra  writes:

On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:

I wonder how to report that.  Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used.  What do people think about this?


Seems a little too cute to me.


I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.


psql already has \d commands with so many columns that you pretty much
have to use \x mode to make them legible; \df+ for instance.  I don't
mind if \dX+ is also in that territory.  It'd be good though if plain
\dX can fit in a normal terminal window.



Hmm. How about these instead of "built: %d bytes"?
I added three columns (N_size, D_size, M_size) to show size. See below:

===
 postgres=# \dX
   List of extended statistics
 Schema |   Name| Definition | N_distinct | Dependencies |  
  Mcv
+---+++--+---
 public | stts_1| a, b FROM t1   || built|
 public | stts_2| a, b FROM t1   | built  | built|
 public | stts_3| a, b FROM t1   | built  | built| 
built
 public | stts_4| b, c FROM t2   | not built  | not built| 
not built
 public | stts_hoge | col1, col2, col3 FROM hoge | not built  | not built| 
not built
(5 rows)

postgres=# \dX+
List of extended statistics
 Schema |   Name| Definition | N_distinct | Dependencies |  
  Mcv| N_size | D_size | M_size
+---+++--+---+++
 public | stts_1| a, b FROM t1   || built|  
 || 40 |
 public | stts_2| a, b FROM t1   | built  | built|  
 | 13 | 40 |
 public | stts_3| a, b FROM t1   | built  | built| 
built | 13 | 40 |   6126
 public | stts_4| b, c FROM t2   | not built  | not built| 
not built |||
 public | stts_hoge | col1, col2, col3 FROM hoge | not built  | not built| 
not built |||
===

I used this query to get results of "\dX+".
===
SELECT
 stxnamespace::pg_catalog.regnamespace AS "Schema",
 stxname AS "Name",
 format('%s FROM %s',
   (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a
ON (stxrelid = a.attrelid
AND a.attnum = s.attnum
AND NOT attisdropped)),
 stxrelid::regclass) AS "Definition",
 CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built'
  WHEN 'd' = any(stxkind) THEN 'not built'
 END AS "N_distinct",
 CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built'
  WHEN 'f' = any(stxkind) THEN 'not built'
 END AS "Dependencies",
 CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built'
  WHEN 'm' = any(stxkind) THEN 'not built'
 END AS "Mcv",
   pg_catalog.length(stxdndistinct) AS "N_size",
   pg_catalog.length(stxddependencies) AS "D_size",
   pg_catalog.length(stxdmcv) AS "M_size"
   FROM pg_catalog.pg_statistic_ext es
   INNER JOIN pg_catalog.pg_class c
   ON stxrelid = c.oid
   LEFT JOIN pg_catalog.pg_statistic_ext_data esd
   ON es.oid = esd.stxoid
   ORDER BY 1, 2;
===
 


Attached patch includes:

   - Replace "Columns" and "Table" column with "Definition"
   - Show the status (built/not built/null) of extended stats by
 using pg_statistic_ext_data
   - Add "\dX+" command to show size of extended stats

Please find the attached file! :-D


Thanks,
Tatsuro Yamada

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index fc16e6c..5664c22 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1893,6 +1893,20 @@ testdb=>
 
 
   
+  
+  
+\dX [ pattern ]
+
+
+Lists extended statistics.
+If pattern
+is specified, only those extended statistics whose names match the 
pattern
+are listed.
+If + is appended to the command name, each extended 
statistics
+is listed with its size.
+
+
+  
 
   
 \dy[+] [ pattern ]
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9902a4a..077a585 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@

Re: Get rid of runtime handling of AlternativeSubPlan?

2020-08-30 Thread David Rowley

On Sun, 30 Aug 2020 at 11:26, Tom Lane  wrote:
>
> I wrote:
> > Therefore, I'm considering the idea of ripping out all executor support
> > for AlternativeSubPlan and instead having the planner replace an
> > AlternativeSubPlan with the desired specific SubPlan somewhere late in
> > planning, possibly setrefs.c.
>
> Here's a proposed patchset for that.

Do you feel that the choice to create_plan() on the subplan before
planning the outer query is still a good one?  ISTM that that was
required when the AlternativeSubplan decision was made during
execution, since we, of course, need a plan to execute. If the
decision is now being made in the planner then is it not better to
delay the create_plan() until later in planning?

>From looking at the code it seems that Paths won't really do here as
we're dealing with two separate PlannerInfos rather than two paths
belonging to the same PlannerInfo, but maybe it's better to invent
something else that's similar to a list of paths and just do
create_plan() for the subquery once.

David

Re: Terminate the idle sessions

2020-08-30 Thread Li Japin



On Aug 31, 2020, at 8:51 AM, Thomas Munro 
mailto:thomas.mu...@gmail.com>> wrote:

The main problem I have with it is the high frequency setitimer()
calls.  If you enable both statement_timeout and idle_session_timeout,
then we get up to huge number of system calls, like the following
strace -c output for a few seconds of one backend under pgbench -S
workload shows:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
39.450.118685   0250523   setitimer
29.980.090200   0125275   sendto
24.300.073107   0126235   973 recvfrom
 6.010.018068   0 20950   pread64
 0.260.000779   0   973   epoll_wait
-- --- --- - - 
100.000.300839523956   973 total

Hi, Thomas,

Could you give the more details about the test instructions?

Re: [PATCH v1] explicit_bzero.c: using explicit_memset on NetBSD

2020-08-30 Thread Michael Paquier

On Sun, Aug 30, 2020 at 02:03:32PM +0100, David CARLIER wrote:
> Thanks.

During the addition of explicit_bzero(), there was an agreement to use
memset_s(), as it is blessed by the standard:
https://www.postgresql.org/message-id/20190717211931.GA906@alvherre.pgsql

So what would be the advantage of explicit_memset() knowing that
NetBSD has memset_s()?  This also means that your patch is a no-op for
NetBSD as HAVE_MEMSET_S would be set.
--
Michael


signature.asc
Description: PGP signature

Re: Terminate the idle sessions

2020-08-30 Thread Thomas Munro

On Mon, Aug 31, 2020 at 2:40 PM Li Japin  wrote:
> Could you give the more details about the test instructions?

Hi Japin,

Sure.  Because I wasn't trying to get reliable TPS number or anything,
I just used a simple short read-only test with one connection, like
this:

pgbench -i -s10 postgres
pgbench -T60 -Mprepared -S postgres

Then I looked for the active backend and ran strace -c -p XXX for a
few seconds and hit ^C to get the counters.  I doubt the times are
very accurate, but the number of calls is informative.

If you do that on a server running with -c statement_timeout=10s, you
see one setitimer() per transaction.  If you also use -c
idle_session_timeout=10s at the same time, you see two.

Re: Terminate the idle sessions

2020-08-30 Thread Kyotaro Horiguchi

At Mon, 31 Aug 2020 12:51:20 +1200, Thomas Munro  wrote 
in 
> On Tue, Aug 18, 2020 at 2:13 PM Li Japin  wrote:
> > On Aug 18, 2020, at 9:19 AM, Kyotaro Horiguchi  
> > wrote:
> > The same already happens for idle_in_transaction_session_timeout and
> > we can use "ALTER ROLE/DATABASE SET" to dislable or loosen them, it's
> > a bit cumbersome, though. I don't think we should (at least
> > implicitly) disable those timeouts ad-hockerly for postgres_fdw.
> >
> > +1.
> 
> This seems like a reasonable feature to me.
> 
> The delivery of the error message explaining what happened is probably
> not reliable, so to some clients and on some operating systems this
> will be indistinguishable from a dropped network connection or other
> error, but that's OK and we already have that problem with the
> existing timeout-based disconnection feature.
> 
> The main problem I have with it is the high frequency setitimer()
> calls.  If you enable both statement_timeout and idle_session_timeout,
> then we get up to huge number of system calls, like the following
> strace -c output for a few seconds of one backend under pgbench -S
> workload shows:
> 
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  39.450.118685   0250523   setitimer
>  29.980.090200   0125275   sendto
>  24.300.073107   0126235   973 recvfrom
>   6.010.018068   0 20950   pread64
>   0.260.000779   0   973   epoll_wait
> -- --- --- - - 
> 100.000.300839523956   973 total
> 
> There's a small but measurable performance drop from this, as also
> discussed in another thread about another kind of timeout[1].  Maybe
> we should try to fix that with something like the attached?
> 
> [1] 
> https://www.postgresql.org/message-id/flat/77def86b27e41f0efcba411460e929ae%40postgrespro.ru

I think it's worth doing. Maybe we can get rid of doing anything other
than removing an entry in the case where we disable a non-nearest
timeout.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

v13: show extended stats target in \d

2020-08-30 Thread Justin Pryzby

The stats target can be set since commit d06215d03, but wasn't shown by psql.
 ALTER STATISISTICS .. SET STATISTICS n.

Normal (1-D) stats targets are shown in \d+ table.
Stats objects are shown in \d (no plus).
Arguably, this should be shown only in "verbose" mode (\d+).
>From 60a4033a04fbaafbb01c2bb2d73bb2fbe4d1da75 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 30 Aug 2020 23:38:17 -0500
Subject: [PATCH 1/1] Show stats target of extended statistics

This can be set since d06215d03, but wasn't shown by psql.
Normal (1-D) stats targets are shown in \d+ table.
---
 src/bin/psql/describe.c |  8 +++-
 src/test/regress/expected/stats_ext.out | 18 ++
 src/test/regress/sql/stats_ext.sql  |  2 ++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index d81f1575bf..73befa36ee 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2683,10 +2683,12 @@ describeOneTableDetails(const char *schemaname,
 			  "a.attnum = s.attnum AND NOT attisdropped)) AS columns,\n"
 			  "  'd' = any(stxkind) AS ndist_enabled,\n"
 			  "  'f' = any(stxkind) AS deps_enabled,\n"
-			  "  'm' = any(stxkind) AS mcv_enabled\n"
+			  "  'm' = any(stxkind) AS mcv_enabled,\n"
+			  "  %s"
 			  "FROM pg_catalog.pg_statistic_ext stat "
 			  "WHERE stxrelid = '%s'\n"
 			  "ORDER BY 1;",
+			  (pset.sversion >= 13 ? "stxstattarget\n" : "-1\n"),
 			  oid);
 
 			result = PSQLexec(buf.data);
@@ -2732,6 +2734,10 @@ describeOneTableDetails(const char *schemaname,
 	  PQgetvalue(result, i, 4),
 	  PQgetvalue(result, i, 1));
 
+	if (strcmp(PQgetvalue(result, i, 8), "-1") != 0)
+		appendPQExpBuffer(&buf, " STATISTICS %s",
+	  PQgetvalue(result, i, 8));
+
 	printTableAddFooter(&cont, buf.data);
 }
 			}
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 0ae779a3b9..4dea48a2e8 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -102,6 +102,15 @@ WARNING:  statistics object "public.ab1_a_b_stats" could not be computed for rel
 ALTER TABLE ab1 ALTER a SET STATISTICS -1;
 -- setting statistics target 0 skips the statistics, without printing any message, so check catalog
 ALTER STATISTICS ab1_a_b_stats SET STATISTICS 0;
+\d+ ab1
+Table "public.ab1"
+ Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
++-+---+--+-+-+--+-
+ a  | integer |   |  | | plain   |  | 
+ b  | integer |   |  | | plain   |  | 
+Statistics objects:
+"public"."ab1_a_b_stats" (ndistinct, dependencies, mcv) ON a, b FROM ab1 STATISTICS 0
+
 ANALYZE ab1;
 SELECT stxname, stxdndistinct, stxddependencies, stxdmcv
   FROM pg_statistic_ext s, pg_statistic_ext_data d
@@ -113,6 +122,15 @@ SELECT stxname, stxdndistinct, stxddependencies, stxdmcv
 (1 row)
 
 ALTER STATISTICS ab1_a_b_stats SET STATISTICS -1;
+\d+ ab1
+Table "public.ab1"
+ Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
++-+---+--+-+-+--+-
+ a  | integer |   |  | | plain   |  | 
+ b  | integer |   |  | | plain   |  | 
+Statistics objects:
+"public"."ab1_a_b_stats" (ndistinct, dependencies, mcv) ON a, b FROM ab1
+
 -- partial analyze doesn't build stats either
 ANALYZE ab1 (a);
 WARNING:  statistics object "public.ab1_a_b_stats" could not be computed for relation "public.ab1"
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 2834a902a7..d1d49948b4 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -72,12 +72,14 @@ ANALYZE ab1;
 ALTER TABLE ab1 ALTER a SET STATISTICS -1;
 -- setting statistics target 0 skips the statistics, without printing any message, so check catalog
 ALTER STATISTICS ab1_a_b_stats SET STATISTICS 0;
+\d+ ab1
 ANALYZE ab1;
 SELECT stxname, stxdndistinct, stxddependencies, stxdmcv
   FROM pg_statistic_ext s, pg_statistic_ext_data d
  WHERE s.stxname = 'ab1_a_b_stats'
AND d.stxoid = s.oid;
 ALTER STATISTICS ab1_a_b_stats SET STATISTICS -1;
+\d+ ab1
 -- partial analyze doesn't build stats either
 ANALYZE ab1 (a);
 ANALYZE ab1;
-- 
2.17.0

Re: Compatible defaults for LEAD/LAG

2020-08-30 Thread Pavel Stehule

ne 30. 8. 2020 v 23:59 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > This is nice example of usage of anycompatible type (that is consistent
> > with other things in Postgres), but standard says something else.
> > It can be easily solved with https://commitfest.postgresql.org/28/2081/
> -
> > but Tom doesn't like this patch.
> > I am more inclined to think so this feature should be implemented
> > differently - there is no strong reason to go against standard or against
> > the implementations of other databases (and increase the costs of
> porting).
> > Now the implementation is limited, but allowed behaviour is 100% ANSI.
>
> I don't particularly buy this argument.  The case at hand is what to do
> if we have, say,
>
> select lag(integer_column, 1, 1.2) over ...
>
> The proposed patch would result in the output being of type numeric,
> and any rows using the default would show "1.2".  The spec says that
> the right thing is to return integer, and we should round the default
> to "1" to make that work.  But
>
> (1) I doubt that anybody actually writes such things;
>
> (2) For anyone who does write it, the spec's behavior fails to meet
> the principle of least surprise.  It is not normally the case that
> any information-losing cast would be applied silently within an
> expression.
>

postgres=# create table foo(a int);
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1

postgres=# create table foo(a int default 1.1);
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)

It is maybe strange, but it is not an unusual pattern in SQL. I think it
can be analogy with default clause

DECLARE a int DEFAULT 1.2;

The default value doesn't change a type of variable. This is maybe too
stupid example. There can be other little bit more realistic

CREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...
DECLARE x int DEFAULT a;
BEGIN
  ...

I am afraid about performance - if default value can change type, then some
other things can stop work - like using index.

For *this* case we don't speak about some operations between two
independent operands or function arguments. We are speaking about
the value and about a *default* for the value.


> So this deviation from spec doesn't bother me; we have much bigger ones.
>

ok,  if it is acceptable for other people, I can accept it too - I
understand well so it is a corner case and there is some consistency with
other Postgres features.

Maybe this difference should be mentioned in documentation.


> My concern with this patch is what I said upthread: I'm not sure that
> we should be adjusting this behavior in a piecemeal fashion.  I looked
> through pg_proc to find all the functions that have more than one any*
> argument, and found these:
>
>   oid  | prorettype
> ---+
>  lag(anyelement,integer,anyelement)| anyelement
>  lead(anyelement,integer,anyelement)   | anyelement
>  width_bucket(anyelement,anyarray) | integer
>  btarraycmp(anyarray,anyarray) | integer
>  array_eq(anyarray,anyarray)   | boolean
>  array_ne(anyarray,anyarray)   | boolean
>  array_lt(anyarray,anyarray)   | boolean
>  array_gt(anyarray,anyarray)   | boolean
>  array_le(anyarray,anyarray)   | boolean
>  array_ge(anyarray,anyarray)   | boolean
>  array_append(anyarray,anyelement) | anyarray
>  array_prepend(anyelement,anyarray)| anyarray
>  array_cat(anyarray,anyarray)  | anyarray
>  array_larger(anyarray,anyarray)   | anyarray
>  array_smaller(anyarray,anyarray)  | anyarray
>  array_position(anyarray,anyelement)   | integer
>  array_position(anyarray,anyelement,integer)   | integer
>  array_positions(anyarray,anyelement)  | integer[]
>  array_remove(anyarray,anyelement) | anyarray
>  array_replace(anyarray,anyelement,anyelement) | anyarray
>  arrayoverlap(anyarray,anyarray)   | boolean
>  arraycontains(anyarray,anyarray)  | boolean
>  arraycontained(anyarray,anyarray) | boolean
>  elem_contained_by_range(anyelement,anyrange)  | boolean
>  range_contains_elem(anyrange,anyelement)  | boolean
>  range_eq(anyrange,anyrange)   | boolean
>  range_ne(anyrange,anyrange)   | boolean
>  range_overlaps(anyrange,anyrange) | boolean
>  range_contains(anyrange,anyrange) | boolean
>  range_contained_by(anyrange,anyrange) | boolean
>  range_adjacent(anyrange,anyrange) | boolean
>  range_before(anyrange,anyrange)   | boolean
>  range_after(anyrange,anyrange)| boolean
>  range_overleft(anyrange,anyrange) | boolean
>  range_overr

Re: Re: [HACKERS] Custom compression methods

2020-08-30 Thread Amit Khandekar

On Thu, 13 Aug 2020 at 17:18, Dilip Kumar  wrote:
> I have rebased the patch on the latest head and currently, broken into 3 
> parts.
>
> v1-0001: As suggested by Robert, it provides the syntax support for
> setting the compression method for a column while creating a table and
> adding columns.  However, we don't support changing the compression
> method for the existing column.  As part of this patch, there is only
> one built-in compression method that can be set (pglz).  In this, we
> have one in-build am (pglz) and the compressed attributes will directly
> store the oid of the AM.  In this patch, I have removed the
> pg_attr_compresion as we don't support changing the compression
> for the existing column so we don't need to preserve the old
> compressions.
> v1-0002: Add another built-in compression method (zlib)
> v1:0003: Remaining patch set (nothing is changed except rebase on the
> current head, stabilizing check-world  and  0001 and 0002 are pulled
> out of this)
>
> Next, I will be working on separating out the remaining patches as per
> the suggestion by Robert.

Thanks for this new feature. Looks promising and very useful, with so
many good compression libraries already available.

I see that with the patch-set, I would be able to create an extension
that defines a PostgreSQL C handler function which assigns all the
required hook function implementations for compressing, decompressing
and validating, etc. In short, I would be able to use a completely
different compression algorithm to compress toast data if I write such
an extension. Correct me if I am wrong with my interpretation.

Just a quick superficial set of review comments 

A minor re-base is required due to a conflict in a regression test

-

In heap_toast_insert_or_update() and in other places, the comments for
new parameter preserved_am_info are missing.

-

+toast_compress_datum(Datum value, Oid acoid)
 {
struct varlena *tmp = NULL;
int32 valsize;
-   CompressionAmOptions cmoptions;
+   CompressionAmOptions *cmoptions = NULL;

I think tmp and cmoptions need not be initialized to NULL

-

- TOAST_COMPRESS_SET_RAWSIZE(tmp, valsize);
- SET_VARSIZE_COMPRESSED(tmp, len + TOAST_COMPRESS_HDRSZ);
  /* successful compression */
+ toast_set_compressed_datum_info(tmp, amoid, valsize);
  return PointerGetDatum(tmp);

Any particular reason why is this code put in a new extern function ?
Is there a plan to re-use it ? Otherwise, it's not necessary to do
this.



Also, not sure why "HTAB *amoptions_cache" and "MemoryContext
amoptions_cache_mcxt" aren't static declarations. They are being used
only in toast_internals.c

---

The tab-completion doesn't show COMPRESSION :
postgres=# create access method my_method TYPE
INDEX  TABLE
postgres=# create access method my_method TYPE

Also, the below syntax also would better be tab-completed  so as to
display all the installed compression methods, in line with how we
show all the storage methods like plain,extended,etc:
postgres=# ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION



I could see the differences in compression ratio, and the compression
and decompression speed when I use lz versus zib :

CREATE TABLE zlibtab(t TEXT COMPRESSION zlib WITH (level '4'));
create table lztab(t text);
ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION pglz;

pgg:s2:pg$ time psql -c "\copy zlibtab from text.data"
COPY 13050

real0m1.344s
user0m0.031s
sys 0m0.026s

pgg:s2:pg$ time psql -c "\copy lztab from text.data"
COPY 13050

real0m2.088s
user0m0.008s
sys 0m0.050s


pgg:s2:pg$ time psql -c "select pg_table_size('zlibtab'::regclass),
pg_table_size('lztab'::regclass)"
 pg_table_size | pg_table_size
---+---
   1261568 |   1687552

pgg:s2:pg$ time psql -c "select NULL from zlibtab where t like ''"
 > /dev/null

real0m0.127s
user0m0.000s
sys 0m0.002s

pgg:s2:pg$ time psql -c "select NULL from lztab where t like ''"
> /dev/null

real0m0.050s
user0m0.002s
sys 0m0.000s


-- 
Thanks,
-Amit Khandekar
Huawei Technologies

Re: list of extended statistics on psql

2020-08-30 Thread Justin Pryzby

On Thu, Aug 27, 2020 at 07:53:23PM -0400, Alvaro Herrera wrote:
> +1 for the general idea, and +1 for \dX being the syntax to use
> 
> IMO the per-type columns should show both the type being enabled as
> well as it being built.
> 
> (How many more stat types do we expect -- Tomas?  I wonder if having one
> column per type is going to scale in the long run.)
> 
> Also, the stat obj name column should be first, followed by a single
> column listing both table and columns that it applies to.  Keep in mind
> that in the future we might want to add stats that cross multiple tables
> -- that's why the CREATE syntax is the way it is.  So we should give
> room for that in psql's display too.

There's also a plan for CREATE STATISTICS to support expresion statistics, with
the statistics functionality of an expression index, but without the cost of
index-update on UPDATE/DELETE.  That's Tomas' patch here:
https://commitfest.postgresql.org/29/2421/

I think that would compute ndistinct and MCV, same as indexes, but not
dependencies.  To me, I think it's better if there's a single column showing
the "kinds" of statistics to be generated (stxkind), rather than a column for
each.

I'm not sure why the length of the stats lists cast as text is useful to show?
We don't have a slash-dee command to show the number of MCV or histogram in
traditional, 1-D stats in pg_statistic, right ?  I think anybody wanting that
would learn to SELECT FROM pg_statistic*.  Also, the length of the text output
isn't very meaningful ?  If this is json, maybe you'd do something like this:
|SELECT a.stxdndistinct , COUNT(b) FROM pg_statistic_ext_data a , 
json_each(stxdndistinct::Json) AS b GROUP BY 1

I guess stxdmcv isn't json, but it seems especially meaningless to show
length() of its ::text, since we don't even "deserialize" the object to begin
with.

BTW, I've just started a new thread about displaying in psql \d the stats
target of target extended stats.

-- 
Justin

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-30 Thread Amit Kapila

On Sun, Aug 30, 2020 at 2:43 PM Dilip Kumar  wrote:
>
> On Sat, Aug 29, 2020 at 5:18 PM Amit Kapila  wrote:
> >
>
> > One more comment for which I haven't done anything yet.
> > +static void
> > +set_schema_sent_in_streamed_txn(RelationSyncEntry *entry, TransactionId 
> > xid)
> > +{
> > + MemoryContext oldctx;
> > +
> > + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > +
> > + entry->streamed_txns = lappend_int(entry->streamed_txns, xid);
>
> > Is it a good idea to append xid with lappend_int? Won't we need
> > something equivalent for uint32? If so, I think we have a couple of
> > options (a) use lcons method and accordingly append the pointer to
> > xid, I think we need to allocate memory for xid if we want to use this
> > idea or (b) use an array instead. What do you think?
>
> BTW, OID is internally mapped to uint32,  but using lappend_oid might
> not look good.  So maybe we can provide an option for lappend_uint32?
> Using an array is also not a bad idea.  Providing lappend_uint32
> option looks more appealing to me.
>

I thought about this again and I feel it might be okay to use it for
our case as after storing it in T_IntList, we primarily fetch it for
comparison with TrasnactionId (uint32), so this shouldn't create any
problem. I feel we can just discuss this in a separate thread and
check the opinion of others, what do you think?

Another comment:

+cleanup_rel_sync_cache(TransactionId xid, bool is_commit)
+{
+ HASH_SEQ_STATUS hash_seq;
+ RelationSyncEntry *entry;
+
+ Assert(RelationSyncCache != NULL);
+
+ hash_seq_init(&hash_seq, RelationSyncCache);
+ while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ {
+ if (is_commit)
+ entry->schema_sent = true;

How is it correct to set 'entry->schema_sent' for all the entries in
RelationSyncCache? Consider a case where due to invalidation in an
unrelated transaction we have set the flag schema_sent for a
particular relation 'r1' as 'false' and that transaction is executed
before the current streamed transaction for which we are performing
commit and called this function. It will set the flag for unrelated
entry in this case 'r1' which doesn't seem correct to me. Or, if this
is correct, it would be a good idea to write some comments about it.

-- 
With Regards,
Amit Kapila.

Re: Implementing Incremental View Maintenance

2020-08-30 Thread Yugo NAGATA

Hi,

I updated the wiki page.
https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

On Fri, 21 Aug 2020 21:40:50 +0900 (JST)
Tatsuo Ishii  wrote:

> From: Yugo NAGATA 
> Subject: Re: Implementing Incremental View Maintenance
> Date: Fri, 21 Aug 2020 17:23:20 +0900
> Message-ID: <20200821172320.a2506577d5244b6066f69...@sraoss.co.jp>
> 
> > On Wed, 19 Aug 2020 10:02:42 +0900 (JST)
> > Tatsuo Ishii  wrote:
> > 
> >> I have looked into this.
> > 
> > Thank you for your reviewing!
> >  
> >> - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch:
> >>   This one needs a comment to describe what the function does etc.
> >> 
> >>   +void
> >>   +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
> >>   +{
> > 
> > I added a comment for this function and related places.
> > 
> > +/*
> > + * SetTransitionTablePreserved
> > + *
> > + * Prolong lifespan of transition tables corresponding specified relid and
> > + * command type to the end of the outmost query instead of each nested 
> > query.
> > + * This enables to use nested AFTER trigger's transition tables from outer
> > + * query's triggers.  Currently, only immediate incremental view 
> > maintenance
> > + * uses this.
> > + */
> > +void
> > +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
> > 
> > Also, I removed releted unnecessary code which was left accidentally.
> > 
> >  
> >> - 0007-Add-aggregates-support-in-IVM.patch
> >>   "Check if the given aggregate function is supporting" shouldn't be
> >>   "Check if the given aggregate function is supporting IVM"?
> > 
> > Yes, you are right. I fixed this, too.
> > 
> >> 
> >> + * check_aggregate_supports_ivm
> >> + *
> >> + * Check if the given aggregate function is supporting
> 
> Thanks for the fixes. I have changed the commit fest status to "Ready
> for Committer".
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
> 


-- 
Yugo NAGATA

Re: Terminate the idle sessions

2020-08-30 Thread Li Japin



> On Aug 31, 2020, at 11:43 AM, Thomas Munro  wrote:
> 
> On Mon, Aug 31, 2020 at 2:40 PM Li Japin  wrote:
>> Could you give the more details about the test instructions?
> 
> Hi Japin,
> 
> Sure.  Because I wasn't trying to get reliable TPS number or anything,
> I just used a simple short read-only test with one connection, like
> this:
> 
> pgbench -i -s10 postgres
> pgbench -T60 -Mprepared -S postgres
> 
> Then I looked for the active backend and ran strace -c -p XXX for a
> few seconds and hit ^C to get the counters.  I doubt the times are
> very accurate, but the number of calls is informative.
> 
> If you do that on a server running with -c statement_timeout=10s, you
> see one setitimer() per transaction.  If you also use -c
> idle_session_timeout=10s at the same time, you see two.

Hi, Thomas,

Thanks for your point out this problem, here is the comparison.

Without Optimize settimer usage:
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 41.221.444851   1   1317033   setitimer
 28.410.995936   2658622   sendto
 24.630.863316   1659116   599 recvfrom
  5.710.200275   2111055   pread64
  0.030.001152   2   599   epoll_wait
  0.000.00   0 1   epoll_ctl
-- --- --- - - 
100.003.505530   2746426   599 total

With Optimize settimer usage:
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 49.891.464332   1   1091429   sendto
 40.831.198389   1   1091539   219 recvfrom
  9.260.271890   1183321   pread64
  0.020.000482   2   214   epoll_wait
  0.000.13   3 5   setitimer
  0.000.10   2 5   rt_sigreturn
  0.000.00   0 1   epoll_ctl
-- --- --- - - 
100.002.935116   2366514   219 total

Here’s a modified version of Thomas’s patch.



v3-0001-Allow-terminating-the-idle-sessions.patch
Description: v3-0001-Allow-terminating-the-idle-sessions.patch


v3-0002-Optimize-setitimer-usage.patch
Description: v3-0002-Optimize-setitimer-usage.patch

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-30 Thread Thomas Munro

On Sat, Aug 29, 2020 at 3:33 AM Robert Haas  wrote:
> I think David's points elsewhere on the thread about ProjectSet and
> Materialize nodes are interesting.

Indeed, I'm now finding it very difficult to look past the similarity with:

postgres=# explain select count(*) from t t1 cross join t t2;
 QUERY PLAN

 Aggregate  (cost=1975482.56..1975482.57 rows=1 width=8)
   ->  Nested Loop  (cost=0.00..1646293.50 rows=131675625 width=0)
 ->  Seq Scan on t t1  (cost=0.00..159.75 rows=11475 width=0)
 ->  Materialize  (cost=0.00..217.12 rows=11475 width=0)
   ->  Seq Scan on t t2  (cost=0.00..159.75 rows=11475 width=0)
(5 rows)

I wonder what it would take to overcome the overheads of the separate
Result Cache node, with techniques to step out of the way or something
like that.

> [tricky philosophical questions about ancient and maybe in some cases 
> arbitrary choices]

Ack.

Re: More tests with USING INDEX replident and dropped indexes

2020-08-30 Thread Michael Paquier

On Mon, Aug 31, 2020 at 10:36:13AM +0530, Rahila wrote:
> Now, the relreplident is being set in the transaction previous to
> the one marking index as invalid for concurrent drop. Won't
> it cause issues with relreplident cleared but index not dropped,
> if drop index concurrently fails in the small window after
> commit transaction in above snippet and before the start transaction above?

Argh.  I missed your point that this stuff uses heap_inplace_update(),
so the update of indisvalid flag is not transactional.  The thing is
that we won't be able to update the flag consistently with
relreplident except if we switch index_set_state_flags() to use a
transactional operation here.  So, with the current state of affairs,
it looks like we could just call SetRelationReplIdent() in the
last transaction, after the transaction marking the index as dead has
committed (see the top of index_set_state_flags() saying that this
should be the last step of a transaction), but that leaves a window
open as you say.

On the other hand, it looks appealing to make index_set_state_flags()
transactional.  This would solve the consistency problem, and looking
at the code scanning pg_index, I don't see a reason why we could not
do that.  What do you think?

> To the existing partitioned table test, can we add a test to demonstrate
> that relreplident is set to 'n' for even the individual partitions.

Makes sense.  It is still important to test the case where a
partitioned table without leaves is correctly reset IMO.

> Typo, s/updates are visible/updates visible

Thanks.

> - For DROP INDEX CONCURRENTLY, pg_class.relreplident of the parent
> table is set in the last transaction doing the drop.  It would be
> tempting to reset the flag in the same transaction as the one marking
> the index as invalid, but there could be a point in reindexing the
> invalid index whose drop has failed, and this adds morecomplexity to
> the patch.

That's a point I studied, but it makes actually more sense to just
reset the flag once the index is marked as invalid, similarly to
indisclustered.  Reindexing an invalid index can have value in some
cases, but we would have much more problems with the relcache if we
finish with two indexes marked as indreplident :(
--
Michael

signature.asc
Description: PGP signature

Re: please update ps display for recovery checkpoint

2020-08-30 Thread Michael Paquier

On Thu, Aug 20, 2020 at 05:09:05PM +0900, Michael Paquier wrote:
> That could be helpful.  Wouldn't it be better to use "end-of-recovery
> checkpoint" instead?  That's the common wording in the code comments.
> 
> I don't see the point of patch 0002.  In the same paragraph, we
> already know that this applies to any checkpoints.

Thinking more about this..  Could it be better to just add some calls
to set_ps_display() directly in CreateCheckPoint()?  This way, both
the checkpointer as well as the startup process at the end of recovery
would benefit from the change.
--
Michael


signature.asc
Description: PGP signature

37 matches

Mail list logo