Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-08-31 Thread Noah Misch
On Wed, Jan 17, 2018 at 12:36:31AM -0800, Noah Misch wrote:
> On Tue, Jan 16, 2018 at 08:50:24AM -0800, Andres Freund wrote:
> > On 2018-01-16 16:12:11 +0900, Michael Paquier wrote:
> > > On Fri, Feb 03, 2017 at 12:26:50AM +, Noah Misch wrote:
> > > > Since this emits double syncs with older xlc, I recommend instead 
> > > > replacing
> > > > the whole thing with inline asm.  As I opined in the last message of the
> > > > thread you linked above, the intrinsics provide little value as 
> > > > abstractions
> > > > if one checks the generated code to deduce how to use them.  Now that 
> > > > the
> > > > generated code is xlc-version-dependent, the port is better off with
> > > > compiler-independent asm like we have for ppc in s_lock.h.
> > > 
> > > Could it be cleaner to just use __xlc_ver__ to avoid double syncs on
> > > past versions? I think that it would make the code more understandable
> > > than just listing directly the instructions.
> > 
> > Given the quality of the intrinsics on AIX, see past commits and the
> > comment in the code quoted above, I think we're much better of doing
> > this via inline asm.
> 
> For me, verifiability is the crucial benefit of inline asm.  Anyone with an
> architecture manual can thoroughly review an inline asm implementation.  Given
> intrinsics and __xlc_ver__ conditionals, the same level of review requires
> access to every xlc version.

> > > As Heikki is not around these days, Noah, could you provide a new
> > > version of the patch? This bug has been around for some time now, it
> > > would be nice to move on.. 
> 
> Not soon.

Done.  fetch-add-variable-test-v1.patch just adds tests for non-constant
addends and 16-bit edge cases.  Today's implementation handles those,
PostgreSQL doesn't use them, and I might easily have broken them.
fetch-add-xlc-asm-v1.patch moves xlc builds from the __fetch_and_add()
intrinsic to inline asm.  fetch-add-gcc-xlc-unify-v1.patch moves fetch_add to
inline asm for all other ppc compilers.  gcc-7.2.0 generates equivalent code
before and after.  I plan to keep the third patch HEAD-only, back-patching the
other two.  I tested with xlc v12 and v13.
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 7f03b7e..b00b76f 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -687,7 +687,7 @@ test_atomic_uint32(void)
if (pg_atomic_read_u32(&var) != 3)
elog(ERROR, "atomic_read_u32() #2 wrong");
 
-   if (pg_atomic_fetch_add_u32(&var, 1) != 3)
+   if (pg_atomic_fetch_add_u32(&var, pg_atomic_read_u32(&var) - 2) != 3)
elog(ERROR, "atomic_fetch_add_u32() #1 wrong");
 
if (pg_atomic_fetch_sub_u32(&var, 1) != 4)
@@ -712,6 +712,20 @@ test_atomic_uint32(void)
if (pg_atomic_fetch_add_u32(&var, INT_MAX) != INT_MAX)
elog(ERROR, "pg_atomic_add_fetch_u32() #3 wrong");
 
+   pg_atomic_fetch_add_u32(&var, 2);   /* wrap to 0 */
+
+   if (pg_atomic_fetch_add_u32(&var, PG_INT16_MAX) != 0)
+   elog(ERROR, "pg_atomic_fetch_add_u32() #3 wrong");
+
+   if (pg_atomic_fetch_add_u32(&var, PG_INT16_MAX + 1) != PG_INT16_MAX)
+   elog(ERROR, "pg_atomic_fetch_add_u32() #4 wrong");
+
+   if (pg_atomic_fetch_add_u32(&var, PG_INT16_MIN) != 2 * PG_INT16_MAX + 1)
+   elog(ERROR, "pg_atomic_fetch_add_u32() #5 wrong");
+
+   if (pg_atomic_fetch_add_u32(&var, PG_INT16_MIN - 1) != PG_INT16_MAX)
+   elog(ERROR, "pg_atomic_fetch_add_u32() #6 wrong");
+
pg_atomic_fetch_add_u32(&var, 1);   /* top up to UINT_MAX */
 
if (pg_atomic_read_u32(&var) != UINT_MAX)
@@ -787,7 +801,7 @@ test_atomic_uint64(void)
if (pg_atomic_read_u64(&var) != 3)
elog(ERROR, "atomic_read_u64() #2 wrong");
 
-   if (pg_atomic_fetch_add_u64(&var, 1) != 3)
+   if (pg_atomic_fetch_add_u64(&var, pg_atomic_read_u64(&var) - 2) != 3)
elog(ERROR, "atomic_fetch_add_u64() #1 wrong");
 
if (pg_atomic_fetch_sub_u64(&var, 1) != 4)
diff --git a/src/include/port/atomics/generic-xlc.h 
b/src/include/port/atomics/generic-xlc.h
index 5ddccff..8b5c732 100644
--- a/src/include/port/atomics/generic-xlc.h
+++ b/src/include/port/atomics/generic-xlc.h
@@ -73,11 +73,27 @@ pg_atomic_compare_exchange_u32_impl(volatile 
pg_atomic_uint32 *ptr,
 static inline uint32
 pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
 {
+   uint32 _t;
+   uint32 res;
+
/*
-* __fetch_and_add() emits a leading "sync" and trailing "isync", 
thereby
-* providing sequential consistency.  This is undocumented.
+* xlc has a no-longer-documented __fetch_and_add() intrinsic.  In xlc
+* 12.01.., it emits a leading "sync" and trailing "isync".  In
+* xlc 13.01.0003.0004, it emits neither.  Hence, using the intrinsic
+* would add redundant syncs on xlc 12.
 */
-   return __fetch_and_add((volati

Re: [HACKERS] Help required to debug pg_repack breaking logical replication

2019-08-31 Thread Oleksii Kliukin
Hello,

Petr Jelinek  wrote:

> On 08/10/17 15:21, Craig Ringer wrote:
>> On 8 October 2017 at 02:37, Daniele Varrazzo  
>> wrote:
>>> Hello,
>>> 
>>> we have been reported, and I have experienced a couple of times,
>>> pg_repack breaking logical replication.
>>> 
>>> - https://github.com/reorg/pg_repack/issues/135
>>> - https://github.com/2ndQuadrant/pglogical/issues/113
>> 
>> Yeah, I was going to say I've seen reports of this with pglogical, but
>> I see you've linked to them.
>> 
>> I haven't had a chance to look into it though, and haven't had a
>> suitable reproducible test case.
>> 
>>> In the above issue #113, Petr Jelinek commented:
>>> 
 From quick look at pg_repack, the way it does table rewrite is almost 
 guaranteed
 to break logical decoding unless there is zero unconsumed changes for a 
 given table
 as it does not build the necessary mappings info for logical decoding that 
 standard
 heap rewrite in postgres does.
>>> 
>>> unfortunately he didn't follow up to further details requests.
>> 
>> At a guess he's referring to src/backend/access/heap/rewriteheap.c .
>> 
>> I'd explain better if I understood what was going on myself, but I
>> haven't really understood the logical decoding parts of that code.
>> 
>>> - Is Petr diagnosis right and freezing of logical replication is to be
>>> blamed to missing mapping?
>>> - Can you suggest a test to reproduce the issue reliably?
>>> - What are mapped relations anyway?
>> 
>> I can't immediately give you the answers you seek, but start by
>> studying src/backend/access/heap/rewriteheap.c . Notably
>> logical_end_heap_rewrite, logical_rewrite_heap_tuple,
>> logical_begin_heap_rewrite.
> 
> Yes that's exactly it. When table is rewritten we need to create mapping
> for every tuple that was created or removed (ie, insert, update or
> delete operation happened on it) since the oldest replication slot xmin
> for logical decoding to continue to work on that table after the
> rewrite. And pg_repack doesn't create that mapping.

Excuse me for posting to almost 2-years old thread. I haven’t found any more
recent discussions. I use pg_repack quite extensively and while I am not
running it together with logical replication yet, there is a pressing need
to do so. I came across this discussion and after reading it I feel the
answers to the question of whether it is safe to use pg_repack and logical
decoding together given there are lacking necessary details to make a
potential test case to reproduce the problem.

Looking at the source code inside rewriteheap.c, the functions mentioned
above are no-op if state->rs_logical_rewrite is set to false, and the flag
is only set to true for system catalogs (that repack doesn’t cause rewriting
of anyway) and user catalog tables (that are uncommon). Does it imply the
scope of the breakage is limited to only those two types of tables, or am I
missing some other part of the code Petr had in mind in the original comment
that stated that pg_repack is almost guaranteed to break logical decoding?

Cheers,
Oleksii



Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-08-31 Thread Peter Geoghegan
On Thu, Aug 29, 2019 at 10:10 PM Peter Geoghegan  wrote:
> I see some Valgrind errors on v9, all of which look like the following
> two sample errors I go into below.

I've found a fix for these Valgrind issues. It's a matter of making
sure that _bt_truncate() sizes new pivot tuples properly, which is
quite subtle:

--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -2155,8 +2155,11 @@ _bt_truncate(Relation rel, IndexTuple lastleft,
IndexTuple firstright,
 {
 BTreeTupleClearBtIsPosting(pivot);
 BTreeTupleSetNAtts(pivot, keepnatts);
-pivot->t_info &= ~INDEX_SIZE_MASK;
-pivot->t_info |= BTreeTupleGetPostingOffset(firstright);
+if (keepnatts == natts)
+{
+pivot->t_info &= ~INDEX_SIZE_MASK;
+pivot->t_info |=
MAXALIGN(BTreeTupleGetPostingOffset(firstright));
+}
 }

I'm varying how the new pivot tuple is sized here according to whether
or not index_truncate_tuple() just does a CopyIndexTuple(). This very
slightly changes the behavior of the nbtsplitloc.c stuff, but that's
not a concern for me.

I will post a patch with this and other tweaks next week.

--
Peter Geoghegan




Re: range bug in resolve_generic_type?

2019-08-31 Thread Paul A Jungwirth
On Tue, Aug 27, 2019 at 8:52 AM Paul A Jungwirth
 wrote:
>
> On Tue, Aug 27, 2019 at 8:23 AM Tom Lane  wrote:
> > I seem to recall that we discussed this exact point during development
> > of the range feature, and concluded that this was the behavior we
> > wanted, ie, treat anyrange like a scalar for this purpose.  Otherwise
> > there isn't any way to use a polymorphic function to build an array
> > of ranges

One more thing about this: The docs only say that elemOf(anyrange) =
anyelement and elemOf(anyarray) = anyelement. I already added
anymultirange to that page, so here is what I have (from my
forthcoming patch). It also includes anyrangearray. I thought it might
be useful here to clarify how we could support polymorphic
arrays-of-ranges going forward, while avoiding any contradictions:


 Seven pseudo-types of special interest are anyelement,
 anyarray, anynonarray, anyenum,
 anyrange, anymultirange, and
 anyrangearray.
 which are collectively called polymorphic types.
 Any function declared using these types is said to be
 a polymorphic function.  A polymorphic function can
 operate on many different data types, with the specific data type(s)
 being determined by the data types actually passed to it in a particular
 call.



 Polymorphic arguments and results are tied to each other and are resolved
 to a specific data type when a query calling a polymorphic function is
 parsed.  Each position (either argument or return value) declared as
 anyelement is allowed to have any specific actual
 data type, but in any given call they must all be the
 same actual type. Each
 position declared as anyarray can have any array data type,
 but similarly they must all be the same type.  And similarly,
 positions declared as anyrange must all be the same range
 type.  Likewise for anymultirange and
anyrangearray.



 Furthermore, if there are
 positions declared anyarray and others declared
 anyelement, the actual array type in the
 anyarray positions must be an array whose elements are
 the same type appearing in the anyelement positions.
 anynonarray is treated exactly the same as
anyelement,
 but adds the additional constraint that the actual type must not be
 an array type.
 anyenum is treated exactly the same as
anyelement,
 but adds the additional constraint that the actual type must
 be an enum type.



 Similarly, if there are positions declared anyrange
 and others declared anyelement, the actual range type in
 the anyrange positions must be a range whose subtype is
 the same type appearing in the anyelement positions.
 The types anymultirange and anyrangearray accept
 a multirange or array of any range type, respectively.
 If they appear along with an anyrange then they must be a
 multirange/array of that range type.
 If a function has both anymultirange and
 anyrangearray, they must contain ranges of the same type,
 even if there is no anyrange parameter.



 Thus, when more than one argument position is declared with a polymorphic
 type, the net effect is that only certain combinations of actual argument
 types are allowed.  For example, a function declared as
 equal(anyelement, anyelement) will take any
two input values,
 so long as they are of the same data type.


I hope that helps!

Yours,
Paul




Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-08-31 Thread Tom Lane
Noah Misch  writes:
> Done.  fetch-add-variable-test-v1.patch just adds tests for non-constant
> addends and 16-bit edge cases.  Today's implementation handles those,
> PostgreSQL doesn't use them, and I might easily have broken them.
> fetch-add-xlc-asm-v1.patch moves xlc builds from the __fetch_and_add()
> intrinsic to inline asm.  fetch-add-gcc-xlc-unify-v1.patch moves fetch_add to
> inline asm for all other ppc compilers.  gcc-7.2.0 generates equivalent code
> before and after.  I plan to keep the third patch HEAD-only, back-patching the
> other two.  I tested with xlc v12 and v13.

Hm, no objection to the first two patches, but I don't understand
why the third patch goes to so much effort just to use "addi" rather
than (one assumes) "li" then "add"?  It doesn't seem likely that
that's buying much.

regards, tom lane




Re: block-level incremental backup

2019-08-31 Thread Ibrar Ahmed
On Sat, Aug 31, 2019 at 7:59 AM Robert Haas  wrote:

> On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe
>  wrote:
> > Due to the inherent nature of pg_basebackup, the incremental backup also
> > allows taking backup in tar and compressed format. But, pg_combinebackup
> > does not understand how to restore this. I think we should either make
> > pg_combinebackup support restoration of tar incremental backup or
> restrict
> > taking the incremental backup in tar format until pg_combinebackup
> > supports the restoration by making option '--lsn' and '-Ft' exclusive.
> >
> > It is arguable that one can take the incremental backup in tar format,
> extract
> > that manually and then give the resultant directory as input to the
> > pg_combinebackup, but I think that kills the purpose of having
> > pg_combinebackup utility.
>
> I don't agree. You're right that you would have to untar (and
> uncompress) the backup to run pg_combinebackup, but you would also
> have to do that to restore a non-incremental backup, so it doesn't
> seem much different.  It's true that for an incremental backup you
> might need to untar and uncompress multiple prior backups rather than
> just one, but that's just the nature of an incremental backup.  And,
> on a practical level, if you want compression, which is pretty likely
> if you're thinking about incremental backups, the way to get that is
> to use tar format with -z or -Z.
>
> It might be interesting to teach pg_combinebackup to be able to read
> tar-format backups, but I think that there are several variants of the
> tar format, and I suspect it would need to read them all.  If someone
> un-tars and re-tars a backup with a different tar tool, we don't want
> it to become unreadable.  So we'd either have to write our own
> de-tarring library or add an external dependency on one.


Are we using any tar library in pg_basebackup.c? We already have the
capability
in pg_basebackup to do that.



> I don't
> think it's worth doing that at this point; I definitely don't think it
> needs to be part of the first patch.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

-- 
Ibrar Ahmed


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-08-31 Thread a . kondratov

Hi hackers,

On 2018-12-27 04:57, Michael Paquier wrote:

On Wed, Dec 26, 2018 at 03:19:06PM -0300, Alvaro Herrera wrote:

As for REINDEX, I think it's valuable to move tablespace together with
the reindexing.  You can already do it with the CREATE INDEX
CONCURRENTLY recipe we recommend, of course; but REINDEX CONCURRENTLY 
is

not going to provide that, and it seems worth doing.


Even for plain REINDEX that seems useful.



I've rebased the patch and put it on the closest commitfest. It is 
updated to allow user to do REINDEX CONCURRENTLY + SET TABLESPACE 
altogether, since plain REINDEX CONCURRENTLY became available this year.


On 2019-06-07 21:27, Alexander Korotkov wrote:
On Fri, Dec 28, 2018 at 11:32 AM Masahiko Sawada 
 wrote:

On Thu, Dec 27, 2018 at 10:24 PM Alvaro Herrera
 wrote:
>
> On 2018-Dec-27, Alexey Kondratov wrote:
>
> > To summarize:
> >
> > 1) Alvaro and Michael agreed, that REINDEX with tablespace move may be
> > useful. This is done in the patch attached to my initial email. Adding
> > REINDEX to ALTER TABLE as new action seems quite questionable for me and not
> > completely semantically correct. ALTER already looks bulky.
>
> Agreed on these points.

As an alternative idea, I think we can have a new ALTER INDEX variants
that rebuilds the index while moving tablespace, something like ALTER
INDEX ... REBUILD SET TABLESPACE 


+1

It seems the easiest way to have feature-full commands. If we put
functionality of CLUSTER and VACUUM FULL to ALTER TABLE, and put
functionality of REINDEX to ALTER INDEX, then CLUSTER, VACUUM FULL and
REINDEX would be just syntax sugar.



I definitely bought into the idea of 'change a data type, cluster, and 
change tablespace all in a single SQL command', but stuck with some 
architectural questions, when it got to the code.


Currently, the only one kind of table rewrite is done by ALTER TABLE. It 
is preformed by simply reading tuples one by one via 
table_scan_getnextslot and inserting into the new table via tuple_insert 
table access method (AM). In the same time, CLUSTER table rewrite is 
implemented as a separated table AM relation_copy_for_cluster, which is 
actually a direct link to the heap AM heapam_relation_copy_for_cluster. 
Basically speaking, CLUSTER table rewrite happens 2 abstraction layers 
lower than ALTER TABLE one. Furthermore, CLUSTER seems to be a 
heap-specific AM and may be meaningless for some other storages, which 
is even more important because of coming pluggable storages, isn't it?


Maybe I overly complicate the problem, but to perform a data type change 
(or any other ALTER TABLE modification), cluster, and change tablespace 
in a row we have to bring all this high-level stuff done by ALTER TABLE 
to heapam_relation_copy_for_cluster. But is it even possible without 
leaking abstractions?


I'm working toward adding REINDEX to ALTER INDEX, so it was possible to 
do 'ALTER INDEX ... REINDEX CONCURRENTLY SET TABLESPACE ...', but ALTER 
TABLE + CLUSTER/VACUUM FULL is quite questionable for me now.


Anyway, new patch, which adds SET TABLESPACE to REINDEX is attached and 
this functionality seems really useful, so I will be very appreciate if 
someone will take a look on it.



Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
The Russian Postgres Company
From 105f4fb6178e522990c4280ecfe05d6b74c44a32 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 30 Aug 2019 20:49:21 +0300
Subject: [PATCH v1] Allow REINDEX and REINDEX CONCURRENTLY to SET TABLESPACE

---
 doc/src/sgml/ref/reindex.sgml |  10 ++
 src/backend/catalog/index.c   | 117 ++
 src/backend/commands/cluster.c|   2 +-
 src/backend/commands/indexcmds.c  |  34 +--
 src/backend/commands/tablecmds.c  |  59 ++-
 src/backend/parser/gram.y |  21 ++--
 src/backend/tcop/utility.c|  16 ++-
 src/include/catalog/index.h   |   7 +-
 src/include/commands/defrem.h |   6 +-
 src/include/commands/tablecmds.h  |   2 +
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/input/tablespace.source  |  31 ++
 src/test/regress/output/tablespace.source |  41 
 13 files changed, 276 insertions(+), 71 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 10881ab03a..fdd1f6e628 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE } name [ SET TABLESPACE new_tablespace ]
 
  
 
@@ -165,6 +166,15 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURR
 

 
+   
+new_tablespace
+
+ 
+  The name of the specific tablespace to store rebuilt indexes.
+ 
+
+   
+

 VERBOSE
 
dif

SIGQUIT on archiver child processes maybe not such a hot idea?

2019-08-31 Thread Tom Lane
My buildfarm animal dromedary ran out of disk space yesterday, which
I found rather surprising because the last time I'd looked it had
tens of GB to spare.  On investigation, the problem was lots and lots
of core images in /cores, which is where macOS drops them (by default
at least).  It looked like I was getting one new core image per
buildfarm run, even successful runs.  Even odder, they mostly seemed
to be images from /bin/cp, not Postgres.

After investigation, the mechanism that's causing that is that the
src/test/recovery/t/010_logical_decoding_timelines.pl test shuts
down its replica server with a mode-immediate stop, which causes
that postmaster to shut down all its children with SIGQUIT, and
in particular that signal propagates to a "cp" command that the
archiver process is executing.  The "cp" is unsurprisingly running
with default SIGQUIT handling, which per the signal man page
includes dumping core.

This makes me wonder whether we shouldn't be using some other signal
to shut down archiver subprocesses.  It's not real cool if we're
spewing cores all over the place.  Admittedly, production servers
are likely running with "ulimit -c 0" on most modern platforms,
so this might not be a huge problem in the field; but accumulation
of core files could be a problem anywhere that's configured to allow
server core dumps.

I suspect the reason we've not noticed this in the buildfarm is that most
of those platforms are configured to dump core into the data directory,
where it'll be thrown away when we clean up after the run.  But aside
from macOS, the machines running recent systemd releases are likely
accumulating cores somewhere behind the scenes, since systemd has seen
fit to insert itself into core-handling along with everything else.

Ideally, perhaps, we'd be using SIGINT not SIGQUIT to shut down
non-Postgres child processes.  But redesigning the system's signal
handling to make that possible seems like a bit of a mess.

Thoughts?

regards, tom lane




Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-08-31 Thread Noah Misch
On Sat, Aug 31, 2019 at 02:27:55PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > Done.  fetch-add-variable-test-v1.patch just adds tests for non-constant
> > addends and 16-bit edge cases.  Today's implementation handles those,
> > PostgreSQL doesn't use them, and I might easily have broken them.
> > fetch-add-xlc-asm-v1.patch moves xlc builds from the __fetch_and_add()
> > intrinsic to inline asm.  fetch-add-gcc-xlc-unify-v1.patch moves fetch_add 
> > to
> > inline asm for all other ppc compilers.  gcc-7.2.0 generates equivalent code
> > before and after.  I plan to keep the third patch HEAD-only, back-patching 
> > the
> > other two.  I tested with xlc v12 and v13.
> 
> Hm, no objection to the first two patches, but I don't understand
> why the third patch goes to so much effort just to use "addi" rather
> than (one assumes) "li" then "add"?  It doesn't seem likely that
> that's buying much.

Changing an addi to li+add may not show up on benchmarks, but I can't claim
it's immaterial.  I shouldn't unify the code if that makes the compiled code
materially worse than what the gcc intrinsics produce today, hence the
nontrivial (~50 line) bits to match the intrinsics' capabilities.




Re: row filtering for logical replication

2019-08-31 Thread Euler Taveira
Em dom, 3 de fev de 2019 às 07:14, Andres Freund  escreveu:
>
> As far as I can tell, the patch has not been refreshed since. So I'm
> marking this as returned with feedback for now. Please resubmit once
> ready.
>
I fix all of the bugs pointed in this thread. I decide to disallow
UDFs in filters (it is safer for a first version). We can add this
functionality later. However, I'll check if allow "safe" functions
(aka builtin functions) are ok. I add more docs explaining that
expressions are executed with the role used for replication connection
and also that columns used in expressions must be part of PK or
REPLICA IDENTITY. I add regression tests.

Comments?



--
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
From 87945236590e9fd37b203d325b74dc5baccee64d Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Fri, 9 Mar 2018 18:39:22 +
Subject: [PATCH 1/8] Remove unused atttypmod column from initial table
 synchronization

 Since commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920, atttypmod was
 added but not used. The removal is safe because COPY from publisher
 does not need such information.
---
 src/backend/replication/logical/tablesync.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 7881079e96..0a565dd837 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -647,7 +647,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	StringInfoData cmd;
 	TupleTableSlot *slot;
 	Oid			tableRow[2] = {OIDOID, CHAROID};
-	Oid			attrRow[4] = {TEXTOID, OIDOID, INT4OID, BOOLOID};
+	Oid			attrRow[3] = {TEXTOID, OIDOID, BOOLOID};
 	bool		isnull;
 	int			natt;
 
@@ -691,7 +691,6 @@ fetch_remote_table_info(char *nspname, char *relname,
 	appendStringInfo(&cmd,
 	 "SELECT a.attname,"
 	 "   a.atttypid,"
-	 "   a.atttypmod,"
 	 "   a.attnum = ANY(i.indkey)"
 	 "  FROM pg_catalog.pg_attribute a"
 	 "  LEFT JOIN pg_catalog.pg_index i"
@@ -703,7 +702,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	 lrel->remoteid,
 	 (walrcv_server_version(wrconn) >= 12 ? "AND a.attgenerated = ''" : ""),
 	 lrel->remoteid);
-	res = walrcv_exec(wrconn, cmd.data, 4, attrRow);
+	res = walrcv_exec(wrconn, cmd.data, 3, attrRow);
 
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
@@ -724,7 +723,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 		Assert(!isnull);
 		lrel->atttyps[natt] = DatumGetObjectId(slot_getattr(slot, 2, &isnull));
 		Assert(!isnull);
-		if (DatumGetBool(slot_getattr(slot, 4, &isnull)))
+		if (DatumGetBool(slot_getattr(slot, 3, &isnull)))
 			lrel->attkeys = bms_add_member(lrel->attkeys, natt);
 
 		/* Should never happen. */
-- 
2.11.0

From 3a5b4c541982357c2231b9882ac01f1f0d0a8e29 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Tue, 27 Feb 2018 02:21:03 +
Subject: [PATCH 3/8] Refactor function create_estate_for_relation

Relation localrel is the only LogicalRepRelMapEntry structure member
that is useful for create_estate_for_relation.
---
 src/backend/replication/logical/worker.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..31fc7c5048 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -173,7 +173,7 @@ ensure_transaction(void)
  * This is based on similar code in copy.c
  */
 static EState *
-create_estate_for_relation(LogicalRepRelMapEntry *rel)
+create_estate_for_relation(Relation rel)
 {
 	EState	   *estate;
 	ResultRelInfo *resultRelInfo;
@@ -183,13 +183,13 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	rte = makeNode(RangeTblEntry);
 	rte->rtekind = RTE_RELATION;
-	rte->relid = RelationGetRelid(rel->localrel);
-	rte->relkind = rel->localrel->rd_rel->relkind;
+	rte->relid = RelationGetRelid(rel);
+	rte->relkind = rel->rd_rel->relkind;
 	rte->rellockmode = AccessShareLock;
 	ExecInitRangeTable(estate, list_make1(rte));
 
 	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	InitResultRelInfo(resultRelInfo, rel, 1, NULL, 0);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -589,7 +589,7 @@ apply_handle_insert(StringInfo s)
 	}
 
 	/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel->localrel);
 	remoteslot = ExecInitExtraTupleSlot(estate,
 		RelationGetDescr(rel->localrel),
 		&TTSOpsVirtual);
@@ -696,7 +696,7 @@ apply_handle_update(StringInfo s)
 	check_relation_updatable(rel);
 
 	/* Initialize the executor state. */
-	estate = create_estate_for_relation(re

Re: row filtering for logical replication

2019-08-31 Thread Euler Taveira
Em ter, 27 de ago de 2019 às 18:10,  escreveu:
>
> Do you have any plans for continuing working on this patch and
> submitting it again on the closest September commitfest? There are only
> a few days left. Anyway, I will be glad to review the patch if you do
> submit it, though I didn't yet dig deeply into the code.
>
Sure. See my last email to this thread. I appreciate if you can review it.

> Although almost all new tests are passed, there is a problem with DELETE
> replication, so 1 out of 10 tests is failed. It isn't replicated if the
> record was created with is_cloud=TRUE on cloud, replicated to remote;
> then updated with is_cloud=FALSE on remote, replicated to cloud; then
> deleted on remote.
>
That's because you don't include is_cloud in PK or REPLICA IDENTITY. I
add a small note in docs.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




Re: [DOC] Document auto vacuum interruption

2019-08-31 Thread Amit Kapila
On Fri, Jul 26, 2019 at 1:45 AM James Coleman  wrote:
>
> We've discussed this internally many times, but today finally decided
> to write up a doc patch.
>

Thanks, I think something on the lines of what you have written can
help some users to understand the behavior in this area and there
doesn't seem to be any harm in giving such information to the user.

> Autovacuum holds a SHARE UPDATE EXCLUSIVE lock, but other processes
> can cancel autovacuum if blocked by that lock unless the autovacuum is
> to prevent wraparound.This can result in very surprising behavior:
> imagine a system that needs to run ANALYZE manually before batch jobs
> to ensure reasonable query plans. That ANALYZE will interrupt attempts
> to run autovacuum, and pretty soon the table is far more bloated than
> expected, and query plans (ironically) degrade further.
>

+If a process attempts to acquire a SHARE UPDATE
EXCLUSIVE
+lock (the lock type held by autovacuum), lock acquisition will interrupt
+the autovacuum.

I think it is not only for a process that tries to acquire a lock in
SHARE UPDATE EXCLUSIVE mode, rather when a process tries to acquire
any lock mode that conflicts with SHARE UPDATE EXCLUSIVE.  For the
conflicting lock modes, you can refer docs [1] (See Table 13.2.
Conflicting Lock Modes).

[1] - https://www.postgresql.org/docs/devel/explicit-locking.html

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: doc: update PL/pgSQL sample loop function

2019-08-31 Thread Amit Kapila
On Thu, Aug 29, 2019 at 10:07 AM Pavel Stehule  wrote:
>
> Hi
>
> čt 29. 8. 2019 v 5:03 odesílatel Ian Barwick  
> napsal:
>>
>> Hi
>>
>> Here:
>>
>>
>> https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING
>>
>> we have a sample PL/PgSQL function (dating from at least 7.2) demonstrating
>> query result loops, which refreshes some pseudo materialized views stored in
>> a user-defined table.
>>
>> As we've had proper materialized views since 9.3, I thought it might
>> be nice to update this with a self-contained sample which can be used
>> as-is; see attached patch.
>>
>> (As a side note the current sample function contains a couple of "%s"
>> placeholders which should be just "%"; a quick search of plpgsql.sgml
>> shows this is the only place they occur).
>>
>> Will submit to the next commitfest.
>
> +1
>

The current example shows the usage of looping in plpgsql, so as such
there is no correctness issue, but OTOH there is no harm in updating
the example as proposed by Ian Barwick.  Does anyone else see any
problem with this idea?  If we agree to proceed with this update, it
might be better to backpatch it for the sake of consistency though I
am not sure about that.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: refactoring - share str2*int64 functions

2019-08-31 Thread Michael Paquier
On Fri, Aug 30, 2019 at 04:50:21PM +0200, Fabien COELHO wrote:
> Given the overheads of the SQL interpreter, I'm unsure about what you would
> measure at the SQL level. You could just write a small standalone C program
> to test perf and accuracy. Maybe this is what you have in mind.

After a certain threshold, you can see the difference anyway by paying
once the overhead of the function.  See for example the updated module
attached that I used for my tests.

I have been testing the various implementations, and doing 2B
iterations leads to roughly the following with a non-assert, -O2
build using mul_u32:
- __builtin_sub_overflow => 5s
- cast to uint64 => 5.9s
- division => 8s
You are right as well that having symmetry with the signed methods is
much better.  In order to see the difference, you can just do that
with the extension attached, after of course hijacking int.h with some
undefs and recompiling the backend and the module:
select pg_overflow_check(1, 1, 20, 'uint32', 'mul');

>> still it is possible to trick things with signed integer arguments.
> 
> Is it?

If you pass -1 and then you can fall back to the maximum of each 16,
32 or 64 bits for the unsigned (see the regression tests I added with
the module).

> Do you mean:
> 
>  sql> SELECT -32768::INT2;
>  ERROR:  smallint out of range

You are incorrect here, as the minus sign is ignored by the cast.
This works though:
=# SELECT (-32768)::INT2;
  int2

 -32768
(1 row)

If you look at int2.sql, we do that:
-- largest and smallest values
INSERT INTO INT2_TBL(f1) VALUES ('32767');
INSERT INTO INT2_TBL(f1) VALUES ('-32767');
That's the part I mean is wrong, as the minimum is actually -32768,
but the test fails to consider that.  I'll go fix that after
double-checking other similar tests for int4 and int8.

Attached is an updated patch to complete the work for common/int.h,
with the following changes:
- Changed the multiplication methods for uint16 and uint32 to not be
division-based.  uint64 can use that only if int128 exists.
- Added comments on top of each sub-sections for the types checked.

Attached is also an updated version of the module I used to validate
this stuff.  Fabien, any thoughts?
--
Michael


overflow.tar.gz
Description: application/gzip
diff --git a/src/include/common/int.h b/src/include/common/int.h
index d754798543..4e1aec5018 100644
--- a/src/include/common/int.h
+++ b/src/include/common/int.h
@@ -20,10 +20,28 @@
 #ifndef COMMON_INT_H
 #define COMMON_INT_H
 
-/*
- * If a + b overflows, return true, otherwise store the result of a + b into
- * *result. The content of *result is implementation defined in case of
+
+/*-
+ * The following guidelines apply to all the routines:
+ * - If a + b overflows, return true, otherwise store the result of a + b
+ * into *result. The content of *result is implementation defined in case of
  * overflow.
+ * - If a - b overflows, return true, otherwise store the result of a - b
+ * into *result. The content of *result is implementation defined in case of
+ * overflow.
+ * - If a * b overflows, return true, otherwise store the result of a * b
+ * into *result. The content of *result is implementation defined in case of
+ * overflow.
+ *-
+ */
+
+/*
+ * Overflow routines for signed integers
+ *
+ */
+
+/*
+ * INT16
  */
 static inline bool
 pg_add_s16_overflow(int16 a, int16 b, int16 *result)
@@ -43,11 +61,6 @@ pg_add_s16_overflow(int16 a, int16 b, int16 *result)
 #endif
 }
 
-/*
- * If a - b overflows, return true, otherwise store the result of a - b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_sub_s16_overflow(int16 a, int16 b, int16 *result)
 {
@@ -66,11 +79,6 @@ pg_sub_s16_overflow(int16 a, int16 b, int16 *result)
 #endif
 }
 
-/*
- * If a * b overflows, return true, otherwise store the result of a * b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_mul_s16_overflow(int16 a, int16 b, int16 *result)
 {
@@ -90,9 +98,7 @@ pg_mul_s16_overflow(int16 a, int16 b, int16 *result)
 }
 
 /*
- * If a + b overflows, return true, otherwise store the result of a + b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
+ * INT32
  */
 static inline bool
 pg_add_s32_overflow(int32 a, int32 b, int32 *result)
@@ -112,11 +118,6 @@ pg_add_s32_overflow(int32 a, int32 b, int32 *result)
 #endif
 }
 
-/*
- * If a - b overflows, return true, otherwise store the result of a - b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_sub_s32_overflow(int32 a, int32 b, int32 *result)
 {
@@ -135,11 +136,6 @@ pg_sub_s32_overflow(int32 a, int32 b, int32 *result)
 #endif
 }
 
-/*
- * If a * b overflow

Commit fest 2019-09

2019-08-31 Thread Michael Paquier
Hi all,

Based on the current clock we are going to be the 1st of September
anywhere on Earth in approximately 7 hours:
https://www.timeanddate.com/time/zones/aoe

Once this happens, I will switch the commit fest as in progress and it
will not be possible to add any new patches.

I am not sure if somebody would like to volunteer, but I propose
myself as commit fest manager for the next session.  Feel free to
reply to this thread if you feel that you could help out as manager
for this time.

Thanks, and happy hacking!
--
Michael


signature.asc
Description: PGP signature


Re: row filtering for logical replication

2019-08-31 Thread Alexey Zagarin
I think that I also have found one shortcoming when using the setup described 
by Alexey Kondratov. The problem that I face is that if both (cloud and remote) 
tables already have data the moment I add the subscription, then the whole 
table is copied in both directions initially. Which leads to duplicated data 
and broken replication because COPY doesn't take into account the filtering 
condition. In case there are filters in a publication, the COPY command that is 
executed when adding a subscription (or altering one to refresh a publication) 
should also filter the data based on the same condition, e.g. COPY (SELECT * 
FROM ... WHERE ...) TO ...

The current workaround is to always use WITH copy_data = false when subscribing 
or refreshing, and then manually copy data with the above statement.

Alexey Zagarin
On 1 Sep 2019 12:11 +0700, Euler Taveira , wrote:
> Em ter, 27 de ago de 2019 às 18:10,  escreveu:
> >
> > Do you have any plans for continuing working on this patch and
> > submitting it again on the closest September commitfest? There are only
> > a few days left. Anyway, I will be glad to review the patch if you do
> > submit it, though I didn't yet dig deeply into the code.
> >
> Sure. See my last email to this thread. I appreciate if you can review it.
>
> > Although almost all new tests are passed, there is a problem with DELETE
> > replication, so 1 out of 10 tests is failed. It isn't replicated if the
> > record was created with is_cloud=TRUE on cloud, replicated to remote;
> > then updated with is_cloud=FALSE on remote, replicated to cloud; then
> > deleted on remote.
> >
> That's because you don't include is_cloud in PK or REPLICA IDENTITY. I
> add a small note in docs.
>
>
> --
> Euler Taveira Timbira -
> http://www.timbira.com.br/
> PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
>
>
>
>


Re: Should we add xid_current() or a int8->xid cast?

2019-08-31 Thread Thomas Munro
On Fri, Aug 2, 2019 at 10:42 PM Thomas Munro  wrote:
> On Thu, Jul 25, 2019 at 1:11 PM Thomas Munro  wrote:
> > On Thu, Jul 25, 2019 at 12:42 PM Tom Lane  wrote:
> > > Andres Freund  writes:
> > > > On 2019-07-24 20:34:39 -0400, Tom Lane wrote:
> > > >> Yeah, I would absolutely NOT recommend that you open that can of worms
> > > >> right now.  We have looked at adding unsigned integer types in the past
> > > >> and it looked like a mess.
> > >
> > > > I assume Thomas was thinking more of another bespoke type like xid, just
> > > > wider.  There's some notational advantage in not being able to
> > > > immediately do math etc on xids.
> > >
> > > Well, we could invent an xid8 type if we want, just don't try to make
> > > it part of the numeric hierarchy (as indeed xid isn't).
> >
> > Yeah, I meant an xid64/xid8/fxid/pg_something/... type that isn't a
> > kind of number.

I thought about how to deal with the transition to xid8 for the
txid_XXX() family of functions.  The best idea I've come up with so
far is to create a parallel xid8_XXX() family of functions, and
declare the bigint-based functions to be deprecated, and threaten to
drop them from a future release.  The C code for the two families can
be the same (it's a bit of a dirty trick, but only until the
txid_XXX() variants go away).  Here's a PoC patch demonstrating that.
Not tested much, yet, probably needs some more work, but I wanted to
see if others thought the idea was terrible first.

I wonder if there is a better way to share hash functions than the
hack in check_hash_func_signature(), which I had to extend to cover
xid8.

Adding to CF.

-- 
Thomas Munro
https://enterprisedb.com


0001-Add-SQL-type-xid8-to-expose-FullTransactionId-to-use.patch
Description: Binary data


0002-Introduce-xid8-variants-of-the-txid_XXX-fmgr-functio.patch
Description: Binary data