Re: [HACKERS] Block level parallel vacuum

2019-10-12 Thread Mahendra Singh
Thanks Amit for patch.

Crash is fixed by this patch.

Thanks and Regards
Mahendra Thalor


On Sat, Oct 12, 2019, 09:03 Amit Kapila  wrote:

> On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh  wrote:
> >
> >
> > I did some analysis and found that we are trying to free some already
> freed memory. Or we are freeing palloced memory in vac_update_relstats.
> > for (i = 0; i < nindexes; i++)
> > {
> > if (stats[i] == NULL || stats[i]->estimated_count)
> > continue;
> >
> > /* Update index statistics */
> > vac_update_relstats(Irel[i],
> > stats[i]->num_pages,
> > stats[i]->num_index_tuples,
> > 0,
> > false,
> > InvalidTransactionId,
> > InvalidMultiXactId,
> > false);
> > pfree(stats[i]);
> > }
> >
> > As my table have 2 indexes, so we have to free both stats. When i = 0,
> it is freeing propery but when i = 1, then vac_update_relstats  is freeing
> memory.
> >>
> >> (gdb) p *stats[i]
> >> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false,
> num_index_tuples = 3, tuples_removed = 3, pages_deleted = 102,
> pages_free = 0}
> >> (gdb) p *stats[i]
> >> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false,
> num_index_tuples = 0, tuples_removed = 0, pages_deleted = 0, pages_free = 0}
> >> (gdb)
> >
> >
> > From above data, it looks like, somewhere inside vac_update_relstats, we
> are freeing all palloced memory. I don't know, why is it.
> >
>
> I don't think the problem is in vac_update_relstats as we are not even
> passing stats to it, so it won't be able to free it.  I think the real
> problem is in the way we copy the stats from shared memory to local
> memory in the function end_parallel_vacuum().  Basically, it allocates
> the memory for all the index stats together and then in function
> update_index_statistics,  it is trying to free memory of individual
> array elements, that won't work.  I have tried to fix the allocation
> in end_parallel_vacuum, see if this fixes the problem for you.   You
> need to apply the attached patch atop
> v28-0001-Add-parallel-option-to-VACUUM-command posted above by
> Sawada-San.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-12 Thread Masahiko Sawada
On Wed, Oct 9, 2019 at 3:57 PM Antonin Houska  wrote:
>
> Moon, Insung  wrote:
>
> > On Wed, Oct 9, 2019 at 2:42 PM Antonin Houska  wrote:
> > >
> > > Moon, Insung  wrote:
> > >
> > > > I also tried to generate IV using PID (32bit) + tempCounter (64bit) at
> > > > first, but in the worst-case PID and tempCounter are used in the same
> > > > values.
> > > > Therefore, the uniqueness of the encryption key was considered without
> > > > considering the uniqueness of the IV value.
> > >
> > > If you consider 64bit counter insufficient (here it seems that tempCounter
> > > counts the 1GB segments), then we can't even use LSN as the IV for 
> > > relation
> > > pages.
> >
> > The worst-case here is not a lack of tempCounter, but a problem that
> > occurs when PID is reused after a certain period.
> > Of course, it is very unlikely to be a problem because it is a
> > temporary file, but since the file name can know the PID and
> > tempFileCounter, if you accumulate some data, the same key and the
> > same IV will be used to encrypt other data. So I thought there could
> > be a problem.
>
> ok
>
> > > > First, it generates a hash value randomly for the file, and uses the
> > > > hash value and KEK (or MDEK) to derive and use the key with
> > > > HMAC-SHA256.
> > > > In this case, there is no need to store the encryption key separately
> > > > if it is not necessary to keep it in a separate IV file or memory.
> > > > (IV is a hash value of 64 bits and a counter of 32 bits.)
> > >
> > > You seem to miss the fact that user of buffile.c can seek in the file and
> > > rewrite arbitrary part. Thus you'd have to generate a new key for the part
> > > being changed.
> >
> > That's right. I wanted to ask this too.
> > Is it possible to overwrite the data already written in the actual 
> > buffile.c?
> > Such a problem seems to become a problem when BufFileWRite function is
> > called, and BufFileSeek function is called, and BufFileRead is called.
> > In other words, the file is not written in units of 8kb, but the file
> > is changed in the pos, and some data is read in another pos.
>
> v04-0011-Make-buffile.c-aware-of-encryption.patch in [1] changes buffile.c so
> that data is read and written in 8kB blocks if encryption is enabled. In order
> to record the IV per block, the computation of the buffer position within the
> file would have to be adjusted somehow. I can check it soon but not in the
> next few days.

As far as I read the patch the nonce consists of pid, counter and
block number where the counter is the number incremented each time of
creating a BufFile. Therefore it could happen to rewrite the buffer
data with the same nonce and key, which is bad.

So I think we can have the rewrite counter of the block in the each 8k
block header. And then the nonce consists of block number within a
segment file (4 bytes), temp file counter (8 bytes), rewrite counter
(2 bytes) and CTR mode counter (2 bytes). And then if we have a
single-use encryption key per backend processes I guess we can
guarantee the uniqueness of the combination of key and nonce.

Regards,

--
Masahiko Sawada




Re: [HACKERS] Block level parallel vacuum

2019-10-12 Thread Amit Kapila
On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada  wrote:
>
> On Sat, Oct 12, 2019 at 12:33 PM Amit Kapila  wrote:
> >
> > On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh  wrote:
> > >
>
> Thank you for reviewing and creating the patch!
>
> I think the patch fixes this issue correctly. Attached the updated
> version patch.
>

I see a much bigger problem with the way this patch collects the index
stats in shared memory.  IIUC, it allocates the shared memory (DSM)
for all the index stats, in the same way, considering its size as
IndexBulkDeleteResult.  For the first time, it gets the stats from
local memory as returned by ambulkdelete/amvacuumcleanup call and then
copies it in shared memory space.  There onwards, it always updates
the stats in shared memory by pointing each index stats to that
memory.  In this scheme, you overlooked the point that an index AM
could choose to return a larger structure of which
IndexBulkDeleteResult is just the first field.  This generally
provides a way for ambulkdelete to communicate additional private data
to amvacuumcleanup.  We use this idea in the gist index, see how
gistbulkdelete and gistvacuumcleanup works. The current design won't
work for such cases.

One idea is to change the design such that each index method provides
a method to estimate/allocate the shared memory required for stats of
ambulkdelete/amvacuumscan and then later we also need to use index
method-specific function which copies the stats from local memory to
shared memory. I think this needs further investigation.

I have also made a few other changes in the attached delta patch.  The
main point that fixed by attached patch is that even if we don't allow
a parallel vacuum on temporary tables, the analyze should be able to
work if the user has asked for it.  I have changed an error message
and few other cosmetic changes related to comments.  Kindly include
this in the next version if you don't find any problem with the
changes.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_comments_amit_1.patch
Description: Binary data


Re: maintenance_work_mem used by Vacuum

2019-10-12 Thread Amit Kapila
On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada  wrote:
>
> On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila  wrote:
> >
> > That's right, but OTOH, if the user specifies gin_pending_list_limit
> > as an option during Create Index with a value greater than GUC
> > gin_pending_list_limit, then also we will face the same problem.  It
> > seems to me that we haven't thought enough on memory usage during Gin
> > pending list cleanup and I don't want to independently make a decision
> > to change it.  So unless the original author/committer or some other
> > people who have worked in this area share their opinion, we can leave
> > it as it is and make a parallel vacuum patch adapt to it.
>
> Yeah I totally agreed.
>
> Apart from the GIN problem can we discuss whether need to change the
> current memory usage policy of parallel utility command described in
> the doc? We cannot control the memory usage in index AM after all but
> we need to generically consider how much memory is used during
> parallel vacuum.
>

Do you mean to say change the docs for a parallel vacuum patch in this
regard?  If so, I think we might want to do something for
maintenance_work_mem for parallel vacuum as described in one of the
emails above [1] and then change the docs accordingly.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1JhpNsTiHj%2BJOy3N8uCGyTBMH8xDhUEtBw8ZeCAPRGp6Q%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: use of the term "verifier" with SCRAM

2019-10-12 Thread Peter Eisentraut
On 2019-10-10 10:03, Michael Paquier wrote:
> On Thu, Oct 10, 2019 at 09:08:37AM +0200, Peter Eisentraut wrote:
>> Here is my proposed patch to adjust this.
> 
> Looks fine to me reading through.  I think that you are right to not
> change the descriptions in build_server_final_message(), as that's
> described similarly in RFC 5802.

committed

> By renaming scram_build_verifier()
> to scram_build_secret() you are going to break one of my in-house
> extensions.  I am using it to register for a user SCRAM veri^D^D^D^D
> secrets with custom iteration and salt length :)

OK, that should be easy to work around with an #ifdef or two.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: fairywren failures

2019-10-12 Thread Peter Eisentraut
On 2019-10-03 16:21, Andrew Dunstan wrote:
> My new msys2 animal fairywren

Could you please check how this animal is labeled?  AFAICT, this is not
an msys2 build but a mingw build (x86_64-w64-mingw32).

> has had 3 recent failures when checking
> pg_upgrade. The failures have been while running the regression tests,
> specifically the interval test, and they all look like this:

I've also seen this randomly, but only under 64-bit mingw, never 32-bit
mingw.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: adding partitioned tables to publications

2019-10-12 Thread Petr Jelinek

Hi,

On 07/10/2019 02:55, Amit Langote wrote:

One cannot currently add partitioned tables to a publication.

create table p (a int, b int) partition by hash (a);
create table p1 partition of p for values with (modulus 3, remainder 0);
create table p2 partition of p for values with (modulus 3, remainder 1);
create table p3 partition of p for values with (modulus 3, remainder 2);

create publication publish_p for table p;
ERROR:  "p" is a partitioned table
DETAIL:  Adding partitioned tables to publications is not supported.
HINT:  You can add the table partitions individually.

One can do this instead:

create publication publish_p1 for table p1;
create publication publish_p2 for table p2;
create publication publish_p3 for table p3;


Or just create publication publish_p for table p1, p2, p3;



but maybe that's too much code to maintain for users.

I propose that we make this command:

create publication publish_p for table p;



+1


automatically add all the partitions to the publication.  Also, any
future partitions should also be automatically added to the
publication.  So, publishing a partitioned table automatically
publishes all of its existing and future partitions.  Attached patch
implements that.

What doesn't change with this patch is that the partitions on the
subscription side still have to match one-to-one with the partitions
on the publication side, because the changes are still replicated as
being made to the individual partitions, not as the changes to the
root partitioned table.  It might be useful to implement that
functionality on the publication side, because it allows users to
define the replication target any way they need to, but this patch
doesn't implement that.



Yeah for that to work subscription would need to also need to be able to 
write to partitioned tables, so it needs both sides to add support for 
this. I think if we do both what you did and the transparent handling of 
root only, we'll need new keyword to differentiate the two. It might 
make sense to think about if we want your way to need an extra keyword 
or the transparent one will need it.


One issue that I see reading the patch is following set of commands:

CREATE TABLE foo ...;
CREATE PUBLICATION mypub FOR TABLE foo;

CREATE TABLE bar ...;
ALTER PUBLICATION mypub ADD TABLE bar;

ALTER TABLE foo ATTACH PARTITION bar ...;
ALTER TABLE foo DETACH PARTITION bar ...;

This will end up with bar not being in any publication even though it 
was explicitly added. That might be acceptable caveat but it at least 
should be clearly documented (IMHO with warning).


--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/




Re: stress test for parallel workers

2019-10-12 Thread Tom Lane
I've now also been able to reproduce the "infinite_recurse" segfault
on wobbegong's host (or, since I was using a gcc build, I guess I
should say vulpes' host).  The first-order result is that it's the
same problem with the kernel not giving us as much stack space as
we expect: there's only 1179648 bytes in the stack segment in the
core dump, though we should certainly have been allowed at least 8MB.

The next interesting thing is that looking closely at the identified
spot of the SIGSEGV, there's nothing there that should be touching
the stack at all:

(gdb) x/4i $pc
=> 0x10201df0 :ld  r9,0(r30)
   0x10201df4 :ld  r8,128(r30)
   0x10201df8 :ld  r10,152(r30)
   0x10201dfc :ld  r9,0(r9)

(r30 is not pointing at the stack, but at a valid heap location.)
This code is the start of the switch case at scan.l:1064, so the
most recent successfully-executed instructions were the switch jump,
and they don't involve the stack either.

The reported sp,

(gdb) i reg sp
sp 0x7fffe6940890   140737061849232

is a good 2192 bytes above the bottom of the allocated stack space,
which is 0x7fffe694 according to gdb.  So we really ought to
have plenty of margin here.  What's going on?

What I suspect, given the difficulty of reproducing this, is that
what really happened is that the kernel tried to deliver a SIGUSR1
signal to us just at this point.  The kernel source code that
Thomas pointed to comments that

 * The kernel signal delivery code writes up to about 1.5kB
 * below the stack pointer (r1) before decrementing it.

There's more than 1.5kB available below sp, but what if that comment
is a lie?  In particular, I'm wondering if that number dates to PPC32
and needs to be doubled, or nearly so, to describe PPC64 reality.
If that were the case, then the signal code would not have been
able to fit its requirement, and would probably have come here to
ask for more stack space, and the hard-wired 2048 test a little
further down would have decided that that was a wild stack access.

In short, my current belief is that Linux PPC64 fails when trying
to deliver a signal if there's right around 2KB of stack remaining,
even though it should be able to expand the stack and press on.

It may well be that the reason is just that this heuristic in
bad_stack_expansion() is out of date.  Or there might be a similarly
bogus value somewhere in the signal-delivery code.

regards, tom lane




Re: v12.0: ERROR: could not find pathkey item to sort

2019-10-12 Thread Justin Pryzby
On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:
> Could you provide a self-contained test case please?

SET enable_partitionwise_aggregate = 'on';
SET enable_partitionwise_join = 'on';
SET max_parallel_workers_per_gather=0;
-- maybe not important but explain(settings) suggests I should include them for 
completeness:
SET effective_io_concurrency = '0';
SET work_mem = '512MB';
SET jit = 'off';

CREATE TABLE s(site_id int, site_location text, site_office text);
INSERT INTO s SELECT generate_series(1,99),'','';

CREATE TABLE t(start_time timestamp, site_id text, i int)PARTITION BY 
RANGE(start_time);
CREATE TABLE t1 PARTITION OF t FOR VALUES FROM ('2019-10-01')TO('2019-10-02');
INSERT INTO t1 SELECT a,b FROM generate_series( '2019-10-01'::timestamp, 
'2019-10-01 23:45'::timestamp, '15 minutes')a, generate_series(1,99)b, 
generate_series(1,99)c;
CREATE TABLE t2 PARTITION OF t FOR VALUES FROM ('2019-10-02')TO('2019-10-03');
INSERT INTO t2 SELECT a,b FROM generate_series( '2019-10-02'::timestamp, 
'2019-10-02 23:45'::timestamp, '15 minutes')a, generate_series(1,99)b, 
generate_series(1,99)c;

ANALYZE s,t;

explain
SELECT s.* FROM
(SELECT start_time, site_id::int
FROM t t1 FULL JOIN t t2 USING(start_time,site_id)
WHERE (start_time>='2019-10-01' AND start_time<'2019-10-01 01:00')
GROUP BY 1,2) AS data
JOIN s ON (s.site_location='' OR s.site_office::int=data.site_id)

Justin




Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-10-12 Thread Noah Misch
On Wed, Oct 09, 2019 at 01:15:29PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Oct 07, 2019 at 03:06:35PM -0400, Tom Lane wrote:
> >> This still fails on Apple's compilers. ...
> 
> > Thanks for testing.  That error boils down to "need to use some other
> > register".  The second operand of addi is one of the ppc instruction 
> > operands
> > that can hold a constant zero or a register number[1], so the proper
> > constraint is "b".  I've made it so and added a comment.
> 
> Ah-hah.  This version does compile and pass check-world for me.
> 
> > I should probably
> > update s_lock.h, too, in a later patch.  I don't know how it has
> > mostly-avoided this failure mode, but its choice of constraint could explain
> > https://postgr.es/m/flat/36E70B06-2C5C-11D8-A096-0005024EF27F%40ifrance.com
> 
> Indeed.  It's a bit astonishing that more people haven't hit that.
> This should be back-patched.

I may as well do that first, so there's no time when s_lock.h disagrees with
arch-ppc.h about the constraint to use.  I'm attaching that patch, too.

> * I still think that the added configure test is a waste of build cycles.
> It'd be sufficient to test "#ifdef HAVE__BUILTIN_CONSTANT_P" where you
> are testing HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P, because our previous
> buildfarm go-round with this showed that all supported compilers
> interpret "i" this way.

xlc does not interpret "i" that way:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2019-09-14%2016%3A42%3A32&stg=config

> * I really dislike building the asm calls with macros as you've done
> here.  The macros violate project style, and are not remotely general-
> purpose, because they have hard-wired references to variables that are
> not in their argument lists.  While that could be fixed with more
> arguments, I don't think that the approach is readable or maintainable
> --- it's impossible for example to understand the register constraints
> without looking simultaneously at the calls and the macro definition.
> And, as we've seen in this "b" issue, the interactions between the chosen
> instruction types and the constraints are subtle enough to make me wonder
> whether you won't need even more arguments to allow some of the other
> constraints to be variable.  I think it'd be far better just to write out
> the asm in-line and accept the not-very-large amount of code duplication
> you'd get.

For a macro local to one C file, I think readability is the relevant metric.
In particular, it would be wrong to add arguments to make these more like
header file macros.  I think the macros make the code somewhat more readable,
and you think they make the code less readable.  I have removed the macros.

> * src/tools/pginclude/headerscheck needs the same adjustment as you
> made in cpluspluscheck.

Done.
diff --git a/configure b/configure
index 74627a7..02a905b 100755
--- a/configure
+++ b/configure
@@ -14518,6 +14518,46 @@ $as_echo "$pgac_cv_have_ppc_mutex_hint" >&6; }
 $as_echo "#define HAVE_PPC_LWARX_MUTEX_HINT 1" >>confdefs.h
 
 fi
+# Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether 
__builtin_constant_p(x) implies \"i\"(x) acceptance" >&5
+$as_echo_n "checking whether __builtin_constant_p(x) implies \"i\"(x) 
acceptance... " >&6; }
+if ${pgac_cv_have_i_constraint__builtin_constant_p+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+static inline int
+ addi(int ra, int si)
+ {
+ int res = 0;
+ if (__builtin_constant_p(si))
+ __asm__ __volatile__(
+ " addi %0,%1,%2\n" : "=r"(res) : "b"(ra), "i"(si));
+ return res;
+ }
+ int test_adds(int x) { return addi(3, x) + addi(x, 5); }
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_have_i_constraint__builtin_constant_p=yes
+else
+  pgac_cv_have_i_constraint__builtin_constant_p=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$pgac_cv_have_i_constraint__builtin_constant_p" >&5
+$as_echo "$pgac_cv_have_i_constraint__builtin_constant_p" >&6; }
+if test x"$pgac_cv_have_i_constraint__builtin_constant_p" = xyes ; then
+
+$as_echo "#define HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P 1" >>confdefs.h
+
+fi
   ;;
 esac
 
diff --git a/configure.in b/configure.in
index 0d16c1a..88d3a59 100644
--- a/configure.in
+++ b/configure.in
@@ -1539,6 +1539,26 @@ case $host_cpu in
 if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then
AC_DEFINE(HAVE_PPC_LWARX_MUTEX_HINT, 1, [Define to 1 if the assembler 
supports PPC's LWARX mutex hint bit.])
 fi
+# Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+AC_CACHE_CHECK([whether __builtin_constant_p(x) implies "i"(x) acceptance],
+   [pgac_cv_have_i_constraint__

Re: stress test for parallel workers

2019-10-12 Thread Tom Lane
I wrote:
> In short, my current belief is that Linux PPC64 fails when trying
> to deliver a signal if there's right around 2KB of stack remaining,
> even though it should be able to expand the stack and press on.

I figured I should try to remove some variables from the equation
by demonstrating this claim without involving Postgres.  The attached
test program eats some stack space and then waits to be sent SIGUSR1.
For some values of "some stack space", it dumps core:

[tgl@postgresql-fedora ~]$ gcc -g -Wall -O1 stacktest.c
[tgl@postgresql-fedora ~]$ ./a.out 124 &
[1] 11796
[tgl@postgresql-fedora ~]$ kill -USR1 11796
[tgl@postgresql-fedora ~]$ signal delivered, stack base 0x7fffdc16 top 
0x7fffdc031420 (1240032 used)

[1]+  Done./a.out 124
[tgl@postgresql-fedora ~]$ ./a.out 1242000 &
[1] 11797
[tgl@postgresql-fedora ~]$ kill -USR1 11797
[tgl@postgresql-fedora ~]$ 
[1]+  Segmentation fault  (core dumped) ./a.out 1242000
[tgl@postgresql-fedora ~]$ uname -a
Linux postgresql-fedora.novalocal 4.18.19-100.fc27.ppc64le #1 SMP Wed Nov 14 
21:53:32 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

I don't think any further proof is required that this is
a kernel bug.  Where would be a good place to file it?

regards, tom lane

#include 
#include 
#include 
#include 

static char *stack_base_ptr;
static char *stack_top_ptr;

static volatile sig_atomic_t sig_occurred = 0;

static void
sigusr1_handler(int signal_arg)
{
	sig_occurred = 1;
}

static void
consume_stack(long stackdepth)
{
	char		stack_cur;

	if ((stack_base_ptr - &stack_cur) < stackdepth)
		consume_stack(stackdepth);
	else
	{
		stack_top_ptr = &stack_cur;
		while (!sig_occurred)
			;
	}
}

int
main(int argc, char **argv)
{
	long		stackdepth = atol(argv[1]);
	char		stack_base;
	struct sigaction act;

	act.sa_handler = sigusr1_handler;
	sigemptyset(&act.sa_mask);
	act.sa_flags = 0;
	if (sigaction(SIGUSR1, &act, NULL) < 0)
		perror("sigaction");

	stack_base_ptr = (char *) (((size_t) &stack_base + 65535) & ~65535UL);

	consume_stack(stackdepth);

	if (sig_occurred)
		printf("signal delivered, stack base %p top %p (%zu used)\n",
			   stack_base_ptr, stack_top_ptr, stack_base_ptr - stack_top_ptr);
	else
		printf("no signal delivered\n");

	return 0;
}


Re: v12.0: ERROR: could not find pathkey item to sort

2019-10-12 Thread Tom Lane
Justin Pryzby  writes:
> On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote:
>> Could you provide a self-contained test case please?

> [ test case ]

Yup, fails for me too.  Will look shortly.

regards, tom lane




Re: stress test for parallel workers

2019-10-12 Thread Thomas Munro
On Sun, Oct 13, 2019 at 1:06 PM Tom Lane  wrote:
> I don't think any further proof is required that this is
> a kernel bug.  Where would be a good place to file it?

linuxppc-...@lists.ozlabs.org might be the right place.

https://lists.ozlabs.org/listinfo/linuxppc-dev




CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+

2019-10-12 Thread Tomas Vondra

Hi,

over in pgsql-bugs [1] we got a report about CREATE TEXT SEARCH
DICTIONARY causing segfaults on 12.0. Simply running

   CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,
   DictFile=hunspell_sample_num, AffFile=hunspell_sample_long);

does trigger a crash, 100% of the time. The crash was reported on 12.0,
but it's in fact present since 9.6.

On 9.5 the example does not work, because that version does not (a)
include the hunspell dictionaries used in the example, and (b) it does
not support long flags. So even after copying the dictionaries and
tweaking them a bit it still passes without a crash.

Looking at the commit history of spell.c, there seems to be a bunch of
commits in 2016 (e.g. f4ceed6ceba3) touching exactly this part of the
code (hunspell), and it also correlates quite nicely with the affected
branches (9.6+). So my best guess is it's a bug in those changes.

A complete backtrace looks like this:

Program received signal SIGSEGV, Segmentation fault.
0x008fca10 in getCompoundAffixFlagValue (Conf=0x20dd3b8, s=0x7f7f7f7f7f7f7f7f 
) at spell.c:1126
1126while (*flagcur)
(gdb) bt
#0  0x008fca10 in getCompoundAffixFlagValue (Conf=0x20dd3b8, 
s=0x7f7f7f7f7f7f7f7f ) at spell.c:1126
#1  0x008fdd1c in makeCompoundFlags (Conf=0x20dd3b8, affix=303) at 
spell.c:1608
#2  0x008fe04e in mkSPNode (Conf=0x20dd3b8, low=0, high=1, level=3) at 
spell.c:1680
#3  0x008fe113 in mkSPNode (Conf=0x20dd3b8, low=0, high=1, level=2) at 
spell.c:1692
#4  0x008fde89 in mkSPNode (Conf=0x20dd3b8, low=0, high=4, level=1) at 
spell.c:1652
#5  0x008fde89 in mkSPNode (Conf=0x20dd3b8, low=0, high=9, level=0) at 
spell.c:1652
#6  0x008fe50b in NISortDictionary (Conf=0x20dd3b8) at spell.c:1785
#7  0x008f9e14 in dispell_init (fcinfo=0x7ffdda6abc90) at 
dict_ispell.c:89
#8  0x00a6210a in FunctionCall1Coll (flinfo=0x7ffdda6abcf0, 
collation=0, arg1=34478896) at fmgr.c:1140
#9  0x00a62c72 in OidFunctionCall1Coll (functionId=3731, collation=0, 
arg1=34478896) at fmgr.c:1418
#10 0x006c2dcb in verify_dictoptions (tmplId=3733, 
dictoptions=0x20e1b30) at tsearchcmds.c:402
#11 0x006c2f4c in DefineTSDictionary (names=0x20ba278, 
parameters=0x20ba458) at tsearchcmds.c:463
#12 0x008eb274 in ProcessUtilitySlow (pstate=0x20db518, pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80, 
   completionTag=0x7ffdda6ac540 "") at utility.c:1272
#13 0x008ea7e5 in standard_ProcessUtility (pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80, 
   completionTag=0x7ffdda6ac540 "") at utility.c:927

#14 0x008e991a in ProcessUtility (pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT 
SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, 
AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 
dest=0x20bac80, completionTag=0x7ffdda6ac540 "")
   at utility.c:360
#15 0x008e88e1 in PortalRunUtility (portal=0x2121368, pstmt=0x20bab88, 
isTopLevel=true, setHoldSnapshot=false, dest=0x20bac80, completionTag=0x7ffdda6ac540 
"") at pquery.c:1175
#16 0x008e8afe in PortalRunMulti (portal=0x2121368, isTopLevel=true, 
setHoldSnapshot=false, dest=0x20bac80, altdest=0x20bac80, completionTag=0x7ffdda6ac540 
"") at pquery.c:1321
#17 0x008e8032 in PortalRun (portal=0x2121368, count=9223372036854775807, 
isTopLevel=true, run_once=true, dest=0x20bac80, altdest=0x20bac80, 
completionTag=0x7ffdda6ac540 "") at pquery.c:796
#18 0x008e1f51 in exec_simple_query (query_string=0x20b97a8 "CREATE TEXT 
SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, 
AffFile=hunspell_sample_long);") at postgres.c:1215
#19 0x008e6243 in PostgresMain (argc=1, argv=0x20e54f8, dbname=0x20e5340 "test", 
username=0x20b53e8 "user") at postgres.c:4236
#20 0x0083c5e2 in BackendRun (port=0x20dd980) at postmaster.c:4437
#21 0x0083bdb3 in BackendStartup (port=0x20dd980) at postmaster.c:4128
#22 0x008381d7 in ServerLoop () at postmaster.c:1704
#23 0x00837a83 in PostmasterMain (argc=3, argv=0x20b3350) at 
postmaster.c:1377
#24 0x00759507 in main (argc=3, argv=0x20b3350) at main.c:228
(gdb) up
#1  0x008fdd1c in makeCompoundFlags (Conf=0x20dd3b8, affix=303) at 
spell.c:1608
1608return (getCompoundAffixFlagValue(Conf, str) & 
FF_COMPOUNDFLAGMASK);
(gdb) p *Conf
$1 = {maffixes = 16, naffixes = 10, Affix = 0x2181fd0, Suffix = 0x0, Prefix = 0x0, Dictionary = 0x0, AffixData = 0x20e1fa8, lenAffixData = 12, nAffixData =

Re: pgsql: Implement jsonpath .datetime() method

2019-10-12 Thread Alexander Korotkov
On Thu, Oct 3, 2019 at 4:48 PM Robert Haas  wrote:
> On Tue, Oct 1, 2019 at 1:41 PM Alexander Korotkov
>  wrote:
> > So, basically standard requires us to suppress any error happening in
> > filter expression.
>
> Sounds like the standard is dumb, then. :-)
>
> > But as I wrote before suppression of errors in
> > datetime comparison may lead to surprising results.  That happens in
> > rare corner cases, but still.  This makes uneasy choice between
> > consistent behavior and standard behavior.
>
> Yeah.

Proposed patch eliminates this dilemma in particular case.  It
provides correct cross-type comparison of datetime values even if one
of values overflows during cast.  In order to do this, I made cast
functions to report whether lower or upper boundary is overflowed.  We
know that overflowed value is lower (or upper) than any valid value
except infinity.

This patch also changes the way timestamp to timestamptz cast works.
Previously it did timestamp2tm() then tm2timestamp().  Instead, after
timestamp2tm() it calculates timezone offset and applies it to
original timestamp value.  I hope this is correct.  If so, besides
making overflow handling easier, this refactoring saves some CPU
cycles.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0002-Refactor-jsonpath-s-compareDatetime-2.patch
Description: Binary data


Re: pgsql: Implement jsonpath .datetime() method

2019-10-12 Thread Tom Lane
Alexander Korotkov  writes:
> This patch also changes the way timestamp to timestamptz cast works.
> Previously it did timestamp2tm() then tm2timestamp().  Instead, after
> timestamp2tm() it calculates timezone offset and applies it to
> original timestamp value.  I hope this is correct.

I'd wonder whether this gives the same answers near DST transitions,
where it's not real clear which offset applies.

Please *don't* wrap this sort of thing into an unrelated feature patch.

regards, tom lane




v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held

2019-10-12 Thread Justin Pryzby
I ran into this while trying to trigger the previously-reported segfault. 

CREATE TABLE t(i) AS SELECT * FROM generate_series(1,9);
CREATE INDEX ON t(i);

[pryzbyj@database ~]$ for i in `seq 1 9`; do PGOPTIONS='-cstatement_timeout=9' 
psql postgres --host /tmp --port 5678 -c "REINDEX INDEX CONCURRENTLY t_i_idx" ; 
done
ERROR:  canceling statement due to statement timeout
ERROR:  lock ShareUpdateExclusiveLock on object 14185/47287/0 is already held
[...]

Variations on this seem to leave the locks table (?) or something else in a
Real Bad state, such that I cannot truncate the table or drop it; or at least
commands are unreasonably delayed for minutes, on this otherwise-empty test
cluster.

Justin




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-12 Thread Dilip Kumar
On Thu, Oct 3, 2019 at 2:43 PM Amit Kapila  wrote:
>
> On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra  
> wrote:
>>
>> On Wed, Oct 02, 2019 at 04:27:30AM +0530, Amit Kapila wrote:
>> >On Tue, Oct 1, 2019 at 7:21 PM Tomas Vondra 
>> >wrote:
>> >
>> >> On Tue, Oct 01, 2019 at 06:55:52PM +0530, Amit Kapila wrote:
>> >> >
>> >> >On further testing, I found that the patch seems to have problems with
>> >> >toast.  Consider below scenario:
>> >> >Session-1
>> >> >Create table large_text(t1 text);
>> >> >INSERT INTO large_text
>> >> >SELECT (SELECT string_agg('x', ',')
>> >> >FROM generate_series(1, 100)) FROM generate_series(1, 1000);
>> >> >
>> >> >Session-2
>> >> >SELECT * FROM pg_create_logical_replication_slot('regression_slot',
>> >> >'test_decoding');
>> >> >SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
>> >> >*--kaboom*
>> >> >
>> >> >The second statement in Session-2 leads to a crash.
>> >> >
>> >>
>> >> OK, thanks for the report - will investigate.
>> >>
>> >
>> >It was an assertion failure in ReorderBufferCleanupTXN at below line:
>> >+ /* Check we're not mixing changes from different transactions. */
>> >+ Assert(change->txn == txn);
>> >
>>
>> Can you still reproduce this issue with the patch I sent on 28/9? I have
>> been unable to trigger the failure, and it seems pretty similar to the
>> failure you reported (and I fixed) on 28/9.
>
>
> Yes, it seems we need a similar change in ReorderBufferAddNewTupleCids.  I 
> think in session-2 you need to create replication slot before creating table 
> in session-1 to see this problem.
>
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -2196,6 +2196,7 @@ ReorderBufferAddNewTupleCids(ReorderBuffer *rb, 
> TransactionId xid,
> change->data.tuplecid.cmax = cmax;
> change->data.tuplecid.combocid = combocid;
> change->lsn = lsn;
> +   change->txn = txn;
> change->action = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
> dlist_push_tail(&txn->tuplecids, &change->node);
>
> Few more comments:
> ---
> 1.
> +static bool
> +check_logical_decoding_work_mem(int *newval, void **extra, GucSource source)
> +{
> + /*
> + * -1 indicates fallback.
> + *
> + * If we haven't yet changed the boot_val default of -1, just let it be.
> + * logical decoding will look to maintenance_work_mem instead.
> + */
> + if (*newval == -1)
> + return true;
> +
> + /*
> + * We clamp manually-set values to at least 64kB. The maintenance_work_mem
> + * uses a higher minimum value (1MB), so this is OK.
> + */
> + if (*newval < 64)
> + *newval = 64;
>
> I think this needs to be changed as now we don't rely on 
> maintenance_work_mem.  Another thing related to this is that I think the 
> default value for logical_decoding_work_mem still seems to be -1.  We need to 
> make it to 64MB.  I have seen this while debugging memory accounting changes. 
>  I think this is the reason why I was not seeing toast related changes being 
> serialized because, in that test, I haven't changed the default value of 
> logical_decoding_work_mem.
>
> 2.
> + /*
> + * We're going modify the size of the change, so to make sure the
> + * accounting is correct we'll make it look like we're removing the
> + * change now (with the old size), and then re-add it at the end.
> + */
>
>
> /going modify/going to modify/
>
> 3.
> + *
> + * While updating the existing change with detoasted tuple data, we need to
> + * update the memory accounting info, because the change size will differ.
> + * Otherwise the accounting may get out of sync, triggering serialization
> + * at unexpected times.
> + *
> + * We simply subtract size of the change before rejiggering the tuple, and
> + * then adding the new size. This makes it look like the change was removed
> + * and then added back, except it only tweaks the accounting info.
> + *
> + * In particular it can't trigger serialization, which would be pointless
> + * anyway as it happens during commit processing right before handing
> + * the change to the output plugin.
>   */
>  static void
>  ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
> @@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb, 
> ReorderBufferTXN *txn,
>   if (txn->toast_hash == NULL)
>   return;
>
> + /*
> + * We're going modify the size of the change, so to make sure the
> + * accounting is correct we'll make it look like we're removing the
> + * change now (with the old size), and then re-add it at the end.
> + */
> + ReorderBufferChangeMemoryUpdate(rb, change, false);
>
> It is not very clear why this change is required.  Basically, this is done at 
> commit time after which actually we shouldn't attempt to spill these changes. 
>  This is mentioned in comments as well, but it is not clear if that is the 
> case, then how and when accounting can create a problem.  If possible, can 
> you explain it with an example?
>
IIUC,