Re: [HACKERS] Block level parallel vacuum
Thanks Amit for patch. Crash is fixed by this patch. Thanks and Regards Mahendra Thalor On Sat, Oct 12, 2019, 09:03 Amit Kapila wrote: > On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh wrote: > > > > > > I did some analysis and found that we are trying to free some already > freed memory. Or we are freeing palloced memory in vac_update_relstats. > > for (i = 0; i < nindexes; i++) > > { > > if (stats[i] == NULL || stats[i]->estimated_count) > > continue; > > > > /* Update index statistics */ > > vac_update_relstats(Irel[i], > > stats[i]->num_pages, > > stats[i]->num_index_tuples, > > 0, > > false, > > InvalidTransactionId, > > InvalidMultiXactId, > > false); > > pfree(stats[i]); > > } > > > > As my table have 2 indexes, so we have to free both stats. When i = 0, > it is freeing propery but when i = 1, then vac_update_relstats is freeing > memory. > >> > >> (gdb) p *stats[i] > >> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, > num_index_tuples = 3, tuples_removed = 3, pages_deleted = 102, > pages_free = 0} > >> (gdb) p *stats[i] > >> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, > num_index_tuples = 0, tuples_removed = 0, pages_deleted = 0, pages_free = 0} > >> (gdb) > > > > > > From above data, it looks like, somewhere inside vac_update_relstats, we > are freeing all palloced memory. I don't know, why is it. > > > > I don't think the problem is in vac_update_relstats as we are not even > passing stats to it, so it won't be able to free it. I think the real > problem is in the way we copy the stats from shared memory to local > memory in the function end_parallel_vacuum(). Basically, it allocates > the memory for all the index stats together and then in function > update_index_statistics, it is trying to free memory of individual > array elements, that won't work. I have tried to fix the allocation > in end_parallel_vacuum, see if this fixes the problem for you. You > need to apply the attached patch atop > v28-0001-Add-parallel-option-to-VACUUM-command posted above by > Sawada-San. > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com >
Re: Transparent Data Encryption (TDE) and encrypted files
On Wed, Oct 9, 2019 at 3:57 PM Antonin Houska wrote: > > Moon, Insung wrote: > > > On Wed, Oct 9, 2019 at 2:42 PM Antonin Houska wrote: > > > > > > Moon, Insung wrote: > > > > > > > I also tried to generate IV using PID (32bit) + tempCounter (64bit) at > > > > first, but in the worst-case PID and tempCounter are used in the same > > > > values. > > > > Therefore, the uniqueness of the encryption key was considered without > > > > considering the uniqueness of the IV value. > > > > > > If you consider 64bit counter insufficient (here it seems that tempCounter > > > counts the 1GB segments), then we can't even use LSN as the IV for > > > relation > > > pages. > > > > The worst-case here is not a lack of tempCounter, but a problem that > > occurs when PID is reused after a certain period. > > Of course, it is very unlikely to be a problem because it is a > > temporary file, but since the file name can know the PID and > > tempFileCounter, if you accumulate some data, the same key and the > > same IV will be used to encrypt other data. So I thought there could > > be a problem. > > ok > > > > > First, it generates a hash value randomly for the file, and uses the > > > > hash value and KEK (or MDEK) to derive and use the key with > > > > HMAC-SHA256. > > > > In this case, there is no need to store the encryption key separately > > > > if it is not necessary to keep it in a separate IV file or memory. > > > > (IV is a hash value of 64 bits and a counter of 32 bits.) > > > > > > You seem to miss the fact that user of buffile.c can seek in the file and > > > rewrite arbitrary part. Thus you'd have to generate a new key for the part > > > being changed. > > > > That's right. I wanted to ask this too. > > Is it possible to overwrite the data already written in the actual > > buffile.c? > > Such a problem seems to become a problem when BufFileWRite function is > > called, and BufFileSeek function is called, and BufFileRead is called. > > In other words, the file is not written in units of 8kb, but the file > > is changed in the pos, and some data is read in another pos. > > v04-0011-Make-buffile.c-aware-of-encryption.patch in [1] changes buffile.c so > that data is read and written in 8kB blocks if encryption is enabled. In order > to record the IV per block, the computation of the buffer position within the > file would have to be adjusted somehow. I can check it soon but not in the > next few days. As far as I read the patch the nonce consists of pid, counter and block number where the counter is the number incremented each time of creating a BufFile. Therefore it could happen to rewrite the buffer data with the same nonce and key, which is bad. So I think we can have the rewrite counter of the block in the each 8k block header. And then the nonce consists of block number within a segment file (4 bytes), temp file counter (8 bytes), rewrite counter (2 bytes) and CTR mode counter (2 bytes). And then if we have a single-use encryption key per backend processes I guess we can guarantee the uniqueness of the combination of key and nonce. Regards, -- Masahiko Sawada
Re: [HACKERS] Block level parallel vacuum
On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada wrote: > > On Sat, Oct 12, 2019 at 12:33 PM Amit Kapila wrote: > > > > On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh wrote: > > > > > Thank you for reviewing and creating the patch! > > I think the patch fixes this issue correctly. Attached the updated > version patch. > I see a much bigger problem with the way this patch collects the index stats in shared memory. IIUC, it allocates the shared memory (DSM) for all the index stats, in the same way, considering its size as IndexBulkDeleteResult. For the first time, it gets the stats from local memory as returned by ambulkdelete/amvacuumcleanup call and then copies it in shared memory space. There onwards, it always updates the stats in shared memory by pointing each index stats to that memory. In this scheme, you overlooked the point that an index AM could choose to return a larger structure of which IndexBulkDeleteResult is just the first field. This generally provides a way for ambulkdelete to communicate additional private data to amvacuumcleanup. We use this idea in the gist index, see how gistbulkdelete and gistvacuumcleanup works. The current design won't work for such cases. One idea is to change the design such that each index method provides a method to estimate/allocate the shared memory required for stats of ambulkdelete/amvacuumscan and then later we also need to use index method-specific function which copies the stats from local memory to shared memory. I think this needs further investigation. I have also made a few other changes in the attached delta patch. The main point that fixed by attached patch is that even if we don't allow a parallel vacuum on temporary tables, the analyze should be able to work if the user has asked for it. I have changed an error message and few other cosmetic changes related to comments. Kindly include this in the next version if you don't find any problem with the changes. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_comments_amit_1.patch Description: Binary data
Re: maintenance_work_mem used by Vacuum
On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada wrote: > > On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila wrote: > > > > That's right, but OTOH, if the user specifies gin_pending_list_limit > > as an option during Create Index with a value greater than GUC > > gin_pending_list_limit, then also we will face the same problem. It > > seems to me that we haven't thought enough on memory usage during Gin > > pending list cleanup and I don't want to independently make a decision > > to change it. So unless the original author/committer or some other > > people who have worked in this area share their opinion, we can leave > > it as it is and make a parallel vacuum patch adapt to it. > > Yeah I totally agreed. > > Apart from the GIN problem can we discuss whether need to change the > current memory usage policy of parallel utility command described in > the doc? We cannot control the memory usage in index AM after all but > we need to generically consider how much memory is used during > parallel vacuum. > Do you mean to say change the docs for a parallel vacuum patch in this regard? If so, I think we might want to do something for maintenance_work_mem for parallel vacuum as described in one of the emails above [1] and then change the docs accordingly. [1] - https://www.postgresql.org/message-id/CAA4eK1JhpNsTiHj%2BJOy3N8uCGyTBMH8xDhUEtBw8ZeCAPRGp6Q%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: use of the term "verifier" with SCRAM
On 2019-10-10 10:03, Michael Paquier wrote: > On Thu, Oct 10, 2019 at 09:08:37AM +0200, Peter Eisentraut wrote: >> Here is my proposed patch to adjust this. > > Looks fine to me reading through. I think that you are right to not > change the descriptions in build_server_final_message(), as that's > described similarly in RFC 5802. committed > By renaming scram_build_verifier() > to scram_build_secret() you are going to break one of my in-house > extensions. I am using it to register for a user SCRAM veri^D^D^D^D > secrets with custom iteration and salt length :) OK, that should be easy to work around with an #ifdef or two. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: fairywren failures
On 2019-10-03 16:21, Andrew Dunstan wrote: > My new msys2 animal fairywren Could you please check how this animal is labeled? AFAICT, this is not an msys2 build but a mingw build (x86_64-w64-mingw32). > has had 3 recent failures when checking > pg_upgrade. The failures have been while running the regression tests, > specifically the interval test, and they all look like this: I've also seen this randomly, but only under 64-bit mingw, never 32-bit mingw. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: adding partitioned tables to publications
Hi, On 07/10/2019 02:55, Amit Langote wrote: One cannot currently add partitioned tables to a publication. create table p (a int, b int) partition by hash (a); create table p1 partition of p for values with (modulus 3, remainder 0); create table p2 partition of p for values with (modulus 3, remainder 1); create table p3 partition of p for values with (modulus 3, remainder 2); create publication publish_p for table p; ERROR: "p" is a partitioned table DETAIL: Adding partitioned tables to publications is not supported. HINT: You can add the table partitions individually. One can do this instead: create publication publish_p1 for table p1; create publication publish_p2 for table p2; create publication publish_p3 for table p3; Or just create publication publish_p for table p1, p2, p3; but maybe that's too much code to maintain for users. I propose that we make this command: create publication publish_p for table p; +1 automatically add all the partitions to the publication. Also, any future partitions should also be automatically added to the publication. So, publishing a partitioned table automatically publishes all of its existing and future partitions. Attached patch implements that. What doesn't change with this patch is that the partitions on the subscription side still have to match one-to-one with the partitions on the publication side, because the changes are still replicated as being made to the individual partitions, not as the changes to the root partitioned table. It might be useful to implement that functionality on the publication side, because it allows users to define the replication target any way they need to, but this patch doesn't implement that. Yeah for that to work subscription would need to also need to be able to write to partitioned tables, so it needs both sides to add support for this. I think if we do both what you did and the transparent handling of root only, we'll need new keyword to differentiate the two. It might make sense to think about if we want your way to need an extra keyword or the transparent one will need it. One issue that I see reading the patch is following set of commands: CREATE TABLE foo ...; CREATE PUBLICATION mypub FOR TABLE foo; CREATE TABLE bar ...; ALTER PUBLICATION mypub ADD TABLE bar; ALTER TABLE foo ATTACH PARTITION bar ...; ALTER TABLE foo DETACH PARTITION bar ...; This will end up with bar not being in any publication even though it was explicitly added. That might be acceptable caveat but it at least should be clearly documented (IMHO with warning). -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Re: stress test for parallel workers
I've now also been able to reproduce the "infinite_recurse" segfault on wobbegong's host (or, since I was using a gcc build, I guess I should say vulpes' host). The first-order result is that it's the same problem with the kernel not giving us as much stack space as we expect: there's only 1179648 bytes in the stack segment in the core dump, though we should certainly have been allowed at least 8MB. The next interesting thing is that looking closely at the identified spot of the SIGSEGV, there's nothing there that should be touching the stack at all: (gdb) x/4i $pc => 0x10201df0 :ld r9,0(r30) 0x10201df4 :ld r8,128(r30) 0x10201df8 :ld r10,152(r30) 0x10201dfc :ld r9,0(r9) (r30 is not pointing at the stack, but at a valid heap location.) This code is the start of the switch case at scan.l:1064, so the most recent successfully-executed instructions were the switch jump, and they don't involve the stack either. The reported sp, (gdb) i reg sp sp 0x7fffe6940890 140737061849232 is a good 2192 bytes above the bottom of the allocated stack space, which is 0x7fffe694 according to gdb. So we really ought to have plenty of margin here. What's going on? What I suspect, given the difficulty of reproducing this, is that what really happened is that the kernel tried to deliver a SIGUSR1 signal to us just at this point. The kernel source code that Thomas pointed to comments that * The kernel signal delivery code writes up to about 1.5kB * below the stack pointer (r1) before decrementing it. There's more than 1.5kB available below sp, but what if that comment is a lie? In particular, I'm wondering if that number dates to PPC32 and needs to be doubled, or nearly so, to describe PPC64 reality. If that were the case, then the signal code would not have been able to fit its requirement, and would probably have come here to ask for more stack space, and the hard-wired 2048 test a little further down would have decided that that was a wild stack access. In short, my current belief is that Linux PPC64 fails when trying to deliver a signal if there's right around 2KB of stack remaining, even though it should be able to expand the stack and press on. It may well be that the reason is just that this heuristic in bad_stack_expansion() is out of date. Or there might be a similarly bogus value somewhere in the signal-delivery code. regards, tom lane
Re: v12.0: ERROR: could not find pathkey item to sort
On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote: > Could you provide a self-contained test case please? SET enable_partitionwise_aggregate = 'on'; SET enable_partitionwise_join = 'on'; SET max_parallel_workers_per_gather=0; -- maybe not important but explain(settings) suggests I should include them for completeness: SET effective_io_concurrency = '0'; SET work_mem = '512MB'; SET jit = 'off'; CREATE TABLE s(site_id int, site_location text, site_office text); INSERT INTO s SELECT generate_series(1,99),'',''; CREATE TABLE t(start_time timestamp, site_id text, i int)PARTITION BY RANGE(start_time); CREATE TABLE t1 PARTITION OF t FOR VALUES FROM ('2019-10-01')TO('2019-10-02'); INSERT INTO t1 SELECT a,b FROM generate_series( '2019-10-01'::timestamp, '2019-10-01 23:45'::timestamp, '15 minutes')a, generate_series(1,99)b, generate_series(1,99)c; CREATE TABLE t2 PARTITION OF t FOR VALUES FROM ('2019-10-02')TO('2019-10-03'); INSERT INTO t2 SELECT a,b FROM generate_series( '2019-10-02'::timestamp, '2019-10-02 23:45'::timestamp, '15 minutes')a, generate_series(1,99)b, generate_series(1,99)c; ANALYZE s,t; explain SELECT s.* FROM (SELECT start_time, site_id::int FROM t t1 FULL JOIN t t2 USING(start_time,site_id) WHERE (start_time>='2019-10-01' AND start_time<'2019-10-01 01:00') GROUP BY 1,2) AS data JOIN s ON (s.site_location='' OR s.site_office::int=data.site_id) Justin
Re: [HACKERS] Deadlock in XLogInsert at AIX
On Wed, Oct 09, 2019 at 01:15:29PM -0400, Tom Lane wrote: > Noah Misch writes: > > On Mon, Oct 07, 2019 at 03:06:35PM -0400, Tom Lane wrote: > >> This still fails on Apple's compilers. ... > > > Thanks for testing. That error boils down to "need to use some other > > register". The second operand of addi is one of the ppc instruction > > operands > > that can hold a constant zero or a register number[1], so the proper > > constraint is "b". I've made it so and added a comment. > > Ah-hah. This version does compile and pass check-world for me. > > > I should probably > > update s_lock.h, too, in a later patch. I don't know how it has > > mostly-avoided this failure mode, but its choice of constraint could explain > > https://postgr.es/m/flat/36E70B06-2C5C-11D8-A096-0005024EF27F%40ifrance.com > > Indeed. It's a bit astonishing that more people haven't hit that. > This should be back-patched. I may as well do that first, so there's no time when s_lock.h disagrees with arch-ppc.h about the constraint to use. I'm attaching that patch, too. > * I still think that the added configure test is a waste of build cycles. > It'd be sufficient to test "#ifdef HAVE__BUILTIN_CONSTANT_P" where you > are testing HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P, because our previous > buildfarm go-round with this showed that all supported compilers > interpret "i" this way. xlc does not interpret "i" that way: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2019-09-14%2016%3A42%3A32&stg=config > * I really dislike building the asm calls with macros as you've done > here. The macros violate project style, and are not remotely general- > purpose, because they have hard-wired references to variables that are > not in their argument lists. While that could be fixed with more > arguments, I don't think that the approach is readable or maintainable > --- it's impossible for example to understand the register constraints > without looking simultaneously at the calls and the macro definition. > And, as we've seen in this "b" issue, the interactions between the chosen > instruction types and the constraints are subtle enough to make me wonder > whether you won't need even more arguments to allow some of the other > constraints to be variable. I think it'd be far better just to write out > the asm in-line and accept the not-very-large amount of code duplication > you'd get. For a macro local to one C file, I think readability is the relevant metric. In particular, it would be wrong to add arguments to make these more like header file macros. I think the macros make the code somewhat more readable, and you think they make the code less readable. I have removed the macros. > * src/tools/pginclude/headerscheck needs the same adjustment as you > made in cpluspluscheck. Done. diff --git a/configure b/configure index 74627a7..02a905b 100755 --- a/configure +++ b/configure @@ -14518,6 +14518,46 @@ $as_echo "$pgac_cv_have_ppc_mutex_hint" >&6; } $as_echo "#define HAVE_PPC_LWARX_MUTEX_HINT 1" >>confdefs.h fi +# Check if compiler accepts "i"(x) when __builtin_constant_p(x). +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether __builtin_constant_p(x) implies \"i\"(x) acceptance" >&5 +$as_echo_n "checking whether __builtin_constant_p(x) implies \"i\"(x) acceptance... " >&6; } +if ${pgac_cv_have_i_constraint__builtin_constant_p+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +static inline int + addi(int ra, int si) + { + int res = 0; + if (__builtin_constant_p(si)) + __asm__ __volatile__( + " addi %0,%1,%2\n" : "=r"(res) : "b"(ra), "i"(si)); + return res; + } + int test_adds(int x) { return addi(3, x) + addi(x, 5); } +int +main () +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + pgac_cv_have_i_constraint__builtin_constant_p=yes +else + pgac_cv_have_i_constraint__builtin_constant_p=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_have_i_constraint__builtin_constant_p" >&5 +$as_echo "$pgac_cv_have_i_constraint__builtin_constant_p" >&6; } +if test x"$pgac_cv_have_i_constraint__builtin_constant_p" = xyes ; then + +$as_echo "#define HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P 1" >>confdefs.h + +fi ;; esac diff --git a/configure.in b/configure.in index 0d16c1a..88d3a59 100644 --- a/configure.in +++ b/configure.in @@ -1539,6 +1539,26 @@ case $host_cpu in if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then AC_DEFINE(HAVE_PPC_LWARX_MUTEX_HINT, 1, [Define to 1 if the assembler supports PPC's LWARX mutex hint bit.]) fi +# Check if compiler accepts "i"(x) when __builtin_constant_p(x). +AC_CACHE_CHECK([whether __builtin_constant_p(x) implies "i"(x) acceptance], + [pgac_cv_have_i_constraint__
Re: stress test for parallel workers
I wrote: > In short, my current belief is that Linux PPC64 fails when trying > to deliver a signal if there's right around 2KB of stack remaining, > even though it should be able to expand the stack and press on. I figured I should try to remove some variables from the equation by demonstrating this claim without involving Postgres. The attached test program eats some stack space and then waits to be sent SIGUSR1. For some values of "some stack space", it dumps core: [tgl@postgresql-fedora ~]$ gcc -g -Wall -O1 stacktest.c [tgl@postgresql-fedora ~]$ ./a.out 124 & [1] 11796 [tgl@postgresql-fedora ~]$ kill -USR1 11796 [tgl@postgresql-fedora ~]$ signal delivered, stack base 0x7fffdc16 top 0x7fffdc031420 (1240032 used) [1]+ Done./a.out 124 [tgl@postgresql-fedora ~]$ ./a.out 1242000 & [1] 11797 [tgl@postgresql-fedora ~]$ kill -USR1 11797 [tgl@postgresql-fedora ~]$ [1]+ Segmentation fault (core dumped) ./a.out 1242000 [tgl@postgresql-fedora ~]$ uname -a Linux postgresql-fedora.novalocal 4.18.19-100.fc27.ppc64le #1 SMP Wed Nov 14 21:53:32 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux I don't think any further proof is required that this is a kernel bug. Where would be a good place to file it? regards, tom lane #include #include #include #include static char *stack_base_ptr; static char *stack_top_ptr; static volatile sig_atomic_t sig_occurred = 0; static void sigusr1_handler(int signal_arg) { sig_occurred = 1; } static void consume_stack(long stackdepth) { char stack_cur; if ((stack_base_ptr - &stack_cur) < stackdepth) consume_stack(stackdepth); else { stack_top_ptr = &stack_cur; while (!sig_occurred) ; } } int main(int argc, char **argv) { long stackdepth = atol(argv[1]); char stack_base; struct sigaction act; act.sa_handler = sigusr1_handler; sigemptyset(&act.sa_mask); act.sa_flags = 0; if (sigaction(SIGUSR1, &act, NULL) < 0) perror("sigaction"); stack_base_ptr = (char *) (((size_t) &stack_base + 65535) & ~65535UL); consume_stack(stackdepth); if (sig_occurred) printf("signal delivered, stack base %p top %p (%zu used)\n", stack_base_ptr, stack_top_ptr, stack_base_ptr - stack_top_ptr); else printf("no signal delivered\n"); return 0; }
Re: v12.0: ERROR: could not find pathkey item to sort
Justin Pryzby writes: > On Fri, Oct 11, 2019 at 10:48:37AM -0400, Tom Lane wrote: >> Could you provide a self-contained test case please? > [ test case ] Yup, fails for me too. Will look shortly. regards, tom lane
Re: stress test for parallel workers
On Sun, Oct 13, 2019 at 1:06 PM Tom Lane wrote: > I don't think any further proof is required that this is > a kernel bug. Where would be a good place to file it? linuxppc-...@lists.ozlabs.org might be the right place. https://lists.ozlabs.org/listinfo/linuxppc-dev
CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+
Hi, over in pgsql-bugs [1] we got a report about CREATE TEXT SEARCH DICTIONARY causing segfaults on 12.0. Simply running CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell, DictFile=hunspell_sample_num, AffFile=hunspell_sample_long); does trigger a crash, 100% of the time. The crash was reported on 12.0, but it's in fact present since 9.6. On 9.5 the example does not work, because that version does not (a) include the hunspell dictionaries used in the example, and (b) it does not support long flags. So even after copying the dictionaries and tweaking them a bit it still passes without a crash. Looking at the commit history of spell.c, there seems to be a bunch of commits in 2016 (e.g. f4ceed6ceba3) touching exactly this part of the code (hunspell), and it also correlates quite nicely with the affected branches (9.6+). So my best guess is it's a bug in those changes. A complete backtrace looks like this: Program received signal SIGSEGV, Segmentation fault. 0x008fca10 in getCompoundAffixFlagValue (Conf=0x20dd3b8, s=0x7f7f7f7f7f7f7f7f ) at spell.c:1126 1126while (*flagcur) (gdb) bt #0 0x008fca10 in getCompoundAffixFlagValue (Conf=0x20dd3b8, s=0x7f7f7f7f7f7f7f7f ) at spell.c:1126 #1 0x008fdd1c in makeCompoundFlags (Conf=0x20dd3b8, affix=303) at spell.c:1608 #2 0x008fe04e in mkSPNode (Conf=0x20dd3b8, low=0, high=1, level=3) at spell.c:1680 #3 0x008fe113 in mkSPNode (Conf=0x20dd3b8, low=0, high=1, level=2) at spell.c:1692 #4 0x008fde89 in mkSPNode (Conf=0x20dd3b8, low=0, high=4, level=1) at spell.c:1652 #5 0x008fde89 in mkSPNode (Conf=0x20dd3b8, low=0, high=9, level=0) at spell.c:1652 #6 0x008fe50b in NISortDictionary (Conf=0x20dd3b8) at spell.c:1785 #7 0x008f9e14 in dispell_init (fcinfo=0x7ffdda6abc90) at dict_ispell.c:89 #8 0x00a6210a in FunctionCall1Coll (flinfo=0x7ffdda6abcf0, collation=0, arg1=34478896) at fmgr.c:1140 #9 0x00a62c72 in OidFunctionCall1Coll (functionId=3731, collation=0, arg1=34478896) at fmgr.c:1418 #10 0x006c2dcb in verify_dictoptions (tmplId=3733, dictoptions=0x20e1b30) at tsearchcmds.c:402 #11 0x006c2f4c in DefineTSDictionary (names=0x20ba278, parameters=0x20ba458) at tsearchcmds.c:463 #12 0x008eb274 in ProcessUtilitySlow (pstate=0x20db518, pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80, completionTag=0x7ffdda6ac540 "") at utility.c:1272 #13 0x008ea7e5 in standard_ProcessUtility (pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80, completionTag=0x7ffdda6ac540 "") at utility.c:927 #14 0x008e991a in ProcessUtility (pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80, completionTag=0x7ffdda6ac540 "") at utility.c:360 #15 0x008e88e1 in PortalRunUtility (portal=0x2121368, pstmt=0x20bab88, isTopLevel=true, setHoldSnapshot=false, dest=0x20bac80, completionTag=0x7ffdda6ac540 "") at pquery.c:1175 #16 0x008e8afe in PortalRunMulti (portal=0x2121368, isTopLevel=true, setHoldSnapshot=false, dest=0x20bac80, altdest=0x20bac80, completionTag=0x7ffdda6ac540 "") at pquery.c:1321 #17 0x008e8032 in PortalRun (portal=0x2121368, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x20bac80, altdest=0x20bac80, completionTag=0x7ffdda6ac540 "") at pquery.c:796 #18 0x008e1f51 in exec_simple_query (query_string=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);") at postgres.c:1215 #19 0x008e6243 in PostgresMain (argc=1, argv=0x20e54f8, dbname=0x20e5340 "test", username=0x20b53e8 "user") at postgres.c:4236 #20 0x0083c5e2 in BackendRun (port=0x20dd980) at postmaster.c:4437 #21 0x0083bdb3 in BackendStartup (port=0x20dd980) at postmaster.c:4128 #22 0x008381d7 in ServerLoop () at postmaster.c:1704 #23 0x00837a83 in PostmasterMain (argc=3, argv=0x20b3350) at postmaster.c:1377 #24 0x00759507 in main (argc=3, argv=0x20b3350) at main.c:228 (gdb) up #1 0x008fdd1c in makeCompoundFlags (Conf=0x20dd3b8, affix=303) at spell.c:1608 1608return (getCompoundAffixFlagValue(Conf, str) & FF_COMPOUNDFLAGMASK); (gdb) p *Conf $1 = {maffixes = 16, naffixes = 10, Affix = 0x2181fd0, Suffix = 0x0, Prefix = 0x0, Dictionary = 0x0, AffixData = 0x20e1fa8, lenAffixData = 12, nAffixData =
Re: pgsql: Implement jsonpath .datetime() method
On Thu, Oct 3, 2019 at 4:48 PM Robert Haas wrote: > On Tue, Oct 1, 2019 at 1:41 PM Alexander Korotkov > wrote: > > So, basically standard requires us to suppress any error happening in > > filter expression. > > Sounds like the standard is dumb, then. :-) > > > But as I wrote before suppression of errors in > > datetime comparison may lead to surprising results. That happens in > > rare corner cases, but still. This makes uneasy choice between > > consistent behavior and standard behavior. > > Yeah. Proposed patch eliminates this dilemma in particular case. It provides correct cross-type comparison of datetime values even if one of values overflows during cast. In order to do this, I made cast functions to report whether lower or upper boundary is overflowed. We know that overflowed value is lower (or upper) than any valid value except infinity. This patch also changes the way timestamp to timestamptz cast works. Previously it did timestamp2tm() then tm2timestamp(). Instead, after timestamp2tm() it calculates timezone offset and applies it to original timestamp value. I hope this is correct. If so, besides making overflow handling easier, this refactoring saves some CPU cycles. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0002-Refactor-jsonpath-s-compareDatetime-2.patch Description: Binary data
Re: pgsql: Implement jsonpath .datetime() method
Alexander Korotkov writes: > This patch also changes the way timestamp to timestamptz cast works. > Previously it did timestamp2tm() then tm2timestamp(). Instead, after > timestamp2tm() it calculates timezone offset and applies it to > original timestamp value. I hope this is correct. I'd wonder whether this gives the same answers near DST transitions, where it's not real clear which offset applies. Please *don't* wrap this sort of thing into an unrelated feature patch. regards, tom lane
v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held
I ran into this while trying to trigger the previously-reported segfault. CREATE TABLE t(i) AS SELECT * FROM generate_series(1,9); CREATE INDEX ON t(i); [pryzbyj@database ~]$ for i in `seq 1 9`; do PGOPTIONS='-cstatement_timeout=9' psql postgres --host /tmp --port 5678 -c "REINDEX INDEX CONCURRENTLY t_i_idx" ; done ERROR: canceling statement due to statement timeout ERROR: lock ShareUpdateExclusiveLock on object 14185/47287/0 is already held [...] Variations on this seem to leave the locks table (?) or something else in a Real Bad state, such that I cannot truncate the table or drop it; or at least commands are unreasonably delayed for minutes, on this otherwise-empty test cluster. Justin
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Thu, Oct 3, 2019 at 2:43 PM Amit Kapila wrote: > > On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra > wrote: >> >> On Wed, Oct 02, 2019 at 04:27:30AM +0530, Amit Kapila wrote: >> >On Tue, Oct 1, 2019 at 7:21 PM Tomas Vondra >> >wrote: >> > >> >> On Tue, Oct 01, 2019 at 06:55:52PM +0530, Amit Kapila wrote: >> >> > >> >> >On further testing, I found that the patch seems to have problems with >> >> >toast. Consider below scenario: >> >> >Session-1 >> >> >Create table large_text(t1 text); >> >> >INSERT INTO large_text >> >> >SELECT (SELECT string_agg('x', ',') >> >> >FROM generate_series(1, 100)) FROM generate_series(1, 1000); >> >> > >> >> >Session-2 >> >> >SELECT * FROM pg_create_logical_replication_slot('regression_slot', >> >> >'test_decoding'); >> >> >SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL); >> >> >*--kaboom* >> >> > >> >> >The second statement in Session-2 leads to a crash. >> >> > >> >> >> >> OK, thanks for the report - will investigate. >> >> >> > >> >It was an assertion failure in ReorderBufferCleanupTXN at below line: >> >+ /* Check we're not mixing changes from different transactions. */ >> >+ Assert(change->txn == txn); >> > >> >> Can you still reproduce this issue with the patch I sent on 28/9? I have >> been unable to trigger the failure, and it seems pretty similar to the >> failure you reported (and I fixed) on 28/9. > > > Yes, it seems we need a similar change in ReorderBufferAddNewTupleCids. I > think in session-2 you need to create replication slot before creating table > in session-1 to see this problem. > > --- a/src/backend/replication/logical/reorderbuffer.c > +++ b/src/backend/replication/logical/reorderbuffer.c > @@ -2196,6 +2196,7 @@ ReorderBufferAddNewTupleCids(ReorderBuffer *rb, > TransactionId xid, > change->data.tuplecid.cmax = cmax; > change->data.tuplecid.combocid = combocid; > change->lsn = lsn; > + change->txn = txn; > change->action = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID; > dlist_push_tail(&txn->tuplecids, &change->node); > > Few more comments: > --- > 1. > +static bool > +check_logical_decoding_work_mem(int *newval, void **extra, GucSource source) > +{ > + /* > + * -1 indicates fallback. > + * > + * If we haven't yet changed the boot_val default of -1, just let it be. > + * logical decoding will look to maintenance_work_mem instead. > + */ > + if (*newval == -1) > + return true; > + > + /* > + * We clamp manually-set values to at least 64kB. The maintenance_work_mem > + * uses a higher minimum value (1MB), so this is OK. > + */ > + if (*newval < 64) > + *newval = 64; > > I think this needs to be changed as now we don't rely on > maintenance_work_mem. Another thing related to this is that I think the > default value for logical_decoding_work_mem still seems to be -1. We need to > make it to 64MB. I have seen this while debugging memory accounting changes. > I think this is the reason why I was not seeing toast related changes being > serialized because, in that test, I haven't changed the default value of > logical_decoding_work_mem. > > 2. > + /* > + * We're going modify the size of the change, so to make sure the > + * accounting is correct we'll make it look like we're removing the > + * change now (with the old size), and then re-add it at the end. > + */ > > > /going modify/going to modify/ > > 3. > + * > + * While updating the existing change with detoasted tuple data, we need to > + * update the memory accounting info, because the change size will differ. > + * Otherwise the accounting may get out of sync, triggering serialization > + * at unexpected times. > + * > + * We simply subtract size of the change before rejiggering the tuple, and > + * then adding the new size. This makes it look like the change was removed > + * and then added back, except it only tweaks the accounting info. > + * > + * In particular it can't trigger serialization, which would be pointless > + * anyway as it happens during commit processing right before handing > + * the change to the output plugin. > */ > static void > ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn, > @@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb, > ReorderBufferTXN *txn, > if (txn->toast_hash == NULL) > return; > > + /* > + * We're going modify the size of the change, so to make sure the > + * accounting is correct we'll make it look like we're removing the > + * change now (with the old size), and then re-add it at the end. > + */ > + ReorderBufferChangeMemoryUpdate(rb, change, false); > > It is not very clear why this change is required. Basically, this is done at > commit time after which actually we shouldn't attempt to spill these changes. > This is mentioned in comments as well, but it is not clear if that is the > case, then how and when accounting can create a problem. If possible, can > you explain it with an example? > IIUC,