Re: pg11.1: dsa_area could not attach to segment

2019-02-14 Thread Thomas Munro
On Fri, Feb 15, 2019 at 2:31 AM Sergei Kornilov wrote: > I can not reproduce bug after 30min test long. (without patch bug was after > minute-two) Thank you Justin and Sergei for all your help reproducing and testing this. Fix pushed to all supported releases. It's lightly refactored from the

Re: pg11.1: dsa_area could not attach to segment

2019-02-14 Thread Thomas Munro
On Fri, Feb 15, 2019 at 5:36 AM Sergei Kornilov wrote: > > Do you think that plausibly explains and resolves symptoms of bug#15585, > > too? > > I think yes. Bug#15585 raised only after "dsa_area could not attach to > segment" in different parallel worker. Leader stuck because waiting all > par

Re: pg11.1: dsa_area could not attach to segment

2019-02-14 Thread Sergei Kornilov
Hi > Do you think that plausibly explains and resolves symptoms of bug#15585, too? I think yes. Bug#15585 raised only after "dsa_area could not attach to segment" in different parallel worker. Leader stuck because waiting all parallel workers, but one worker has unexpected recursion in dsm_back

Re: pg11.1: dsa_area could not attach to segment

2019-02-14 Thread Justin Pryzby
On Fri, Feb 15, 2019 at 01:12:35AM +1300, Thomas Munro wrote: > The problem is that a DSM handle (ie a random number) can be reused > for a new segment immediately after the shared memory object has been > destroyed but before the DSM slot has been released. Now two DSM > slots have the same handl

Re: pg11.1: dsa_area could not attach to segment

2019-02-14 Thread Sergei Kornilov
Hi! Great work, thank you! I can not reproduce bug after 30min test long. (without patch bug was after minute-two) regards Sergei

Re: pg11.1: dsa_area could not attach to segment

2019-02-14 Thread Thomas Munro
On Tue, Feb 12, 2019 at 10:15 PM Sergei Kornilov wrote: > I still have error with parallel_leader_participation = off. Justin very kindly set up a virtual machine similar to the one where he'd seen the problem so I could experiment with it. Eventually I also managed to reproduce it locally, and

Re: pg11.1: dsa_area could not attach to segment

2019-02-12 Thread Sergei Kornilov
Hi > I think this is tentatively confirmed..I ran 20 loops for over 90 minutes with > no crash when parallel_leader_participation=off. > > On enabling parallel_leader_participation, crash within 10min. > > Sergei, could you confirm ? I still have error with parallel_leader_participation = off. On

Re: pg11.1: dsa_area could not attach to segment

2019-02-11 Thread Thomas Munro
On Tue, Feb 12, 2019 at 4:27 PM Thomas Munro wrote: > On Tue, Feb 12, 2019 at 4:01 PM Justin Pryzby wrote: > > On Mon, Feb 11, 2019 at 08:43:14PM -0600, Justin Pryzby wrote: > > > I have a suspicion that this doesn't happen if > > > parallel_leader_participation=off. > > > > I think this is tenta

Re: pg11.1: dsa_area could not attach to segment

2019-02-11 Thread Thomas Munro
On Tue, Feb 12, 2019 at 4:01 PM Justin Pryzby wrote: > On Mon, Feb 11, 2019 at 08:43:14PM -0600, Justin Pryzby wrote: > > I have a suspicion that this doesn't happen if > > parallel_leader_participation=off. > > I think this is tentatively confirmed..I ran 20 loops for over 90 minutes with > no cr

Re: pg11.1: dsa_area could not attach to segment

2019-02-11 Thread Justin Pryzby
On Mon, Feb 11, 2019 at 08:43:14PM -0600, Justin Pryzby wrote: > I have a suspicion that this doesn't happen if > parallel_leader_participation=off. I think this is tentatively confirmed..I ran 20 loops for over 90 minutes with no crash when parallel_leader_participation=off. On enabling parallel

Re: pg11.1: dsa_area could not attach to segment

2019-02-11 Thread Thomas Munro
On Tue, Feb 12, 2019 at 1:14 PM Justin Pryzby wrote: > On Tue, Feb 12, 2019 at 10:57:51AM +1100, Thomas Munro wrote: > > > On current REL_11_STABLE branch with PANIC level i see this backtrace for > > > failed parallel process: > > > > > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/

Re: pg11.1: dsa_area could not attach to segment

2019-02-11 Thread Justin Pryzby
On Mon, Feb 11, 2019 at 08:14:28PM -0600, Justin Pryzby wrote: > > Can we please see the stderr output of dsa_dump(area), added just > > before the PANIC? Can we see the value of "handle" when the error is > > raised, and the directory listing for /dev/shm (assuming Linux) after > > the crash (may

Re: pg11.1: dsa_area could not attach to segment

2019-02-11 Thread Justin Pryzby
On Tue, Feb 12, 2019 at 10:57:51AM +1100, Thomas Munro wrote: > > On current REL_11_STABLE branch with PANIC level i see this backtrace for > > failed parallel process: > > > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > > #1 0x7f3b36983535 in __GI_abort () at

Re: pg11.1: dsa_area could not attach to segment

2019-02-11 Thread Thomas Munro
On Tue, Feb 12, 2019 at 10:57 AM Thomas Munro wrote: > bogus shm_open() EEXIST from the OS Strike that particular idea... it'd be the non-DSM_OP_CREATE case, and if the file was somehow bogusly not visible to us we'd get ENOENT and that'd raise an error, and we aren't seeing that. -- Thomas Mun

Re: pg11.1: dsa_area could not attach to segment

2019-02-11 Thread Thomas Munro
On Tue, Feb 12, 2019 at 1:51 AM Sergei Kornilov wrote: > > Here's confirmed steps to reproduce > > Wow, i confirm this testcase is reproducible for me. On my 4-core desktop i > see "dsa_area could not attach to segment" error after minute or two. Well that's something -- thanks for this report.

Re: pg11.1: dsa_area could not attach to segment

2019-02-11 Thread Sergei Kornilov
Hi > Here's confirmed steps to reproduce Wow, i confirm this testcase is reproducible for me. On my 4-core desktop i see "dsa_area could not attach to segment" error after minute or two. On current REL_11_STABLE branch with PANIC level i see this backtrace for failed parallel process: #0 __GI

Re: pg11.1: dsa_area could not attach to segment

2019-02-10 Thread Justin Pryzby
Hi, On Mon, Feb 11, 2019 at 11:11:32AM +1100, Thomas Munro wrote: > I haven't ever managed to reproduce that one yet. It's great you have > a reliable repro... Let's discuss it on the #15585 thread. I realized that I gave bad information (at least to Thomas). On the server where I've been repr

Re: pg11.1: dsa_area could not attach to segment

2019-02-07 Thread Justin Pryzby
On Wed, Feb 06, 2019 at 07:47:19PM -0600, Justin Pryzby wrote: > FYI, I wasn't yet able to make this work yet. > (gdb) print *segment_map->header > Cannot access memory at address 0x7f347e554000 I'm still not able to make this work. Actually this doesn't work even: (gdb) print *segment_map Canno

Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Thomas Munro
On Thu, Feb 7, 2019 at 12:47 PM Justin Pryzby wrote: > However I *did* reproduce the error in an isolated, non-production postgres > instance. It's a total empty, untuned v11.1 initdb just for this, running > ONLY > a few simultaneous loops around just one query It looks like the simultaneous >

Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Justin Pryzby
FYI, I wasn't yet able to make this work yet. (gdb) print *segment_map->header Cannot access memory at address 0x7f347e554000 However I *did* reproduce the error in an isolated, non-production postgres instance. It's a total empty, untuned v11.1 initdb just for this, running ONLY a few simultaneo

Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Jakub Glapa
> > It might be interesting to have CPU info, too. model name: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (virtualized vmware) and model name: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (bare metal) -- regards, pozdrawiam, Jakub Glapa On Wed, Feb 6, 2019 at 7:52 PM Justin Pryzby wrote:

Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Justin Pryzby
On Wed, Feb 06, 2019 at 06:37:16PM +0100, Jakub Glapa wrote: > I'm seeing dsa_allocate on two different servers. > One is virtualized with VMWare the other is bare metal. Thanks. So it's not limited to vmware or VM at all. FYI here we've seen DSA errors on (and only on) two vmware VMs. It might

Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Sergei Kornilov
Hi > Could you let us know which dsa_* error you were seeing, whether or not you > were running postgres under virtual environment, and (if so) which VM > hypervisor? System from my report is amazon virtual server. lscpu say: Hypervisor vendor: Xen Virtualization type: full regards, Sergei

Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Jakub Glapa
Hi Justin I'm seeing dsa_allocate on two different servers. One is virtualized with VMWare the other is bare metal. ubuntu@db1:~$ grep dsa_allocate /var/log/postgresql/postgresql-11-main.log 2019-02-03 17:03:03 CET:192.168.10.83(48336):foo@bar:[27979]: FATAL: dsa_allocate could not find 7 free pag

Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Justin Pryzby
On Wed, Feb 06, 2019 at 04:22:12PM +1100, Thomas Munro wrote: > Can anyone else who has hit this comment on any virtualisation they > might be using? I don't think most of these people are on -hackers (one of the original reports was on -performance) so I'm copying them now. Could you let us know

Re: pg11.1: dsa_area could not attach to segment

2019-02-06 Thread Thomas Munro
On Wed, Feb 6, 2019 at 4:22 PM Thomas Munro wrote: > On Wed, Feb 6, 2019 at 1:10 PM Justin Pryzby wrote: > > This is a contrived query which I made up to try to exercise/stress bitmap > > scans based on Thomas's working hypothesis for this error/bug. This seems > > to > > be easier to hit than

Re: pg11.1: dsa_area could not attach to segment

2019-02-05 Thread Thomas Munro
On Wed, Feb 6, 2019 at 1:10 PM Justin Pryzby wrote: > This is a contrived query which I made up to try to exercise/stress bitmap > scans based on Thomas's working hypothesis for this error/bug. This seems to > be easier to hit than the other error ("could not attach to segment") - a loop > around

Re: pg11.1: dsa_area could not attach to segment

2019-02-05 Thread Justin Pryzby
I should have included query plan for the query which caused the "could not find free pages" error. This is a contrived query which I made up to try to exercise/stress bitmap scans based on Thomas's working hypothesis for this error/bug. This seems to be easier to hit than the other error ("could

Re: pg11.1: dsa_area could not attach to segment

2019-02-05 Thread Justin Pryzby
And here's the "dsa_allocate could not find %zu free pages" error with core. @@ -726,5 +728,5 @@ dsa_allocate_extended(dsa_area *area, size_t size, int flags) */ - if (!FreePageManagerGet(segment_map->fpm, npages, &first_page)) - elog(FATAL, -

Re: pg11.1: dsa_area could not attach to segment

2019-02-05 Thread Justin Pryzby
I finally reproduced this with core.. For some reason I needed to write assert() rather than elog(PANIC), otherwise it failed with ERROR and no core.. @@ -1741,4 +1743,5 @@ get_segment_by_index(dsa_area *area, dsa_segment_index index) segment = dsm_attach(handle); +

Re: pg11.1: dsa_area could not attach to segment

2019-01-02 Thread Thomas Munro
Hi Justin, On Tue, Jan 1, 2019 at 11:17 AM Justin Pryzby wrote: > dsa_area could not attach to segment /* * If we are reached by dsa_free or dsa_get_address, there must be at * least one object allocated in the referenced segment. Otherwise,

pg11.1: dsa_area could not attach to segment

2018-12-31 Thread Justin Pryzby
In our query logs I saw: postgres=# SELECT log_time, session_id, session_line, left(message,99), left(query,99) FROM postgres_log WHERE error_severity='ERROR' AND message NOT LIKE 'cancel%'; -[ RECORD 1 ]+--