On Fri, Feb 15, 2019 at 2:31 AM Sergei Kornilov wrote:
> I can not reproduce bug after 30min test long. (without patch bug was after
> minute-two)
Thank you Justin and Sergei for all your help reproducing and testing this.
Fix pushed to all supported releases. It's lightly refactored from
the
On Fri, Feb 15, 2019 at 5:36 AM Sergei Kornilov wrote:
> > Do you think that plausibly explains and resolves symptoms of bug#15585,
> > too?
>
> I think yes. Bug#15585 raised only after "dsa_area could not attach to
> segment" in different parallel worker. Leader stuck because waiting all
> par
Hi
> Do you think that plausibly explains and resolves symptoms of bug#15585, too?
I think yes. Bug#15585 raised only after "dsa_area could not attach to segment"
in different parallel worker. Leader stuck because waiting all parallel
workers, but one worker has unexpected recursion in dsm_back
On Fri, Feb 15, 2019 at 01:12:35AM +1300, Thomas Munro wrote:
> The problem is that a DSM handle (ie a random number) can be reused
> for a new segment immediately after the shared memory object has been
> destroyed but before the DSM slot has been released. Now two DSM
> slots have the same handl
Hi!
Great work, thank you!
I can not reproduce bug after 30min test long. (without patch bug was after
minute-two)
regards Sergei
On Tue, Feb 12, 2019 at 10:15 PM Sergei Kornilov wrote:
> I still have error with parallel_leader_participation = off.
Justin very kindly set up a virtual machine similar to the one where
he'd seen the problem so I could experiment with it. Eventually I
also managed to reproduce it locally, and
Hi
> I think this is tentatively confirmed..I ran 20 loops for over 90 minutes with
> no crash when parallel_leader_participation=off.
>
> On enabling parallel_leader_participation, crash within 10min.
>
> Sergei, could you confirm ?
I still have error with parallel_leader_participation = off. On
On Tue, Feb 12, 2019 at 4:27 PM Thomas Munro
wrote:
> On Tue, Feb 12, 2019 at 4:01 PM Justin Pryzby wrote:
> > On Mon, Feb 11, 2019 at 08:43:14PM -0600, Justin Pryzby wrote:
> > > I have a suspicion that this doesn't happen if
> > > parallel_leader_participation=off.
> >
> > I think this is tenta
On Tue, Feb 12, 2019 at 4:01 PM Justin Pryzby wrote:
> On Mon, Feb 11, 2019 at 08:43:14PM -0600, Justin Pryzby wrote:
> > I have a suspicion that this doesn't happen if
> > parallel_leader_participation=off.
>
> I think this is tentatively confirmed..I ran 20 loops for over 90 minutes with
> no cr
On Mon, Feb 11, 2019 at 08:43:14PM -0600, Justin Pryzby wrote:
> I have a suspicion that this doesn't happen if
> parallel_leader_participation=off.
I think this is tentatively confirmed..I ran 20 loops for over 90 minutes with
no crash when parallel_leader_participation=off.
On enabling parallel
On Tue, Feb 12, 2019 at 1:14 PM Justin Pryzby wrote:
> On Tue, Feb 12, 2019 at 10:57:51AM +1100, Thomas Munro wrote:
> > > On current REL_11_STABLE branch with PANIC level i see this backtrace for
> > > failed parallel process:
> > >
> > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/
On Mon, Feb 11, 2019 at 08:14:28PM -0600, Justin Pryzby wrote:
> > Can we please see the stderr output of dsa_dump(area), added just
> > before the PANIC? Can we see the value of "handle" when the error is
> > raised, and the directory listing for /dev/shm (assuming Linux) after
> > the crash (may
On Tue, Feb 12, 2019 at 10:57:51AM +1100, Thomas Munro wrote:
> > On current REL_11_STABLE branch with PANIC level i see this backtrace for
> > failed parallel process:
> >
> > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #1 0x7f3b36983535 in __GI_abort () at
On Tue, Feb 12, 2019 at 10:57 AM Thomas Munro
wrote:
> bogus shm_open() EEXIST from the OS
Strike that particular idea... it'd be the non-DSM_OP_CREATE case, and
if the file was somehow bogusly not visible to us we'd get ENOENT and
that'd raise an error, and we aren't seeing that.
--
Thomas Mun
On Tue, Feb 12, 2019 at 1:51 AM Sergei Kornilov wrote:
> > Here's confirmed steps to reproduce
>
> Wow, i confirm this testcase is reproducible for me. On my 4-core desktop i
> see "dsa_area could not attach to segment" error after minute or two.
Well that's something -- thanks for this report.
Hi
> Here's confirmed steps to reproduce
Wow, i confirm this testcase is reproducible for me. On my 4-core desktop i see
"dsa_area could not attach to segment" error after minute or two.
On current REL_11_STABLE branch with PANIC level i see this backtrace for
failed parallel process:
#0 __GI
Hi,
On Mon, Feb 11, 2019 at 11:11:32AM +1100, Thomas Munro wrote:
> I haven't ever managed to reproduce that one yet. It's great you have
> a reliable repro... Let's discuss it on the #15585 thread.
I realized that I gave bad information (at least to Thomas). On the server
where I've been repr
On Wed, Feb 06, 2019 at 07:47:19PM -0600, Justin Pryzby wrote:
> FYI, I wasn't yet able to make this work yet.
> (gdb) print *segment_map->header
> Cannot access memory at address 0x7f347e554000
I'm still not able to make this work. Actually this doesn't work even:
(gdb) print *segment_map
Canno
On Thu, Feb 7, 2019 at 12:47 PM Justin Pryzby wrote:
> However I *did* reproduce the error in an isolated, non-production postgres
> instance. It's a total empty, untuned v11.1 initdb just for this, running
> ONLY
> a few simultaneous loops around just one query It looks like the simultaneous
>
FYI, I wasn't yet able to make this work yet.
(gdb) print *segment_map->header
Cannot access memory at address 0x7f347e554000
However I *did* reproduce the error in an isolated, non-production postgres
instance. It's a total empty, untuned v11.1 initdb just for this, running ONLY
a few simultaneo
>
> It might be interesting to have CPU info, too.
model name: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (virtualized
vmware)
and
model name: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (bare metal)
--
regards,
pozdrawiam,
Jakub Glapa
On Wed, Feb 6, 2019 at 7:52 PM Justin Pryzby wrote:
On Wed, Feb 06, 2019 at 06:37:16PM +0100, Jakub Glapa wrote:
> I'm seeing dsa_allocate on two different servers.
> One is virtualized with VMWare the other is bare metal.
Thanks. So it's not limited to vmware or VM at all.
FYI here we've seen DSA errors on (and only on) two vmware VMs.
It might
Hi
> Could you let us know which dsa_* error you were seeing, whether or not you
> were running postgres under virtual environment, and (if so) which VM
> hypervisor?
System from my report is amazon virtual server. lscpu say:
Hypervisor vendor: Xen
Virtualization type: full
regards, Sergei
Hi Justin
I'm seeing dsa_allocate on two different servers.
One is virtualized with VMWare the other is bare metal.
ubuntu@db1:~$ grep dsa_allocate /var/log/postgresql/postgresql-11-main.log
2019-02-03 17:03:03 CET:192.168.10.83(48336):foo@bar:[27979]: FATAL:
dsa_allocate could not find 7 free pag
On Wed, Feb 06, 2019 at 04:22:12PM +1100, Thomas Munro wrote:
> Can anyone else who has hit this comment on any virtualisation they
> might be using?
I don't think most of these people are on -hackers (one of the original reports
was on -performance) so I'm copying them now.
Could you let us know
On Wed, Feb 6, 2019 at 4:22 PM Thomas Munro
wrote:
> On Wed, Feb 6, 2019 at 1:10 PM Justin Pryzby wrote:
> > This is a contrived query which I made up to try to exercise/stress bitmap
> > scans based on Thomas's working hypothesis for this error/bug. This seems
> > to
> > be easier to hit than
On Wed, Feb 6, 2019 at 1:10 PM Justin Pryzby wrote:
> This is a contrived query which I made up to try to exercise/stress bitmap
> scans based on Thomas's working hypothesis for this error/bug. This seems to
> be easier to hit than the other error ("could not attach to segment") - a loop
> around
I should have included query plan for the query which caused the "could not
find free pages" error.
This is a contrived query which I made up to try to exercise/stress bitmap
scans based on Thomas's working hypothesis for this error/bug. This seems to
be easier to hit than the other error ("could
And here's the "dsa_allocate could not find %zu free pages" error with core.
@@ -726,5 +728,5 @@ dsa_allocate_extended(dsa_area *area, size_t size, int
flags)
*/
- if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
- elog(FATAL,
-
I finally reproduced this with core..
For some reason I needed to write assert() rather than elog(PANIC), otherwise
it failed with ERROR and no core..
@@ -1741,4 +1743,5 @@ get_segment_by_index(dsa_area *area, dsa_segment_index
index)
segment = dsm_attach(handle);
+
Hi Justin,
On Tue, Jan 1, 2019 at 11:17 AM Justin Pryzby wrote:
> dsa_area could not attach to segment
/*
* If we are reached by dsa_free or dsa_get_address,
there must be at
* least one object allocated in the referenced
segment. Otherwise,
In our query logs I saw:
postgres=# SELECT log_time, session_id, session_line, left(message,99),
left(query,99) FROM postgres_log WHERE error_severity='ERROR' AND message NOT
LIKE 'cancel%';
-[ RECORD 1
]+--
32 matches
Mail list logo