On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański <dan...@turtleware.eu> wrote: > I dont think its related to shared vs static - rather two gc running > concurrently. Try commenting out GC_init call in ecl and see what happens.
I don't understand how two GCs can run concurrently on a memory region controlled by ECL which is statically linked to GC... In fact I am pretty sure no other instances of GC are running anywhere within our process tree. By the way, I don't know whether it's obvious from the backtrace that cl_boot() has been completed, or not. If it actually was completed, could it be a bug that invalidates the bit indicating that cl_boot() has been done? We have seen similar troubles with clang recently, related to FPE. There an FPE bit was flipped by assignment of a double to an integer type (sic!). It took us a lot of head banging on various hard surfaces to debug this: https://trac.sagemath.org/ticket/22799 it turned out we did hit a known bug: https://bugs.llvm.org//show_bug.cgi?id=17686 > > Do you need sigchld for anything? Run-program was rewritten and sigchld > handling wasnt viable option anymore for it. > We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we now can simply skip it all together. Thanks, Dima > Im on phone, will be avail after the weekend. > > Regards, D. > > > Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik <dimpase+...@gmail.com> > napisał(a): >> >> Hi Daniel, >> Thanks for the message. The scenario you talk about only happens if GC >> is a shared library, right? >> >> I've rebuilt GC disabling shared libs, and ECL doing static linking to GC. >> And I still get very similar segfaults: >> >> ;;; ECL C Backtrace >> ;;; 0 ecl_internal_error (0x87d79b375) >> ;;; 1 init_unixint (0x87d7c17e0) >> ;;; 2 init_unixint (0x87d7c1582) >> ;;; 3 pthread_sigmask (0x80103779d) >> ;;; 4 pthread_getspecific (0x801036d6f) >> ;;; 5 unknown (0x7ffffffff193) >> ;;; 6 GC_push_current_stack (0x87d7ef7c3) >> ;;; 7 GC_with_callee_saves_pushed (0x87d7f7360) >> ;;; 8 GC_push_roots (0x87d7ef9c2) >> ;;; 9 GC_mark_some (0x87d7ec97c) >> ;;; 10 GC_stopped_mark (0x87d7e6b7a) >> ;;; 11 GC_try_to_collect_inner (0x87d7e6a75) >> ;;; 12 GC_init (0x87d7f08ea) >> ;;; 13 init_alloc (0x87d7d5669) >> ;;; 14 cl_boot (0x87d69f66b) >> ... >> >> And a very similar picture on the develop branch of ECL - although >> I had to change our code, as in particular >> ECL_OPT_TRAP_SIGCHLD is gone... >> >> So, what can it be? Some signals issue? >> >> Thanks, >> Dima >> >> On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański <dan...@turtleware.eu> >> wrote: >>> >>> Hey Dima, >>> >>> this looks like the issue with having GC initialized before ECL kicks >>> in. >>> See https://gitlab.com/embeddable-common-lisp/ecl/issues/371 for a >>> discussion about this problem. Basically some other component already >>> called >>> GC_init and ECL calls it once more. It's arguably not a bug. >>> >>> Best regards, >>> >>> Daniel >>> >>> >>> On 31.08.2017 15:29, Dima Pasechnik wrote: >>>> >>>> >>>> Dear all, >>>> >>>> I'm struggling to understand strange segfaults coming from >>>> ECL(+Maxima) on FreeBSD embedded into Python; they typically look as >>>> follows: >>>> >>>> Got signal before environment was installed on our thread >>>> [2: No such file or directory] >>>> >>>> ;;; ECL C Backtrace >>>> ;;; 0 ecl_internal_error (0x87d790765) >>>> ;;; 1 init_unixint (0x87d7b6bd0) >>>> ;;; 2 init_unixint (0x87d7b6972) >>>> ;;; 3 pthread_sigmask (0x80103779d) >>>> ;;; 4 pthread_getspecific (0x801036d6f) >>>> ;;; 5 unknown (0x7ffffffff193) >>>> ;;; 6 GC_push_all_stacks (0x87db1ea2c) >>>> ;;; 7 GC_mark_some (0x87db12eec) >>>> ;;; 8 GC_stopped_mark (0x87db09baa) >>>> ;;; 9 GC_try_to_collect_inner (0x87db09a75) >>>> ;;; 10 GC_init (0x87db16f4f) >>>> ;;; 11 init_alloc (0x87d7caa59) >>>> ;;; 12 cl_boot (0x87d694a5b) >>>> ;;; 13 initecl (0x87d218340) >>>> ;;; 14 initecl (0x87d20a43f) >>>> ;;; 15 initecl (0x87d207e28) >>>> ;;; 16 _PyImport_LoadDynamicModule (0x800b3ed1c) >>>> ;;; 17 PyImport_AppendInittab (0x800b3d71f) >>>> ;;; 18 PyImport_AppendInittab (0x800b3d1a8) >>>> ;;; 19 PyImport_ImportModuleLevel (0x800b3c2ce) >>>> ;;; 20 _PyBuiltin_Init (0x800b162d7) >>>> ;;; 21 PyObject_Call (0x800a7d3e3) >>>> ;;; 22 PyEval_EvalFrameEx (0x800b2121c) >>>> ;;; 23 PyEval_EvalCodeEx (0x800b1b5d4) >>>> ;;; 24 PyEval_EvalCode (0x800b1ad96) >>>> ;;; 25 PyImport_ExecCodeModuleEx (0x800b3ad11) >>>> ;;; 26 PyImport_AppendInittab (0x800b3ddb8) >>>> ;;; 27 PyImport_AppendInittab (0x800b3d71f) >>>> ;;; 28 PyImport_AppendInittab (0x800b3d1a8) >>>> ;;; 29 PyImport_ImportModuleLevel (0x800b3c2ce) >>>> ;;; 30 _PyBuiltin_Init (0x800b162d7) >>>> ;;; 31 PyEval_EvalFrameEx (0x800b22dd1) >>>> Segmentation fault (core dumped) >>>> >>>> It looks as if ECL (version 16.1.2) is being called before an >>>> initialisation is complete, but it it possible to say more without a >>>> debugger? >>>> >>>> More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0 >>>> with libatomic_ops version 7.4.6. >>>> And only reproducible on FreeBSD. >>>> >>>> ECL is built with --disable-threads; GC is built with or without >>>> threads---result is still the same. >>>> (so it's unclear to me where pthread_* calls in the trace >>>> come from). >>>> >>>> Thanks, >>>> Dima >>>> >>>> PS. the segfault is at the bottom of >>>> https://trac.sagemath.org/ticket/22679#comment:87 >>> >>> >>> > > -- Wysłane za pomocą K-9 Mail.