> On Sep 21, 2017, at 8:31 AM, Dima Pasechnik <dimpase+...@gmail.com> wrote: > > > > On Tue, Sep 12, 2017 at 1:18 AM, Fabrizio Fabbri <strabix...@yahoo.com> wrote: > > > >> On Sep 11, 2017, at 7:13 PM, Dima Pasechnik <dimpase+...@gmail.com> wrote: > >> > >>> On Mon, Sep 4, 2017 at 11:15 AM, Daniel Kochmański <dan...@turtleware.eu> > >>> wrote: > >>> From the backtrace it is sure that fail is caused inside the call to > >>> GC_init. Such errors are known to have happened when another GC was > >>> initialized already on the system (I've linked the issue). It might be > >>> caused by something else in bdwgc, I don't know. Either way I'd focus on > >>> GC_init part. > >> > >> Our project (sagemath) only uses libgc within the embedded ECL. Thus I > >> am really puzzled how another libgc instance might kick in and spoil > >> the game for ECL. > >> > >> One possibility is that clang is using libgc, and thus, in principle, > >> libgc might be sitting somewhere in the runtime?! > >> > >> > >>> > >>> To make sure, that I'm right with my assertion you may put printf before > >>> and > >>> after call to GC_init. I'm not quite familiar with bdwgc internals to say, > >>> what is wrong though. Maybe updating bundled sources of GC will help? Or > >>> linking with libgc on the system? It might be that it was a bug in bdwgc > >>> which got already fixed. > >> > >> We are not using the bdwgc shipped with ECL, we use a separate libgc > >> 7.6.0, which is the latest stable. > >> (Is there a reason to ship bdwgc sources with ECL - do you patch it?) > >> > > > > I'm using ecl with the non embedded bdwgc as well and I don't have issue.. > > > > Ensure that bdwgc it's not also build statically in ecl as well. I expect > > linking problems in that case but worth it double check. > > here is a part of a stacktrace from the debugger, in a scenario where > a call to embedded ECL from Python leads to a ECL's stack overflow, on > an already initialised ECL; it seems to be related to a particular thread > this call comes from (another, usual, calling sequence > does not lead to crashes). There is no mention of GC in the stacktrace. >
If the current thread is generated outside the lisp environment you need to import it before call any ecl function. That is done by ecl_import_current_thread ecl_release_current_thread You could see the example here: https://gitlab.com/embeddable-common-lisp/ecl/tree/develop/examples/threads/import Maybe you already do that but worth mentioning that. Best F. > This looks to me as a lack of thread safety on ECL side, although I might be > wrong. > ... > frame #16: 0x000000088444b9d6 libecl.so.16.1`si_serror(narg=6, > cformat=0x0000000000d27ba0, eformat=0x00000008847d12a0) at error.d:549 > frame #17: 0x000000088448bd42 libecl.so.16.1`ecl_cs_overflow at stacks.d:76 > frame #18: 0x00000008844168af > libecl.so.16.1`ecl_interpret(frame=0x00007fffdeff2658, > env=0x0000000000000001, bytecodes=0x0000000000db33c0) at interpreter.d:286 > frame #19: 0x0000000884414afc > libecl.so.16.1`ecl_apply_from_stack_frame(frame=0x00007fffdeff2658, > x=0x0000000000db33c0) at eval.d:79 > frame #20: 0x000000088441545b libecl.so.16.1`cl_apply(narg=0, > fun=0x0000000000db33c0, lastarg=0x0000000000000001) at eval.d:164 > frame #21: 0x0000000883e0e1b4 > ecl.so`__pyx_f_4sage_4libs_3ecl_ecl_safe_funcall(__pyx_v_func=0x0000000000769600, > __pyx_v_arg=0x0000000000e6dfa0) at ecl.c:5831 > frame #22: 0x0000000883e0d519 > ecl.so`__pyx_f_4sage_4libs_3ecl_ecl_safe_read_string(__pyx_v_s="(setf > *load-verbose* NIL)") at ecl.c:6084 > frame #23: 0x0000000883e0d02b > ecl.so`__pyx_f_4sage_4libs_3ecl_ecl_eval(__pyx_v_s=0x0000000882add970, > __pyx_skip_dispatch=0) at ecl.c:10682 > frame #24: 0x0000000883e0cd4c > ecl.so`__pyx_pf_4sage_4libs_3ecl_10ecl_eval(__pyx_self=0x0000000000000000, > __pyx_v_s=0x0000000882add970) at ecl.c:10762 > frame #25: 0x0000000883e0cab7 > ecl.so`__pyx_pw_4sage_4libs_3ecl_11ecl_eval(__pyx_self=0x0000000000000000, > __pyx_v_s=0x0000000882add970) at ecl.c:10745 > frame #26: 0x0000000800d8a68f > libpython2.7.so.1`call_function(pp_stack=0x00007fffdeff2c00, oparg=1) at > ceval.c:4340 > frame #27: 0x0000000800d854d2 > libpython2.7.so.1`PyEval_EvalFrameEx(f=0x00000008829939b0, throwflag=0) at > ceval.c:2989 > ... > frame #91: 0x0000000800d88361 > libpython2.7.so.1`PyEval_CallObjectWithKeywords(func=0x000000087cdf99e0, > arg=0x000000080064e060, kw=0x0000000000000000) at ceval.c:4221 > frame #92: 0x0000000800de60d1 > libpython2.7.so.1`t_bootstrap(boot_raw=0x0000000807015598) at > threadmodule.c:620 > frame #93: 0x00000008012d3b55 > libthr.so.3`___lldb_unnamed_symbol1$$libthr.so.3 + 325 > > > > > > >> Thanks, > >> Dima > >> > >>> > >>> Regards, > >>> > >>> Daniel > >>> > >>> > >>> > >>>> On 04.09.2017 12:04, Dima Pasechnik wrote: > >>>> > >>>> On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański <dan...@turtleware.eu> > >>>> wrote: > >>>>> > >>>>> I dont think its related to shared vs static - rather two gc running > >>>>> concurrently. Try commenting out GC_init call in ecl and see what > >>>>> happens. > >>>> > >>>> I don't understand how two GCs can run concurrently on a memory region > >>>> controlled by ECL which is statically linked to GC... > >>>> In fact I am pretty sure no other instances of GC are running anywhere > >>>> within our process tree. > >>>> > >>>> By the way, I don't know whether it's obvious from the backtrace that > >>>> cl_boot() has been completed, or not. > >>>> > >>>> If it actually was completed, could it be a bug that invalidates the > >>>> bit indicating that cl_boot() has been done? > >>>> > >>>> We have seen similar troubles with clang recently, related to FPE. > >>>> There an FPE bit was flipped by assignment of a double to an > >>>> integer type (sic!). > >>>> It took us a lot of head banging on various hard surfaces to debug this: > >>>> https://trac.sagemath.org/ticket/22799 > >>>> it turned out we did hit a known bug: > >>>> https://bugs.llvm.org//show_bug.cgi?id=17686 > >>>> > >>>>> Do you need sigchld for anything? Run-program was rewritten and sigchld > >>>>> handling wasnt viable option anymore for it. > >>>>> > >>>> We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we > >>>> now can simply skip it all together. > >>>> > >>>> Thanks, > >>>> Dima > >>>> > >>>>> Im on phone, will be avail after the weekend. > >>>>> > >>>>> Regards, D. > >>>>> > >>>>> > >>>>> Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik > >>>>> <dimpase+...@gmail.com> > >>>>> napisał(a): > >>>>>> > >>>>>> Hi Daniel, > >>>>>> Thanks for the message. The scenario you talk about only happens if GC > >>>>>> is a shared library, right? > >>>>>> > >>>>>> I've rebuilt GC disabling shared libs, and ECL doing static linking to > >>>>>> GC. > >>>>>> And I still get very similar segfaults: > >>>>>> > >>>>>> ;;; ECL C Backtrace > >>>>>> ;;; 0 ecl_internal_error (0x87d79b375) > >>>>>> ;;; 1 init_unixint (0x87d7c17e0) > >>>>>> ;;; 2 init_unixint (0x87d7c1582) > >>>>>> ;;; 3 pthread_sigmask (0x80103779d) > >>>>>> ;;; 4 pthread_getspecific (0x801036d6f) > >>>>>> ;;; 5 unknown (0x7ffffffff193) > >>>>>> ;;; 6 GC_push_current_stack (0x87d7ef7c3) > >>>>>> ;;; 7 GC_with_callee_saves_pushed (0x87d7f7360) > >>>>>> ;;; 8 GC_push_roots (0x87d7ef9c2) > >>>>>> ;;; 9 GC_mark_some (0x87d7ec97c) > >>>>>> ;;; 10 GC_stopped_mark (0x87d7e6b7a) > >>>>>> ;;; 11 GC_try_to_collect_inner (0x87d7e6a75) > >>>>>> ;;; 12 GC_init (0x87d7f08ea) > >>>>>> ;;; 13 init_alloc (0x87d7d5669) > >>>>>> ;;; 14 cl_boot (0x87d69f66b) > >>>>>> ... > >>>>>> > >>>>>> And a very similar picture on the develop branch of ECL - although > >>>>>> I had to change our code, as in particular > >>>>>> ECL_OPT_TRAP_SIGCHLD is gone... > >>>>>> > >>>>>> So, what can it be? Some signals issue? > >>>>>> > >>>>>> Thanks, > >>>>>> Dima > >>>>>> > >>>>>> On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański > >>>>>> <dan...@turtleware.eu> > >>>>>> wrote: > >>>>>>> > >>>>>>> Hey Dima, > >>>>>>> > >>>>>>> this looks like the issue with having GC initialized before ECL kicks > >>>>>>> in. > >>>>>>> See https://gitlab.com/embeddable-common-lisp/ecl/issues/371 for a > >>>>>>> discussion about this problem. Basically some other component already > >>>>>>> called > >>>>>>> GC_init and ECL calls it once more. It's arguably not a bug. > >>>>>>> > >>>>>>> Best regards, > >>>>>>> > >>>>>>> Daniel > >>>>>>> > >>>>>>> > >>>>>>>> On 31.08.2017 15:29, Dima Pasechnik wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> Dear all, > >>>>>>>> > >>>>>>>> I'm struggling to understand strange segfaults coming from > >>>>>>>> ECL(+Maxima) on FreeBSD embedded into Python; they typically look as > >>>>>>>> follows: > >>>>>>>> > >>>>>>>> Got signal before environment was installed on our thread > >>>>>>>> [2: No such file or directory] > >>>>>>>> > >>>>>>>> ;;; ECL C Backtrace > >>>>>>>> ;;; 0 ecl_internal_error (0x87d790765) > >>>>>>>> ;;; 1 init_unixint (0x87d7b6bd0) > >>>>>>>> ;;; 2 init_unixint (0x87d7b6972) > >>>>>>>> ;;; 3 pthread_sigmask (0x80103779d) > >>>>>>>> ;;; 4 pthread_getspecific (0x801036d6f) > >>>>>>>> ;;; 5 unknown (0x7ffffffff193) > >>>>>>>> ;;; 6 GC_push_all_stacks (0x87db1ea2c) > >>>>>>>> ;;; 7 GC_mark_some (0x87db12eec) > >>>>>>>> ;;; 8 GC_stopped_mark (0x87db09baa) > >>>>>>>> ;;; 9 GC_try_to_collect_inner (0x87db09a75) > >>>>>>>> ;;; 10 GC_init (0x87db16f4f) > >>>>>>>> ;;; 11 init_alloc (0x87d7caa59) > >>>>>>>> ;;; 12 cl_boot (0x87d694a5b) > >>>>>>>> ;;; 13 initecl (0x87d218340) > >>>>>>>> ;;; 14 initecl (0x87d20a43f) > >>>>>>>> ;;; 15 initecl (0x87d207e28) > >>>>>>>> ;;; 16 _PyImport_LoadDynamicModule (0x800b3ed1c) > >>>>>>>> ;;; 17 PyImport_AppendInittab (0x800b3d71f) > >>>>>>>> ;;; 18 PyImport_AppendInittab (0x800b3d1a8) > >>>>>>>> ;;; 19 PyImport_ImportModuleLevel (0x800b3c2ce) > >>>>>>>> ;;; 20 _PyBuiltin_Init (0x800b162d7) > >>>>>>>> ;;; 21 PyObject_Call (0x800a7d3e3) > >>>>>>>> ;;; 22 PyEval_EvalFrameEx (0x800b2121c) > >>>>>>>> ;;; 23 PyEval_EvalCodeEx (0x800b1b5d4) > >>>>>>>> ;;; 24 PyEval_EvalCode (0x800b1ad96) > >>>>>>>> ;;; 25 PyImport_ExecCodeModuleEx (0x800b3ad11) > >>>>>>>> ;;; 26 PyImport_AppendInittab (0x800b3ddb8) > >>>>>>>> ;;; 27 PyImport_AppendInittab (0x800b3d71f) > >>>>>>>> ;;; 28 PyImport_AppendInittab (0x800b3d1a8) > >>>>>>>> ;;; 29 PyImport_ImportModuleLevel (0x800b3c2ce) > >>>>>>>> ;;; 30 _PyBuiltin_Init (0x800b162d7) > >>>>>>>> ;;; 31 PyEval_EvalFrameEx (0x800b22dd1) > >>>>>>>> Segmentation fault (core dumped) > >>>>>>>> > >>>>>>>> It looks as if ECL (version 16.1.2) is being called before an > >>>>>>>> initialisation is complete, but it it possible to say more without a > >>>>>>>> debugger? > >>>>>>>> > >>>>>>>> More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0 > >>>>>>>> with libatomic_ops version 7.4.6. > >>>>>>>> And only reproducible on FreeBSD. > >>>>>>>> > >>>>>>>> ECL is built with --disable-threads; GC is built with or without > >>>>>>>> threads---result is still the same. > >>>>>>>> (so it's unclear to me where pthread_* calls in the trace > >>>>>>>> come from). > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Dima > >>>>>>>> > >>>>>>>> PS. the segfault is at the bottom of > >>>>>>>> https://trac.sagemath.org/ticket/22679#comment:87 >