Hello Daniel, thanks for taking the time to look at this. After taking a deep dive debugging and finally being able to reproduce the problem, I quickly realized that I had caused all this havoc by mixing C++ and C semantics all in the pursuit of trying to save myself from writing one more line of code...
Long story short, I was somehow piecing together cl_objects from other cl_objects which had already been "destroyed" by a transient std::vector I was using somewhere. Any code that worked with these objects worked most of the time (calling free(...) does not mean the things you are free-ing are not in memory anymore...), but crashed every once in a while since it was working with free'd memory. Since I fixed my main problem I haven't had the time to investigate the issue with the single-threaded builds I have. Maybe It'll come up some other day, it seems to be working just fine outside of my program. Anyways, thanks for your help! Dennis Daniel Kochmański <dan...@turtleware.eu> writes: >> [...] there are multiple more calls to Lxxstore_object() methods below this >> >> I am having problems debugging this because I highly doubt that the generic >> function dispatch mechanism is broken (otherwise *nothing ever* would work, >> right?) So I think something else is causing this confusion in >> fill_spec_vector. > > It is hard to tell anything without a reproducible test case I could use. > Please replace the if/else in fill_spec_vector with: > > <<<EOF > > if (ECL_LISTP(spec_type) && > !Null(eql_spec = ecl_memql(args[spec_position], spec_type))) { > argtype[spec_no++] = eql_spec; > } else { > printf("XXX: args: %p, spec-pos: %d, args[sp]: %p\n", args, > spec_position, args[spec_position]); > printf("XXX: printing argument\n"); > ecl_print(args[spec_position], ECL_T); > ecl_terpri(ECL_T); > printf("XXX: printing argument type\n"); > ecl_print(ecl_type_to_symbol(ecl_t_of(args[spec_position])), ECL_T); > ecl_terpri(ECL_T); > printf("XXX: debug information done\n"); > argtype[spec_no++] = cl_class_of(args[spec_position]); > } > > EOF > > it could be that the dispatch mechanism misses one particular type, or that > you have a dangling pointer, I wouldn't be so sure that all works correct. > Please compile ECL with this debug information and when you reproduce the > issue send the console output before the error. Note that this may crash > before reaching argtype[spec_no++] because we dereference some pointers in > the meantime). If it is too verbose, coment out the 'printing argument' part, > it may be a big array or something. > >> I've compiled it with only the --disable-threads flag now and I still get >> the same crash in the call to GC_init() in cl_boot(). However, staring the >> ECL interpreter works fine and embedding ECL into a single-threaded, small >> example program also works. > > Regarding working with threads enabled: ECL enviroment must b e imported on > each "C++ world" thread (see examples for how to do that). That is not > necessary on ECL with single thread build. > > Regarding GC_init – are you certain you do not call it twice for some reason? > Or that cl_boot is not called twice? I mildly remember someone had a similar > problem and it was due to calling GC_init separately before cl_boot (or > immedietely after). >> >> Could it be that I am missing something when trying to embed ECL in a large >> C++ codebase? Do I have to worry about the Boehm GC not functioning when >> most of the program is not designed to use GC_MALLOC? I am also statically >> linking my lisp code, would that make a difference here? > > No, bdwgc should work fine with code which is not libgc aware. You may want > to try using libgc shipped with your system. I don't know what your OS is, > but OpenBSD has some heavy restrictions for what you can do with memory. >> > > > Regards, > Daniel