Hmm, that's strange! What operating system is this?
If it happens to be Linux, could you try compiling everything with -DKJ_USE_FUTEX=0 (or remove the `#define KJ_USE_FUTEX 1` from the top of c++/src/kj/mutex.h), and see if that changes anything? This change will make KJ use a completely different mutex implementation. (That said, the futex-based implementation has seen very heavy use with no problems in the past, so it would be surprising if it were broken somehow.) -Kenton On Sat, Oct 5, 2019 at 2:35 PM <[email protected]> wrote: > As an update, I've tried to place the following messages > to c++/src/capnp/arena.c++: > > SegmentMap* segments = nullptr; > KJ_IF_MAYBE(s, *lock) { > KJ_IF_MAYBE(segment, s->find(id.value)) { > return *segment; > } > segments = s; > + } else { > + size_t this_id = > std::hash<std::thread::id>{}(std::this_thread::get_id()); > + KJ_DBG("map doesn't exist", this_id, this); > } > > It looks like (just before the crash) multiple threads print "map doesn't > exist" for the same 'this' value. It's as if lock did not work for some > reason. I could not reproduce the issue in a pure capnp test yet. > > For context, we have the same type of message with 2 segments printed in a > high frequency. We have a stack of them being read by multiple readers. > Apart from the mentioned exception being thrown, we often have segfaults in > the insert() function. > > On Tuesday, October 1, 2019 at 10:39:16 AM UTC-7, Cenk Oguz Saglam wrote: >> >> Thanks for the quick response Kenton. >> >> I was also suspecting a race condition. Thanks for checking the mutex. It >> is very likely that the issue is due to our usage. I'll share what I find >> as I debug this further. >> >> On Tuesday, October 1, 2019 at 9:30:42 AM UTC-7, Kenton Varda wrote: >>> >>> Hi Oguz, >>> >>> You can get better stack traces by compiling in debug mode (both Cap'n >>> Proto itself, and your project). You should then see a symbolic trace >>> instead of a bunch of addresses. >>> >>> This is a strange error, though. Looking at the code for >>> ReaderArena::tryGetSegment(), the insert() call only happens after a find() >>> call looking for the same key has failed. How could the inserted row >>> already exist, then? >>> >>> Moreover, the whole sequence is performed under a mutex lock, seemingly >>> ruling out any race conditions. >>> >>> I'm not sure what to say here. If you can come up with a self-contained >>> test case that reproduces the issue, I'd be happy to debug. >>> >>> -Kenton >>> >>> On Tue, Oct 1, 2019 at 9:10 AM <[email protected]> wrote: >>> >>>> Thanks for this amazing software. >>>> >>>> We are using v0.7.0. I would like to ask help debugging the following >>>> exception which we rarely but consistently get: >>>> >>>> terminate called after throwing an instance of 'kj::ExceptionImpl' >>>> what(): kj/table.c++:44: failed: inserted row already exists in table >>>> stack: 7f6f7f0697 7f6f7f1ee3 7f6f802623 7f6f5dc9ab 7f6f5b4823 >>>> 7f77fdd5bf 557c6f2957 557c63df2b 7f78a30e13 7f78b0f087 >>>> >>>> Our backtrace shows that we were trying to read from a proto, then the >>>> following two functions in capnp were called: >>>> >>>> - kj::Table<kj::HashMap<unsigned int, >>>> kj::Own<capnp::_::SegmentReader> >::Entry, >>>> kj::HashIndex<kj::HashMap<unsigned int, kj::Own<capnp::_::SegmentReader> >>>> >::Callbacks> >::insert(kj::HashMap<unsigned int, >>>> kj::Own<capnp::_::SegmentReader> >::Entry&&) >>>> - capnp::_::ReaderArena::tryGetSegment(kj::Id<unsigned int, >>>> capnp::_::Segment>) >>>> >>>> Why would reading from proto trigger an insert call? >>>> >>>> How can I make use of the "stack: 7f6f7f0697 7f6f7f1ee3..." to debug >>>> this further? >>>> >>>> If the way we read proto is OK, can this perhaps be caused by how we >>>> populate the proto contents? Perhaps some missing or corrupt members? >>>> >>>> Thank you very much for help in advance, >>>> Oguz >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Cap'n Proto" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com >>>> <https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- > You received this message because you are subscribed to the Google Groups > "Cap'n Proto" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com > <https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/CAJouXQnb7ZeyudLAg0aKVTqnqeXW89XJ9YmbWVSSx6zY3N5adw%40mail.gmail.com.
