Could this be use-after-free? Maybe the MessageReader (and hence the mutex) is already deleted?
-Kenton On Mon, Oct 7, 2019 at 11:58 AM <[email protected]> wrote: > Thanks for helping me debug this. > > 1. With KJ_USE_FUTEX=0 I'm hitting what initially seems to be a > deadlock. I'll debug this further now. > 2. I don't have any crash if I put a static std::mutex > in c++/src/capnp/arena.c++ and lock it using std::unique_lock right > after moreSegments.lockExclusive(); > 3. I still observe the issue if (instead of #2) I make > std::unique_ptr<std::mutex> a member variable of ReaderArena and lock it > before or after moreSegments.lockExclusive(); > - I have printed KJ_DBG(my_unique_lock.owns_lock(), my_thread_id, > this); and interestingly, two threads claim they own the lock: > - > - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = > 12814521112356468066; this = 7fd7f800a478 > - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = > 2054456097081703779; this = 7fd7f800a478 > - This is really weird :) > > I'm on Ubuntu 18.04.3 LTS. > > On Saturday, October 5, 2019 at 4:16:32 PM UTC-7, Kenton Varda wrote: >> >> Hmm, that's strange! >> >> What operating system is this? >> >> If it happens to be Linux, could you try compiling everything with >> -DKJ_USE_FUTEX=0 (or remove the `#define KJ_USE_FUTEX 1` from the top of >> c++/src/kj/mutex.h), and see if that changes anything? This change will >> make KJ use a completely different mutex implementation. (That said, the >> futex-based implementation has seen very heavy use with no problems in the >> past, so it would be surprising if it were broken somehow.) >> >> -Kenton >> >> On Sat, Oct 5, 2019 at 2:35 PM <[email protected]> wrote: >> >>> As an update, I've tried to place the following messages >>> to c++/src/capnp/arena.c++: >>> >>> SegmentMap* segments = nullptr; >>> KJ_IF_MAYBE(s, *lock) { >>> KJ_IF_MAYBE(segment, s->find(id.value)) { >>> return *segment; >>> } >>> segments = s; >>> + } else { >>> + size_t this_id = >>> std::hash<std::thread::id>{}(std::this_thread::get_id()); >>> + KJ_DBG("map doesn't exist", this_id, this); >>> } >>> >>> It looks like (just before the crash) multiple threads print "map >>> doesn't exist" for the same 'this' value. It's as if lock did not work for >>> some reason. I could not reproduce the issue in a pure capnp test yet. >>> >>> For context, we have the same type of message with 2 segments printed in >>> a high frequency. We have a stack of them being read by multiple readers. >>> Apart from the mentioned exception being thrown, we often have segfaults in >>> the insert() function. >>> >>> On Tuesday, October 1, 2019 at 10:39:16 AM UTC-7, Cenk Oguz Saglam wrote: >>>> >>>> Thanks for the quick response Kenton. >>>> >>>> I was also suspecting a race condition. Thanks for checking the mutex. >>>> It is very likely that the issue is due to our usage. I'll share what I >>>> find as I debug this further. >>>> >>>> On Tuesday, October 1, 2019 at 9:30:42 AM UTC-7, Kenton Varda wrote: >>>>> >>>>> Hi Oguz, >>>>> >>>>> You can get better stack traces by compiling in debug mode (both Cap'n >>>>> Proto itself, and your project). You should then see a symbolic trace >>>>> instead of a bunch of addresses. >>>>> >>>>> This is a strange error, though. Looking at the code for >>>>> ReaderArena::tryGetSegment(), the insert() call only happens after a >>>>> find() >>>>> call looking for the same key has failed. How could the inserted row >>>>> already exist, then? >>>>> >>>>> Moreover, the whole sequence is performed under a mutex lock, >>>>> seemingly ruling out any race conditions. >>>>> >>>>> I'm not sure what to say here. If you can come up with a >>>>> self-contained test case that reproduces the issue, I'd be happy to debug. >>>>> >>>>> -Kenton >>>>> >>>>> On Tue, Oct 1, 2019 at 9:10 AM <[email protected]> wrote: >>>>> >>>>>> Thanks for this amazing software. >>>>>> >>>>>> We are using v0.7.0. I would like to ask help debugging the following >>>>>> exception which we rarely but consistently get: >>>>>> >>>>>> terminate called after throwing an instance of 'kj::ExceptionImpl' >>>>>> what(): kj/table.c++:44: failed: inserted row already exists in >>>>>> table >>>>>> stack: 7f6f7f0697 7f6f7f1ee3 7f6f802623 7f6f5dc9ab 7f6f5b4823 >>>>>> 7f77fdd5bf 557c6f2957 557c63df2b 7f78a30e13 7f78b0f087 >>>>>> >>>>>> Our backtrace shows that we were trying to read from a proto, then >>>>>> the following two functions in capnp were called: >>>>>> >>>>>> - kj::Table<kj::HashMap<unsigned int, >>>>>> kj::Own<capnp::_::SegmentReader> >::Entry, >>>>>> kj::HashIndex<kj::HashMap<unsigned int, >>>>>> kj::Own<capnp::_::SegmentReader> >>>>>> >::Callbacks> >::insert(kj::HashMap<unsigned int, >>>>>> kj::Own<capnp::_::SegmentReader> >::Entry&&) >>>>>> - capnp::_::ReaderArena::tryGetSegment(kj::Id<unsigned int, >>>>>> capnp::_::Segment>) >>>>>> >>>>>> Why would reading from proto trigger an insert call? >>>>>> >>>>>> How can I make use of the "stack: 7f6f7f0697 7f6f7f1ee3..." to debug >>>>>> this further? >>>>>> >>>>>> If the way we read proto is OK, can this perhaps be caused by how we >>>>>> populate the proto contents? Perhaps some missing or corrupt members? >>>>>> >>>>>> Thank you very much for help in advance, >>>>>> Oguz >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Cap'n Proto" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Cap'n Proto" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com >>> <https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "Cap'n Proto" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com > <https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/CAJouXQmxf6iEAQHa9nYAV%2BUrs7ERfbdkdGFOfncq6Bn%3DY-sWdw%40mail.gmail.com.
