Makes sense. Thank you very much for helping me debug Kenton :) On Tuesday, October 8, 2019 at 9:05:19 AM UTC-7, Kenton Varda wrote: > > Ah, hmm. So you are accessing the MesasgeReader itself (not just the > objects in it) from multiple threads? It has been a long time since I wrote > this code, but I suspect that wasn't intended to be supported. Normally > thread-safe methods are marked `const`, and none of MessageReader's methods > appear to be `const`. In contrast, the getter/setter methods on individual > Reader objects are marked `const`, hence all the mutex-locking to make sure > that changes they might trigger are thread-safe. But to get a Reader object > initially, you have to call getRoot(), which initializes the arena and sets > `allocatedArena = true`. > > I admit this is not very intuitive. MessageReader *feels* like a read-only > type and hence like it ought to be thread-safe. I think probably I should > refactor MessageReader so that it loads the whole segment table upfront and > doesn't do any silly mutex locking, though I think subclasses may need to > be changed to support that, bleh. > > As a simple work-around to fix your problem, I think you can do: > > messageReader.getRoot<capnp::AnyPointer>(); > > immediately after allocating the MessageReader, before it has been made > visible to other threads. > > A more-correct solution is to call getRoot() with the actual root type and > then only share the returned struct Reader with other threads; don't share > the MessageReader. > > -Kenton > > On Mon, Oct 7, 2019 at 5:38 PM <[email protected] <javascript:>> wrote: > >> great suggestion. I found that just before the crash, ReaderArena gets >> constructed twice at the same memory location. That would explain why locks >> don't work as expected. Looking at the capnp code, we were wondering >> whether allocatedArena variable of MessageReader needs to be protected for >> multi-threading. >> >> On Monday, October 7, 2019 at 12:16:42 PM UTC-7, Kenton Varda wrote: >>> >>> Could this be use-after-free? Maybe the MessageReader (and hence the >>> mutex) is already deleted? >>> >>> -Kenton >>> >>> On Mon, Oct 7, 2019 at 11:58 AM <[email protected]> wrote: >>> >>>> Thanks for helping me debug this. >>>> >>>> 1. With KJ_USE_FUTEX=0 I'm hitting what initially seems to be a >>>> deadlock. I'll debug this further now. >>>> 2. I don't have any crash if I put a static std::mutex >>>> in c++/src/capnp/arena.c++ and lock it using std::unique_lock right >>>> after moreSegments.lockExclusive(); >>>> 3. I still observe the issue if (instead of #2) I make >>>> std::unique_ptr<std::mutex> a member variable of ReaderArena and lock >>>> it >>>> before or after moreSegments.lockExclusive(); >>>> - I have printed KJ_DBG(my_unique_lock.owns_lock(), >>>> my_thread_id, this); and interestingly, two threads claim they own >>>> the lock: >>>> - >>>> - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = >>>> 12814521112356468066; this = 7fd7f800a478 >>>> - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = >>>> 2054456097081703779; this = 7fd7f800a478 >>>> - This is really weird :) >>>> >>>> I'm on Ubuntu 18.04.3 LTS. >>>> >>>> On Saturday, October 5, 2019 at 4:16:32 PM UTC-7, Kenton Varda wrote: >>>>> >>>>> Hmm, that's strange! >>>>> >>>>> What operating system is this? >>>>> >>>>> If it happens to be Linux, could you try compiling everything with >>>>> -DKJ_USE_FUTEX=0 (or remove the `#define KJ_USE_FUTEX 1` from the top of >>>>> c++/src/kj/mutex.h), and see if that changes anything? This change will >>>>> make KJ use a completely different mutex implementation. (That said, the >>>>> futex-based implementation has seen very heavy use with no problems in >>>>> the >>>>> past, so it would be surprising if it were broken somehow.) >>>>> >>>>> -Kenton >>>>> >>>>> On Sat, Oct 5, 2019 at 2:35 PM <[email protected]> wrote: >>>>> >>>>>> As an update, I've tried to place the following messages >>>>>> to c++/src/capnp/arena.c++: >>>>>> >>>>>> SegmentMap* segments = nullptr; >>>>>> KJ_IF_MAYBE(s, *lock) { >>>>>> KJ_IF_MAYBE(segment, s->find(id.value)) { >>>>>> return *segment; >>>>>> } >>>>>> segments = s; >>>>>> + } else { >>>>>> + size_t this_id = >>>>>> std::hash<std::thread::id>{}(std::this_thread::get_id()); >>>>>> + KJ_DBG("map doesn't exist", this_id, this); >>>>>> } >>>>>> >>>>>> It looks like (just before the crash) multiple threads print "map >>>>>> doesn't exist" for the same 'this' value. It's as if lock did not work >>>>>> for >>>>>> some reason. I could not reproduce the issue in a pure capnp test yet. >>>>>> >>>>>> For context, we have the same type of message with 2 segments printed >>>>>> in a high frequency. We have a stack of them being read by multiple >>>>>> readers. Apart from the mentioned exception being thrown, we often have >>>>>> segfaults in the insert() function. >>>>>> >>>>>> On Tuesday, October 1, 2019 at 10:39:16 AM UTC-7, Cenk Oguz Saglam >>>>>> wrote: >>>>>>> >>>>>>> Thanks for the quick response Kenton. >>>>>>> >>>>>>> I was also suspecting a race condition. Thanks for checking the >>>>>>> mutex. It is very likely that the issue is due to our usage. I'll share >>>>>>> what I find as I debug this further. >>>>>>> >>>>>>> On Tuesday, October 1, 2019 at 9:30:42 AM UTC-7, Kenton Varda wrote: >>>>>>>> >>>>>>>> Hi Oguz, >>>>>>>> >>>>>>>> You can get better stack traces by compiling in debug mode (both >>>>>>>> Cap'n Proto itself, and your project). You should then see a symbolic >>>>>>>> trace >>>>>>>> instead of a bunch of addresses. >>>>>>>> >>>>>>>> This is a strange error, though. Looking at the code for >>>>>>>> ReaderArena::tryGetSegment(), the insert() call only happens after a >>>>>>>> find() >>>>>>>> call looking for the same key has failed. How could the inserted row >>>>>>>> already exist, then? >>>>>>>> >>>>>>>> Moreover, the whole sequence is performed under a mutex lock, >>>>>>>> seemingly ruling out any race conditions. >>>>>>>> >>>>>>>> I'm not sure what to say here. If you can come up with a >>>>>>>> self-contained test case that reproduces the issue, I'd be happy to >>>>>>>> debug. >>>>>>>> >>>>>>>> -Kenton >>>>>>>> >>>>>>>> On Tue, Oct 1, 2019 at 9:10 AM <[email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks for this amazing software. >>>>>>>>> >>>>>>>>> We are using v0.7.0. I would like to ask help debugging the >>>>>>>>> following exception which we rarely but consistently get: >>>>>>>>> >>>>>>>>> terminate called after throwing an instance of 'kj::ExceptionImpl' >>>>>>>>> what(): kj/table.c++:44: failed: inserted row already exists in >>>>>>>>> table >>>>>>>>> stack: 7f6f7f0697 7f6f7f1ee3 7f6f802623 7f6f5dc9ab 7f6f5b4823 >>>>>>>>> 7f77fdd5bf 557c6f2957 557c63df2b 7f78a30e13 7f78b0f087 >>>>>>>>> >>>>>>>>> Our backtrace shows that we were trying to read from a proto, then >>>>>>>>> the following two functions in capnp were called: >>>>>>>>> >>>>>>>>> - kj::Table<kj::HashMap<unsigned int, >>>>>>>>> kj::Own<capnp::_::SegmentReader> >::Entry, >>>>>>>>> kj::HashIndex<kj::HashMap<unsigned int, >>>>>>>>> kj::Own<capnp::_::SegmentReader> >>>>>>>>> >::Callbacks> >::insert(kj::HashMap<unsigned int, >>>>>>>>> kj::Own<capnp::_::SegmentReader> >::Entry&&) >>>>>>>>> - capnp::_::ReaderArena::tryGetSegment(kj::Id<unsigned int, >>>>>>>>> capnp::_::Segment>) >>>>>>>>> >>>>>>>>> Why would reading from proto trigger an insert call? >>>>>>>>> >>>>>>>>> How can I make use of the "stack: 7f6f7f0697 7f6f7f1ee3..." to >>>>>>>>> debug this further? >>>>>>>>> >>>>>>>>> If the way we read proto is OK, can this perhaps be caused by how >>>>>>>>> we populate the proto contents? Perhaps some missing or corrupt >>>>>>>>> members? >>>>>>>>> >>>>>>>>> Thank you very much for help in advance, >>>>>>>>> Oguz >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "Cap'n Proto" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Cap'n Proto" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Cap'n Proto" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Cap'n Proto" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/capnproto/245fd14b-0061-4346-af50-3804baef15f5%40googlegroups.com >> >> <https://groups.google.com/d/msgid/capnproto/245fd14b-0061-4346-af50-3804baef15f5%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >
-- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/398ecae8-ee0d-49e7-bd94-cf384a465a98%40googlegroups.com.
