Could this be use-after-free? Maybe the MessageReader (and hence the mutex)
is already deleted?

-Kenton

On Mon, Oct 7, 2019 at 11:58 AM <[email protected]> wrote:

> Thanks for helping me debug this.
>
>    1. With KJ_USE_FUTEX=0 I'm hitting what initially seems to be a
>    deadlock. I'll debug this further now.
>    2. I don't have any crash if I put a static std::mutex
>    in c++/src/capnp/arena.c++ and lock it using std::unique_lock right
>    after moreSegments.lockExclusive();
>    3. I still observe the issue if (instead of #2) I make
>    std::unique_ptr<std::mutex> a member variable of ReaderArena and lock it
>    before or after moreSegments.lockExclusive();
>       - I have printed KJ_DBG(my_unique_lock.owns_lock(), my_thread_id,
>       this); and interestingly, two threads claim they own the lock:
>          -
>          - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id =
>          12814521112356468066; this = 7fd7f800a478
>          - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id =
>          2054456097081703779; this = 7fd7f800a478
>       - This is really weird :)
>
> I'm on Ubuntu 18.04.3 LTS.
>
> On Saturday, October 5, 2019 at 4:16:32 PM UTC-7, Kenton Varda wrote:
>>
>> Hmm, that's strange!
>>
>> What operating system is this?
>>
>> If it happens to be Linux, could you try compiling everything with
>> -DKJ_USE_FUTEX=0 (or remove the `#define KJ_USE_FUTEX 1` from the top of
>> c++/src/kj/mutex.h), and see if that changes anything? This change will
>> make KJ use a completely different mutex implementation. (That said, the
>> futex-based implementation has seen very heavy use with no problems in the
>> past, so it would be surprising if it were broken somehow.)
>>
>> -Kenton
>>
>> On Sat, Oct 5, 2019 at 2:35 PM <[email protected]> wrote:
>>
>>> As an update, I've tried to place the following messages
>>> to c++/src/capnp/arena.c++:
>>>
>>>    SegmentMap* segments = nullptr;
>>>    KJ_IF_MAYBE(s, *lock) {
>>>      KJ_IF_MAYBE(segment, s->find(id.value)) {
>>>        return *segment;
>>>      }
>>>      segments = s;
>>> +  } else {
>>> +    size_t this_id =
>>> std::hash<std::thread::id>{}(std::this_thread::get_id());
>>> +    KJ_DBG("map doesn't exist", this_id, this);
>>>    }
>>>
>>> It looks like (just before the crash) multiple threads print "map
>>> doesn't exist" for the same 'this' value. It's as if lock did not work for
>>> some reason. I could not reproduce the issue in a pure capnp test yet.
>>>
>>> For context, we have the same type of message with 2 segments printed in
>>> a high frequency. We have a stack of them being read by multiple readers.
>>> Apart from the mentioned exception being thrown, we often have segfaults in
>>> the insert() function.
>>>
>>> On Tuesday, October 1, 2019 at 10:39:16 AM UTC-7, Cenk Oguz Saglam wrote:
>>>>
>>>> Thanks for the quick response Kenton.
>>>>
>>>> I was also suspecting a race condition. Thanks for checking the mutex.
>>>> It is very likely that the issue is due to our usage. I'll share what I
>>>> find as I debug this further.
>>>>
>>>> On Tuesday, October 1, 2019 at 9:30:42 AM UTC-7, Kenton Varda wrote:
>>>>>
>>>>> Hi Oguz,
>>>>>
>>>>> You can get better stack traces by compiling in debug mode (both Cap'n
>>>>> Proto itself, and your project). You should then see a symbolic trace
>>>>> instead of a bunch of addresses.
>>>>>
>>>>> This is a strange error, though. Looking at the code for
>>>>> ReaderArena::tryGetSegment(), the insert() call only happens after a 
>>>>> find()
>>>>> call looking for the same key has failed. How could the inserted row
>>>>> already exist, then?
>>>>>
>>>>> Moreover, the whole sequence is performed under a mutex lock,
>>>>> seemingly ruling out any race conditions.
>>>>>
>>>>> I'm not sure what to say here. If you can come up with a
>>>>> self-contained test case that reproduces the issue, I'd be happy to debug.
>>>>>
>>>>> -Kenton
>>>>>
>>>>> On Tue, Oct 1, 2019 at 9:10 AM <[email protected]> wrote:
>>>>>
>>>>>> Thanks for this amazing software.
>>>>>>
>>>>>> We are using v0.7.0. I would like to ask help debugging the following
>>>>>> exception which we rarely but consistently get:
>>>>>>
>>>>>> terminate called after throwing an instance of 'kj::ExceptionImpl'
>>>>>>   what():  kj/table.c++:44: failed: inserted row already exists in
>>>>>> table
>>>>>> stack: 7f6f7f0697 7f6f7f1ee3 7f6f802623 7f6f5dc9ab 7f6f5b4823
>>>>>> 7f77fdd5bf 557c6f2957 557c63df2b 7f78a30e13 7f78b0f087
>>>>>>
>>>>>> Our backtrace shows that we were trying to read from a proto, then
>>>>>> the following two functions in capnp were called:
>>>>>>
>>>>>>    - kj::Table<kj::HashMap<unsigned int,
>>>>>>    kj::Own<capnp::_::SegmentReader> >::Entry,
>>>>>>    kj::HashIndex<kj::HashMap<unsigned int, 
>>>>>> kj::Own<capnp::_::SegmentReader>
>>>>>>    >::Callbacks> >::insert(kj::HashMap<unsigned int,
>>>>>>    kj::Own<capnp::_::SegmentReader> >::Entry&&)
>>>>>>    - capnp::_::ReaderArena::tryGetSegment(kj::Id<unsigned int,
>>>>>>    capnp::_::Segment>)
>>>>>>
>>>>>> Why would reading from proto trigger an insert call?
>>>>>>
>>>>>> How can I make use of the "stack: 7f6f7f0697 7f6f7f1ee3..." to debug
>>>>>> this further?
>>>>>>
>>>>>> If the way we read proto is OK, can this perhaps be caused by how we
>>>>>> populate the proto contents? Perhaps some missing or corrupt members?
>>>>>>
>>>>>> Thank you very much for help in advance,
>>>>>> Oguz
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Cap'n Proto" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Cap'n Proto" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com
>>> <https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com
> <https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJouXQmxf6iEAQHa9nYAV%2BUrs7ERfbdkdGFOfncq6Bn%3DY-sWdw%40mail.gmail.com.

Reply via email to