Thanks for helping me debug this.
1. With KJ_USE_FUTEX=0 I'm hitting what initially seems to be a
deadlock. I'll debug this further now.
2. I don't have any crash if I put a static std::mutex
in c++/src/capnp/arena.c++ and lock it using std::unique_lock right
after moreSegments.lockExclusive();
3. I still observe the issue if (instead of #2) I make
std::unique_ptr<std::mutex> a member variable of ReaderArena and lock it
before or after moreSegments.lockExclusive();
- I have printed KJ_DBG(my_unique_lock.owns_lock(), my_thread_id,
this); and interestingly, two threads claim they own the lock:
-
- capnp/arena.c++:150: debug: l.owns_lock() = true; this_id =
12814521112356468066; this = 7fd7f800a478
- capnp/arena.c++:150: debug: l.owns_lock() = true; this_id =
2054456097081703779; this = 7fd7f800a478
- This is really weird :)
I'm on Ubuntu 18.04.3 LTS.
On Saturday, October 5, 2019 at 4:16:32 PM UTC-7, Kenton Varda wrote:
>
> Hmm, that's strange!
>
> What operating system is this?
>
> If it happens to be Linux, could you try compiling everything with
> -DKJ_USE_FUTEX=0 (or remove the `#define KJ_USE_FUTEX 1` from the top of
> c++/src/kj/mutex.h), and see if that changes anything? This change will
> make KJ use a completely different mutex implementation. (That said, the
> futex-based implementation has seen very heavy use with no problems in the
> past, so it would be surprising if it were broken somehow.)
>
> -Kenton
>
> On Sat, Oct 5, 2019 at 2:35 PM <[email protected] <javascript:>> wrote:
>
>> As an update, I've tried to place the following messages
>> to c++/src/capnp/arena.c++:
>>
>> SegmentMap* segments = nullptr;
>> KJ_IF_MAYBE(s, *lock) {
>> KJ_IF_MAYBE(segment, s->find(id.value)) {
>> return *segment;
>> }
>> segments = s;
>> + } else {
>> + size_t this_id =
>> std::hash<std::thread::id>{}(std::this_thread::get_id());
>> + KJ_DBG("map doesn't exist", this_id, this);
>> }
>>
>> It looks like (just before the crash) multiple threads print "map doesn't
>> exist" for the same 'this' value. It's as if lock did not work for some
>> reason. I could not reproduce the issue in a pure capnp test yet.
>>
>> For context, we have the same type of message with 2 segments printed in
>> a high frequency. We have a stack of them being read by multiple readers.
>> Apart from the mentioned exception being thrown, we often have segfaults in
>> the insert() function.
>>
>> On Tuesday, October 1, 2019 at 10:39:16 AM UTC-7, Cenk Oguz Saglam wrote:
>>>
>>> Thanks for the quick response Kenton.
>>>
>>> I was also suspecting a race condition. Thanks for checking the mutex.
>>> It is very likely that the issue is due to our usage. I'll share what I
>>> find as I debug this further.
>>>
>>> On Tuesday, October 1, 2019 at 9:30:42 AM UTC-7, Kenton Varda wrote:
>>>>
>>>> Hi Oguz,
>>>>
>>>> You can get better stack traces by compiling in debug mode (both Cap'n
>>>> Proto itself, and your project). You should then see a symbolic trace
>>>> instead of a bunch of addresses.
>>>>
>>>> This is a strange error, though. Looking at the code for
>>>> ReaderArena::tryGetSegment(), the insert() call only happens after a
>>>> find()
>>>> call looking for the same key has failed. How could the inserted row
>>>> already exist, then?
>>>>
>>>> Moreover, the whole sequence is performed under a mutex lock, seemingly
>>>> ruling out any race conditions.
>>>>
>>>> I'm not sure what to say here. If you can come up with a self-contained
>>>> test case that reproduces the issue, I'd be happy to debug.
>>>>
>>>> -Kenton
>>>>
>>>> On Tue, Oct 1, 2019 at 9:10 AM <[email protected]> wrote:
>>>>
>>>>> Thanks for this amazing software.
>>>>>
>>>>> We are using v0.7.0. I would like to ask help debugging the following
>>>>> exception which we rarely but consistently get:
>>>>>
>>>>> terminate called after throwing an instance of 'kj::ExceptionImpl'
>>>>> what(): kj/table.c++:44: failed: inserted row already exists in
>>>>> table
>>>>> stack: 7f6f7f0697 7f6f7f1ee3 7f6f802623 7f6f5dc9ab 7f6f5b4823
>>>>> 7f77fdd5bf 557c6f2957 557c63df2b 7f78a30e13 7f78b0f087
>>>>>
>>>>> Our backtrace shows that we were trying to read from a proto, then the
>>>>> following two functions in capnp were called:
>>>>>
>>>>> - kj::Table<kj::HashMap<unsigned int,
>>>>> kj::Own<capnp::_::SegmentReader> >::Entry,
>>>>> kj::HashIndex<kj::HashMap<unsigned int,
>>>>> kj::Own<capnp::_::SegmentReader>
>>>>> >::Callbacks> >::insert(kj::HashMap<unsigned int,
>>>>> kj::Own<capnp::_::SegmentReader> >::Entry&&)
>>>>> - capnp::_::ReaderArena::tryGetSegment(kj::Id<unsigned int,
>>>>> capnp::_::Segment>)
>>>>>
>>>>> Why would reading from proto trigger an insert call?
>>>>>
>>>>> How can I make use of the "stack: 7f6f7f0697 7f6f7f1ee3..." to debug
>>>>> this further?
>>>>>
>>>>> If the way we read proto is OK, can this perhaps be caused by how we
>>>>> populate the proto contents? Perhaps some missing or corrupt members?
>>>>>
>>>>> Thank you very much for help in advance,
>>>>> Oguz
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Cap'n Proto" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com
>>>>>
>>>>> <https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com.