Hmm, that's strange!

What operating system is this?

If it happens to be Linux, could you try compiling everything with
-DKJ_USE_FUTEX=0 (or remove the `#define KJ_USE_FUTEX 1` from the top of
c++/src/kj/mutex.h), and see if that changes anything? This change will
make KJ use a completely different mutex implementation. (That said, the
futex-based implementation has seen very heavy use with no problems in the
past, so it would be surprising if it were broken somehow.)

-Kenton

On Sat, Oct 5, 2019 at 2:35 PM <[email protected]> wrote:

> As an update, I've tried to place the following messages
> to c++/src/capnp/arena.c++:
>
>    SegmentMap* segments = nullptr;
>    KJ_IF_MAYBE(s, *lock) {
>      KJ_IF_MAYBE(segment, s->find(id.value)) {
>        return *segment;
>      }
>      segments = s;
> +  } else {
> +    size_t this_id =
> std::hash<std::thread::id>{}(std::this_thread::get_id());
> +    KJ_DBG("map doesn't exist", this_id, this);
>    }
>
> It looks like (just before the crash) multiple threads print "map doesn't
> exist" for the same 'this' value. It's as if lock did not work for some
> reason. I could not reproduce the issue in a pure capnp test yet.
>
> For context, we have the same type of message with 2 segments printed in a
> high frequency. We have a stack of them being read by multiple readers.
> Apart from the mentioned exception being thrown, we often have segfaults in
> the insert() function.
>
> On Tuesday, October 1, 2019 at 10:39:16 AM UTC-7, Cenk Oguz Saglam wrote:
>>
>> Thanks for the quick response Kenton.
>>
>> I was also suspecting a race condition. Thanks for checking the mutex. It
>> is very likely that the issue is due to our usage. I'll share what I find
>> as I debug this further.
>>
>> On Tuesday, October 1, 2019 at 9:30:42 AM UTC-7, Kenton Varda wrote:
>>>
>>> Hi Oguz,
>>>
>>> You can get better stack traces by compiling in debug mode (both Cap'n
>>> Proto itself, and your project). You should then see a symbolic trace
>>> instead of a bunch of addresses.
>>>
>>> This is a strange error, though. Looking at the code for
>>> ReaderArena::tryGetSegment(), the insert() call only happens after a find()
>>> call looking for the same key has failed. How could the inserted row
>>> already exist, then?
>>>
>>> Moreover, the whole sequence is performed under a mutex lock, seemingly
>>> ruling out any race conditions.
>>>
>>> I'm not sure what to say here. If you can come up with a self-contained
>>> test case that reproduces the issue, I'd be happy to debug.
>>>
>>> -Kenton
>>>
>>> On Tue, Oct 1, 2019 at 9:10 AM <[email protected]> wrote:
>>>
>>>> Thanks for this amazing software.
>>>>
>>>> We are using v0.7.0. I would like to ask help debugging the following
>>>> exception which we rarely but consistently get:
>>>>
>>>> terminate called after throwing an instance of 'kj::ExceptionImpl'
>>>>   what():  kj/table.c++:44: failed: inserted row already exists in table
>>>> stack: 7f6f7f0697 7f6f7f1ee3 7f6f802623 7f6f5dc9ab 7f6f5b4823
>>>> 7f77fdd5bf 557c6f2957 557c63df2b 7f78a30e13 7f78b0f087
>>>>
>>>> Our backtrace shows that we were trying to read from a proto, then the
>>>> following two functions in capnp were called:
>>>>
>>>>    - kj::Table<kj::HashMap<unsigned int,
>>>>    kj::Own<capnp::_::SegmentReader> >::Entry,
>>>>    kj::HashIndex<kj::HashMap<unsigned int, kj::Own<capnp::_::SegmentReader>
>>>>    >::Callbacks> >::insert(kj::HashMap<unsigned int,
>>>>    kj::Own<capnp::_::SegmentReader> >::Entry&&)
>>>>    - capnp::_::ReaderArena::tryGetSegment(kj::Id<unsigned int,
>>>>    capnp::_::Segment>)
>>>>
>>>> Why would reading from proto trigger an insert call?
>>>>
>>>> How can I make use of the "stack: 7f6f7f0697 7f6f7f1ee3..." to debug
>>>> this further?
>>>>
>>>> If the way we read proto is OK, can this perhaps be caused by how we
>>>> populate the proto contents? Perhaps some missing or corrupt members?
>>>>
>>>> Thank you very much for help in advance,
>>>> Oguz
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Cap'n Proto" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com
> <https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJouXQnb7ZeyudLAg0aKVTqnqeXW89XJ9YmbWVSSx6zY3N5adw%40mail.gmail.com.

Reply via email to