Thanks for helping me debug this.

   1. With KJ_USE_FUTEX=0 I'm hitting what initially seems to be a 
   deadlock. I'll debug this further now.
   2. I don't have any crash if I put a static std::mutex 
   in c++/src/capnp/arena.c++ and lock it using std::unique_lock right 
   after moreSegments.lockExclusive();
   3. I still observe the issue if (instead of #2) I make 
   std::unique_ptr<std::mutex> a member variable of ReaderArena and lock it 
   before or after moreSegments.lockExclusive();
      - I have printed KJ_DBG(my_unique_lock.owns_lock(), my_thread_id, 
      this); and interestingly, two threads claim they own the lock:
         - 
         - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = 
         12814521112356468066; this = 7fd7f800a478
         - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = 
         2054456097081703779; this = 7fd7f800a478
      - This is really weird :)
   
I'm on Ubuntu 18.04.3 LTS.

On Saturday, October 5, 2019 at 4:16:32 PM UTC-7, Kenton Varda wrote:
>
> Hmm, that's strange!
>
> What operating system is this?
>
> If it happens to be Linux, could you try compiling everything with 
> -DKJ_USE_FUTEX=0 (or remove the `#define KJ_USE_FUTEX 1` from the top of 
> c++/src/kj/mutex.h), and see if that changes anything? This change will 
> make KJ use a completely different mutex implementation. (That said, the 
> futex-based implementation has seen very heavy use with no problems in the 
> past, so it would be surprising if it were broken somehow.)
>
> -Kenton
>
> On Sat, Oct 5, 2019 at 2:35 PM <[email protected] <javascript:>> wrote:
>
>> As an update, I've tried to place the following messages 
>> to c++/src/capnp/arena.c++:
>>
>>    SegmentMap* segments = nullptr;
>>    KJ_IF_MAYBE(s, *lock) {
>>      KJ_IF_MAYBE(segment, s->find(id.value)) {
>>        return *segment;
>>      }
>>      segments = s;
>> +  } else {
>> +    size_t this_id = 
>> std::hash<std::thread::id>{}(std::this_thread::get_id());
>> +    KJ_DBG("map doesn't exist", this_id, this);
>>    }
>>
>> It looks like (just before the crash) multiple threads print "map doesn't 
>> exist" for the same 'this' value. It's as if lock did not work for some 
>> reason. I could not reproduce the issue in a pure capnp test yet.
>>
>> For context, we have the same type of message with 2 segments printed in 
>> a high frequency. We have a stack of them being read by multiple readers. 
>> Apart from the mentioned exception being thrown, we often have segfaults in 
>> the insert() function.
>>
>> On Tuesday, October 1, 2019 at 10:39:16 AM UTC-7, Cenk Oguz Saglam wrote:
>>>
>>> Thanks for the quick response Kenton.
>>>
>>> I was also suspecting a race condition. Thanks for checking the mutex. 
>>> It is very likely that the issue is due to our usage. I'll share what I 
>>> find as I debug this further.
>>>
>>> On Tuesday, October 1, 2019 at 9:30:42 AM UTC-7, Kenton Varda wrote:
>>>>
>>>> Hi Oguz,
>>>>
>>>> You can get better stack traces by compiling in debug mode (both Cap'n 
>>>> Proto itself, and your project). You should then see a symbolic trace 
>>>> instead of a bunch of addresses.
>>>>
>>>> This is a strange error, though. Looking at the code for 
>>>> ReaderArena::tryGetSegment(), the insert() call only happens after a 
>>>> find() 
>>>> call looking for the same key has failed. How could the inserted row 
>>>> already exist, then?
>>>>
>>>> Moreover, the whole sequence is performed under a mutex lock, seemingly 
>>>> ruling out any race conditions.
>>>>
>>>> I'm not sure what to say here. If you can come up with a self-contained 
>>>> test case that reproduces the issue, I'd be happy to debug.
>>>>
>>>> -Kenton
>>>>
>>>> On Tue, Oct 1, 2019 at 9:10 AM <[email protected]> wrote:
>>>>
>>>>> Thanks for this amazing software.
>>>>>
>>>>> We are using v0.7.0. I would like to ask help debugging the following 
>>>>> exception which we rarely but consistently get:
>>>>>
>>>>> terminate called after throwing an instance of 'kj::ExceptionImpl'
>>>>>   what():  kj/table.c++:44: failed: inserted row already exists in 
>>>>> table
>>>>> stack: 7f6f7f0697 7f6f7f1ee3 7f6f802623 7f6f5dc9ab 7f6f5b4823 
>>>>> 7f77fdd5bf 557c6f2957 557c63df2b 7f78a30e13 7f78b0f087
>>>>>
>>>>> Our backtrace shows that we were trying to read from a proto, then the 
>>>>> following two functions in capnp were called:
>>>>>
>>>>>    - kj::Table<kj::HashMap<unsigned int, 
>>>>>    kj::Own<capnp::_::SegmentReader> >::Entry, 
>>>>>    kj::HashIndex<kj::HashMap<unsigned int, 
>>>>> kj::Own<capnp::_::SegmentReader> 
>>>>>    >::Callbacks> >::insert(kj::HashMap<unsigned int, 
>>>>>    kj::Own<capnp::_::SegmentReader> >::Entry&&)
>>>>>    - capnp::_::ReaderArena::tryGetSegment(kj::Id<unsigned int, 
>>>>>    capnp::_::Segment>)
>>>>>
>>>>> Why would reading from proto trigger an insert call?
>>>>>
>>>>> How can I make use of the "stack: 7f6f7f0697 7f6f7f1ee3..." to debug 
>>>>> this further?
>>>>>
>>>>> If the way we read proto is OK, can this perhaps be caused by how we 
>>>>> populate the proto contents? Perhaps some missing or corrupt members?
>>>>>
>>>>> Thank you very much for help in advance,
>>>>> Oguz
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Cap'n Proto" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com.

Reply via email to