Makes sense. Thank you very much for helping me debug Kenton :)

On Tuesday, October 8, 2019 at 9:05:19 AM UTC-7, Kenton Varda wrote:
>
> Ah, hmm. So you are accessing the MesasgeReader itself (not just the 
> objects in it) from multiple threads? It has been a long time since I wrote 
> this code, but I suspect that wasn't intended to be supported. Normally 
> thread-safe methods are marked `const`, and none of MessageReader's methods 
> appear to be `const`. In contrast, the getter/setter methods on individual 
> Reader objects are marked `const`, hence all the mutex-locking to make sure 
> that changes they might trigger are thread-safe. But to get a Reader object 
> initially, you have to call getRoot(), which initializes the arena and sets 
> `allocatedArena = true`.
>
> I admit this is not very intuitive. MessageReader *feels* like a read-only 
> type and hence like it ought to be thread-safe. I think probably I should 
> refactor MessageReader so that it loads the whole segment table upfront and 
> doesn't do any silly mutex locking, though I think subclasses may need to 
> be changed to support that, bleh.
>
> As a simple work-around to fix your problem, I think you can do:
>
>     messageReader.getRoot<capnp::AnyPointer>();
>
> immediately after allocating the MessageReader, before it has been made 
> visible to other threads.
>
> A more-correct solution is to call getRoot() with the actual root type and 
> then only share the returned struct Reader with other threads; don't share 
> the MessageReader.
>
> -Kenton
>
> On Mon, Oct 7, 2019 at 5:38 PM <[email protected] <javascript:>> wrote:
>
>> great suggestion. I found that just before the crash, ReaderArena gets 
>> constructed twice at the same memory location. That would explain why locks 
>> don't work as expected. Looking at the capnp code, we were wondering 
>> whether allocatedArena variable of MessageReader needs to be protected for 
>> multi-threading.
>>
>> On Monday, October 7, 2019 at 12:16:42 PM UTC-7, Kenton Varda wrote:
>>>
>>> Could this be use-after-free? Maybe the MessageReader (and hence the 
>>> mutex) is already deleted?
>>>
>>> -Kenton
>>>
>>> On Mon, Oct 7, 2019 at 11:58 AM <[email protected]> wrote:
>>>
>>>> Thanks for helping me debug this.
>>>>
>>>>    1. With KJ_USE_FUTEX=0 I'm hitting what initially seems to be a 
>>>>    deadlock. I'll debug this further now.
>>>>    2. I don't have any crash if I put a static std::mutex 
>>>>    in c++/src/capnp/arena.c++ and lock it using std::unique_lock right 
>>>>    after moreSegments.lockExclusive();
>>>>    3. I still observe the issue if (instead of #2) I make 
>>>>    std::unique_ptr<std::mutex> a member variable of ReaderArena and lock 
>>>> it 
>>>>    before or after moreSegments.lockExclusive();
>>>>       - I have printed KJ_DBG(my_unique_lock.owns_lock(), 
>>>>       my_thread_id, this); and interestingly, two threads claim they own 
>>>> the lock:
>>>>          - 
>>>>          - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = 
>>>>          12814521112356468066; this = 7fd7f800a478
>>>>          - capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = 
>>>>          2054456097081703779; this = 7fd7f800a478
>>>>       - This is really weird :)
>>>>    
>>>> I'm on Ubuntu 18.04.3 LTS.
>>>>
>>>> On Saturday, October 5, 2019 at 4:16:32 PM UTC-7, Kenton Varda wrote:
>>>>>
>>>>> Hmm, that's strange!
>>>>>
>>>>> What operating system is this?
>>>>>
>>>>> If it happens to be Linux, could you try compiling everything with 
>>>>> -DKJ_USE_FUTEX=0 (or remove the `#define KJ_USE_FUTEX 1` from the top of 
>>>>> c++/src/kj/mutex.h), and see if that changes anything? This change will 
>>>>> make KJ use a completely different mutex implementation. (That said, the 
>>>>> futex-based implementation has seen very heavy use with no problems in 
>>>>> the 
>>>>> past, so it would be surprising if it were broken somehow.)
>>>>>
>>>>> -Kenton
>>>>>
>>>>> On Sat, Oct 5, 2019 at 2:35 PM <[email protected]> wrote:
>>>>>
>>>>>> As an update, I've tried to place the following messages 
>>>>>> to c++/src/capnp/arena.c++:
>>>>>>
>>>>>>    SegmentMap* segments = nullptr;
>>>>>>    KJ_IF_MAYBE(s, *lock) {
>>>>>>      KJ_IF_MAYBE(segment, s->find(id.value)) {
>>>>>>        return *segment;
>>>>>>      }
>>>>>>      segments = s;
>>>>>> +  } else {
>>>>>> +    size_t this_id = 
>>>>>> std::hash<std::thread::id>{}(std::this_thread::get_id());
>>>>>> +    KJ_DBG("map doesn't exist", this_id, this);
>>>>>>    }
>>>>>>
>>>>>> It looks like (just before the crash) multiple threads print "map 
>>>>>> doesn't exist" for the same 'this' value. It's as if lock did not work 
>>>>>> for 
>>>>>> some reason. I could not reproduce the issue in a pure capnp test yet.
>>>>>>
>>>>>> For context, we have the same type of message with 2 segments printed 
>>>>>> in a high frequency. We have a stack of them being read by multiple 
>>>>>> readers. Apart from the mentioned exception being thrown, we often have 
>>>>>> segfaults in the insert() function.
>>>>>>
>>>>>> On Tuesday, October 1, 2019 at 10:39:16 AM UTC-7, Cenk Oguz Saglam 
>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks for the quick response Kenton.
>>>>>>>
>>>>>>> I was also suspecting a race condition. Thanks for checking the 
>>>>>>> mutex. It is very likely that the issue is due to our usage. I'll share 
>>>>>>> what I find as I debug this further.
>>>>>>>
>>>>>>> On Tuesday, October 1, 2019 at 9:30:42 AM UTC-7, Kenton Varda wrote:
>>>>>>>>
>>>>>>>> Hi Oguz,
>>>>>>>>
>>>>>>>> You can get better stack traces by compiling in debug mode (both 
>>>>>>>> Cap'n Proto itself, and your project). You should then see a symbolic 
>>>>>>>> trace 
>>>>>>>> instead of a bunch of addresses.
>>>>>>>>
>>>>>>>> This is a strange error, though. Looking at the code for 
>>>>>>>> ReaderArena::tryGetSegment(), the insert() call only happens after a 
>>>>>>>> find() 
>>>>>>>> call looking for the same key has failed. How could the inserted row 
>>>>>>>> already exist, then?
>>>>>>>>
>>>>>>>> Moreover, the whole sequence is performed under a mutex lock, 
>>>>>>>> seemingly ruling out any race conditions.
>>>>>>>>
>>>>>>>> I'm not sure what to say here. If you can come up with a 
>>>>>>>> self-contained test case that reproduces the issue, I'd be happy to 
>>>>>>>> debug.
>>>>>>>>
>>>>>>>> -Kenton
>>>>>>>>
>>>>>>>> On Tue, Oct 1, 2019 at 9:10 AM <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Thanks for this amazing software.
>>>>>>>>>
>>>>>>>>> We are using v0.7.0. I would like to ask help debugging the 
>>>>>>>>> following exception which we rarely but consistently get:
>>>>>>>>>
>>>>>>>>> terminate called after throwing an instance of 'kj::ExceptionImpl'
>>>>>>>>>   what():  kj/table.c++:44: failed: inserted row already exists in 
>>>>>>>>> table
>>>>>>>>> stack: 7f6f7f0697 7f6f7f1ee3 7f6f802623 7f6f5dc9ab 7f6f5b4823 
>>>>>>>>> 7f77fdd5bf 557c6f2957 557c63df2b 7f78a30e13 7f78b0f087
>>>>>>>>>
>>>>>>>>> Our backtrace shows that we were trying to read from a proto, then 
>>>>>>>>> the following two functions in capnp were called:
>>>>>>>>>
>>>>>>>>>    - kj::Table<kj::HashMap<unsigned int, 
>>>>>>>>>    kj::Own<capnp::_::SegmentReader> >::Entry, 
>>>>>>>>>    kj::HashIndex<kj::HashMap<unsigned int, 
>>>>>>>>> kj::Own<capnp::_::SegmentReader> 
>>>>>>>>>    >::Callbacks> >::insert(kj::HashMap<unsigned int, 
>>>>>>>>>    kj::Own<capnp::_::SegmentReader> >::Entry&&)
>>>>>>>>>    - capnp::_::ReaderArena::tryGetSegment(kj::Id<unsigned int, 
>>>>>>>>>    capnp::_::Segment>)
>>>>>>>>>
>>>>>>>>> Why would reading from proto trigger an insert call?
>>>>>>>>>
>>>>>>>>> How can I make use of the "stack: 7f6f7f0697 7f6f7f1ee3..." to 
>>>>>>>>> debug this further?
>>>>>>>>>
>>>>>>>>> If the way we read proto is OK, can this perhaps be caused by how 
>>>>>>>>> we populate the proto contents? Perhaps some missing or corrupt 
>>>>>>>>> members?
>>>>>>>>>
>>>>>>>>> Thank you very much for help in advance,
>>>>>>>>> Oguz
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "Cap'n Proto" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Cap'n Proto" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Cap'n Proto" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/capnproto/245fd14b-0061-4346-af50-3804baef15f5%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/capnproto/245fd14b-0061-4346-af50-3804baef15f5%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/398ecae8-ee0d-49e7-bd94-cf384a465a98%40googlegroups.com.

Reply via email to