And, yes, out-of-order messages are totally fine ----- we just have to be
careful with the design.

- Michael

On Sun, Dec 20, 2015 at 3:08 PM, Michael R. Hines <mhi...@digitalocean.com>
wrote:

> Adding such a control message would defeat the benefits of RDMA, as there
> shouldn't be any signalling in the actual DMA path, or RDMA latency would
> be too high. If you're sending control messages for individual writes, then
> you need to change up your design. It's OK to design ACKs for groups of
> writes, depending on the requirements.
>
> So, the out-of-order issue you're seeing is only with your new message,
> not the original messages?
>
> Can you describe/document it in more detail so I can help advise?
>
> - Michael
>
> On Mon, Dec 14, 2015 at 6:53 PM, Dr. David Alan Gilbert <
> dgilb...@redhat.com> wrote:
>
>> * Michael R. Hines (mhi...@digitalocean.com) wrote:
>> > David,
>> >
>> > Thanks for including my email directly. It helps a lot.
>> >
>> > Below, I'm going to assume that only "dest" is calling
>> > qemu_rdma_exchange_recv()
>> > and only src is calling qemu_rdma_exchange_send(), since you didn't
>> specify
>> > who
>> > is sending and who is receiving.
>> >
>> > If that assumption is wrong, please respond again.
>>
>> That's correct.
>>
>> > Comments inline.....
>> >
>> > On Sat, Dec 12, 2015 at 1:48 AM, Dr. David Alan Gilbert <
>> dgilb...@redhat.com
>> > > wrote:
>> >
>> > > Hi Michael,
>> > >    I think I've got an RDMA race condition, but I'm being a little
>> > > cautious at the moment and wondered if you agree with the following
>> > > diagnosis.
>> > >
>> > > It's showing up in a world of mine that's sending more control
>> messages
>> > > from the destination->source and I'm seeing the following.
>> > >
>> > > We normally expect:
>> > >
>> > >    src                        dest
>> > >      ----------->control ready->
>> > >
>> >
>> > If src is sending, this is not correct. Dest should send the ready
>> message
>> > if it is receiving, not src, which breaks the above assumption. So, I'll
>> > reverse the assumption previously and continue with your observation and
>> > assume that src is receiving instead of dest, which should instead look
>> > like:
>>
>> Gah! Yes, I got the label the wrong way around; it's dest sending control
>> ready.
>>
>> > src  (receiving)                      dest (sending)
>> >      ----------->control ready->
>> >
>> >
>> >
>> > >    Sees SEND_CONTROL signal to ack that it has been sent
>> > >
>> >
>> > I'll assume here that you meant that dest sees the ready message and is
>> > then later sends something.
>> >
>> >
>> > >          <-----control message--
>> > >    Sees RECV_CONTROL message from dest
>> > >
>> > >
>> > Similar assumption for the receiver (src).
>> >
>> >
>> > > but what I'm seeing is:
>> > >    src                        dest
>> > >      ----------->control ready->
>> > >          <-----control message--
>> > >    Sees RECV_CONTROL message from dest
>> > >
>> >
>> > hmmmmm....
>> >
>> >
>> > >    Sees SEND_CONTROL signal to ack that it has been sent
>> > >
>> > >
>> > There's not enough information here....... do you have a multi-threaded
>> > send or receive or something?
>>
>> No, I've been trying to wire RDMA into the COLO fault-tolerant setup;
>> so the change which got me to trigger this bug was that I'd
>> added a new control message 'notify write' which explicitly
>> told the destination it had a page written to; at the RDMA level
>> that was the only change.
>>
>> > Do the work request IDs match up?
>>
>> Yes I think so; I also added a sequence number to the 'ready' messages
>> to check I wasn't losing one.
>> I had a chat to one of our RDMA guys (Doug Ledford) and he said
>> it's perfectly legal for RDMA to take longer to return the signal
>> from the send than for the round trip of the destination responding;
>> the 'signal' doesn't happen until an ack has been received from the
>> destination card anyway, so the ack can get delayed or retried.
>> So I think we do need to fix this; the question then is how do we fix
>> it for all control messages without breaking anything else.   Are there
>> any cases that rely on having received the signal from the send before
>> continuing, or could i just do what I'm doing for all control messages?
>>
>> Dave
>>
>> > - Michael
>> --
>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>>
>
>
>
> --
> /*
>  * Michael R. Hines
>  * https://michael.hinespot.com
>  */
>



-- 
/*
 * Michael R. Hines
 * https://michael.hinespot.com
 */

Reply via email to