And, yes, out-of-order messages are totally fine ----- we just have to be careful with the design.
- Michael On Sun, Dec 20, 2015 at 3:08 PM, Michael R. Hines <mhi...@digitalocean.com> wrote: > Adding such a control message would defeat the benefits of RDMA, as there > shouldn't be any signalling in the actual DMA path, or RDMA latency would > be too high. If you're sending control messages for individual writes, then > you need to change up your design. It's OK to design ACKs for groups of > writes, depending on the requirements. > > So, the out-of-order issue you're seeing is only with your new message, > not the original messages? > > Can you describe/document it in more detail so I can help advise? > > - Michael > > On Mon, Dec 14, 2015 at 6:53 PM, Dr. David Alan Gilbert < > dgilb...@redhat.com> wrote: > >> * Michael R. Hines (mhi...@digitalocean.com) wrote: >> > David, >> > >> > Thanks for including my email directly. It helps a lot. >> > >> > Below, I'm going to assume that only "dest" is calling >> > qemu_rdma_exchange_recv() >> > and only src is calling qemu_rdma_exchange_send(), since you didn't >> specify >> > who >> > is sending and who is receiving. >> > >> > If that assumption is wrong, please respond again. >> >> That's correct. >> >> > Comments inline..... >> > >> > On Sat, Dec 12, 2015 at 1:48 AM, Dr. David Alan Gilbert < >> dgilb...@redhat.com >> > > wrote: >> > >> > > Hi Michael, >> > > I think I've got an RDMA race condition, but I'm being a little >> > > cautious at the moment and wondered if you agree with the following >> > > diagnosis. >> > > >> > > It's showing up in a world of mine that's sending more control >> messages >> > > from the destination->source and I'm seeing the following. >> > > >> > > We normally expect: >> > > >> > > src dest >> > > ----------->control ready-> >> > > >> > >> > If src is sending, this is not correct. Dest should send the ready >> message >> > if it is receiving, not src, which breaks the above assumption. So, I'll >> > reverse the assumption previously and continue with your observation and >> > assume that src is receiving instead of dest, which should instead look >> > like: >> >> Gah! Yes, I got the label the wrong way around; it's dest sending control >> ready. >> >> > src (receiving) dest (sending) >> > ----------->control ready-> >> > >> > >> > >> > > Sees SEND_CONTROL signal to ack that it has been sent >> > > >> > >> > I'll assume here that you meant that dest sees the ready message and is >> > then later sends something. >> > >> > >> > > <-----control message-- >> > > Sees RECV_CONTROL message from dest >> > > >> > > >> > Similar assumption for the receiver (src). >> > >> > >> > > but what I'm seeing is: >> > > src dest >> > > ----------->control ready-> >> > > <-----control message-- >> > > Sees RECV_CONTROL message from dest >> > > >> > >> > hmmmmm.... >> > >> > >> > > Sees SEND_CONTROL signal to ack that it has been sent >> > > >> > > >> > There's not enough information here....... do you have a multi-threaded >> > send or receive or something? >> >> No, I've been trying to wire RDMA into the COLO fault-tolerant setup; >> so the change which got me to trigger this bug was that I'd >> added a new control message 'notify write' which explicitly >> told the destination it had a page written to; at the RDMA level >> that was the only change. >> >> > Do the work request IDs match up? >> >> Yes I think so; I also added a sequence number to the 'ready' messages >> to check I wasn't losing one. >> I had a chat to one of our RDMA guys (Doug Ledford) and he said >> it's perfectly legal for RDMA to take longer to return the signal >> from the send than for the round trip of the destination responding; >> the 'signal' doesn't happen until an ack has been received from the >> destination card anyway, so the ack can get delayed or retried. >> So I think we do need to fix this; the question then is how do we fix >> it for all control messages without breaking anything else. Are there >> any cases that rely on having received the signal from the send before >> continuing, or could i just do what I'm doing for all control messages? >> >> Dave >> >> > - Michael >> -- >> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >> > > > > -- > /* > * Michael R. Hines > * https://michael.hinespot.com > */ > -- /* * Michael R. Hines * https://michael.hinespot.com */