Elena Ufimtseva <elena.ufimts...@oracle.com> writes: > Sometimes multifd sends just sync packet with no pages > (normal_num is 0). In this case the old value is being > preserved and being accounted for while only packet_len > is being transferred. > Reset it to 0 after sending and accounting for. >
Usually you'd finish your commit message here with your Signed off tag and the comments below would be included after the following --- sign. That way we can merge this patch without bring the unrelated discussion along. > TODO: Fix the same packet ids in the stream. > with this patch, there is still an issue with the duplicated > packets ids being sent (with different number of pages/flags). > See in below multifd_send trace (before this change): > multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 > next_packet_size=0x57000 > multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 > next_packet_size=0x57000 > > With this commit there are still duplicated packets, but since no pages > are being sent with sync flag set, next_packet_size is 0: > multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 > next_packet_size=0x7b000 > multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 > flags=0x0 next_packet_size=0x0 > If there is a suggestion how to fix this properly, I will be > glad to use it. What is going on here? Is it that we're incrementing the multifd_send_state->packet_num under different locks? Because p->mutex is per-channel? You could try turning that into an atomic operation instead.