distributor: synchronize lcores statistics

Lukasz Wojciechowski Fri, 16 Oct 2020 20:35:47 -0700


W dniu 16.10.2020 o 17:42, Honnappa Nagarahalli pisze:
> <snip>
>
>> W dniu 16.10.2020 o 14:43, Lukasz Wojciechowski pisze:
>>> Hi Honnappa,
>>>
>>> Thank you for your answer.
>>> In the current v7 version I followed your advise and used RELAXED memory
>> model.
>>> And it works without any issues. I guess after fixing other issues found
>> since v4 the distributor works more stable.
>>> I didn't have time to rearrange all tests in the way I proposed, but I 
>>> guess if
>> they work like this it's not a top priority.
> Agree, not a top priority.
>
>>> Can you give an ack on the series? I believe David Marchand is waiting for
>> your opinion to process it.
>> I'm sorry I didn't see your other comments. I'll try to fix them them today.
> No problem, I can review the next series quickly.
So it's already there, the v8.
but you have one patch more to look at :)
I was ready to submit new version, but I run some tests ... and one of 
executions hanged, so I attached myself with gdb and found one issue more.
>
>>> Best regards
>>> Lukasz
>>>
>>> W dniu 16.10.2020 o 07:43, Honnappa Nagarahalli pisze:
>>>> <snip>
>>>>
>>>>> Hi Honnappa,
>>>>>
>>>>> Many thanks for the review!
>>>>>
>>>>> I'll write my answers here not inline as it would be easier to read
>>>>> them in one place, I think.
>>>>> So first of all I agree with you in 2 things:
>>>>> 1) all uses of statistics must be atomic and lack of that caused
>>>>> most of the problems
>>>>> 2) it would be better to replace barrier and memset in
>>>>> clear_packet_count() with atomic stores as you suggested
>>>>>
>>>>> So I will apply both of above.
>>>>>
>>>>> However I wasn't not fully convinced on changing acquire/release to
>> relaxed.
>>>>> It wood be perfectly ok if it would look like in this Herb Sutter's 
>>>>> example:
>>>>> https://youtu.be/KeLBd2[]  EJLOU?t=4170 But in his case the counters
>>>>> are cleared before worker threads start and are printout after they
>>>>> are completed.
>>>>>
>>>>> In case of the dpdk distributor tests both worker and main cores are
>>>>> running at the same time. In the sanity_test, the statistics are
>>>>> cleared and verified few times for different hashes of packages. The
>>>>> worker cores are not stopped at this time and they continue their loops
>> in handle procedure.
>>>>> Verification made in main core is an exchange of data as the current
>>>>> statistics indicate how the test will result.
>>>> Agree. The key point we have to note is that the data that is exchanged
>> between the two threads is already atomic (handled_packets is atomic).
>>>>> So as I wasn't convinced, I run some tests with both both relaxed
>>>>> and acquire/release modes and they both fail :( The failures caused
>>>>> by statistics errors to number of tests ratio for
>>>>> 200000 tests was:
>>>>> for relaxed: 0,000790562
>>>>> for acq/rel: 0,000091321
>>>>>
>>>>>
>>>>> That's why I'm going to modify tests in such way, that they would:
>>>>> 1) clear statistics
>>>>> 2) launch worker threads
>>>>> 3) run test
>>>>> 4) wait for workers procedures to complete
>>>>> 5) check stats, verify results and print them out
>>>>>
>>>>> This way worker main core will use (clear or verify) stats only when
>>>>> there are no worker threads. This would make things simpler and
>>>>> allowing to focus on testing the distributor not tests. And of
>>>>> course relaxed mode would be enough!
>>>> Agree, this would be the only way to ensure that the main thread sees
>>>> the correct statistics (just like in the video)
>>>>
>>>>> Best regards
>>>>> Lukasz
>>>>>
>>>>>
>>>>> W dniu 29.09.2020 o 07:49, Honnappa Nagarahalli pisze:
>>>>>> <snip>
>>>>>>
>>>>>>> Statistics of handled packets are cleared and read on main lcore,
>>>>>>> while they are increased in workers handlers on different lcores.
>>>>>>>
>>>>>>> Without synchronization occasionally showed invalid values.
>>>>>>> This patch uses atomic acquire/release mechanisms to synchronize.
>>>>>> In general, load-acquire and store-release memory orderings are
>>>>>> required
>>>>> while synchronizing data (that cannot be updated atomically) between
>>>>> threads. In the situation, making counters atomic is enough.
>>>>>>> Fixes: c3eabff124e6 ("distributor: add unit tests")
>>>>>>> Cc: [email protected]
>>>>>>> Cc: [email protected]
>>>>>>>
>>>>>>> Signed-off-by: Lukasz Wojciechowski
>>>>>>> <[email protected]>
>>>>>>> Acked-by: David Hunt <[email protected]>
>>>>>>> ---
>>>>>>>      app/test/test_distributor.c | 39
>>>>>>> ++++++++++++++++++++++++-----------
>>>>> --
>>>>>>>      1 file changed, 26 insertions(+), 13 deletions(-)
>>>>>>>
>>>>>>> diff --git a/app/test/test_distributor.c
>>>>>>> b/app/test/test_distributor.c index
>>>>>>> 35b25463a..0e49e3714 100644
>>>>>>> --- a/app/test/test_distributor.c
>>>>>>> +++ b/app/test/test_distributor.c
>>>>>>> @@ -43,7 +43,8 @@ total_packet_count(void)  {
>>>>>>>         unsigned i, count = 0;
>>>>>>>         for (i = 0; i < worker_idx; i++)
>>>>>>> -               count += worker_stats[i].handled_packets;
>>>>>>> +               count +=
>>>>>>> __atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +                               __ATOMIC_ACQUIRE);
>>>>>> RELAXED memory order is sufficient. For ex: the worker threads are
>>>>>> not
>>>>> 'releasing' any data that is not atomically updated to the main thread.
>>>>>>>         return count;
>>>>>>>      }
>>>>>>>
>>>>>>> @@ -52,6 +53,7 @@ static inline void
>>>>>>>      clear_packet_count(void)
>>>>>>>      {
>>>>>>>         memset(&worker_stats, 0, sizeof(worker_stats));
>>>>>>> +       rte_atomic_thread_fence(__ATOMIC_RELEASE);
>>>>>> Ideally, the counters should be set to 0 atomically rather than
>>>>>> using a
>>>>> memset.
>>>>>>>      }
>>>>>>>
>>>>>>>      /* this is the basic worker function for sanity test @@ -72,13
>>>>>>> +74,13 @@ handle_work(void *arg)
>>>>>>>         num = rte_distributor_get_pkt(db, id, buf, buf, num);
>>>>>>>         while (!quit) {
>>>>>>>
>>      __atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>>>> num,
>>>>>>> -                               __ATOMIC_RELAXED);
>>>>>>> +                               __ATOMIC_ACQ_REL);
>>>>>> Using the __ATOMIC_ACQ_REL order does not mean anything to the
>> main
>>>>> thread. The main thread might still see the updates from different
>>>>> threads in different order.
>>>>>>>                 count += num;
>>>>>>>                 num = rte_distributor_get_pkt(db, id,
>>>>>>>                                 buf, buf, num);
>>>>>>>         }
>>>>>>>         __atomic_fetch_add(&worker_stats[id].handled_packets,
>> num,
>>>>>>> -                       __ATOMIC_RELAXED);
>>>>>>> +                       __ATOMIC_ACQ_REL);
>>>>>> Same here, do not see why this change is required.
>>>>>>
>>>>>>>         count += num;
>>>>>>>         rte_distributor_return_pkt(db, id, buf, num);
>>>>>>>         return 0;
>>>>>>> @@ -134,7 +136,8 @@ sanity_test(struct worker_params *wp, struct
>>>>>>> rte_mempool *p)
>>>>>>>
>>>>>>>         for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>                 printf("Worker %u handled %u packets\n", i,
>>>>>>> -                               worker_stats[i].handled_packets);
>>>>>>> +
>>      __atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +                                       __ATOMIC_ACQUIRE));
>>>>>> __ATOMIC_RELAXED is enough.
>>>>>>
>>>>>>>         printf("Sanity test with all zero hashes done.\n");
>>>>>>>
>>>>>>>         /* pick two flows and check they go correctly */ @@ -159,7
>>>>>>> +162,9 @@ sanity_test(struct worker_params *wp, struct
>> rte_mempool
>>>>>>> *p)
>>>>>>>
>>>>>>>                 for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>                         printf("Worker %u handled %u packets\n", i,
>>>>>>> -
>>      worker_stats[i].handled_packets);
>>>>>>> +                               __atomic_load_n(
>>>>>>> +
>>      &worker_stats[i].handled_packets,
>>>>>>> +                                       __ATOMIC_ACQUIRE));
>>>>>> __ATOMIC_RELAXED is enough
>>>>>>
>>>>>>>                 printf("Sanity test with two hash values done\n");
>>>>>>>         }
>>>>>>>
>>>>>>> @@ -185,7 +190,8 @@ sanity_test(struct worker_params *wp, struct
>>>>>>> rte_mempool *p)
>>>>>>>
>>>>>>>         for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>                 printf("Worker %u handled %u packets\n", i,
>>>>>>> -                               worker_stats[i].handled_packets);
>>>>>>> +
>>      __atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +                                       __ATOMIC_ACQUIRE));
>>>>>> __ATOMIC_RELAXED is enough
>>>>>>
>>>>>>>         printf("Sanity test with non-zero hashes done\n");
>>>>>>>
>>>>>>>         rte_mempool_put_bulk(p, (void *)bufs, BURST); @@ -280,15
>>>>>>> +286,17 @@ handle_work_with_free_mbufs(void *arg)
>>>>>>>                 buf[i] = NULL;
>>>>>>>         num = rte_distributor_get_pkt(d, id, buf, buf, num);
>>>>>>>         while (!quit) {
>>>>>>> -               worker_stats[id].handled_packets += num;
>>>>>>>                 count += num;
>>>>>>> +
>>      __atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>>>> num,
>>>>>>> +                               __ATOMIC_ACQ_REL);
>>>>>> IMO, the problem would be the non-atomic update of the statistics.
>>>>>> So, __ATOMIC_RELAXED is enough
>>>>>>
>>>>>>>                 for (i = 0; i < num; i++)
>>>>>>>                         rte_pktmbuf_free(buf[i]);
>>>>>>>                 num = rte_distributor_get_pkt(d,
>>>>>>>                                 id, buf, buf, num);
>>>>>>>         }
>>>>>>> -       worker_stats[id].handled_packets += num;
>>>>>>>         count += num;
>>>>>>> +       __atomic_fetch_add(&worker_stats[id].handled_packets,
>> num,
>>>>>>> +                       __ATOMIC_ACQ_REL);
>>>>>> Same here, the problem is non-atomic update of the statistics,
>>>>> __ATOMIC_RELAXED is enough.
>>>>>> Similarly, for changes below, __ATOMIC_RELAXED is enough.
>>>>>>
>>>>>>>         rte_distributor_return_pkt(d, id, buf, num);
>>>>>>>         return 0;
>>>>>>>      }
>>>>>>> @@ -363,8 +371,9 @@ handle_work_for_shutdown_test(void *arg)
>>>>>>>         /* wait for quit single globally, or for worker zero, wait
>>>>>>>          * for zero_quit */
>>>>>>>         while (!quit && !(id == zero_id && zero_quit)) {
>>>>>>> -               worker_stats[id].handled_packets += num;
>>>>>>>                 count += num;
>>>>>>> +
>>      __atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>>>> num,
>>>>>>> +                               __ATOMIC_ACQ_REL);
>>>>>>>                 for (i = 0; i < num; i++)
>>>>>>>                         rte_pktmbuf_free(buf[i]);
>>>>>>>                 num = rte_distributor_get_pkt(d, @@ -379,10
>> +388,11 @@
>>>>>>> handle_work_for_shutdown_test(void *arg)
>>>>>>>
>>>>>>>                 total += num;
>>>>>>>         }
>>>>>>> -       worker_stats[id].handled_packets += num;
>>>>>>>         count += num;
>>>>>>>         returned = rte_distributor_return_pkt(d, id, buf, num);
>>>>>>>
>>>>>>> +       __atomic_fetch_add(&worker_stats[id].handled_packets,
>> num,
>>>>>>> +                       __ATOMIC_ACQ_REL);
>>>>>>>         if (id == zero_id) {
>>>>>>>                 /* for worker zero, allow it to restart to pick up last
>> packet
>>>>>>>                  * when all workers are shutting down.
>>>>>>> @@ -394,10 +404,11 @@ handle_work_for_shutdown_test(void *arg)
>>>>>>>                                 id, buf, buf, num);
>>>>>>>
>>>>>>>                 while (!quit) {
>>>>>>> -                       worker_stats[id].handled_packets += num;
>>>>>>>                         count += num;
>>>>>>>                         rte_pktmbuf_free(pkt);
>>>>>>>                         num = rte_distributor_get_pkt(d, id, buf, buf,
>> num);
>>>>>>> +
>>>>>>>         __atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>>>> +                                       num, __ATOMIC_ACQ_REL);
>>>>>>>                 }
>>>>>>>                 returned = rte_distributor_return_pkt(d,
>>>>>>>                                 id, buf, num);
>>>>>>> @@ -461,7 +472,8 @@ sanity_test_with_worker_shutdown(struct
>>>>>>> worker_params *wp,
>>>>>>>
>>>>>>>         for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>                 printf("Worker %u handled %u packets\n", i,
>>>>>>> -                               worker_stats[i].handled_packets);
>>>>>>> +
>>      __atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +                                       __ATOMIC_ACQUIRE));
>>>>>>>
>>>>>>>         if (total_packet_count() != BURST * 2) {
>>>>>>>                 printf("Line %d: Error, not all packets flushed. "
>>>>>>> @@ -514,7 +526,8 @@ test_flush_with_worker_shutdown(struct
>>>>>>> worker_params *wp,
>>>>>>>         zero_quit = 0;
>>>>>>>         for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>                 printf("Worker %u handled %u packets\n", i,
>>>>>>> -                               worker_stats[i].handled_packets);
>>>>>>> +
>>      __atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +                                       __ATOMIC_ACQUIRE));
>>>>>>>
>>>>>>>         if (total_packet_count() != BURST) {
>>>>>>>                 printf("Line %d: Error, not all packets flushed. "
>>>>>>> --
>>>>>>> 2.17.1
>>>>> --
>>>>> Lukasz Wojciechowski
>>>>> Principal Software Engineer
>>>>>
>>>>> Samsung R&D Institute Poland
>>>>> Samsung Electronics
>>>>> Office +48 22 377 88 25
>>>>> [email protected]
>> --
>> Lukasz Wojciechowski
>> Principal Software Engineer
>>
>> Samsung R&D Institute Poland
>> Samsung Electronics
>> Office +48 22 377 88 25
>> [email protected]


-- 
Lukasz Wojciechowski
Principal Software Engineer

Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
[email protected]

Re: [dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics

Reply via email to