Hi All,

when we try to use ping cmd with multi woker thread, we found sometimes the 
ping delay is nearly 1000ms eventhough the two site is directly linked to each 
other.

we found in the function signal_ip46_icmp_replay_event, it used 
vlib_process_signal_event to notify the ping_response msg.

but the packet receive thread is not the same with the cli thread, this may 
cause the notify failed. 

And in run_ping_ip46_address, it must depend on the timeout the get the 
icmp_reply, so the delay value is mainly decided by the ping interval(1000ms).
//hi,yanzhang,you can try this patch.I fix it when there is one worker thread.

https://gerrit.fd.io/r/#/c/6688/

ping command does not work when there is woker thread (VPP-844)
















王辉 wanghui






IT开发工程师 IT Development
Engineer
虚拟化南京四部/无线研究院/无线产品经营部 NIV Nanjing Dept. IV/Wireless Product R&D 
Institute/Wireless Product Operation Division









南京
E: wang.hu...@zte.com.cn 
www.zte.com.cn






原始邮件



发件人: <leiyanzh...@raydonetworks.com>
收件人: <vpp-dev@lists.fd.io>
日 期 :2017年06月21日 16:58
主 题 :[vpp-dev] 【vpp-dev】delay is error in ping with multi worker thread





Hi All,
when we try to use ping cmd with multi woker thread, we found sometimes the 
ping delay is nearly 1000ms eventhough the two site is directly linked to each 
other.
we found in the function signal_ip46_icmp_replay_event, it used 
vlib_process_signal_event to notify the ping_response msg.
but the packet receive thread is not the same with the cli thread, this may 
cause the notify failed. 
And in run_ping_ip46_address, it must depend on the timeout the get the 
icmp_reply, so the delay value is mainly decided by the ping interval(1000ms).

we have changed to notify the icmp_reply msg by use 
vl_api_rpc_call_main_thread, and this will use rpc callback to notify the msg 
int the main thread.
but after the change, we fould sometimes the delay is nearly 10ms.
and we found the main thread is always block in linux_epoll_input, where a 
epoll_pwait is used. 
and this will make sometimes the rpc callback function will be called with a 
10ms delay.

with the ping fuction, we can record the pkt receive time to avoid the problem. 
But we found vl_api_rpc_call_main_thread was used in someother places, for 
examle bfd_rpc_update_session. 
and in the callback function bfd_rpc_update_session_cb, it used 
clib_cpu_time_now. and if the callback function can not be called immediately, 
this will import 10ms delay. In some situation, this will make bfd check error.

Is our analysis is right,  or we missed something?

our ping change is below:
typedef struct 
{
    u8 event_type
    uword ping_run_index
    f64   work_time
    icmp4_echo_request_header_t icmp4_header
    icmp6_echo_request_header_t icmp6_header
}ping_reply_event_arg_t


static void
set_ping_reply_rpc_callback (ping_reply_event_arg_t * a)
{
    ping_main_t *pm = &ping_main
    vlib_main_t *vm = vlib_get_main ()
    ASSERT (os_get_cpu_number () == 0)
    u8 event_type = a->event_type
    u32 bi0_copy
      ping_run_t *pr = vec_elt_at_index (pm->ping_runs, a->ping_run_index)
    if (vlib_buffer_alloc (vm, &bi0_copy, 1) == 1)
    {
        void *dst = vlib_buffer_get_current (vlib_get_buffer (vm, bi0_copy))
        if (PING_RESPONSE_IP4 == event_type)
        {
            clib_memcpy (dst, &(a->icmp4_header), 
sizeof(icmp4_echo_request_header_t))
        }
        else
        {
            clib_memcpy (dst, &(a->icmp6_header), 
sizeof(icmp6_echo_request_header_t))
        }
    }
    f64 rtt = vlib_time_now (vm) - a->icmp4_header.icmp_echo.time_sent    
    vlib_process_signal_event (vm, pr->cli_process_id, event_type, bi0_copy)
}


/*
 * If we can find the ping run by an ICMP ID, then we send the signal
 * to the CLI process referenced by that ping run, alongside with
 * a freshly made copy of the packet.
 * I opted for a packet copy to keep the main packet processing path
 * the same as for all the other nodes.
 *
 */
static int
signal_ip46_icmp_reply_event (vlib_main_t * vm,
                              u8 event_type, vlib_buffer_t * b0)
{
    ping_main_t *pm = &ping_main
    u16 net_icmp_id = 0
    ping_reply_event_arg_t args
        args.event_type = event_type
        switch (event_type)
    {
        case PING_RESPONSE_IP4:
        {
            icmp4_echo_request_header_t *h0 = vlib_buffer_get_current (b0)
            net_icmp_id = h0->icmp_echo.id
            clib_memcpy (&(args.icmp4_header), h0, 
sizeof(icmp4_echo_request_header_t))
        }
        break
        case PING_RESPONSE_IP6:
        {
            icmp6_echo_request_header_t *h0 = vlib_buffer_get_current (b0)
            net_icmp_id = h0->icmp_echo.id
            clib_memcpy (&(args.icmp6_header), h0, 
sizeof(icmp6_echo_request_header_t))
        }
    }
        uword *p = hash_get (pm->ping_run_by_icmp_id,
                       clib_net_to_host_u16 (net_icmp_id))
    if (!p)
    {
        return 0
    }
    args.work_time = vlib_time_now (vm)
    args.ping_run_index = p[0]
    vl_api_rpc_call_main_thread (set_ping_reply_rpc_callback,
                               (u8 *) & args, sizeof (args))
    return 1
}
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to