> -----Original Message----- > From: vuon...@viettel.com.vn [mailto:vuon...@viettel.com.vn] > Sent: Tuesday, July 18, 2017 2:37 AM > To: Dumitrescu, Cristian <cristian.dumitre...@intel.com> > Cc: us...@dpdk.org; dev@dpdk.org > Subject: Re: [dpdk-dev] Rx Can't receive anymore packet after received 1.5 > billion packet. > > > > On 07/17/2017 05:31 PM, cristian.dumitre...@intel.com wrote: > > > >> -----Original Message----- > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of > >> vuon...@viettel.com.vn > >> Sent: Monday, July 17, 2017 3:04 AM > >> Cc: us...@dpdk.org; dev@dpdk.org > >> Subject: [dpdk-dev] Rx Can't receive anymore packet after received 1.5 > >> billion packet. > >> > >> Hi DPDK team, > >> Sorry when I send this email to both of group users and dev. But I have > >> big problem: Rx core on my application can not receive anymore packet > >> after I did the stress test to it (~1 day Rx core received ~ 1.5 billion > >> packet). Rx core still alive but didn't receive any packet and didn't > >> generate any log. Below is my system configuration: > >> - OS: CentOS 7 > >> - Kernel: 3.10.0-514.16.1.el7.x86_64 > >> - Huge page: 32G: 16384 page 2M > >> - NIC card: Intel 85299 > >> - DPDK version: 16.11 > >> - Architecture: Rx (lcore 1) received packet then queue to the ring > >> ----- Worker (lcore 2) dequeue packet in the ring and free it (use > >> rte_pktmbuf_free() function). > >> - Mempool create: rte_pktmbuf_pool_create ( > >> "rx_pool", /* > >> name */ > >> 8192, /* > >> number of elemements in the mbuf pool */ > >> 256, /* Size of per-core > >> object cache */ > >> 0, /* Size of > >> application private are between rte_mbuf struct and data buffer */ > >> RTE_MBUF_DEFAULT_BUF_SIZE, /* > >> Size of data buffer in each mbuf (2048 + 128)*/ > >> 0 /* socket id */ > >> ); > >> If I change "number of elemements in the mbuf pool" from 8192 to 512, > Rx > >> have same problem after shorter time (~ 30s). > >> > >> Please tell me if you need more information. I am looking forward to > >> hearing from you. > >> > >> > >> Many thanks, > >> Vuong Le > > Hi Vuong, > > > > This is likely to be a buffer leakage problem. You might have a path in your > code where you are not freeing a buffer and therefore this buffer gets > "lost", as the application is not able to use this buffer any more since it > is not > returned back to the pool, so the pool of free buffers shrinks over time up to > the moment when it eventually becomes empty, so no more packets can be > received. > > > > You might want to periodically monitor the numbers of free buffers in your > pool; if this is the root cause, then you should be able to see this number > constantly decreasing until it becomes flat zero, otherwise you should be > able to the number of free buffers oscillating around an equilibrium point. > > > > Since it takes a relatively big number of packets to get to this issue, it > > is > likely that the code path that has this problem is not executed very > frequently: it might be a control plane packet that is not freed up, or an ARP > request/reply pkt, etc. > > > > Regards, > > Cristian > Hi Cristian, > Thanks for your response, I am doing your ideal. But let me show you > another case i have tested before. I changed architecture of my > application as below: > - Architecture: Rx (lcore 1) received packet then queue to the ring > ----- after that: Rx (lcore 1) dequeue packet in the ring and free it > immediately. > (old architecture as above) > With new architecture Rx still receive packet after 2 day and everything > look good. Unfortunately, My application must run in old architecture. > > Any ideal for me? > > > Many thanks, > Vuong Le
I am not sure I understand the old architecture and the new architecture you are referring to, can you please clarify them. Regards, Cristian