> -----Original Message-----
> From: Zhou, YidingX <yidingx.z...@intel.com>
> Sent: Wednesday, September 21, 2022 3:15 PM
> To: Stephen Hemminger <step...@networkplumber.org>; Zhang, Qi Z
> <qi.z.zh...@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.bura...@intel.com>; He,
> Xingguang <xingguang...@intel.com>; sta...@dpdk.org
> Subject: RE: [PATCH v2] net/pcap: fix timeout of stopping device
>
>
>
> > -----Original Message-----
> > From: Stephen Hemminger <mailto:step...@networkplumber.org>
> > Sent: Tuesday, September 6, 2022 10:58 PM
> > To: Zhou, YidingX <mailto:yidingx.z...@intel.com>
> > Cc: mailto:dev@dpdk.org; Zhang, Qi Z <mailto:qi.z.zh...@intel.com>;
> > Burakov, Anatoly
> > <mailto:anatoly.bura...@intel.com>; He, Xingguang
> > <mailto:xingguang...@intel.com>;
> > mailto:sta...@dpdk.org
> > Subject: Re: [PATCH v2] net/pcap: fix timeout of stopping device
> >
> > On Tue, 6 Sep 2022 16:05:11 +0800
> > Yiding Zhou <mailto:yidingx.z...@intel.com> wrote:
> >
> > > The pcap file will be synchronized to the disk when stopping the device.
> > > It takes a long time if the file is large that would cause the
> > > 'detach sync request' timeout when the device is closed under
> > > multi-process scenario.
> > >
> > > This commit fixes the issue by using alarm handler to release dumper.
> > >
> > > Fixes: 0ecfb6c04d54 ("net/pcap: move handler to process private")
> > > Cc: mailto:sta...@dpdk.org
> > >
> > > Signed-off-by: Yiding Zhou <mailto:yidingx.z...@intel.com>
> >
> >
> > I think you need to redesign the handshake if this the case.
> > Forcing 30 second delay at the end of all uses of pcap is not acceptable.
>
> @Zhang, Qi Z Do we need to redesign the handshake to fix this?
Hi, Ferruh
Sorry for the late reply.
I did not receive your email on Oct 6, I got your comments from patchwork.
"Can you please provide more details on multi-process communication and
call trace, to help us think about a solution to address this issue in a
more generic way (not just for pcap but for any case device close takes
more than multi-process timeout)?"
I try to explain this issue with a sequence diagram, hope it can be displayed
correctly in the mail.
thread intr thread intr thread
thread
of secondary of secondary of primary
of primary
| |
| |
| |
| |
rte_eal_hotplug_remove
rte_dev_remove
eal_dev_hotplug_request_to_primary
rte_mp_request_sync ------------------------------------------------------->|
|
handle_secondary_request
|<-----------------|
|
__handle_secondary_request
eal_dev_hotplug_request_to_secondary
|<------------------------------------- rte_mp_request_sync
|
handle_primary_request--------->|
|
__handle_primary_request
local_dev_remove(this will take long time)
rte_mp_reply
-------------------------------->|
|
local_dev_remove
|<------------------------------------------------- rte_mp_reply
The marked 'local_dev_remove()' in the secondary process will perform a pcap
file synchronization operation.
When the pcap file is too large, it will take a lot of time (according to my
test 100G takes 20+ seconds).
This caused the processing of hot_plug message to time out.