> >>> On Tue, 6 Sep 2022 16:05:11 +0800 > >>> Yiding Zhou <mailto:yidingx.z...@intel.com> wrote: > >>> > >>>> The pcap file will be synchronized to the disk when stopping the device. > >>>> It takes a long time if the file is large that would cause the > >>>> 'detach sync request' timeout when the device is closed under > >>>> multi-process scenario. > >>>> > >>>> This commit fixes the issue by using alarm handler to release dumper. > >>>> > >>>> Fixes: 0ecfb6c04d54 ("net/pcap: move handler to process private") > >>>> Cc: mailto:sta...@dpdk.org > >>>> > >>>> Signed-off-by: Yiding Zhou <mailto:yidingx.z...@intel.com> > >>> > >>> > >>> I think you need to redesign the handshake if this the case. > >>> Forcing 30 second delay at the end of all uses of pcap is not acceptable. > >> > >> @Zhang, Qi Z Do we need to redesign the handshake to fix this? > > > > Hi, Ferruh > > Sorry for the late reply. > > I did not receive your email on Oct 6, I got your comments from patchwork. > > > > "Can you please provide more details on multi-process communication > > and call trace, to help us think about a solution to address this > > issue in a more generic way (not just for pcap but for any case device > > close takes more than multi-process timeout)?" > > > > I try to explain this issue with a sequence diagram, hope it can be > > displayed > correctly in the mail. > > > > thread intr thread intr > > thread thread > > of secondary of secondary of primary > > of primary > > | | > > | | > > | | > > | | > > rte_eal_hotplug_remove > > rte_dev_remove > > eal_dev_hotplug_request_to_primary > > rte_mp_request_sync > > ------------------------------------------------------->| > > > > | > > > handle_secondary_request > > > > |<-----------------| > > > > | > > > > __handle_secondary_request > > > > eal_dev_hotplug_request_to_secondary > > |<------------------------------------- rte_mp_request_sync > > | > > handle_primary_request--------->| > > | > > __handle_primary_request > > local_dev_remove(this will take long time) > > rte_mp_reply > > -------------------------------->| > > > > | > > > > local_dev_remove > > |<------------------------------------------------- > > rte_mp_reply > > > > The marked 'local_dev_remove()' in the secondary process will perform a > pcap file synchronization operation. > > When the pcap file is too large, it will take a lot of time (according to > > my test > 100G takes 20+ seconds). > > This caused the processing of hot_plug message to time out. > > Hi Yiding, > > Thanks for the information, > > Right now all MP operations timeout is hardcoded in the code and it is 5 > seconds. > Do you think does it work to have an API to set custom timeout, something like > `rte_mp_timeout_set()`, and call this from pdump? > > This gives a generic solution for similar cases, not just for pcap. > But my concern is if this is too much multi-process related internal detail to > update, @Anatoly may comment on this.
Hi, Ferruh For pdump case only, I think the timeout is affected by pcap's size and other system components, such as the type of FS, system memory size. It may be difficult to predict the specific time value for setting.