On 12/2/2022 10:13 AM, Zhou, YidingX wrote: > > >>>>> On Tue, 6 Sep 2022 16:05:11 +0800 >>>>> Yiding Zhou <mailto:yidingx.z...@intel.com> wrote: >>>>> >>>>>> The pcap file will be synchronized to the disk when stopping the device. >>>>>> It takes a long time if the file is large that would cause the >>>>>> 'detach sync request' timeout when the device is closed under >>>>>> multi-process scenario. >>>>>> >>>>>> This commit fixes the issue by using alarm handler to release dumper. >>>>>> >>>>>> Fixes: 0ecfb6c04d54 ("net/pcap: move handler to process private") >>>>>> Cc: mailto:sta...@dpdk.org >>>>>> >>>>>> Signed-off-by: Yiding Zhou <mailto:yidingx.z...@intel.com> >>>>> >>>>> >>>>> I think you need to redesign the handshake if this the case. >>>>> Forcing 30 second delay at the end of all uses of pcap is not acceptable. >>>> >>>> @Zhang, Qi Z Do we need to redesign the handshake to fix this? >>> >>> Hi, Ferruh >>> Sorry for the late reply. >>> I did not receive your email on Oct 6, I got your comments from patchwork. >>> >>> "Can you please provide more details on multi-process communication >>> and call trace, to help us think about a solution to address this >>> issue in a more generic way (not just for pcap but for any case device >>> close takes more than multi-process timeout)?" >>> >>> I try to explain this issue with a sequence diagram, hope it can be >>> displayed >> correctly in the mail. >>> >>> thread intr thread intr >>> thread thread >>> of secondary of secondary of primary >>> of primary >>> | | >>> | | >>> | | >>> | | >>> rte_eal_hotplug_remove >>> rte_dev_remove >>> eal_dev_hotplug_request_to_primary >>> rte_mp_request_sync >>> ------------------------------------------------------->| >>> >>> | >>> >> handle_secondary_request >>> >>> |<-----------------| >>> >>> | >>> >>> __handle_secondary_request >>> >>> eal_dev_hotplug_request_to_secondary >>> |<------------------------------------- rte_mp_request_sync >>> | >>> handle_primary_request--------->| >>> | >>> __handle_primary_request >>> local_dev_remove(this will take long time) >>> rte_mp_reply >>> -------------------------------->| >>> >>> | >>> >>> local_dev_remove >>> |<------------------------------------------------- >>> rte_mp_reply >>> >>> The marked 'local_dev_remove()' in the secondary process will perform a >> pcap file synchronization operation. >>> When the pcap file is too large, it will take a lot of time (according to >>> my test >> 100G takes 20+ seconds). >>> This caused the processing of hot_plug message to time out. >> >> Hi Yiding, >> >> Thanks for the information, >> >> Right now all MP operations timeout is hardcoded in the code and it is 5 >> seconds. >> Do you think does it work to have an API to set custom timeout, something >> like >> `rte_mp_timeout_set()`, and call this from pdump? >> >> This gives a generic solution for similar cases, not just for pcap. >> But my concern is if this is too much multi-process related internal detail >> to >> update, @Anatoly may comment on this. > > Hi, Ferruh > For pdump case only, I think the timeout is affected by pcap's size and other > system components, such as the type of FS, system memory size. > It may be difficult to predict the specific time value for setting.
It doesn't have to be specific. Point here is to have a multi process API to set timeout, instead of put a hardcoded timeout in pcap PMD.