Hi Guo Some questions.
From: Guo Jia > As we know, hot plug is an importance feature whenever it use for the > datacenter device's fail-safe and consumption management , or use for the > dynamic deployment and SRIOV Live Migration in SDN/NFV, it could be bring > the higher flexibility and continuality of the networking services in > multiple use > case in industry. > > So let we see, dpdk as an importance networking combine framework with > packet control path/fast path lib and multiple diversity PMD drivers, what > can it > do to help if application want to achieve their hot plug solution when they > are > working in packet processing by dpdk. > > We already have a general device event mechanism, failsafe driver, bonding > driver and hot plug/unplug api in framework, app could use these api to > develop functional, but for the case of hot plug failure handle, that is > removing > a device at run-time will cause app trigger MMIO error and crash out, it is > lack > of a mechanism to handle the failure when hot unplug device. At present, > kernel only guantiy the hotplug handle safer on the kernel side, but for the > user > mode side, no more specific 3rd tools such as udev/driverctl have especially > cover about these part of mechanism, and considerate feasibility of the > implementation, runtime performance and the general for almost user mode > PMD driver, here a general hot plug failure handle mechanism in dpdk > framework would be proposed. > > The hot plug failure handle mechanism should be come across as bellow: > 1. Add a new bus ops "handle_hot-unplug"in bus to handle bus read/write > error, it is bus-specific and each kind of bus can implement its own logic. > 2. Implement pci bus specific ops"pci_handle_hot_unplug", in the function, > base on the failure address to remap memory which belong to the > corresponding device that unplugged. > 3. Implement a new sigbus handler, and register it when start device event > monitoring, once the MMIO sigbus error exposure, it will trigger the above hot > plug failure handle mechanism, that will keep app, that working on packet > processing, would not be broken and crash, then could keep going clean, fail- > safe or other working task. Can you explain more what's happened with all the threads? Master thread, host thread, data-path threads, The signal may happened only in a datapath thread or even from a control thread? What's about resource leak? (mainly relevant for control threads): If you jump from the signal address to the restart address, how can you clean the process which was started and got the signal? Matan. > 4. Also also will introduce the solution by use testpmd to show the example of > the whole procedure like that: > device unplug ->failure handle->stop forwarding->stop port->close port->detach > port. > > Best regards, > > Jeff Guo