At 2015-03-20 00:48:01, "Zoltan Kiss" <zoltan.k...@linaro.org> wrote: > > >On 19/03/15 03:40, openlui wrote: >> Hi, all: >> >> I am trying to use a HVM with PCI pass-through NIC as network driver domain. >> However, when I send packets whose size are larger than 128 bytes from DomU >> using pkt-gen tools, after several seconds, the network between driver >> domain and destination host will be blocked. >> >> The networking structure when testing is shown below: >> Pkt-gen (in DomU) <--> Virtual Eth (in DomU) <---> VIF (in Driver Domain) >> <--> OVS (in Driver Domain) <--> pNIC (passthrough nic in Driver Domain) >> <---> Another Host >> >> The summarized results are as follows: >> 1. When we just ping from DomU to another host, the network seems ok. >> 2. When sending 64 or 128 bytes UDP packets from DomU, the network will not >> be blocked >> 3. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and if the >> scatter-gather feature of passthrough NIC in driver domain is on, the >> network will be blocked >> 4. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and only if >> the scatter-gather feature of passthrough NIC in driver domain is off, the >> network will not be blocked >> >> As shown in detailed syslog below, when network is blocked, it seems that >> the passthrough NIC's driver entry an exception state and the tx queue is >> hung. >> As far as I know, when sending 64 or 128 bytes package, the skb generated by >> netback only has the linearized data, and the data is stored in the PAGE >> allocated from the driver domain's memory. But for packets whose size is >> larger than 128 bytes, the skb will also has a frag page which is grant >> mapped from DomU's memory. And if we disable the scatter-gather feature of >> NIC, the skb sent from netback will be linearized firstly, and it will make >> the skb's data is stored in the PAGE allocated from the driver domain other >> than the DomU's memory. >Yes, you are correct: the first slot (at most 128 bytes from it) is >grant copied to a locally allocated skb, whilst the rest is grant mapped >from the guest's memory in this case. >> >> I am wondering if it is the problem caused by PCI-passthrough and DMA >> operations, or if there is some wrong configuration in our environment. How >> can I continue to debug this problem? I am looking forward to your replay >> and advice, Thanks. >> >> The environment we used are as follows: >> a. Dom0: SUSE 12 (kernel: 3.12.28) >> b. XEN: 4.4.1_0602.2 (provided by SUSE 12) >> c. DomU: kernel 3.17.4 >> d. Driver Domain: kernel 3.17.8 >I would try out an upstream kernel, there were some grant mapping >changes recently, maybe that solves your issue. >Also, have you set the kernel's loglevel to DEBUG? >ixgbe also has a modul parameter to enable further logging.
Thanks for your advice. I have tried the xen 4.4.2 and xen 4.5, and found that under xen 4.5, the problem is solved.Then after bisecting, I find it is the commit 203746bc36b41443d0eec78819f153fb59bc68d1 ([1]) which solves this problem. After learning the patch of this commit, I find that this commit will not only map iommu pages whose p2m types are p2m_ram_rw but also other type pages when the hap_ept page table is not shared. I have some other questions as follows about this commit: 1. About the hap_ept page table share, does it mean that the page table used by ept and IOMMU is shared? And does it need the hardware support "Large Intel VT-d Pages" features? ([2]) 2. What is the meaning and difference among the p2m types (p2m_ram_**, p2m_grant_map_**, p2m_ram_logdirty and p2m_map_foreign) listed in the commit above? Is there some documents about it? Our hosts under testing does have no support for "Intel VT-d Shared EPT tables", which can be confirmed from the "xl dmesg" output in Dom0. I will try to find hosts which support this feature these days, and do the testing and look if there is similar problem with the hap_ept_pt_share enabled. But it seems that at least for 4.4.2 version, this problem does exist and maybe the commit above needed to be merged into 4.4.2 version? [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=203746bc36b41443d0eec78819f153fb59bc68d1 [2] http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/virtualization-enabling-intel-virtualization-technology-features-and-benefits-paper.pdf >> e. OVS: 2.1.2 >> f. Host: Huawei RH2288, CPU Intel Xenon E5645@2.40GHz, disabled HyperThread, >> enabled VT-d >> g. pNIC: we tried Intel 82599 10GE NIC (ixgbe v3.23.2), Intel 82576 1GE NIC >> (igb) and Broadcom NetXtreme II BCM 5709 1GE NIC (bnx2 v2.2.5) >> h. para-virtulization driver: netfront/netback >> i. MTU: 1500 >> >> The detailed Logs in Driver Domain after the network is blocked are as >> follows: >> 1. When using 82599 10GE NIC, syslog and dmesg includes infos below. The log >> shows that the Tx unit Hang is detected and driver will try to reset the >> adapter repeatly, however, the network is still blocked. >> >> <snip> >> ixgbe: 0000:00:04.0 eth10: Detected Tx Unit Hang >> Tx Queue <0> >> TDH, TDT <1fd>, <5a> >> next_to_use <5a> >> next_to_clean <1fc> >> ixgbe: 0000:00:04.0 eth0: tx hang 11 detected on queue 0, resetting adapter >> ixgbe: 0000:00:04.0 eth10: Reset adapter >> ixgbe: 0000:00:04.0 eth10: PCIe transaction pending bit also did not clear >> ixgbe: 0000:00:04.0 master disable timed out >> ixgbe: 0000:00:04.0 eth10: detected SFP+: 3 >> ixgbe: 0000:00:04.0 eth10: NIC Link is Up 10 Gbps, Flow Control: RX/TX >> ... >> </snip> >> >> I have tried to remove the "reset adpater" call in ixgbe driver's >> ndo_tx_timeout function, and the logs are shown below. The log shows that >> when network is blocked, the "TDH" and the nic cannot be incremented any >> more. >> >> <snip> >> ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang >> Tx Queue <0> >> TDH, TDT <1fd>, <5a> >> next_to_use <5a> >> next_to_clean <1fc> >> ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean] >> time_stamp <1075b74ca> >> jiffies <1075b791c> >> ixgbe 0000:00:04.0 eth3: Fake Tx hang detected with timeout of 5 seconds >> ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang >> Tx Queue <0> >> TDH, TDT <1fd>, <5a> >> next_to_use <5a> >> next_to_clean <1fc> >> ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean] >> time_stamp <1075b74ca> >> jiffies <1075b7b11> >> ... >> </snip> >> >> I have also compared the nic's corresponding pci status before and after the >> network is hung, and found that the "DevSta" filed changed from "TransPend-" >> to "TransPend+" after the network is blocked: >> >> <snip> >> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend+ >> </snip> >> >> The network can only be recovered after we reload the ixgbe module in driver >> domain. >> >> 2. When using BCM5709 NIC, the results is smiliar. After the network is >> blocked, the syslog has info below: >> >> <snip> >> bnx2 0000:00:04.0 eth14: <--- start FTQ dump ---> >> bnx2 0000:00:04.0 eth14: RV2P_PFTQ_CTL 00010000 >> bnx2 0000:00:04.0 eth14: RV2P_TFTQ_CTL 00020000 >> ... >> bnx2 0000:00:04.0 eth14: CP_CPQ_FTQ_CTL 00004000 >> bnx2 0000:00:04.0 eth14: CPU states: >> bnx2 0000:00:04.0 eth14: 045000 mode b84c state 80001000 evt_mask 500 pc >> 8001280 pc 8001288 instr 8e030000 >> ... >> bnx2 0000:00:04.0 eth14: 185000 mode b8cc state 80000000 evt_mask 500 pc >> 8000ca8 pc 8000920 instr 8ca50020 >> bnx2 0000:00:04.0 eth14: <--- end FTQ dump ---> >> bnx2 0000:00:04.0 eth14: <--- start TBDC dump ---> >> ... >> </snip> >> >> The difference of lspci command results before and after the network is hung >> show that the Status field changed from "MAbort-" to "MAbort+": >> >> <snip> >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- >> <MAbort+ >SERR- <PERR- INTx- >> </snip> >> >> The network can not be recovered even after we reload the bnx2 module in >> driver domain. >> >> ---------- >> openlui >> Best Regards >> >> >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >> > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xen.org >http://lists.xen.org/xen-devel
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel