At 2015-03-20 00:48:01, "Zoltan Kiss" <zoltan.k...@linaro.org> wrote:
>
>
>On 19/03/15 03:40, openlui wrote:
>> Hi, all:
>>
>> I am trying to use a HVM with PCI pass-through NIC as network driver domain. 
>> However, when I send packets whose size are larger than 128 bytes from DomU 
>> using pkt-gen tools, after several seconds, the network between driver 
>> domain and destination host will be blocked.
>>
>> The networking structure when testing is shown below:
>> Pkt-gen (in DomU) <--> Virtual Eth (in DomU) <---> VIF (in Driver Domain) 
>> <--> OVS (in Driver Domain) <--> pNIC (passthrough nic in Driver Domain) 
>> <---> Another Host
>>      
>> The summarized results are as follows:
>> 1. When we just ping from DomU to another host, the network seems ok.
>> 2. When sending 64 or 128 bytes UDP packets from DomU, the network will not 
>> be blocked
>> 3. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and if the 
>> scatter-gather feature of passthrough NIC in driver domain is on, the 
>> network will be blocked
>> 4. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and only if 
>> the scatter-gather feature of passthrough NIC in driver domain is off, the 
>> network will not be blocked
>>
>> As shown in detailed syslog below, when network is blocked, it seems that 
>> the passthrough NIC's driver entry an exception state and the tx queue is 
>> hung.
>> As far as I know, when sending 64 or 128 bytes package, the skb generated by 
>> netback only has the linearized data, and the data is stored in the PAGE 
>> allocated from the driver domain's memory. But for packets whose size is 
>> larger than 128 bytes, the skb will also has a frag page which is grant 
>> mapped from DomU's memory. And if we disable the scatter-gather feature of 
>> NIC, the skb sent from netback will be linearized firstly, and it will make 
>> the skb's data is stored in the PAGE allocated from the driver domain other 
>> than the DomU's memory.
>Yes, you are correct: the first slot (at most 128 bytes from it) is 
>grant copied to a locally allocated skb, whilst the rest is grant mapped 
>from the guest's memory in this case.
>>
>> I am wondering if it is the problem caused by PCI-passthrough and DMA 
>> operations, or if there is some wrong configuration in our environment. How 
>> can I continue to debug this problem? I am looking forward to your replay 
>> and advice, Thanks.
>>
>> The environment we used are as follows:
>> a. Dom0: SUSE 12 (kernel: 3.12.28)
>> b. XEN: 4.4.1_0602.2 (provided by SUSE 12)
>> c. DomU: kernel 3.17.4
>> d. Driver Domain: kernel 3.17.8
>I would try out an upstream kernel, there were some grant mapping 
>changes recently, maybe that solves your issue.
>Also, have you set the kernel's loglevel to DEBUG?
>ixgbe also has a modul parameter to enable further logging.

Thanks for your advice. 
I have tried the xen 4.4.2 and xen 4.5, and found that under xen 4.5, the 
problem is solved.Then after bisecting, I find it is the commit 
203746bc36b41443d0eec78819f153fb59bc68d1 ([1]) which solves this problem. 
After learning the patch of this commit, I find that this commit will not only 
map iommu pages whose p2m types are p2m_ram_rw but also other type pages when 
the hap_ept page table is not shared.


I have some other questions as follows about this commit:
1. About the hap_ept page table share, does it mean that the page table used by 
ept and IOMMU is shared?  And does it need the hardware support "Large Intel 
VT-d Pages" features? ([2])
2. What is the meaning and difference among the p2m types (p2m_ram_**, 
p2m_grant_map_**, p2m_ram_logdirty and p2m_map_foreign)  listed in the commit 
above?  Is there some documents about it?


Our hosts under testing does have no support for "Intel VT-d Shared EPT 
tables", which can be confirmed from the "xl dmesg" output in Dom0. I will try 
to find hosts which support this feature these days, and do the testing and 
look if there is similar problem with the hap_ept_pt_share enabled.


But it seems that at least for 4.4.2 version, this problem does exist and maybe 
the commit above needed to be merged into 4.4.2 version?


[1] 
http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=203746bc36b41443d0eec78819f153fb59bc68d1
[2] 
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/virtualization-enabling-intel-virtualization-technology-features-and-benefits-paper.pdf
>> e. OVS: 2.1.2
>> f. Host: Huawei RH2288, CPU Intel Xenon E5645@2.40GHz, disabled HyperThread, 
>> enabled VT-d
>> g. pNIC: we tried Intel 82599 10GE NIC (ixgbe v3.23.2), Intel 82576 1GE NIC 
>> (igb) and Broadcom NetXtreme II BCM 5709 1GE NIC (bnx2 v2.2.5)
>> h. para-virtulization driver: netfront/netback
>> i. MTU: 1500
>>
>> The detailed Logs in Driver Domain after the network is blocked are as 
>> follows:
>> 1. When using 82599 10GE NIC, syslog and dmesg includes infos below. The log 
>> shows that the Tx unit Hang is detected and driver will try to reset the 
>> adapter repeatly, however, the network is still blocked.
>>
>> <snip>
>> ixgbe: 0000:00:04.0 eth10: Detected Tx Unit Hang
>> Tx Queue             <0>
>> TDH, TDT             <1fd>, <5a>
>> next_to_use          <5a>
>> next_to_clean        <1fc>
>> ixgbe: 0000:00:04.0 eth0: tx hang 11 detected on queue 0, resetting adapter
>> ixgbe: 0000:00:04.0 eth10: Reset adapter
>> ixgbe: 0000:00:04.0 eth10: PCIe transaction pending bit also did not clear
>> ixgbe: 0000:00:04.0 master disable timed out
>> ixgbe: 0000:00:04.0 eth10: detected SFP+: 3
>> ixgbe: 0000:00:04.0 eth10: NIC Link is Up 10 Gbps, Flow Control: RX/TX
>> ...
>> </snip>
>>
>> I have tried to remove the "reset adpater" call in ixgbe driver's 
>> ndo_tx_timeout function, and the logs are shown below. The log shows that 
>> when network is blocked, the "TDH" and the nic cannot be incremented any 
>> more.
>>
>> <snip>
>> ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang
>> Tx Queue             <0>
>> TDH, TDT             <1fd>, <5a>
>> next_to_use          <5a>
>> next_to_clean        <1fc>
>> ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean]
>> time_stamp           <1075b74ca>
>> jiffies              <1075b791c>
>> ixgbe 0000:00:04.0 eth3: Fake Tx hang detected with timeout of 5 seconds
>> ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang
>> Tx Queue             <0>
>> TDH, TDT             <1fd>, <5a>
>> next_to_use          <5a>
>> next_to_clean        <1fc>
>> ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean]
>> time_stamp           <1075b74ca>
>> jiffies              <1075b7b11>
>> ...
>> </snip>
>>
>> I have also compared the nic's corresponding pci status before and after the 
>> network is hung, and found that the "DevSta" filed changed from "TransPend-" 
>> to "TransPend+" after the network is blocked:
>>
>> <snip>
>> DevSta:      CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend+
>> </snip>
>>
>> The network can only be recovered after we reload the ixgbe module in driver 
>> domain.
>>
>> 2. When using BCM5709 NIC, the results is smiliar. After the network is 
>> blocked, the syslog has info below:
>>
>> <snip>
>> bnx2 0000:00:04.0 eth14: <--- start FTQ dump --->
>> bnx2 0000:00:04.0 eth14: RV2P_PFTQ_CTL 00010000
>> bnx2 0000:00:04.0 eth14: RV2P_TFTQ_CTL 00020000
>> ...
>> bnx2 0000:00:04.0 eth14: CP_CPQ_FTQ_CTL 00004000
>> bnx2 0000:00:04.0 eth14: CPU states:
>> bnx2 0000:00:04.0 eth14: 045000 mode b84c state 80001000 evt_mask 500 pc 
>> 8001280 pc 8001288 instr 8e030000
>> ...
>> bnx2 0000:00:04.0 eth14: 185000 mode b8cc state 80000000 evt_mask 500 pc 
>> 8000ca8 pc 8000920 instr 8ca50020
>> bnx2 0000:00:04.0 eth14: <--- end FTQ dump --->
>> bnx2 0000:00:04.0 eth14: <--- start TBDC dump --->
>> ...
>> </snip>
>>
>> The difference of lspci command results before and after the network is hung 
>> show that the Status field changed from "MAbort-" to "MAbort+":
>>
>> <snip>
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
>> <MAbort+ >SERR- <PERR- INTx-
>> </snip>
>>
>> The network can not be recovered even after we reload the bnx2 module in
>> driver domain.
>>
>> ----------
>> openlui
>> Best Regards
>>
>>
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xen.org
>http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to