On Mon, Jun 05, 2017 at 11:20:13AM +0800, Peter Xu wrote: > On Fri, Jun 02, 2017 at 05:51:07PM +0300, Michael S. Tsirkin wrote: > > On Fri, Jun 02, 2017 at 07:50:51PM +0800, Peter Xu wrote: > > > With the patch applied: > > > > > > [PATCH v3] exec: fix address_space_get_iotlb_entry page mask > > > (already in Paolo's pull request but not yet merged) > > > > > > Now we can have valid address masks. However it is still not ideal, > > > considering that the mask may not be aligned to guest page sizes. One > > > example would be when huge page is used in guest (please see commit > > > message in patch 1 for details). It applies to normal pages too. So we > > > not only need a valid address mask, we should make sure it is page > > > mask (for x86, it should be either 4K/2M/1G pages). > > > > Why should we? To get better performance, right? > > IMHO one point is for performance, the other point is on how we should > define the IOTLB interface. My opinion is that it is better valid > masks. > > > > > > Patch 1+2 fixes the problem. Tested with both kernel net driver or > > > testpmd, on either 4K/2M pages, to make sure the page mask is correct. > > > > > > Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll > > > definitely want patch 3 now. Here's the simplest TCP streaming test > > > using vhost dmar and iommu=pt in guest: > > > > > > without patch 3: 12.0Gbps > > > > And what happens without patches 1-2? > > Without 1-2, performance is good. But I think it is hacky to have such > a good result (I explained why the performance is good in the VT-d PT > support thread with some logs)... > > > > > > with patch 3: 33.5Gbps > > > > This is the part I don't get. Patches 1-2 will return a bigger region to > > callers. The result should be better performance - instead it seems to > > slow down vhost for some reason and we need tricks to get > > performance back. What's going on? > > Yes. The problem is that if without patch 1/2 I think the codes lacks > correctness. With correctness, we lost performance, then I picked > patch 3 as well. > > Again, I think the first thing we need to settle is what should be the > best definition for IOTLB (addr_mask or arbitary length). > > Thanks,
If arbitary length means we don't require prefaulting hacks, I'm for using arbitary length. > -- > Peter Xu