Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA

Ganapatrao Kulkarni Thu, 12 Jul 2018 00:46:40 -0700

Hi Robin,


On Mon, Jun 4, 2018 at 9:36 AM, Ganapatrao Kulkarni <gklkm...@gmail.com> wrote:
> ping??
>
> On Mon, May 21, 2018 at 6:45 AM, Ganapatrao Kulkarni <gklkm...@gmail.com> 
> wrote:
>> On Thu, Apr 26, 2018 at 3:15 PM, Ganapatrao Kulkarni <gklkm...@gmail.com> 
>> wrote:
>>> Hi Robin,
>>>
>>> On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni
>>> <gklkm...@gmail.com> wrote:
>>>> On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy <robin.mur...@arm.com> 
>>>> wrote:
>>>>> On 19/04/18 18:12, Ganapatrao Kulkarni wrote:
>>>>>>
>>>>>> The performance drop is observed with long hours iperf testing using 40G
>>>>>> cards. This is mainly due to long iterations in finding the free iova
>>>>>> range in 32bit address space.
>>>>>>
>>>>>> In current implementation for 64bit PCI devices, there is always first
>>>>>> attempt to allocate iova from 32bit(SAC preferred over DAC) address
>>>>>> range. Once we run out 32bit range, there is allocation from higher 
>>>>>> range,
>>>>>> however due to cached32_node optimization it does not suppose to be
>>>>>> painful. cached32_node always points to recently allocated 32-bit node.
>>>>>> When address range is full, it will be pointing to last allocated node
>>>>>> (leaf node), so walking rbtree to find the available range is not
>>>>>> expensive affair. However this optimization does not behave well when
>>>>>> one of the middle node is freed. In that case cached32_node is updated
>>>>>> to point to next iova range. The next iova allocation will consume free
>>>>>> range and again update cached32_node to itself. From now on, walking
>>>>>> over 32-bit range is more expensive.
>>>>>>
>>>>>> This patch adds fix to update cached node to leaf node when there are no
>>>>>> iova free range left, which avoids unnecessary long iterations.
>>>>>
>>>>>
>>>>> The only trouble with this is that "allocation failed" doesn't uniquely 
>>>>> mean
>>>>> "space full". Say that after some time the 32-bit space ends up empty 
>>>>> except
>>>>> for one page at 0x1000 and one at 0x80000000, then somebody tries to
>>>>> allocate 2GB. If we move the cached node down to the leftmost entry when
>>>>> that fails, all subsequent allocation attempts are now going to fail 
>>>>> despite
>>>>> the space being 99.9999% free!
>>>>>
>>>>> I can see a couple of ways to solve that general problem of free space 
>>>>> above
>>>>> the cached node getting lost, but neither of them helps with the case 
>>>>> where
>>>>> there is genuinely insufficient space (and if anything would make it even
>>>>> slower). In terms of the optimisation you want here, i.e. fail fast when 
>>>>> an
>>>>> allocation cannot possibly succeed, the only reliable idea which comes to
>>>>> mind is free-PFN accounting. I might give that a go myself to see how ugly
>>>>> it looks.

did you get any chance to look in to this issue?
i am waiting for your suggestion/patch for this issue!

>>>
>>> For this testing, dual port intel 40G card(XL710) used and both ports
>>> were connected in loop-back. Ran iperf server and clients on both
>>> ports(used NAT to route packets out on intended ports).There were 10
>>> iperf clients invoked every 60 seconds in loop for hours for each
>>> port. Initially the performance on both ports is seen close to line
>>> rate, however after test ran about 4 to 6 hours, the performance
>>> started dropping  to very low (to few hundred Mbps) on both
>>> connections.
>>>
>>> IMO,  this is common bug and should happen on any other platforms too
>>> and needs to be fixed at the earliest.
>>> Please let me know if you have better way to fix this,  i am happy to
>>> test your patch!
>>
>> any update on this issue?
>>>
>>>>
>>>> i see 2 problems in current implementation,
>>>> 1. We don't replenish the 32 bits range, until first attempt of second
>>>> allocation(64 bit) fails.
>>>> 2. Having  per cpu cache might not yield good hit on platforms with
>>>> more number of CPUs.
>>>>
>>>> however irrespective of current issues, It makes sense to update
>>>> cached node as done in this patch , when there is failure to get iova
>>>> range using current cached pointer which is forcing for the
>>>> unnecessary time consuming do-while iterations until any replenish
>>>> happens!
>>>>
>>>> thanks
>>>> Ganapat
>>>>
>>>>>
>>>>> Robin.
>>>>>
>>>>>
>>>>>> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulka...@cavium.com>
>>>>>> ---
>>>>>>   drivers/iommu/iova.c | 6 ++++++
>>>>>>   1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>>>>> index 83fe262..e6ee2ea 100644
>>>>>> --- a/drivers/iommu/iova.c
>>>>>> +++ b/drivers/iommu/iova.c
>>>>>> @@ -201,6 +201,12 @@ static int __alloc_and_insert_iova_range(struct
>>>>>> iova_domain *iovad,
>>>>>>         } while (curr && new_pfn <= curr_iova->pfn_hi);
>>>>>>         if (limit_pfn < size || new_pfn < iovad->start_pfn) {
>>>>>> +               /* No more cached node points to free hole, update to 
>>>>>> leaf
>>>>>> node.
>>>>>> +                */
>>>>>> +               struct iova *prev_iova;
>>>>>> +
>>>>>> +               prev_iova = rb_entry(prev, struct iova, node);
>>>>>> +               __cached_rbnode_insert_update(iovad, prev_iova);
>>>>>>                 spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
>>>>>>                 return -ENOMEM;
>>>>>>         }
>>>>>>
>>>>>
>>>
>>> thanks
>>> Ganapat

thanks
Ganapat
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA

Reply via email to