inline...
-- 
Damjan

> On 31 May 2018, at 21:10, Saxena, Nitin <nitin.sax...@cavium.com> wrote:
> 
> Hi Damjan,
> 
> Answers inline.
> 
> Thanks,
> Nitin
> 
>> On 01-Jun-2018, at 12:15 AM, Damjan Marion <dmarion.li...@gmail.com> wrote:
>> 
>> 
>> Dear Nitin,
>> 
>> See inline….
>> 
>> 
>>> On 31 May 2018, at 19:59, Nitin Saxena <nitin.sax...@cavium.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I am working on optimising dpdk-input node (based on vpp v1804) for our 
>>> target. I am able to get performance improvements on our target but the 
>>> problem I am finding now are:
>>> 
>>> 1) The dpdk-input code is completely changed on master branch from v1804.
>> 
>> Why is this a problem? It was done with reason and for tangible benefit.
> This is a problem for me as I can not apply my v1804 changes directly to the 
> master branch. I have to again rework on master branch and that’s why I am 
> not able to move to master branch or v1807 in future. 

It was hard to know that you have subset of patches hidden somewhere. Typically 
it makes sense to discuss such kind of changes with person who "maintains" the 
code before starting writing the code.

>> 
>>> Not to mention the dpdk-input master branch code do not give better numbers 
>>> on our target as compared to v1804
>> 
>> Sad to hear that, good thing is, it gives better numbers on x86.
> As I understand one dpdk_device_input function cannot be same for all 
> architectures because if the underlying micro-architecture is different, the 
> hot spots changes.

Maybe, but sounds to me like we are still in guessing phase.
Maybe we even need different function for each ARM CPU core as they maybe have 
different memory subsystem and pipeline....

Is there an agreement between ARM vendors what is the targeted core you want to 
have code tuned for or you are simply tuning to whatever core Cavium uses?


> I have seen dpdk-input master branch changes and on a positive notes those 
> changes make sense however some codes are tuned for x86 specially Skylake. I 
> was looking for some kind of  way to have mutiarch select function for the Rx 
> path, like the way it’s done for tx path.

Not sure why do you need that, unless you are going to have code optimised for 
different CPU variants (i.e. Cortex-A53 and Cortex-A72) in the same binary.

>> 
>>> 2) I don’t know the modular approach I should follow to merge my changes as 
>>> I have completely changed the quad loop handling and the prefetches order 
>>> in dpdk-input.
>> 
>> I carefully tuned that code. It was multi day exercise and losing single 
>> clock/packet on x86 with additional modifications are not acceptable. Still 
>> I’m open for discussion how to address this problem.
>> 
>>> 
>>> Note: I am far away from upstreaming the code currently as my optimisation 
>>> is still in progress. It will be better if I know the proper way of doing 
>>> it.
>> 
>> I suggest that you don’t even start on working on upstreaming before we have 
>> deep understanding of what and why needs to be done and we are all in 
>> agreement.
>> 
>>> 
>>> Thanks,
>>> Nitin
>>> 

Reply via email to