Dear Nitin, That doesn't work that way.
Regards, Damjan > On 1 Jun 2018, at 19:41, Saxena, Nitin <nitin.sax...@cavium.com> wrote: > > Hi Damjan, > > Now that you are aware that Cavium is working on optimisations for ARM, can > I request that you check with us on implications for ARM(at least Cavium), > before bringing changes in dpdk-input? > > Regards, > Nitin > > On 01-Jun-2018, at 21:39, Damjan Marion <dmar...@me.com > <mailto:dmar...@me.com>> wrote: > >> >> Dear Nitin, >> >> I really don't have anything else to add. It your call how do you want to >> proceed.... >> >> Regards, >> >> Damjan >> >>> On 1 Jun 2018, at 18:02, Nitin Saxena <nitin.sax...@cavium.com >>> <mailto:nitin.sax...@cavium.com>> wrote: >>> >>> Hi Damjan, >>> >>> Answers Inline. >>> >>> Thanks, >>> Nitin >>> >>> On Friday 01 June 2018 08:49 PM, Damjan Marion wrote: >>>> Hi Nitin, >>>> inline... >>>>> On 1 Jun 2018, at 15:23, Nitin Saxena <nitin.sax...@cavium.com >>>>> <mailto:nitin.sax...@cavium.com>> wrote: >>>>> >>>>> Hi Damjan, >>>>> >>>>>> It was hard to know that you have subset of patches hidden somewhere. >>>>> I wouldn't say patches are hidden. We are trying to fine tune dpdk-input >>>>> initially from our end first and later we will seek your expertise while >>>>> upstreaming. >>>> for me they were hidden. >>>>>> Typically it makes sense to discuss such kind of changes with person >>>>>> >who "maintains" the code before starting writing the code. >>>>> Agreed. However we prefer to do internal analysis/POC first before >>>>> reaching out to MAINTAINERS. That way we can better understand code >>>>> review comments. >>>> Perfectly fine, but then don't put blame on us for not knowing that you >>>> are doing something internally... >>> The intention was not to blame anybody but to understand modular approach >>> in vpp to accommodate multi-arch(s). >>>>> >>>>>> Maybe, but sounds to me like we are still in guessing phase. >>>>> I wouldn't do any guess work with MAINTAINERS. >>>>> >>>>>> Maybe we even need different function for each ARM CPU core as they >>>>>> maybe have different memory subsystem and pipeline.... >>>>> This is what I am looking for. Is it ok to detect our hardware natively >>>>> from autoconf and append target specific macro to CFLAGS? And then >>>>> separate function for our target in dpdk/device/node.c? Sorry my >>>>> multi-arch select example was incorrect and that's not what I am looking >>>>> at. >>>> Here I will be able to help when I get reasonable understanding what is >>>> the "big" plan. >>> The "Big" plan is to optimize each vpp node for Aarch64. For now focus is >>> dpdk-input. >>>> I don't want that we end up in 6 months with cavium patches, nxp patches, >>>> marvell patches, and so on. >>> Is it a problem? If yes than I am not able to visualize it as the same >>> problem would exist for any architecture and not just for Aarch64. >>>>> >>>>>> Is there an agreement between ARM vendors what is the targeted core >>>>>> you want to have code tuned for or you are simply tuning to whatever >>>>>> core Cavium uses? >>>>> I am trying to optimize Cavium's SOC. This question is in this regard >>>>> only. However efforts are going on optimizing Cortex cores as well by ARM >>>>> community. >>>> What about agreeing on plan for optimising on all ARM cores, and then >>>> starting doing optimisation? >>> This is cross-company question so hard to answer but Cavium has the "big" >>> plan described above. >>>>> >>>>> Thanks, >>>>> Nitin >>>>> >>>>> On Friday 01 June 2018 01:55 AM, Damjan Marion wrote: >>>>>> inline... >>>>>> -- >>>>>> Damjan >>>>>>> On 31 May 2018, at 21:10, Saxena, Nitin <nitin.sax...@cavium.com >>>>>>> <mailto:nitin.sax...@cavium.com> <mailto:nitin.sax...@cavium.com >>>>>>> <mailto:nitin.sax...@cavium.com>>> wrote: >>>>>>> >>>>>>> Hi Damjan, >>>>>>> >>>>>>> Answers inline. >>>>>>> >>>>>>> Thanks, >>>>>>> Nitin >>>>>>> >>>>>>>> On 01-Jun-2018, at 12:15 AM, Damjan Marion <dmarion.li...@gmail.com >>>>>>>> <mailto:dmarion.li...@gmail.com> <mailto:dmarion.li...@gmail.com >>>>>>>> <mailto:dmarion.li...@gmail.com>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Dear Nitin, >>>>>>>> >>>>>>>> See inline…. >>>>>>>> >>>>>>>> >>>>>>>>> On 31 May 2018, at 19:59, Nitin Saxena <nitin.sax...@cavium.com >>>>>>>>> <mailto:nitin.sax...@cavium.com> <mailto:nitin.sax...@cavium.com >>>>>>>>> <mailto:nitin.sax...@cavium.com>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am working on optimising dpdk-input node (based on vpp v1804) for >>>>>>>>> our target. I am able to get performance improvements on our target >>>>>>>>> but the problem I am finding now are: >>>>>>>>> >>>>>>>>> 1) The dpdk-input code is completely changed on master branch from >>>>>>>>> v1804. >>>>>>>> >>>>>>>> Why is this a problem? It was done with reason and for tangible >>>>>>>> benefit. >>>>>>> This is a problem for me as I can not apply my v1804 changes directly >>>>>>> to the master branch. I have to again rework on master branch and >>>>>>> that’s why I am not able to move to master branch or v1807 in future. >>>>>> It was hard to know that you have subset of patches hidden somewhere. >>>>>> Typically it makes sense to discuss such kind of changes with person who >>>>>> "maintains" the code before starting writing the code. >>>>>>>> >>>>>>>>> Not to mention the dpdk-input master branch code do not give better >>>>>>>>> numbers on our target as compared to v1804 >>>>>>>> >>>>>>>> Sad to hear that, good thing is, it gives better numbers on x86. >>>>>>> As I understand one dpdk_device_input function cannot be same for all >>>>>>> architectures because if the underlying micro-architecture is >>>>>>> different, the hot spots changes. >>>>>> Maybe, but sounds to me like we are still in guessing phase. >>>>>> Maybe we even need different function for each ARM CPU core as they >>>>>> maybe have different memory subsystem and pipeline.... >>>>>> Is there an agreement between ARM vendors what is the targeted core you >>>>>> want to have code tuned for or you are simply tuning to whatever core >>>>>> Cavium uses? >>>>>>> I have seen dpdk-input master branch changes and on a positive notes >>>>>>> those changes make sense however some codes are tuned for x86 specially >>>>>>> Skylake. I was looking for some kind of way to have mutiarch select >>>>>>> function for the Rx path, like the way it’s done for tx path. >>>>>> Not sure why do you need that, unless you are going to have code >>>>>> optimised for different CPU variants (i.e. Cortex-A53 and Cortex-A72) in >>>>>> the same binary. >>>>>>>> >>>>>>>>> 2) I don’t know the modular approach I should follow to merge my >>>>>>>>> changes as I have completely changed the quad loop handling and the >>>>>>>>> prefetches order in dpdk-input. >>>>>>>> >>>>>>>> I carefully tuned that code. It was multi day exercise and losing >>>>>>>> single clock/packet on x86 with additional modifications are not >>>>>>>> acceptable. Still I’m open for discussion how to address this problem. >>>>>>>> >>>>>>>>> >>>>>>>>> Note: I am far away from upstreaming the code currently as my >>>>>>>>> optimisation is still in progress. It will be better if I know the >>>>>>>>> proper way of doing it. >>>>>>>> >>>>>>>> I suggest that you don’t even start on working on upstreaming before >>>>>>>> we have deep understanding of what and why needs to be done and we are >>>>>>>> all in agreement. >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Nitin >>>>> >>>>> >>> >>> >>