Reduced the CC list (changing the topic slightly)

> >
> > My understanding is that the generated code for both your patch and my
> > changes above is the same. Above suggested changes will conform to
> > ACLE recommendation.
> 
> Though instructions are different. Effective cycles are same even though First
> dup updates the four positions.
Can you elaborate on how the instructions are different?
I wrote the following code with both the methods:

uint32x4_t u32x4_gather_gcc (uint32_t *p0, uint32_t *p1, uint32_t *p2, uint32_t 
*p3)
{
     uint32x4_t r = {*p0, *p1, *p2, *p3};

     return r;
}

uint32x4_t u32x4_gather_acle (uint32_t *p0, uint32_t *p1, uint32_t *p2, 
uint32_t *p3)
{
     uint32x4_t r;

     r = vdupq_n_u32 (* p0);
     r = vsetq_lane_u32 (*p1, r, 1);
     r = vsetq_lane_u32 (*p2, r, 2);
     r = vsetq_lane_u32 (*p3, r, 3);

     return r;
}

The generated code has the same instructions for both (omitted the unwanted 
parts):

u32x4_gather_gcc:
        ld1r    {v0.4s}, [x0]
        ld1     {v0.s}[1], [x1]
        ld1     {v0.s}[2], [x2]
        ld1     {v0.s}[3], [x3]
        ret

u32x4_gather_acle:
        ld1r    {v0.4s}, [x0]
        ld1     {v0.s}[1], [x1]
        ld1     {v0.s}[2], [x2]
        ld1     {v0.s}[3], [x3]
        ret

The first 'ld1r' updates all the lanes in both the cases.

> To make forward progress send the v2 based on the updated logic  just to
> make ACLE  Spec happy, I don’t see any real reason to do it though 😊
Thanks for the patch, it was important to make forward progress.
But, I think we should carry forward the discussion as I plan to change other 
parts of DPDK on similar lines. I want to understand why you think there is no 
real reason. The ACLE recommendation mentions the reasoning.

> 
> http://patches.dpdk.org/patch/54656/
> 

Reply via email to