On Fri, Sep 13, 2013 at 10:47:01PM +0100, Andrew Pinski wrote:
> On Fri, Sep 13, 2013 at 11:57 AM, James Greenhalgh
> <[email protected]> wrote:
> > Should return '1' whatever your endianness. Throwing together a quick
> > test case, that is the case for current trunk. Do you have a testcase
> > where this goes wrong?
>
> I was not thinking of that but rather the definition of lanes in ARM64
> is different than from element due to memory ordering of endian.
> That is lane 0 is element 3 in big-endian. Or is this only for
> aarch32 where the issue is located?
>
> Thanks,
> Andrew Pinski
Well, AArch64 has the AArch32 style memory ordering for vectors,
which I think is different from what other big-endian architectures
use, but gives consistent behaviour between vector and array indexing.
So, take the easy case of a byte array
uint8_t foo [8] = {0, 1, 2, 3, 4, 5, 6, 7}
We would expect both the big and little endian toolchains to lay
this out in memory as:
0x0 ... 0x8
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
And element 0 would give us '0'. If we take the same array and load it
as a vector with ld1.b, both big and little-endian toolchains would load
it as:
bit 128 .. bit 64 bit 0
lane 16 | lane 7 | | lane 0 |
|..... | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
So lane 0 is '0', we're OK so far!
For a short array:
uint16_t foo [4] = {0x0a0b, 0x1a1b, 0x2a2b, 0x3a3b};
The little endian compiler would lay memory out as:
0x0 ... 0x8
| 0b | 0a | 1b | 1a | 2b | 2a | 3b | 3a |
And the big endian compiler would lay out memory as:
0x0 ... 0x8
| 0a | 0b | 1a | 1b | 2a | 2b | 3a | 3b |
In both cases, element 0 is '0x0a0b'. If we load this array as a
vector with ld1.h both big and little-endian compilers will load
the vector as:
bit 128 .. bit 64 bit 0
lane 16 | lane 3 | | lane 0 |
|..... | 3b | 3a | 2b | 2a | 1b | 1a | 0b | 0a |
And lane 0 is '0x0a0b' So we are OK again!
Lanes and elements should match under our model. Which I don't think
is true of other architectures, where I think the whole vector object
is arranged big endian, such that we would need to lay our byte array
out as:
0x0 ... 0x8
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
For it to be correctly loaded, at which point there is a discrepancy
between element and lane.
But as I say, that is other architectures. AArch64 should be consistent.
Thanks,
James