On Mon, 10 Oct 2016, Richard Biener wrote:

> On Mon, 10 Oct 2016, Kyrill Tkachov wrote:
> 
> > 
> > On 10/10/16 11:22, Richard Biener wrote:
> > > On Mon, 10 Oct 2016, Kyrill Tkachov wrote:
> > > 
> > > > Hi Richard,
> > > > 
> > > > As I mentioned, here is the patch applying to the main store merging 
> > > > patch
> > > > to
> > > > re-implement encode_tree_to_bitpos
> > > > to operate on the bytes directly.
> > > > 
> > > > This works fine on little-endian but breaks on big-endian, even for
> > > > merging
> > > > bitfields within a single byte.
> > > > Consider the code snippet from gcc.dg/store_merging_6.c:
> > > > 
> > > > struct bar {
> > > >    int a : 3;
> > > >    unsigned char b : 4;
> > > >    unsigned char c : 1;
> > > >    char d;
> > > >    char e;
> > > >    char f;
> > > >    char g;
> > > > };
> > > > 
> > > > void
> > > > foo1 (struct bar *p)
> > > > {
> > > >    p->b = 3;
> > > >    p->a = 2;
> > > >    p->c = 1;
> > > >    p->d = 4;
> > > >    p->e = 5;
> > > > }
> > > > 
> > > > The correct GIMPLE for these merged stores on big-endian is:
> > > >    MEM[(voidD.49 *)p_2(D)] = 18180;
> > > >    MEM[(charD.8 *)p_2(D) + 2B] = 5;
> > > > 
> > > > whereas with this patch we emit:
> > > >    MEM[(voidD.49 *)p_2(D)] = 39428;
> > > >    MEM[(charD.8 *)p_2(D) + 2B] = 5;
> > > > 
> > > > The dump for merging the individual stores without this patch (using the
> > > > correct but costly wide_int approach in the base patch) is:
> > > > After writing 3 of size 4 at position 3 the merged region contains:
> > > > 6 0 0 0 0 0
> > > > After writing 2 of size 3 at position 0 the merged region contains:
> > > > 46 0 0 0 0 0
> > > > After writing 1 of size 1 at position 7 the merged region contains:
> > > > 47 0 0 0 0 0
> > > > After writing 4 of size 8 at position 8 the merged region contains:
> > > > 47 4 0 0 0 0
> > > > After writing 5 of size 8 at position 16 the merged region contains:
> > > > 47 4 5 0 0 0
> > > > 
> > > > 
> > > > And with this patch it is:
> > > > After writing 3 of size 4 at position 3 the merged region contains:
> > > > 18 0 0 0 0 0
> > > > After writing 2 of size 3 at position 0 the merged region contains:
> > > > 1a 0 0 0 0 0
> > > > After writing 1 of size 1 at position 7 the merged region contains:
> > > > 9a 0 0 0 0 0
> > > > After writing 4 of size 8 at position 8 the merged region contains:
> > > > 9a 4 0 0 0 0
> > > > After writing 5 of size 8 at position 16 the merged region contains:
> > > > 9a 4 5 0 0 0
> > > > 
> > > > (Note the dump just dumps the byte array from index 0 to <len> so the
> > > > first
> > > > thing printed is the lowest numbered byte.
> > > > Also, each byte is dumped in hex.)
> > > > 
> > > > The code as included here doesn't do any byte swapping for big-endian 
> > > > but
> > > > as
> > > > seen from the dump even writing a sub-byte
> > > > bitfield goes wrong so it would be nice to resolve that before going
> > > > forward.
> > > > Any help with debugging this is hugely appreciated. I've included an 
> > > > ASCII
> > > > diagram of the steps in the algorithm
> > > > in the patch itself.
> > > Ah, I think you need to account for BITS_BIG_ENDIAN in
> > > shift_bytes_in_array.  You have to shift towards MSB which means changing
> > > left to right shifts for BITS_BIG_ENDIAN.
> > 
> > Thanks, I'll try it out. But this is on aarch64 where
> > BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1
> > so there's something else bad here.
> 
> Maybe I'm confusing all the macros, so maybe it's BYTES_BIG_ENDIAN
> (vs. WORDS_BIG_ENDIAN -- in theory this approach should work for
> pdp11 as well).

Or maybe I'm confusing how get_inner_reference numbers "bits" when
it returns bitpos... (and how a multi-byte value in target memory
representation has to be "shifted" by bitpos).

I really thought BITS_BIG_ENDIAN is the only thing that matters...

Btw, I reproduced on ppc64-linux (which has BITS_BIG_ENDIAN).

Richard.

> Richard.
> 
> > > You also seem to miss to account for amnt / BITS_PER_UNIT != 0.
> > > Independently of BYTES_BIG_ENDIAN it would be
> > > 
> > >    ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt;
> > > ...
> > 
> > doh, yes. I'll fix that.
> > 
> > > (so best use a single load / store and operate on a temporary).
> > 
> > Thanks,
> > Kyrill
> > 
> > > Richard.
> > > 
> > > > Thanks,
> > > > Kyrill
> > > > 
> > 
> > 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to