On Mon, 10 Oct 2016, Richard Biener wrote: > On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > > > > > On 10/10/16 11:22, Richard Biener wrote: > > > On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > > > > > > > Hi Richard, > > > > > > > > As I mentioned, here is the patch applying to the main store merging > > > > patch > > > > to > > > > re-implement encode_tree_to_bitpos > > > > to operate on the bytes directly. > > > > > > > > This works fine on little-endian but breaks on big-endian, even for > > > > merging > > > > bitfields within a single byte. > > > > Consider the code snippet from gcc.dg/store_merging_6.c: > > > > > > > > struct bar { > > > > int a : 3; > > > > unsigned char b : 4; > > > > unsigned char c : 1; > > > > char d; > > > > char e; > > > > char f; > > > > char g; > > > > }; > > > > > > > > void > > > > foo1 (struct bar *p) > > > > { > > > > p->b = 3; > > > > p->a = 2; > > > > p->c = 1; > > > > p->d = 4; > > > > p->e = 5; > > > > } > > > > > > > > The correct GIMPLE for these merged stores on big-endian is: > > > > MEM[(voidD.49 *)p_2(D)] = 18180; > > > > MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > > > > > whereas with this patch we emit: > > > > MEM[(voidD.49 *)p_2(D)] = 39428; > > > > MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > > > > > The dump for merging the individual stores without this patch (using the > > > > correct but costly wide_int approach in the base patch) is: > > > > After writing 3 of size 4 at position 3 the merged region contains: > > > > 6 0 0 0 0 0 > > > > After writing 2 of size 3 at position 0 the merged region contains: > > > > 46 0 0 0 0 0 > > > > After writing 1 of size 1 at position 7 the merged region contains: > > > > 47 0 0 0 0 0 > > > > After writing 4 of size 8 at position 8 the merged region contains: > > > > 47 4 0 0 0 0 > > > > After writing 5 of size 8 at position 16 the merged region contains: > > > > 47 4 5 0 0 0 > > > > > > > > > > > > And with this patch it is: > > > > After writing 3 of size 4 at position 3 the merged region contains: > > > > 18 0 0 0 0 0 > > > > After writing 2 of size 3 at position 0 the merged region contains: > > > > 1a 0 0 0 0 0 > > > > After writing 1 of size 1 at position 7 the merged region contains: > > > > 9a 0 0 0 0 0 > > > > After writing 4 of size 8 at position 8 the merged region contains: > > > > 9a 4 0 0 0 0 > > > > After writing 5 of size 8 at position 16 the merged region contains: > > > > 9a 4 5 0 0 0 > > > > > > > > (Note the dump just dumps the byte array from index 0 to <len> so the > > > > first > > > > thing printed is the lowest numbered byte. > > > > Also, each byte is dumped in hex.) > > > > > > > > The code as included here doesn't do any byte swapping for big-endian > > > > but > > > > as > > > > seen from the dump even writing a sub-byte > > > > bitfield goes wrong so it would be nice to resolve that before going > > > > forward. > > > > Any help with debugging this is hugely appreciated. I've included an > > > > ASCII > > > > diagram of the steps in the algorithm > > > > in the patch itself. > > > Ah, I think you need to account for BITS_BIG_ENDIAN in > > > shift_bytes_in_array. You have to shift towards MSB which means changing > > > left to right shifts for BITS_BIG_ENDIAN. > > > > Thanks, I'll try it out. But this is on aarch64 where > > BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1 > > so there's something else bad here. > > Maybe I'm confusing all the macros, so maybe it's BYTES_BIG_ENDIAN > (vs. WORDS_BIG_ENDIAN -- in theory this approach should work for > pdp11 as well).
Or maybe I'm confusing how get_inner_reference numbers "bits" when it returns bitpos... (and how a multi-byte value in target memory representation has to be "shifted" by bitpos). I really thought BITS_BIG_ENDIAN is the only thing that matters... Btw, I reproduced on ppc64-linux (which has BITS_BIG_ENDIAN). Richard. > Richard. > > > > You also seem to miss to account for amnt / BITS_PER_UNIT != 0. > > > Independently of BYTES_BIG_ENDIAN it would be > > > > > > ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt; > > > ... > > > > doh, yes. I'll fix that. > > > > > (so best use a single load / store and operate on a temporary). > > > > Thanks, > > Kyrill > > > > > Richard. > > > > > > > Thanks, > > > > Kyrill > > > > > > > > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)