On Fri, May 16, 2014 at 9:58 AM, <pins...@gmail.com> wrote: > > >> On May 16, 2014, at 4:13 AM, Richard Biener <richard.guent...@gmail.com> >> wrote: >> >> On Fri, May 16, 2014 at 1:03 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Fri, May 16, 2014 at 12:56 PM, <pins...@gmail.com> wrote: >>>> >>>> >>>>> On May 16, 2014, at 3:48 AM, Richard Biener <richard.guent...@gmail.com> >>>>> wrote: >>>>> >>>>> On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme >>>>> <thomas.preudho...@arm.com> wrote: >>>>>> Ping? >>>>> >>>>> Sorry ... >>>>> >>>>>> Best regards, >>>>>> >>>>>> Thomas Preud'homme >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>>>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme >>>>>>> Sent: Friday, May 09, 2014 6:26 PM >>>>>>> To: GCC Patches >>>>>>> Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store >>>>>>> >>>>>>> Sorry, took longer than expected as I got distracted by some other >>>>>>> patch. >>>>>>> I merged the whole patchset in a single patch as I was told the current >>>>>>> setup >>>>>>> is actually more difficult to read. >>>>>>> >>>>>>> Here are the updated ChangeLogs: >>>>>>> >>>>>>> *** gcc/ChangeLog *** >>>>>>> >>>>>>> 2014-05-09 Thomas Preud'homme <thomas.preudho...@arm.com> >>>>>>> >>>>>>> PR tree-optimization/54733 >>>>>>> * expr.c (get_inner_reference): Add a parameter to control whether >>>>>>> a >>>>>>> MEM_REF should be split into base + offset. >>>>>>> * tree.h (get_inner_reference): Default new parameter to false. >>>>>>> * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure. >>>>>>> (CMPNOP): Define. >>>>>>> (find_bswap_or_nop_load): New. >>>>>>> (find_bswap_1): Renamed to ... >>>>>>> (find_bswap_or_nop_1): This. Also add support for memory source. >>>>>>> (find_bswap): Renamed to ... >>>>>>> (find_bswap_or_nop): This. Also add support for memory source and >>>>>>> detection of bitwise operations equivalent to load in host >>>>>>> endianness. >>>>>>> (execute_optimize_bswap): Likewise. Also move its leading >>>>>>> comment back >>>>>>> in place and split statement transformation into ... >>>>>>> (bswap_replace): This. Add assert when updating bswap_stats. >>>>>>> >>>>>>> *** gcc/testsuite/ChangeLog *** >>>>>>> >>>>>>> 2014-05-09 Thomas Preud'homme <thomas.preudho...@arm.com> >>>>>>> >>>>>>> PR tree-optimization/54733 >>>>>>> * gcc.dg/optimize-bswapdi-3.c: New test to check extension of >>>>>>> bswap >>>>>>> optimization to support memory sources and bitwise operations >>>>>>> equivalent to load in host endianness. >>>>>>> * gcc.dg/optimize-bswaphi-1.c: Likewise. >>>>>>> * gcc.dg/optimize-bswapsi-2.c: Likewise. >>>>>>> * gcc.c-torture/execute/bswap-2.c: Likewise. >>>>>>> >>>>>>> Ok for trunk? >>>>> >>>>> Ok, I now decided otherwise and dislike the new parameter to >>>>> get_inner_reference. Can you please revert that part and just >>>>> deal with a MEM_REF result in your only caller? >>>>> >>>>> And (of course) I found another possible issue. The way you >>>>> compute load_type and use it here: >>>>> >>>>> + /* Perform the load. */ >>>>> + load_offset_ptr = build_int_cst (n->alias_set, 0); >>>>> + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, >>>>> + load_offset_ptr); >>>>> >>>>> makes the load always appear aligned according to the mode of >>>>> load_type. On strict-alignment targets this may cause faults. >>>>> >>>>> So what you have to do is either (simpler) >>>>> >>>>> unsigned int align = get_pointer_alignment (addr_tmp); >>>>> tree al_load_type = load_type; >>>>> if (align < TYPE_ALIGN (load_type)) >>>>> al_load_type = build_aligned_type (load_type, align); >>>>> ... >>>>> val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, >>>>> load_offset_ptr); >>>>> >>>>> or keep track of the "first" actual load and use >>>>> >>>>> unsigned int align = get_object_alignment (that_first_load); >>>>> >>>>> "first" in the one that corresponds to addr_tmp. From that there >>>>> is a much better chance to derive good alignment values. >>>>> >>>>> Of course on STRICT_ALIGNMENT targets a not aligned load >>>>> will be decomposed again, so eventually doing the transformation >>>>> may no longer be profitable(?). >>>> >>>> Not always decomposed. On MIPS, it should using the load/store left/right >>>> instructions for unaligned load/stores which is normally better than >>>> decomposed load/stores. So having a cost model would be nice. >>> >>> Agreed, but I am happy with doing that as a followup. Btw, >>> a very simple one would be to reject unaligned >>> SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). >>> [of course that may be true on MIPS even for the cases where >>> a "reasonable" fast unalgined variant exists - nearly no target >>> defines that macro in a too fancy way] >> >> Oh, and what happens for >> >> unsigned foo (unsigned char *x) >> { >> return x[0] << 24 | x[2] << 8 | x[3]; >> } >> >> ? We could do an unsigned int load from x and zero byte 3 >> with an AND. Enhancement for a followup, similar to also >> considering vector types for the load (also I'm not sure >> that uint64_type_node always has non-BLKmode for all >> targets). > > No we cannot if x[4] is on a different page which is not mapped in, we get a > fault. Not something we want.
I was reading that code wrong. I trying to say if we don't load from x[3] then we can't do it. But with the example above we can. Thanks, Andrew Pinski > > Thanks, > Andrew > > >> >> Richard. >> >>> Richard. >>> >>>> Thanks, >>>> Andrew >>>> >>>>> >>>>> Thanks and sorry again for the delay. >>>>> >>>>> Otherwise the patch looks good to me. >>>>> >>>>> Richard. >>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>>>>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme >>>>>>>> Sent: Monday, May 05, 2014 7:30 PM >>>>>>>> To: GCC Patches >>>>>>>> Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent >>>>>>>> load/store >>>>>>>> >>>>>>>> I found a way to improve the function find_bswap/find_bswap_or_nop >>>>>>>> and reduce its size. Please hold for the review, I will post an updated >>>>>>>> version as soon as I finish testing. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Thomas Preud'homme >>>>>> >>>>>>