Re: [PATCH, rs6000 v3] enable gimple folding for vec_xl, vec_xst
> On Jul 10, 2018, at 2:10 AM, Richard Biener > wrote: > > On Mon, Jul 9, 2018 at 9:08 PM Will Schmidt wrote: >> >> Hi, >> Re-posting. Richard provided feedback on a previous version of this >> patch, I wanted to make sure he was/is OK with the latest. :-) >> >> Add support for Gimple folding for unaligned vector loads and stores. >> >> Regtest completed across variety of systems, P6,P7,P8,P9. >> >> [v2] Added the type for the MEM_REF, per feedback. >> Testcases for gimple-folding of the same are currently in-tree >> as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c. >> Re-tested, still looks good. :-) >> >> [v3] Updated the alignment for the MEM_REF to be 4bytes. >> Updated/added/removed comments in the code for clarity. >> >> OK for trunk? >> >> Thanks >> -Will >> >> [gcc] >> >> 2018-07-09 Will Schmidt >> >>* config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add >>vec_xst variants to the list. >>(rs6000_gimple_fold_builtin): Add support for folding unaligned >>vector loads and stores. >> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index 8bc4109..774c60a 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum >> rs6000_builtins fn_code) >> case ALTIVEC_BUILTIN_STVX_V8HI: >> case ALTIVEC_BUILTIN_STVX_V4SI: >> case ALTIVEC_BUILTIN_STVX_V4SF: >> case ALTIVEC_BUILTIN_STVX_V2DI: >> case ALTIVEC_BUILTIN_STVX_V2DF: >> +case VSX_BUILTIN_STXVW4X_V16QI: >> +case VSX_BUILTIN_STXVW4X_V8HI: >> +case VSX_BUILTIN_STXVW4X_V4SF: >> +case VSX_BUILTIN_STXVW4X_V4SI: >> +case VSX_BUILTIN_STXVD2X_V2DF: >> +case VSX_BUILTIN_STXVD2X_V2DI: >> return true; >> default: >> return false; >> } >> } >> @@ -15910,10 +15916,79 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator >> *gsi) >>gimple_set_location (g, loc); >>gsi_replace (gsi, g, true); >>return true; >> } >> >> +/* unaligned Vector loads. */ >> +case VSX_BUILTIN_LXVW4X_V16QI: >> +case VSX_BUILTIN_LXVW4X_V8HI: >> +case VSX_BUILTIN_LXVW4X_V4SF: >> +case VSX_BUILTIN_LXVW4X_V4SI: >> +case VSX_BUILTIN_LXVD2X_V2DF: >> +case VSX_BUILTIN_LXVD2X_V2DI: >> + { >> +arg0 = gimple_call_arg (stmt, 0); // offset >> +arg1 = gimple_call_arg (stmt, 1); // address >> +lhs = gimple_call_lhs (stmt); >> +location_t loc = gimple_location (stmt); >> +/* Since arg1 may be cast to a different type, just use >> ptr_type_node >> + here instead of trying to enforce TBAA on pointer types. */ >> +tree arg1_type = ptr_type_node; >> +tree lhs_type = TREE_TYPE (lhs); >> +/* in GIMPLE the type of the MEM_REF specifies the alignment. The >> + required alignment (power) is 4 bytes regardless of data type. */ >> +tree align_ltype = build_aligned_type (lhs_type, 4); >> +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. >> Create >> + the tree using the value from arg0. The resulting type will >> match >> + the type of arg1. */ >> +gimple_seq stmts = NULL; >> +tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0); >> +tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR, >> + arg1_type, arg1, temp_offset); >> +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >> +/* Use the build2 helper to set up the mem_ref. The MEM_REF could >> also >> + take an offset, but since we've already incorporated the offset >> + above, here we just pass in a zero. */ >> +gimple *g; >> +g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, >> temp_addr, >> + build_int_cst (arg1_type, >> 0))); >> +gimple_set_location (g, loc); >> +gsi_replace (gsi, g, true); >> +return true; >> + } >> + >> +/* unaligned Vector stores. */ >> +case VSX_BUILTIN_STXVW4X_V16QI: >> +case VSX_BUILTIN_STXVW4X_V8HI: >> +case VSX_BUILTIN_STXVW4X_V4SF: >> +case VSX_BUILTIN_STXVW4X_V4SI: >> +case VSX_BUILTIN_STXVD2X_V2DF: >> +case VSX_BUILTIN_STXVD2X_V2DI: >> + { >> +arg0 = gimple_call_arg (stmt, 0); /* Value to be stored. */ >> +arg1 = gimple_call_arg (stmt, 1); /* Offset. */ >> +tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address. */ >> +location_t loc = gimple_location (stmt); >> +tree arg0_type = TREE_TYPE (arg0); >> +/* Use ptr_type_node (no TBAA) for the arg2_type. */ >> +tree arg2_type = ptr_type_node; >> +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. >> Create >> + the tree using the value from arg0. The resulting type will >> match >> +
Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)
> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) > wrote: > > On 10/07/18 00:13, Jeff Law wrote: >> On 07/09/2018 10:38 AM, Richard Earnshaw wrote: >>> >>> The patches I posted earlier this year for mitigating against >>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >>> which it became obvious that a rethink was needed. This mail, and the >>> following patches attempt to address that feedback and present a new >>> approach to mitigating against this form of attack surface. >>> >>> There were two major issues with the original approach: >>> >>> - The speculation bounds were too tightly constrained - essentially >>> they had to represent and upper and lower bound on a pointer, or a >>> pointer offset. >>> - The speculation constraints could only cover the immediately preceding >>> branch, which often did not fit well with the structure of the existing >>> code. >>> >>> An additional criticism was that the shape of the intrinsic did not >>> fit particularly well with systems that used a single speculation >>> barrier that essentially had to wait until all preceding speculation >>> had to be resolved. >> Right. I suggest the Intel and IBM reps chime in on the updated semantics. >> > > Yes, logically, this is a boolean tracker value. In practice we use ~0 > for true and 0 for false, so that we can simply use it as a mask > operation later. > > I hope this intrinsic will be even more acceptable than the one that > Bill Schmidt acked previously, it's even simpler than the version we had > last time. Yes, I think this looks quite good. Thanks! Thanks also for digging into the speculation tracking algorithm. This has good potential as a conservative opt-in approach. The obvious concern is whether performance will be acceptable even for apps that really want the protection. We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 numbers on a Skylake box. We saw geomean degradations of about 42% (int) and 33% (fp). (This was just one test, so caveat emptor.) This isn't terrible given the number of potential false positives and the early state of the algorithm, but it's still a lot from a customer perspective. I'll be interested in whether your interprocedural improvements are able to reduce the conservatism a bit. Thanks, Bill > >>> >>> To address all of the above, these patches adopt a new approach, based >>> in part on a posting by Chandler Carruth to the LLVM developers list >>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>> but which we have extended to deal with inter-function speculation. >>> The patches divide the problem into two halves. >> We're essentially turning the control dependency into a value that we >> can then use to munge the pointer or the resultant data. >> >>> >>> The first half is some target-specific code to track the speculation >>> condition through the generated code to provide an internal variable >>> which can tell us whether or not the CPU's control flow speculation >>> matches the data flow calculations. The idea is that the internal >>> variable starts with the value TRUE and if the CPU's control flow >>> speculation ever causes a jump to the wrong block of code the variable >>> becomes false until such time as the incorrect control flow >>> speculation gets unwound. >> Right. >> >> So one of the things that comes immediately to mind is you have to run >> this early enough that you can still get to all the control flow and >> build your predicates. Otherwise you have do undo stuff like >> conditional move generation. > > No, the opposite, in fact. We want to run this very late, at least on > Arm systems (AArch64 or AArch32). Conditional move instructions are > fine - they're data-flow operations, not control flow (in fact, that's > exactly what the control flow tracker instructions are). By running it > late we avoid disrupting any of the earlier optimization passes as well. > >> >> On the flip side, the earlier you do this mitigation, the more you have >> to worry about what the optimizers are going to do to the code later in >> the pipeline. It's almost guaranteed a naive implementation is going to >> muck this up since we can propagate the state of the condition into the >> arms which will make the predicate state a compile time constant. >> >> In fact this seems to be running into the area of pointer providence and >&g
Re: [PATCH, rs6000 v3] enable gimple folding for vec_xl, vec_xst
> On Jul 10, 2018, at 8:48 AM, Richard Biener > wrote: > > On Tue, Jul 10, 2018 at 3:33 PM Bill Schmidt wrote: >> >> >>> On Jul 10, 2018, at 2:10 AM, Richard Biener >>> wrote: >>> >>> On Mon, Jul 9, 2018 at 9:08 PM Will Schmidt >>> wrote: >>>> >>>> Hi, >>>> Re-posting. Richard provided feedback on a previous version of this >>>> patch, I wanted to make sure he was/is OK with the latest. :-) >>>> >>>> Add support for Gimple folding for unaligned vector loads and stores. >>>> >>>> Regtest completed across variety of systems, P6,P7,P8,P9. >>>> >>>> [v2] Added the type for the MEM_REF, per feedback. >>>> Testcases for gimple-folding of the same are currently in-tree >>>> as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c. >>>> Re-tested, still looks good. :-) >>>> >>>> [v3] Updated the alignment for the MEM_REF to be 4bytes. >>>> Updated/added/removed comments in the code for clarity. >>>> >>>> OK for trunk? >>>> >>>> Thanks >>>> -Will >>>> >>>> [gcc] >>>> >>>> 2018-07-09 Will Schmidt >>>> >>>> * config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add >>>> vec_xst variants to the list. >>>> (rs6000_gimple_fold_builtin): Add support for folding unaligned >>>> vector loads and stores. >>>> >>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >>>> index 8bc4109..774c60a 100644 >>>> --- a/gcc/config/rs6000/rs6000.c >>>> +++ b/gcc/config/rs6000/rs6000.c >>>> @@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum >>>> rs6000_builtins fn_code) >>>>case ALTIVEC_BUILTIN_STVX_V8HI: >>>>case ALTIVEC_BUILTIN_STVX_V4SI: >>>>case ALTIVEC_BUILTIN_STVX_V4SF: >>>>case ALTIVEC_BUILTIN_STVX_V2DI: >>>>case ALTIVEC_BUILTIN_STVX_V2DF: >>>> +case VSX_BUILTIN_STXVW4X_V16QI: >>>> +case VSX_BUILTIN_STXVW4X_V8HI: >>>> +case VSX_BUILTIN_STXVW4X_V4SF: >>>> +case VSX_BUILTIN_STXVW4X_V4SI: >>>> +case VSX_BUILTIN_STXVD2X_V2DF: >>>> +case VSX_BUILTIN_STXVD2X_V2DI: >>>> return true; >>>>default: >>>> return false; >>>>} >>>> } >>>> @@ -15910,10 +15916,79 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator >>>> *gsi) >>>> gimple_set_location (g, loc); >>>> gsi_replace (gsi, g, true); >>>> return true; >>>> } >>>> >>>> +/* unaligned Vector loads. */ >>>> +case VSX_BUILTIN_LXVW4X_V16QI: >>>> +case VSX_BUILTIN_LXVW4X_V8HI: >>>> +case VSX_BUILTIN_LXVW4X_V4SF: >>>> +case VSX_BUILTIN_LXVW4X_V4SI: >>>> +case VSX_BUILTIN_LXVD2X_V2DF: >>>> +case VSX_BUILTIN_LXVD2X_V2DI: >>>> + { >>>> +arg0 = gimple_call_arg (stmt, 0); // offset >>>> +arg1 = gimple_call_arg (stmt, 1); // address >>>> +lhs = gimple_call_lhs (stmt); >>>> +location_t loc = gimple_location (stmt); >>>> +/* Since arg1 may be cast to a different type, just use >>>> ptr_type_node >>>> + here instead of trying to enforce TBAA on pointer types. */ >>>> +tree arg1_type = ptr_type_node; >>>> +tree lhs_type = TREE_TYPE (lhs); >>>> +/* in GIMPLE the type of the MEM_REF specifies the alignment. The >>>> + required alignment (power) is 4 bytes regardless of data type. >>>> */ >>>> +tree align_ltype = build_aligned_type (lhs_type, 4); >>>> +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. >>>> Create >>>> + the tree using the value from arg0. The resulting type will >>>> match >>>> + the type of arg1. */ >>>> +gimple_seq stmts = NULL; >>>> +tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0); >>>> +tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR, >>>> + arg1_type, arg1, temp_offset); >>>>
[PATCH, rs6000] Add missing logical-op interfaces to emmintrin.h
Hi, It was recently brought to our attention that the existing emmintrin.h header, which was believed to be feature-complete for SSE2 support, is actually missing four logical-op interfaces: _mm_and_si128 _mm_andnot_si128 _mm_or_si128 _mm_xor_si128 This patch provides those with the obvious implementations, along with test cases. I've bootstrapped it on powerpc64le-linux-gnu (P8, P9) and powerpc64-linux-gnu (P7, P8) and tested it with no regressions. Is this okay for trunk? Although this isn't a regression, it is an oversight that leaves the SSE2 support incomplete. Thus I'd like to ask permission to also backport this to gcc-8-branch after a short waiting period. It's passed regstrap on P8 and P9 LE, and P7/P8 BE testing is underway. Is that backport okay if testing succeeds? [BTW, I'm shepherding this patch on behalf of Steve Munroe.] Thanks! Bill [gcc] 2018-07-10 Bill Schmidt Steve Munroe * config/rs6000/emmintrin.h (_mm_and_si128): New function. (_mm_andnot_si128): Likewise. (_mm_or_si128): Likewise. (_mm_xor_si128): Likewise. [gcc/testsuite] 2018-07-10 Bill Schmidt Steve Munroe * gcc.target/powerpc/sse2-pand-1.c: New file. * gcc.target/powerpc/sse2-pandn-1.c: Likewise. * gcc.target/powerpc/sse2-por-1.c: Likewise. * gcc.target/powerpc/sse2-pxor-1.c: Likewise. Index: gcc/config/rs6000/emmintrin.h === --- gcc/config/rs6000/emmintrin.h (revision 262235) +++ gcc/config/rs6000/emmintrin.h (working copy) @@ -1884,6 +1884,30 @@ } extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_and_si128 (__m128i __A, __m128i __B) +{ + return (__m128i)vec_and ((__v2di) __A, (__v2di) __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_andnot_si128 (__m128i __A, __m128i __B) +{ + return (__m128i)vec_andc ((__v2di) __B, (__v2di) __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_or_si128 (__m128i __A, __m128i __B) +{ + return (__m128i)vec_or ((__v2di) __A, (__v2di) __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_xor_si128 (__m128i __A, __m128i __B) +{ + return (__m128i)vec_xor ((__v2di) __A, (__v2di) __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_cmpeq_epi8 (__m128i __A, __m128i __B) { return (__m128i) vec_cmpeq ((__v16qi) __A, (__v16qi)__B); @@ -2333,3 +2357,4 @@ } #endif /* EMMINTRIN_H_ */ + Index: gcc/testsuite/gcc.target/powerpc/sse2-pand-1.c === --- gcc/testsuite/gcc.target/powerpc/sse2-pand-1.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse2-pand-1.c (working copy) @@ -0,0 +1,41 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse2-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse2_test_pand_1 +#endif + +#include + +static __m128i +__attribute__((noinline, unused)) +test (__m128i s1, __m128i s2) +{ + return _mm_and_si128 (s1, s2); +} + +static void +TEST (void) +{ + union128i_b u, s1, s2; + char e[16]; + int i; + + s1.x = _mm_set_epi8 (1,2,3,4,10,20,30,90,-80,-40,-100,-15,98, 25, 98,7); + s2.x = _mm_set_epi8 (88, 44, 33, 22, 11, 98, 76, -100, -34, -78, -39, 6, 3, 4, 5, 119); + u.x = test (s1.x, s2.x); + + for (i = 0; i < 16; i++) + e[i] = s1.a[i] & s2.a[i]; + + if (check_union128i_b (u, e)) +abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse2-pandn-1.c === --- gcc/testsuite/gcc.target/powerpc/sse2-pandn-1.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse2-pandn-1.c (working copy) @@ -0,0 +1,41 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse2-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse2_test_pandn_1 +#endif + +#include + +static __m128i +__attribute__((noinline, unused)) +test (__m128i s1, __m128i s2) +{ + return _mm_andnot_si128 (s1, s2); +} + +static void +TEST (void) +{ + union128i_b u, s1, s2; + char e[16]; + int i; + + s1.x = _mm_set_epi8 (1,2,3,4,10,20,30,90,-80,-40,-100,-15,98, 25, 98,7); + s2.x = _mm_set_epi8 (88, 44, 33, 22, 11, 98, 76, -100, -34, -78, -39, 6, 3, 4, 5, 119); + u.x = test (s1.x, s2.x); + + for (i = 0; i < 16; i++) + e[i] = (~s1.a[i]) & s2.a[i]; + +
[PATCH, cvs] Clarify that powerpc64le-linux-gnu is a primary platform
Hi, I occasionally get questions about powerpc64le-linux-gnu being a primary platform for GCC, since the release criteria don't specifically call it out (see https://gcc.gnu.org/gcc-8/criteria.html). Currently powerpc64-linux-gnu (for big-endian) is listed instead, which is misleading. I wonder if we could make it clearer that both endianness flavors are considered primary platforms. One possibility is below, but I'd be happy with any other way of getting this across. Thanks for considering! Bill Index: criteria.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/criteria.html,v retrieving revision 1.2 diff -r1.2 criteria.html 110c110 < powerpc64-unknown-linux-gnu --- > powerpc64{,le}-unknown-linux-gnu
Re: [PATCH 11/11] rs6000 - add speculation_barrier pattern
Hi Richard, I can't ack the patch, but I am happy with it. Thank you for this work! -- Bill Bill Schmidt, Ph.D. STSM, GCC Architect for Linux on Power IBM Linux Technology Center wschm...@linux.vnet.ibm.com > On Jul 27, 2018, at 4:37 AM, Richard Earnshaw > wrote: > > > This patch reworks the existing rs6000_speculation_barrier pattern to > work with the new __builtin_sepculation_safe_value() intrinsic. The > change is trivial as it simply requires renaming the existing speculation > barrier pattern. > > So the total patch is to delete 14 characters! > > * config/rs6000/rs6000.md (speculation_barrier): Renamed from > rs6000_speculation_barrier. > * config/rs6000/rs6000.c (rs6000_expand_builtin): Adjust for > new barrier pattern name. > --- > gcc/config/rs6000/rs6000.c | 2 +- > gcc/config/rs6000/rs6000.md | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > <0011-rs6000-add-speculation_barrier-pattern.patch>
New repository location
Question: Is the new gcc git repository at gcc.gnu.org/git/gcc.git using the same location as the earlier git mirror did? I'm curious whether our repository on pike is still syncing with the new master, or whether we need to make some adjustments before we next rebase pu against master.
Re: New repository location
I apologize, I sent this to the wrong mailing list, this had meant to be internal. But thank you very much for the information! It appears we have some adjustments to make. Thanks! Bill On 1/19/20 8:46 AM, H.J. Lu wrote: On Sun, Jan 19, 2020 at 6:33 AM Bill Schmidt wrote: Question: Is the new gcc git repository at gcc.gnu.org/git/gcc.git using the same location as the earlier git mirror did? I'm curious whether our repository on pike is still syncing with the new master, or whether we need to make some adjustments before we next rebase pu against master. 2 repos are different. I renamed my old mirror and created a new one: https://gitlab.com/x86-gcc
Re: [rfc PATCH] rs6000: Updated constraint documentation
On 1/30/20 6:17 PM, Segher Boessenkool wrote: This is my current work-in-progress version. There still are rough edges, and not much is done for the output modifiers yet, but it should be in much better shape wrt the user manual now. The internals manual also is a bit better I think. md.texi is not automatically kept in synch with constraints.md (let alone generated from it), so the two diverged. I tried to correct that, too. Please let me know if you have any ideas how to improve it further, or if I did something terribly wrong, or anything else. Thanks, Segher --- gcc/config/rs6000/constraints.md | 159 +++-- gcc/doc/md.texi | 188 +++ 2 files changed, 182 insertions(+), 165 deletions(-) diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index 398c894..bafc22a 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -21,192 +21,214 @@ ;; Register constraints -(define_register_constraint "f" "rs6000_constraints[RS6000_CONSTRAINT_f]" - "@internal") - -(define_register_constraint "d" "rs6000_constraints[RS6000_CONSTRAINT_d]" - "@internal") +; Actually defined in common.md: +; (define_register_constraint "r" "GENERAL_REGS" +; "A general purpose register (GPR), @code{r0}@dots{}@code{r31}.") (define_register_constraint "b" "BASE_REGS" - "@internal") + "A base register. Like @code{r}, but @code{r0} is not allowed, so + @code{r1}@dots{}@code{r31}.") -(define_register_constraint "h" "SPECIAL_REGS" - "@internal") +(define_register_constraint "f" "rs6000_constraints[RS6000_CONSTRAINT_f]" + "A floating point register (FPR), @code{f0}@dots{}@code{f31}.") -(define_register_constraint "c" "CTR_REGS" - "@internal") - -(define_register_constraint "l" "LINK_REGS" - "@internal") +(define_register_constraint "d" "rs6000_constraints[RS6000_CONSTRAINT_d]" + "A floating point register. This is the same as @code{f} nowadays; + historically @code{f} was for single-precision and @code{d} was for + double-precision floating point.") (define_register_constraint "v" "ALTIVEC_REGS" - "@internal") + "An Altivec vector register (VR), @code{v0}@dots{}@code{v31}.") + +(define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]" + "A VSX register (VSR), @code{vs0}@dots{}@code{vs63}. Either a @code{d} + or a @code{v} register.") Not quite true, as the "d" register is only half of a VSX register. It may or may not be worth including a picture of register overlaps... + +(define_register_constraint "h" "SPECIAL_REGS" + "@internal @code{vrsave}, @code{ctr}, or @code{lr}.") + +(define_register_constraint "c" "CTR_REGS" + "The count register, @code{ctr}.") + +(define_register_constraint "l" "LINK_REGS" + "The link register, @code{lr}.") (define_register_constraint "x" "CR0_REGS" - "@internal") + "Condition register field 0, @code{cr0}.") (define_register_constraint "y" "CR_REGS" - "@internal") + "Any condition register field, @code{cr0}@dots{}@code{cr7}.") (define_register_constraint "z" "CA_REGS" - "@internal") - -;; Use w as a prefix to add VSX modes -;; any VSX register -(define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]" - "Any VSX register if the -mvsx option was used or NO_REGS.") + "@internal The carry bit, @code{XER[CA]}.") ;; NOTE: For compatibility, "wc" is reserved to represent individual CR bits. ;; It is currently used for that purpose in LLVM. (define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]" - "VSX register if the -mpower9-vector -m64 options were used or NO_REGS.") + "@internal VSX register if the -mpower9-vector -m64 options were used + or NO_REGS.") Suggest changing "used or" to "used, else". ;; NO_REGs register constraint, used to merge mov{sd,sf}, since movsd can use ;; direct move directly, and movsf can't to move between the register sets. ;; There is a mode_attr that resolves to wa for SDmode and wn for SFmode -(define_register_constraint "wn" "NO_REGS" "No register (NO_REGS).") +(define_register_constraint "wn" "NO_REGS" + "@internal No register (NO_REGS).") (define_register_constraint "wr" "rs6000_constraints[RS6000_CONSTRAINT_wr]" - "General purpose register if 64-bit instructions are enabled or NO_REGS.") + "@internal General purpose register if 64-bit instructions are enabled + or NO_REGS.") Similar here. (define_register_constraint "wx" "rs6000_constraints[RS6000_CONSTRAINT_wx]" - "Floating point register if the STFIWX instruction is enabled or NO_REGS.") + "@internal Floating point register if the STFIWX instruction is enabled + or NO_REGS.") And here. (define_register_constraint "wA" "rs6000_constraints[RS6000_CONSTRAINT_wA]" - "BASE_REGS if 64-bit instructions are enabled or NO_REGS.") + "@internal BASE_REGS if 64-bit instructions are enabled or NO_REGS.") Etc. ;; wB ne
Re: [rfc PATCH] rs6000: Updated constraint documentation
On 1/31/20 9:42 AM, Segher Boessenkool wrote: Hi Bill, Thanks a lot for looking at this! :-) On Fri, Jan 31, 2020 at 08:49:21AM -0600, Bill Schmidt wrote: +(define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]" + "A VSX register (VSR), @code{vs0}@dots{}@code{vs63}. Either a @code{d} + or a @code{v} register.") Not quite true, as the "d" register is only half of a VSX register. It may or may not be worth including a picture of register overlaps... No, the "d" registers are the actual full registers, all 128 bits of it. You often use them in a mode that uses only 64 bits, sure. Perhaps that would be worth a few words when describing the "d" constraint, then. This is not at all obvious to the casual user. Thanks! I was planning to update this to (define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]" "A VSX register (VSR), @code{vs0}@dots{}@code{vs63}. This is either an FPR (@code{d}) or a VR (@code{v}).") Does that improve it? Yes, sure. The numbering thing is also mentioned in the %x output modifier stuff. There must be a better way to present this, but I don't see it yet. Hrm. I honestly thought that was pretty good as is. Thanks again! Bill (define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]" - "VSX register if the -mpower9-vector -m64 options were used or NO_REGS.") + "@internal VSX register if the -mpower9-vector -m64 options were used + or NO_REGS.") Suggest changing "used or" to "used, else". Or just "used."; this is internals documentation only, and all similar constraints will ideally go away at some point (it just didn't fit in easily with the "enabled" attribute yet; it probably should be just "p9" for "isa" and test the TARGET_64BIT in the insn condition, something like that. Or maybe there shouldn't be separate handling for 64-bit at all here). (define_register_constraint "wr" "rs6000_constraints[RS6000_CONSTRAINT_wr]" - "General purpose register if 64-bit instructions are enabled or NO_REGS.") + "@internal General purpose register if 64-bit instructions are enabled + or NO_REGS.") Similar here. Yup. I didn't change this, fwiw, just synched up md.texi and constraints.md where they diverged. (define_memory_constraint "es" - "A ``stable'' memory operand; that is, one which does not include any -automodification of the base register. Unlike @samp{m}, this constraint -can be used in @code{asm} statements that might access the operand -several times, or that might not access it at all." + "@internal + A ``stable'' memory operand; that is, one which does not include any + automodification of the base register. This used to be useful when + @code{m} allowed automodification of the base register, but as those Trailing whitespace here. Yeah, I don't know how I missed that, git tends to shout about it. Fixed. @item wa -Any VSX register if the @option{-mvsx} option was used or NO_REGS. +A VSX register (VSR), @code{vs0}@dots{}@code{vs63}. Either a @code{d} or a @code{v} +register. Same concern as above. It is literally the same text now (unless I messed up the c'n'p). +@ifset INTERNALS +@item h +@code{vrsave}, @code{ctr}, or @code{lr}. +@end ifset I don't see vrsave elsewhere in either document (should have noted this in constraints.md also). There is no other constraint for vrsave. constraints.md says (define_register_constraint "h" "SPECIAL_REGS" "@internal @code{vrsave}, @code{ctr}, or @code{lr}.") (Same text, as should be). It ends up only in gccint.*, not in gcc.* . @item we -VSX register if the @option{-mcpu=power9} and @option{-m64} options -were used or NO_REGS. +VSX register if the -mpower9-vector -m64 options were used or NO_REGS. As above. I won't call out the rest of these. Since this is not new text, and it now only ends up in the internals documentation, and a lot of it should go away in the short term anyway, and importantly I don't know a good simple way to write what it does anyway (because it *isn't* simple), I hoped I could just keep this for now. Hrm, I lost markup there, will fix. +@item wZ +Indexed or indirect memory operand, ignoring the bottom 4 bits. +@end ifset For consistency, "An indexed..." ? Yes, thanks! +@item Z +A memory operand that is an indexed or indirect from a register. "indexed or indirect access"? And s/from a register// yeah. Great improvements! Thanks :-) Somewhere it should say (in the gcc.* doc) that there are other constraints and output modifiers as well, and some are even supported for backwards compatibility, but here only the ones you should use are mentioned. Not sure where to do that. Segher
[PATCH 03/14] Add file support and functions for diagnostic support.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (bif_file): New filescope variable. (ovld_file): Likewise. (header_file): Likewise. (init_file): Likewise. (defines_file): Likewise. (pgm_path): Likewise. (bif_path): Likewise. (ovld_path): Likewise. (header_path): Likewise. (init_path): Likewise. (defines_path): Likewise. (LINELEN): New defined constant. (linebuf): New filescope variable. (line): Likewise. (pos): Likewise. (diag): Likewise. (bif_diag): New function. (ovld_diag): New function. --- gcc/config/rs6000/rs6000-genbif.c | 47 +++ 1 file changed, 47 insertions(+) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index a53209ed040..3fb13cb11d6 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -122,3 +122,50 @@ along with GCC; see the file COPYING3. If not see #include #include #include + +/* Input and output file descriptors and pathnames. */ +static FILE *bif_file; +static FILE *ovld_file; +static FILE *header_file; +static FILE *init_file; +static FILE *defines_file; + +static const char *pgm_path; +static const char *bif_path; +static const char *ovld_path; +static const char *header_path; +static const char *init_path; +static const char *defines_path; + +/* Position information. Note that "pos" is zero-indexed, but users + expect one-indexed column information, so representations of "pos" + as columns in diagnostic messages must be adjusted. */ +#define LINELEN 1024 +static char linebuf[LINELEN]; +static int line; +static int pos; + +/* Pointer to a diagnostic function. */ +void (*diag) (const char *, ...) __attribute__ ((format (printf, 1, 2))) + = NULL; + +/* Custom diagnostics. */ +static void __attribute__ ((format (printf, 1, 2))) +bif_diag (const char * fmt, ...) +{ + va_list args; + fprintf (stderr, "%s:%d: ", bif_path, line); + va_start (args, fmt); + vfprintf (stderr, fmt, args); + va_end (args); +} + +static void __attribute__ ((format (printf, 1, 2))) +ovld_diag (const char * fmt, ...) +{ + va_list args; + fprintf (stderr, "%s:%d: ", ovld_path, line); + va_start (args, fmt); + vfprintf (stderr, fmt, args); + va_end (args); +} -- 2.17.1
[PATCH 01/14] Initial create of rs6000-genbif.c.
Includes header documentation and initial set of include directives. 2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c: New file. --- gcc/config/rs6000/rs6000-genbif.c | 124 ++ 1 file changed, 124 insertions(+) create mode 100644 gcc/config/rs6000/rs6000-genbif.c diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c new file mode 100644 index 000..a53209ed040 --- /dev/null +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -0,0 +1,124 @@ +/* Generate built-in function initialization and recognition for Power. + Copyright (C) 2020 Free Software Foundation, Inc. + Contributed by Bill Schmidt, IBM + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +/* This program generates built-in function initialization and + recognition code for Power targets, based on text files that + describe the built-in functions and vector overloads: + + rs6000-bif.def Table of built-in functions + rs6000-overload.def Table of overload functions + + Both files group similar functions together in "stanzas," as + described below. + + Each stanza in the built-in function file starts with a line + identifying the target mask for which the group of functions is + permitted, with the mask in square brackets. This is the only + information allowed on the stanza header line, other than + whitespace. Following the stanza header are two lines for each + function: the prototype line and the attributes line. The + prototype line has this format, where the square brackets + indicate optional information and angle brackets indicate + required information: + + [kind] (); + + Here [kind] can be one of "const", "pure", or "math"; +is a legal type for a built-in function result; +is the name by which the function can be called; + and is a comma-separated list of legal types + for built-in function arguments. The argument list may be + empty, but the parentheses and semicolon are required. + + The attributes line looks like this: + + {} + + Here is a unique internal identifier for the built-in + function that will be used as part of an enumeration of all + built-in functions; is the define_expand or + define_insn that will be invoked when the call is expanded; + and is a comma-separated list of special + conditions that apply to the built-in function. The attribute + list may be empty, but the braces are required. + + Attributes are strings, such as these: + + initProcess as a vec_init function + set Process as a vec_set function + ext Process as a vec_extract function + nosoft Not valid with -msoft-float + ldv Needs special handling for vec_ld semantics + stv Needs special handling for vec_st semantics + reveNeeds special handling for element reversal + abs Needs special handling for absolute value + predNeeds special handling for comparison predicates + htm Needs special handling for transactional memory + + An example stanza might look like this: + +[TARGET_ALTIVEC] + const vector signed char __builtin_altivec_abs_v16qi (vector signed char); +ABS_V16QI absv16qi2 {abs} + const vector signed short __builtin_altivec_abs_v8hi (vector signed short); +ABS_V8HI absv8hi2 {abs} + + Note the use of indentation, which is recommended but not required. + + The overload file has more complex stanza headers. Here the stanza + represents all functions with the same overloaded function name: + + [, , ] + + Here the square brackets are part of the syntax, is a + unique internal identifier for the overload that will be used as part + of an enumeration of all overloaded functions; is the + name that will appear as a #define in altivec.h; and + is the name that is overloaded in the back end. + + Each function entry again has two lines. The first line is again a + prototype line (this time without [kind]): + + (); + + The second line contains only one token: the that this + particular instance of the overloaded function maps to. It must + match a token that appears in the bif file. + + An example stanza might look like this: + +[VEC_ABS, vec_abs, __builtin_vec_abs] + vector signed char __builtin_vec_abs (vector signe
[PATCH 00/14] rs6000: Begin replacing built-in support
The current built-in support in the rs6000 back end requires at least a master's degree in spelunking to comprehend. It's full of cruft, redundancy, and unused bits of code, and long overdue for a replacement. This is the first part of my project to do that. My intent is to make adding new built-in functions as simple as adding a few lines to a couple of files, and automatically generating as much of the initialization, overload resolution, and expansion logic as possible. This patch series establishes the format of the input files and creates a new program (rs6000-genbif) to: * Parse the input files into an internal representation; * Generate a file of #defines (rs6000-vecdefines.h) for eventual inclusion into altivec.h; and * Generate an initialization file to create and initialize tables of built-in functions and overloads. Note that none of the code in this patch set affects GCC's operation at all, with the exception of patch #14. Patch 14 causes the program rs6000-genbif to be built and executed, producing the output files, and linking rs6000-bif.o into the executable. However, none of the code in rs6000-bif.o is called, so the only effect is to make the gcc executable larger. I'd like to consider at least patches 1-13 as stage 4 material for the current release. I'd prefer to also include patch 14 for convenience, but I understand if that's not deemed acceptable. I've attempted to break this up into logical pieces for easy consumption, but some of the patches may still be a bit large. Please let me know if you'd like me to break any of them up. Thanks in advance for the review! Bill Schmidt (14): Initial create of rs6000-genbif.c. Add stubs for input files. These will grow much larger. Add file support and functions for diagnostic support. Support functions to parse whitespace, lines, identifiers, integers. Add support functions for matching types. Red-black tree implementation for balanced tree search. Add main function with stub functions for parsing and output. Add support for parsing rs6000-bif.def. Add parsing support for rs6000-overload.def. Build function type identifiers and store them. Write #defines to rs6000-vecdefines.h. Write code to rs6000-bif.h. Write code to rs6000-bif.c. Incorporate new code into the build machinery. gcc/config.gcc|3 +- gcc/config/rs6000/rbtree.c| 233 +++ gcc/config/rs6000/rbtree.h| 51 + gcc/config/rs6000/rs6000-bif.def | 187 ++ gcc/config/rs6000/rs6000-call.c | 35 + gcc/config/rs6000/rs6000-genbif.c | 2295 + gcc/config/rs6000/rs6000-overload.def |5 + gcc/config/rs6000/t-rs6000| 22 + 8 files changed, 2830 insertions(+), 1 deletion(-) create mode 100644 gcc/config/rs6000/rbtree.c create mode 100644 gcc/config/rs6000/rbtree.h create mode 100644 gcc/config/rs6000/rs6000-bif.def create mode 100644 gcc/config/rs6000/rs6000-genbif.c create mode 100644 gcc/config/rs6000/rs6000-overload.def -- 2.17.1
[PATCH 04/14] Support functions to parse whitespace, lines, identifiers, integers.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (MININT): New defined constant. (exit_codes): New enum. (consume_whitespace): New function. (advance_line): New function. (safe_inc_pos): New function. (match_identifier): New function. (match_integer): New function. --- gcc/config/rs6000/rs6000-genbif.c | 99 +++ 1 file changed, 99 insertions(+) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index 3fb13cb11d6..197059cc2d2 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -123,6 +123,10 @@ along with GCC; see the file COPYING3. If not see #include #include +/* Used as a sentinel for range constraints on integer fields. No field can + be 32 bits wide, so this is a safe sentinel value. */ +#define MININT INT32_MIN + /* Input and output file descriptors and pathnames. */ static FILE *bif_file; static FILE *ovld_file; @@ -145,6 +149,11 @@ static char linebuf[LINELEN]; static int line; static int pos; +/* Exit codes for the shell. */ +enum exit_codes { + EC_INTERR +}; + /* Pointer to a diagnostic function. */ void (*diag) (const char *, ...) __attribute__ ((format (printf, 1, 2))) = NULL; @@ -169,3 +178,93 @@ ovld_diag (const char * fmt, ...) vfprintf (stderr, fmt, args); va_end (args); } + +/* Pass over unprintable characters and whitespace (other than a newline, + which terminates the scan). */ +static void +consume_whitespace () +{ + while (pos < LINELEN && isspace(linebuf[pos]) && linebuf[pos] != '\n') +pos++; + return; +} + +/* Get the next nonblank line, returning 0 on EOF, 1 otherwise. */ +static int +advance_line (FILE *file) +{ + while (1) +{ + /* Read ahead one line and check for EOF. */ + if (!fgets (linebuf, sizeof(linebuf), file)) + return 0; + line++; + pos = 0; + consume_whitespace (); + if (linebuf[pos] != '\n') + return 1; +} +} + +static inline void +safe_inc_pos () +{ + if (pos++ >= LINELEN) +{ + (*diag) ("line length overrun.\n"); + exit (EC_INTERR); +} +} + +/* Match an identifier, returning NULL on failure, else a pointer to a + buffer containing the identifier. */ +static char * +match_identifier () +{ + int lastpos = pos - 1; + while (isalnum (linebuf[lastpos + 1]) || linebuf[lastpos + 1] == '_') +if (++lastpos >= LINELEN - 1) + { + (*diag) ("line length overrun.\n"); + exit (EC_INTERR); + } + + if (lastpos < pos) +return 0; + + char *buf = (char *) malloc (lastpos - pos + 2); + memcpy (buf, &linebuf[pos], lastpos - pos + 1); + buf[lastpos - pos + 1] = '\0'; + + pos = lastpos + 1; + return buf; +} + +/* Match an integer and return its value, or MININT on failure. */ +static int +match_integer () +{ + int startpos = pos; + if (linebuf[pos] == '-') +safe_inc_pos (); + + int lastpos = pos - 1; + while (isdigit (linebuf[lastpos + 1])) +if (++lastpos >= LINELEN - 1) + { + (*diag) ("line length overrun in match_integer.\n"); + exit (EC_INTERR); + } + + if (lastpos < pos) +return MININT; + + pos = lastpos + 1; + char *buf = (char *) malloc (lastpos - startpos + 2); + memcpy (buf, &linebuf[startpos], lastpos - startpos + 1); + buf[lastpos - startpos + 1] = '\0'; + + int x; + sscanf (buf, "%d", &x); + return x; +} -- 2.17.1
[PATCH 02/14] Add stubs for input files. These will grow much larger.
This patch adds a subset of the builtin and overload descriptions. I've also started annotating the old-style descriptions in rs6000-c.c where I'm deliberately not planning to support new versions of them. We may have to have some discussion around these at some point, but this helps me track this as I move through the transition. 2020-02-03 Bill Schmidt * config/rs6000/rs6000-bif.def: New file. * config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Annotate some deprecated and bogus entries. * config/rs6000/rs6000-overload.def: New file. --- gcc/config/rs6000/rs6000-bif.def | 187 ++ gcc/config/rs6000/rs6000-call.c | 35 + gcc/config/rs6000/rs6000-overload.def | 5 + 3 files changed, 227 insertions(+) create mode 100644 gcc/config/rs6000/rs6000-bif.def create mode 100644 gcc/config/rs6000/rs6000-overload.def diff --git a/gcc/config/rs6000/rs6000-bif.def b/gcc/config/rs6000/rs6000-bif.def new file mode 100644 index 000..85196400993 --- /dev/null +++ b/gcc/config/rs6000/rs6000-bif.def @@ -0,0 +1,187 @@ +[TARGET_ALTIVEC] + math vf __builtin_altivec_vmaddfp (vf, vf, vf); +VMADDFP fmav4sf4 {} + vss __builtin_altivec_vmhaddshs (vss, vss, vss); +VMHADDSHS altivec_vmhaddshs {} + vss __builtin_altivec_vmhraddshs (vss, vss, vss); +VMHRADDSHS altivec_vmhraddshs {} + const vss __builtin_altivec_vmladduhm (vss, vss, vss); +VMLADDUHM fmav8hi4 {} + const vui __builtin_altivec_vmsumubm (vuc, vuc, vui); +VMSUMUBM altivec_vmsumubm {} + const vsi __builtin_altivec_vmsummbm (vsc, vuc, vsi); +VMSUMMBM altivec_vmsummbm {} + const vui __builtin_altivec_vmsumuhm (vus, vus, vui); +VMSUMUHM altivec_vmsumuhm {} + const vsi __builtin_altivec_vmsumshm (vss, vss, vsi); +VMSUMSHM altivec_vmsumshm {} + vui __builtin_altivec_vmsumuhs (vus, vus, vui); +VMSUMUHS altivec_vmsumuhs {} + vsi __builtin_altivec_vmsumshs (vss, vss, vsi); +VMSUMSHS altivec_vmsumshs {} + math vf __builtin_altivec_vnmsubfp (vf, vf, vf); +VNMSUBFP nfmsv4sf4 {} + const vsq __builtin_altivec_vperm_1ti (vsq, vsq, vuc); +VPERM_1TI altivec_vperm_v1ti {} + const vf __builtin_altivec_vperm_4sf (vf, vf, vuc); +VPERM_4SF altivec_vperm_v4sf {} + const vsi __builtin_altivec_vperm_4si (vsi, vsi, vuc); +VPERM_4SI altivec_vperm_v4si {} + const vss __builtin_altivec_vperm_8hi (vss, vss, vuc); +VPERM_8HI altivec_vperm_v8hi {} + const vsc __builtin_altivec_vperm_16qi (vsc, vsc, vuc); +VPERM_16QI altivec_vperm_v16qi {} + const vuq __builtin_altivec_vperm_1ti_uns (vuq, vuq, vuc); +VPERM_1TI_UNS altivec_vperm_v1ti_uns {} + const vui __builtin_altivec_vperm_4si_uns (vui, vui, vuc); +VPERM_4SI_UNS altivec_vperm_v4si_uns {} + const vus __builtin_altivec_vperm_8hi_uns (vus, vus, vuc); +VPERM_8HI_UNS altivec_vperm_v8hi_uns {} + const vuc __builtin_altivec_vperm_16qi_uns (vuc, vuc, vuc); +VPERM_16QI_UNS altivec_vperm_v16qi_uns {} + const vf __builtin_altivec_vsel_4sf (vf, vf, vbi); +VSEL_4SF_B vector_select_v4sf {} + const vf __builtin_altivec_vsel_4sf (vf, vf, vui); +VSEL_4SF_U vector_select_v4sf {} + const vsi __builtin_altivec_vsel_4si (vsi, vsi, vbi); +VSEL_4SI_B vector_select_v4si {} + const vsi __builtin_altivec_vsel_4si (vsi, vsi, vui); +VSEL_4SI_U vector_select_v4si {} + const vui __builtin_altivec_vsel_4si (vui, vui, vbi); +VSEL_4SI_UB vector_select_v4si {} + const vui __builtin_altivec_vsel_4si (vui, vui, vui); +VSEL_4SI_UU vector_select_v4si {} + const vbi __builtin_altivec_vsel_4si (vbi, vbi, vbi); +VSEL_4SI_BB vector_select_v4si {} + const vbi __builtin_altivec_vsel_4si (vbi, vbi, vui); +VSEL_4SI_BU vector_select_v4si {} + const vss __builtin_altivec_vsel_8hi (vss, vss, vbs); +VSEL_8HI_B vector_select_v8hi {} + const vss __builtin_altivec_vsel_8hi (vss, vss, vus); +VSEL_8HI_U vector_select_v8hi {} + const vus __builtin_altivec_vsel_8hi (vus, vus, vbs); +VSEL_8HI_UB vector_select_v8hi {} + const vus __builtin_altivec_vsel_8hi (vus, vus, vus); +VSEL_8HI_UU vector_select_v8hi {} + const vbs __builtin_altivec_vsel_8hi (vbs, vbs, vbs); +VSEL_8HI_BB vector_select_v8hi {} + const vbs __builtin_altivec_vsel_8hi (vbs, vbs, vus); +VSEL_8HI_BU vector_select_v8hi {} + const vsc __builtin_altivec_vsel_16qi (vsc, vsc, vbc); +VSEL_16QI_B vector_select_v16qi {} + const vsc __builtin_altivec_vsel_16qi (vsc, vsc, vuc); +VSEL_16QI_U vector_select_v16qi {} + const vuc __builtin_altivec_vsel_16qi (vuc, vuc, vbc); +VSEL_16QI_UB vector_select_v16qi {} + const vuc __builtin_altivec_vsel_16qi (vuc, vuc, vuc); +VSEL_16QI_UU vector_select_v16qi {} + const vbc __builtin_altivec_vsel_16qi (vbc, vbc, vbc); +VSEL_16QI_BB vector_select_v16qi {} + const vbc __builtin_altivec_vsel_16qi (vbc, vbc, vuc); +VSEL_16QI_BU vector_select_v16qi {} + const vsq __builtin_altivec_vsel_1ti (vsq, vsq, vuq); +
[PATCH 07/14] Add main function with stub functions for parsing and output.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (rbtree.h): New include. (num_bif_stanzas): New filescope variable. (num_bifs): Likewise. (num_ovld_stanzas): Likewise. (num_ovlds): Likewise. (exit_codes): Add more enum values. (bif_rbt): New filescope variable. (ovld_rbt): Likewise. (fntype_rbt): Likewise. (parse_bif): New function stub. (parse_ovld): Likewise. (write_header_file): Likewise. (write_init_file): Likewise. (write_defines_file): Likewise. (main): New function. --- gcc/config/rs6000/rs6000-genbif.c | 185 ++ 1 file changed, 185 insertions(+) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index 7c1082fbe8f..38401224dce 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -122,6 +122,7 @@ along with GCC; see the file COPYING3. If not see #include #include #include +#include "rbtree.h" /* Used as a sentinel for range constraints on integer fields. No field can be 32 bits wide, so this is a safe sentinel value. */ @@ -155,6 +156,8 @@ enum void_status { VOID_OK }; +static int num_bif_stanzas; + /* Legal base types for an argument or return type. */ enum basetype { BT_CHAR, @@ -196,11 +199,33 @@ struct typeinfo { int val2; }; +static int num_bifs; +static int num_ovld_stanzas; +static int num_ovlds; + /* Exit codes for the shell. */ enum exit_codes { + EC_OK, + EC_BADARGS, + EC_NOBIF, + EC_NOOVLD, + EC_NOHDR, + EC_NOINIT, + EC_NODEFINES, + EC_PARSEBIF, + EC_PARSEOVLD, + EC_WRITEHDR, + EC_WRITEINIT, + EC_WRITEDEFINES, EC_INTERR }; +/* The red-black trees for built-in function identifiers, built-in + overload identifiers, and function type descriptors. */ +static rbt_strings bif_rbt; +static rbt_strings ovld_rbt; +static rbt_strings fntype_rbt; + /* Pointer to a diagnostic function. */ void (*diag) (const char *, ...) __attribute__ ((format (printf, 1, 2))) = NULL; @@ -721,3 +746,163 @@ match_type (typeinfo *typedata, int voidok) consume_whitespace (); return match_basetype (typedata); } + +/* Parse the built-in file. Return 1 for success, 5 for a parsing failure. */ +static int +parse_bif () +{ + return 1; +} + +/* Parse the overload file. Return 1 for success, 6 for a parsing error. */ +static int +parse_ovld () +{ + return 1; +} + +/* Write everything to the header file (rs6000-bif.h). */ +static int +write_header_file () +{ + return 1; +} + +/* Write everything to the initialization file (rs6000-bif.c). */ +static int +write_init_file () +{ + return 1; +} + +/* Write everything to the include file (rs6000-vecdefines.h). */ +static int +write_defines_file () +{ + return 1; +} + +/* Main program to convert flat files into built-in initialization code. */ +int +main (int argc, const char **argv) +{ + if (argc != 6) +{ + fprintf (stderr, + "Five arguments required: two input file and three output" + "files.\n"); + exit (EC_BADARGS); +} + + pgm_path = argv[0]; + bif_path = argv[1]; + ovld_path = argv[2]; + header_path = argv[3]; + init_path = argv[4]; + defines_path = argv[5]; + + bif_file = fopen (bif_path, "r"); + if (!bif_file) +{ + fprintf (stderr, "Cannot find input built-in file '%s'.\n", bif_path); + exit (EC_NOBIF); +} + ovld_file = fopen (ovld_path, "r"); + if (!ovld_file) +{ + fprintf (stderr, "Cannot find input overload file '%s'.\n", ovld_path); + exit (EC_NOOVLD); +} + header_file = fopen (header_path, "w"); + if (!header_file) +{ + fprintf (stderr, "Cannot open header file '%s' for output.\n", + header_path); + exit (EC_NOHDR); +} + init_file = fopen (init_path, "w"); + if (!init_file) +{ + fprintf (stderr, "Cannot open init file '%s' for output.\n", init_path); + exit (EC_NOINIT); +} + defines_file = fopen (defines_path, "w"); + if (!defines_file) +{ + fprintf (stderr, "Cannot open defines file '%s' for output.\n", + defines_path); + exit (EC_NODEFINES); +} + + /* Initialize the balanced trees containing built-in function ids, + overload function ids, and function type declaration ids. */ + bif_rbt.rbt_nil = (rbt_string_node *) malloc (sizeof (rbt_string_node)); + bif_rbt.rbt_nil->color = RBT_BLACK; + bif_rbt.rbt_root = bif_rbt.rbt_nil; + + ovld_rbt.rbt_nil = (rbt_string_node *) malloc (sizeof (rbt_string_node)); + ovld_rbt.rbt_nil->color = RBT_BLACK; + ovld_rbt.rbt_root = ovld_rbt.rbt_nil; + + fntype_rbt.rbt_nil = (rbt_string_node *) malloc (sizeof (rbt_string_node)); + fntype_rbt.rbt_nil->
[PATCH 08/14] Add support for parsing rs6000-bif.def.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (MAXBIFSTANZAS): New defined constant. (bif_stanzas): New filescope variable. (curr_bif_stanza): Likewise. (fnkinds): New enum. (typelist): New struct. (attrinfo): New struct. (prototype): New struct. (MAXBIFS): New defined constant. (bifdata): New struct. (bifs): New filescope variable. (curr_bif): Likewise. (parse_bif_args): New function. (parse_bif_attrs): New function. (parse_prototype): New function. (parse_bif_entry): New function. (parse_bif_stanza): New function. (parse_bif): Implement. --- gcc/config/rs6000/rs6000-genbif.c | 473 +- 1 file changed, 472 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index 38401224dce..e7ce777afbb 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -156,7 +156,23 @@ enum void_status { VOID_OK }; +/* Stanzas are groupings of built-in functions and overloads by some + common feature/attribute. These definitions are for built-in function + stanzas. */ +#define MAXBIFSTANZAS 256 +static char *bif_stanzas[MAXBIFSTANZAS]; static int num_bif_stanzas; +static int curr_bif_stanza; + +/* Function modifiers provide special handling for const, pure, and math + functions. These are mutually exclusive, and therefore kept separate + from other bif attributes. */ +enum fnkinds { + FNK_NONE, + FNK_CONST, + FNK_PURE, + FNK_MATH +}; /* Legal base types for an argument or return type. */ enum basetype { @@ -199,7 +215,54 @@ struct typeinfo { int val2; }; +/* A list of argument types. */ +struct typelist { + typeinfo info; + typelist *next; +}; + +/* Attributes of a builtin function. */ +struct attrinfo { + char isinit; + char isset; + char isext; + char isnosoft; + char isldv; + char isstv; + char isreve; + char isabs; + char ispred; + char ishtm; +}; + +/* Fields associated with a function prototype (bif or overload). */ +struct prototype { + typeinfo rettype; + char *bifname; + int nargs; + typelist *args; + int restr_opnd; + restriction restr; + int restr_val1; + int restr_val2; +}; + +/* Data associated with a builtin function, and a table of such data. */ +#define MAXBIFS 16384 +struct bifdata { + int stanza; + fnkinds kind; + prototype proto; + char *idname; + char *patname; + attrinfo attrs; + char *fndecl; +}; + +static bifdata bifs[MAXBIFS]; static int num_bifs; +static int curr_bif; + static int num_ovld_stanzas; static int num_ovlds; @@ -747,11 +810,419 @@ match_type (typeinfo *typedata, int voidok) return match_basetype (typedata); } +/* Parse the argument list, returning 1 if success or 0 if any + malformation is found. */ +static int +parse_bif_args (prototype *protoptr) +{ + typelist **argptr = &protoptr->args; + int *nargs = &protoptr->nargs; + int *restr_opnd = &protoptr->restr_opnd; + restriction *restr = &protoptr->restr; + int *val1 = &protoptr->restr_val1; + int *val2 = &protoptr->restr_val2; + + int success; + *nargs = 0; + + /* Start the argument list. */ + consume_whitespace (); + if (linebuf[pos] != '(') +{ + (*diag) ("missing '(' at column %d.\n", pos + 1); + return 0; +} + safe_inc_pos (); + + do { +consume_whitespace (); +int oldpos = pos; +typelist *argentry = (typelist *) malloc (sizeof (typelist)); +memset (argentry, 0, sizeof (*argentry)); +typeinfo *argtype = &argentry->info; +success = match_type (argtype, VOID_NOTOK); +if (success) + { + if (argtype->restr) + { + if (*restr_opnd) + { + (*diag) ("More than one restricted operand\n"); + return 0; + } + *restr_opnd = *nargs; + *restr = argtype->restr; + *val1 = argtype->val1; + *val2 = argtype->val2; + } + (*nargs)++; + *argptr = argentry; + argptr = &argentry->next; + consume_whitespace (); + if (linebuf[pos] == ',') + safe_inc_pos (); + else if (linebuf[pos] != ')') + { + (*diag) ("arg not followed by ',' or ')' at column %d.\n", +pos + 1); + return 0; + } + +#ifdef DEBUG + (*diag) ("argument type: isvoid = %d, isconst = %d, isvector = %d, \ +issigned = %d, isunsigned = %d, isbool = %d, ispixel = %d, ispointer = %d, \ +base = %d, restr = %d, val1 = %d, val2 = %d, pos = %d.\n", +argtype->isvoid, argtype->isconst, argtype->isvector, +argtype->issigned, argtype->isunsigned, arg
[PATCH 05/14] Add support functions for matching types.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (void_status): New enum. (basetype): Likewise. (restriction): Likewise. (typeinfo): New struct. (match_basetype): New function. (match_const_restriction): New function. (match_type): New function. --- gcc/config/rs6000/rs6000-genbif.c | 453 ++ 1 file changed, 453 insertions(+) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index 197059cc2d2..7c1082fbe8f 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -149,6 +149,53 @@ static char linebuf[LINELEN]; static int line; static int pos; +/* Used to determine whether a type can be void (only return types). */ +enum void_status { + VOID_NOTOK, + VOID_OK +}; + +/* Legal base types for an argument or return type. */ +enum basetype { + BT_CHAR, + BT_SHORT, + BT_INT, + BT_LONGLONG, + BT_FLOAT, + BT_DOUBLE, + BT_INT128, + BT_FLOAT128 +}; + +/* Ways in which a const int value can be restricted. RES_BITS indicates + that the integer is restricted to val1 bits, interpreted as signed or + unsigned depending on whether the type is signed or unsigned. RES_RANGE + indicates that the integer is restricted to values between val1 and val2, + inclusive. RES_VALUES indicates that the integer must have one of the + values val1 or val2. */ +enum restriction { + RES_NONE, + RES_BITS, + RES_RANGE, + RES_VALUES +}; + +/* Type modifiers for an argument or return type. */ +struct typeinfo { + char isvoid; + char isconst; + char isvector; + char issigned; + char isunsigned; + char isbool; + char ispixel; + char ispointer; + basetype base; + restriction restr; + int val1; + int val2; +}; + /* Exit codes for the shell. */ enum exit_codes { EC_INTERR @@ -268,3 +315,409 @@ match_integer () sscanf (buf, "%d", &x); return x; } + +/* Match one of the allowable base types. Consumes one token unless the + token is "long", which must be paired with a second "long". Return 1 + for success, 0 for failure. */ +static int +match_basetype (typeinfo *typedata) +{ + consume_whitespace (); + int oldpos = pos; + char *token = match_identifier (); + if (!token) +{ + (*diag) ("missing base type in return type at column %d\n", pos + 1); + return 0; +} + + if (!strcmp (token, "char")) +typedata->base = BT_CHAR; + else if (!strcmp (token, "short")) +typedata->base = BT_SHORT; + else if (!strcmp (token, "int")) +typedata->base = BT_INT; + else if (!strcmp (token, "long")) +{ + consume_whitespace (); + char *mustbelong = match_identifier (); + if (!mustbelong || strcmp (mustbelong, "long")) + { + (*diag) ("incomplete 'long long' at column %d\n", oldpos + 1); + return 0; + } + typedata->base = BT_LONGLONG; +} + else if (!strcmp (token, "float")) +typedata->base = BT_FLOAT; + else if (!strcmp (token, "double")) +typedata->base = BT_DOUBLE; + else if (!strcmp (token, "__int128")) +typedata->base = BT_INT128; + else if (!strcmp (token, "_Float128")) +typedata->base = BT_FLOAT128; + else +{ + (*diag) ("unrecognized base type at column %d\n", oldpos + 1); + return 0; +} + + return 1; +} + +/* A const int argument may be restricted to certain values. This is + indicated by one of the following occurring after the "int' token: + +restricts the constant to x bits, interpreted as signed or + unsigned according to the argument type + restricts the constant to the inclusive range [x,y] + {x,y} restricts the constant to one of two values, x or y. + + Here x and y are integer tokens. Return 1 for success, else 0. */ +static int +match_const_restriction (typeinfo *typedata) +{ + int oldpos = pos; + if (linebuf[pos] == '<') +{ + safe_inc_pos (); + oldpos = pos; + int x = match_integer (); + if (x == MININT) + { + (*diag) ("malformed integer at column %d.\n", oldpos + 1); + return 0; + } + consume_whitespace (); + if (linebuf[pos] == '>') + { + typedata->restr = RES_BITS; + typedata->val1 = x; + safe_inc_pos (); + return 1; + } + else if (linebuf[pos] != ',') + { + (*diag) ("malformed restriction at column %d.\n", pos + 1); + return 0; + } + safe_inc_pos (); + oldpos = pos; + int y = match_integer (); + if (y == MININT) + { + (*diag) ("malformed integer at column %d.\n", oldpos + 1); + return 0; + } + typeda
[PATCH 06/14] Red-black tree implementation for balanced tree search.
2020-02-03 Bill Schmidt * config/rs6000/rbtree.c: New file. * config/rs6000/rbtree.h: New file. --- gcc/config/rs6000/rbtree.c | 233 + gcc/config/rs6000/rbtree.h | 51 2 files changed, 284 insertions(+) create mode 100644 gcc/config/rs6000/rbtree.c create mode 100644 gcc/config/rs6000/rbtree.h diff --git a/gcc/config/rs6000/rbtree.c b/gcc/config/rs6000/rbtree.c new file mode 100644 index 000..f6a8cdefaae --- /dev/null +++ b/gcc/config/rs6000/rbtree.c @@ -0,0 +1,233 @@ +/* Partial red-black tree implementation for rs6000-genbif.c. + Copyright (C) 2020 Free Software Foundation, Inc. + Contributed by Bill Schmidt, IBM + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include +#include +#include +#include +#include "rbtree.h" + +/* Create a new node to be inserted into the red-black tree. An inserted + node starts out red. */ +static struct rbt_string_node * +rbt_create_node (struct rbt_strings *t, char *str) +{ + struct rbt_string_node *nodeptr += (struct rbt_string_node *) malloc (sizeof (struct rbt_string_node)); + nodeptr->str = str; + nodeptr->left = t->rbt_nil; + nodeptr->right = t->rbt_nil; + nodeptr->par = NULL; + nodeptr->color = RBT_RED; + return nodeptr; +} + +/* Perform a left-rotate operation on NODE in the red-black tree. */ +static void +rbt_left_rotate (struct rbt_strings *t, struct rbt_string_node *node) +{ + struct rbt_string_node *right = node->right; + assert (right); + + /* Turn RIGHT's left subtree into NODE's right subtree. */ + node->right = right->left; + if (right->left != t->rbt_nil) +right->left->par = node; + + /* Link NODE's parent to RIGHT. */ + right->par = node->par; + + if (node->par == t->rbt_nil) +t->rbt_root = right; + else if (node == node->par->left) +node->par->left = right; + else +node->par->right = right; + + /* Put NODE on RIGHT's left. */ + right->left = node; + node->par = right; +} + +/* Perform a right-rotate operation on NODE in the red-black tree. */ +static void +rbt_right_rotate (struct rbt_strings *t, struct rbt_string_node *node) +{ + struct rbt_string_node *left = node->left; + assert (left); + + /* Turn LEFT's right subtree into NODE's left subtree. */ + node->left = left->right; + if (left->right != t->rbt_nil) +left->right->par = node; + + /* Link NODE's parent to LEFT. */ + left->par = node->par; + + if (node->par == t->rbt_nil) +t->rbt_root = left; + else if (node == node->par->right) +node->par->right = left; + else +node->par->left = left; + + /* Put NODE on LEFT's right. */ + left->right = node; + node->par = left; +} + +/* Insert STR into the tree, returning 1 for success and 0 if STR already + appears in the tree. */ +int +rbt_insert (struct rbt_strings *t, char *str) +{ + struct rbt_string_node *curr = t->rbt_root; + struct rbt_string_node *trail = t->rbt_nil; + + while (curr != t->rbt_nil) +{ + trail = curr; + int cmp = strcmp (str, curr->str); + if (cmp < 0) + curr = curr->left; + else if (cmp > 0) + curr = curr->right; + else + return 0; +} + + struct rbt_string_node *fresh = rbt_create_node (t, str); + fresh->par = trail; + + if (trail == t->rbt_nil) +t->rbt_root = fresh; + else if (strcmp (fresh->str, trail->str) < 0) +trail->left = fresh; + else +trail->right = fresh; + + fresh->left = t->rbt_nil; + fresh->right = t->rbt_nil; + + /* FRESH has now been inserted as a red leaf. If we have invalidated + one of the following preconditions, we must fix things up: + (a) If a node is red, both of its children are black. + (b) The root must be black. + Note that only (a) or (b) applies at any given time during the + process. This algorithm works up the tree from NEW looking + for a red child with a red parent, and cleaning that up. If the + root ends up red, it gets turned black at the end. */ + curr = fresh; + while (curr->par->color == RBT_RED) +if (curr->par == curr->par->par->left) +
[PATCH 09/14] Add parsing support for rs6000-overload.def.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (ovld_stanza): New struct. (MAXOVLDSTANZAS): New defined constant. (ovld_stanzas): New filescope variable. (curr_ovld_stanza): Likewise. (MAXOVLDS): New defined constant. (ovlddata): New struct. (ovlds): New filescope variable. (curr_ovld): Likewise. (parse_ovld_entry): New function. (parse_ovld_stanza): New function. (parse_ovld): Implement. --- gcc/config/rs6000/rs6000-genbif.c | 207 +- 1 file changed, 206 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index e7ce777afbb..22b5b1df3b9 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -263,8 +263,30 @@ static bifdata bifs[MAXBIFS]; static int num_bifs; static int curr_bif; +/* Stanzas are groupings of built-in functions and overloads by some + common feature/attribute. These definitions are for overload stanzas. */ +struct ovld_stanza { + char *stanza_id; + char *extern_name; + char *intern_name; +}; + +#define MAXOVLDSTANZAS 256 +static ovld_stanza ovld_stanzas[MAXOVLDSTANZAS]; static int num_ovld_stanzas; +static int curr_ovld_stanza; + +#define MAXOVLDS 16384 +struct ovlddata { + int stanza; + prototype proto; + char *idname; + char *fndecl; +}; + +static ovlddata ovlds[MAXOVLDS]; static int num_ovlds; +static int curr_ovld; /* Exit codes for the shell. */ enum exit_codes { @@ -1225,11 +1247,194 @@ parse_bif () return result; } +/* Parse one two-line entry in the overload file. Return 0 for EOF, 1 for + success, 2 for end-of-stanza, and 6 for a parsing failure. */ +static int +parse_ovld_entry () +{ + /* Check for end of stanza. */ + pos = 0; + consume_whitespace (); + if (linebuf[pos] == '[') +return 2; + + /* Allocate an entry in the overload table. */ + if (num_ovlds >= MAXOVLDS - 1) +{ + (*diag) ("too many overloads.\n"); + return 6; +} + + curr_ovld = num_ovlds++; + ovlds[curr_ovld].stanza = curr_ovld_stanza; + + if (!parse_prototype (&ovlds[curr_ovld].proto)) +return 6; + + /* Now process line 2, which just contains the builtin id. */ + if (!advance_line (ovld_file)) +{ + (*diag) ("unexpected EOF.\n"); + return 0; +} + + pos = 0; + consume_whitespace (); + int oldpos = pos; + char *id = match_identifier (); + ovlds[curr_ovld].idname = id; + if (!id) +{ + (*diag) ("missing overload id at column %d.\n", pos + 1); + return 6; +} + +#ifdef DEBUG + (*diag) ("ID name is '%s'.\n", id); +#endif + + /* The builtin id has to match one from the bif file. */ + if (!rbt_find (&bif_rbt, id)) +{ + (*diag) ("builtin ID '%s' not found in bif file.\n", id); + return 6; +} + + /* Save the ID in a lookup structure. */ + if (!rbt_insert (&ovld_rbt, id)) +{ + (*diag) ("duplicate function ID '%s' at column %d.\n", id, oldpos + 1); + return 6; +} + + consume_whitespace (); + if (linebuf[pos] != '\n') +{ + (*diag) ("garbage at end of line at column %d.\n", pos + 1); + return 6; +} + return 1; +} + +/* Parse one stanza of the input overload file. linebuf already contains the + first line to parse. Return 1 for success, 0 for EOF, 6 for failure. */ +static int +parse_ovld_stanza () +{ + /* Parse the stanza header. */ + pos = 0; + consume_whitespace (); + + if (linebuf[pos] != '[') +{ + (*diag) ("ill-formed stanza header at column %d.\n", pos + 1); + return 6; +} + safe_inc_pos (); + + char *stanza_name = match_identifier (); + if (!stanza_name) +{ + (*diag) ("no identifier found in stanza header.\n"); + return 6; +} + + /* Add the identifier to a table and set the number to be recorded + with subsequent overload entries. */ + if (num_ovld_stanzas >= MAXOVLDSTANZAS) +{ + (*diag) ("too many stanza headers.\n"); + return 6; +} + + curr_ovld_stanza = num_ovld_stanzas++; + ovld_stanza *stanza = &ovld_stanzas[curr_ovld_stanza]; + stanza->stanza_id = stanza_name; + + consume_whitespace (); + if (linebuf[pos] != ',') +{ + (*diag) ("missing comma at column %d.\n", pos + 1); + return 6; +} + safe_inc_pos (); + + consume_whitespace (); + stanza->extern_name = match_identifier (); + if (!stanza->extern_name) +{ + (*diag) ("missing external name at column %d.\n", pos + 1); + return 6; +} + + consume_whitespace (); + if (linebuf[pos] != ',') +{ + (*diag) ("missing comma at column %d.\n", pos + 1); + return 6; +} + safe_inc_pos (); + + consume_whitespace
[PATCH 12/14] Write code to rs6000-bif.h.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (write_autogenerated_header): New function. (write_bif_enum): New callback function. (write_ovld_enum): New callback function. (write_decls): New function. (write_extern_fntype): New callback function. (write_header_file): Implement. --- gcc/config/rs6000/rs6000-genbif.c | 160 ++ 1 file changed, 160 insertions(+) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index 0bcd035060d..c84df1aa30f 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -1617,10 +1617,170 @@ parse_ovld () return result; } +/* Write a comment at the top of FILE about how the code was generated. */ +static void +write_autogenerated_header (FILE *file) +{ + fprintf (file, "/* Automatically generated by the program '%s'\n", + pgm_path); + fprintf (file, " from the files '%s' and '%s'. */\n\n", + bif_path, ovld_path); +} + +/* Callback functions used in creating enumerations. */ +void write_bif_enum (char *str) +{ + fprintf (header_file, " RS6000_BIF_%s,\n", str); +} + +void write_ovld_enum (char *str) +{ + fprintf (header_file, " RS6000_OVLD_%s,\n", str); +} + +/* Write declarations into the header file. */ +static void +write_decls () +{ + fprintf (header_file, "enum rs6000_gen_builtins\n{\n RS6000_BIF_NONE,\n"); + rbt_inorder_callback (&bif_rbt, bif_rbt.rbt_root, write_bif_enum); + fprintf (header_file, " RS6000_BIF_MAX\n};\n\n"); + + fprintf (header_file, "enum restriction {\n"); + fprintf (header_file, " RES_NONE,\n"); + fprintf (header_file, " RES_BITS,\n"); + fprintf (header_file, " RES_RANGE,\n"); + fprintf (header_file, " RES_VALUES\n"); + fprintf (header_file, "};\n\n"); + + fprintf (header_file, "struct bifdata\n"); + fprintf (header_file, "{\n"); + fprintf (header_file, " const char *bifname;\n"); + fprintf (header_file, " tree fntype;\n"); + fprintf (header_file, " insn_code icode;\n"); + fprintf (header_file, " int bifattrs;\n"); + fprintf (header_file, " int restr_opnd;\n"); + fprintf (header_file, " restriction restr;\n"); + fprintf (header_file, " int restr_val1;\n"); + fprintf (header_file, " int restr_val2;\n"); + fprintf (header_file, "};\n\n"); + + fprintf (header_file, "#define bif_const_bit\t(0x001)\n"); + fprintf (header_file, "#define bif_pure_bit\t(0x002)\n"); + fprintf (header_file, "#define bif_round_bit\t(0x004)\n"); + fprintf (header_file, "#define bif_init_bit\t(0x008)\n"); + fprintf (header_file, "#define bif_set_bit\t(0x010)\n"); + fprintf (header_file, "#define bif_ext_bit\t(0x020)\n"); + fprintf (header_file, "#define bif_nosoft_bit\t(0x040)\n"); + fprintf (header_file, "#define bif_ldv_bit\t(0x080)\n"); + fprintf (header_file, "#define bif_stv_bit\t(0x100)\n"); + fprintf (header_file, "#define bif_reve_bit\t(0x200)\n"); + fprintf (header_file, "#define bif_abs_bit\t(0x400)\n"); + fprintf (header_file, "#define bif_pred_bit\t(0x800)\n"); + fprintf (header_file, "#define bif_htm_bit\t(0x0001000)\n"); + fprintf (header_file, "\n"); + fprintf (header_file, + "#define bif_is_const(x)\t\t((x).bifattrs & bif_const_bit)\n"); + fprintf (header_file, + "#define bif_is_pure(x)\t\t((x).bifattrs & bif_pure_bit)\n"); + fprintf (header_file, + "#define bif_has_rounding(x)\t((x).bifattrs & bif_round_bit)\n"); + fprintf (header_file, + "#define bif_is_init(x)\t\t((x).bifattrs & bif_init_bit)\n"); + fprintf (header_file, + "#define bif_is_extract(x)\t((x).bifattrs & bif_ext_bit)\n"); + fprintf (header_file, + "#define bif_is_nosoft(x)\t((x).bifattrs & bif_nosoft_bit)\n"); + fprintf (header_file, + "#define bif_is_ldv(x)\t\t((x).bifattrs & bif_ldv_bit)\n"); + fprintf (header_file, + "#define bif_is_stv(x)\t\t((x).bifattrs & bif_stv_bit)\n"); + fprintf (header_file, + "#define bif_is_reve(x)\t\t((x).bifattrs & bif_reve_bit)\n"); + fprintf (header_file, + "#define bif_is_abs(x)\t\t((x).bifattrs & bif_abs_bit)\n"); + fprintf (header_file, + "#define bif_is_predicate(x)\t((x).bifattrs & bif_pred_bit)\n"); + fprintf (header_file, + "#define bif_is_htm(x)\t\t((x).bifattrs & bif_htm_bit)\n"); + fpr
[PATCH 11/14] Write #defines to rs6000-vecdefines.h.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (write_defines_file): Implement. --- gcc/config/rs6000/rs6000-genbif.c | 4 1 file changed, 4 insertions(+) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index 7bb7d2b24a4..0bcd035060d 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -1635,6 +1635,10 @@ write_init_file () static int write_defines_file () { + for (int i = 0; i < num_ovld_stanzas; i++) +fprintf (defines_file, "#define %s %s\n", +ovld_stanzas[i].extern_name, +ovld_stanzas[i].intern_name); return 1; } -- 2.17.1
[PATCH 10/14] Build function type identifiers and store them.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (complete_vector_type): New function. (complete_base_type): New function. (construct_fntype_id): New function. (parse_bif_entry): Call construct_fntype_id. (parse_ovld_entry): Likewise. --- gcc/config/rs6000/rs6000-genbif.c | 180 ++ 1 file changed, 180 insertions(+) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index 22b5b1df3b9..7bb7d2b24a4 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -999,6 +999,178 @@ htm = %d.\n", return 1; } +/* Convert a vector type into a mode string. */ +static void +complete_vector_type (typeinfo *typeptr, char *buf, int *bufi) +{ + buf[(*bufi)++] = 'v'; + if (typeptr->ispixel) +{ + memcpy (&buf[*bufi], "p8hi", 4); + *bufi += 4; +} + else +{ + if (typeptr->isbool) + buf[(*bufi)++] = 'b'; + switch (typeptr->base) + { + case BT_CHAR: + memcpy (&buf[*bufi], "16qi", 4); + *bufi += 4; + break; + case BT_SHORT: + memcpy (&buf[*bufi], "8hi", 3); + *bufi += 3; + break; + case BT_INT: + memcpy (&buf[*bufi], "4si", 3); + *bufi += 3; + break; + case BT_LONGLONG: + memcpy (&buf[*bufi], "2di", 3); + *bufi += 3; + break; + case BT_FLOAT: + memcpy (&buf[*bufi], "4sf", 3); + *bufi += 3; + break; + case BT_DOUBLE: + memcpy (&buf[*bufi], "2df", 3); + *bufi += 3; + break; + case BT_INT128: + memcpy (&buf[*bufi], "1ti", 3); + *bufi += 3; + break; + case BT_FLOAT128: + memcpy (&buf[*bufi], "1tf", 3); + *bufi += 3; + break; + default: + (*diag) ("unhandled basetype %d.\n", typeptr->base); + exit (EC_INTERR); + } +} +} + +/* Convert a base type into a mode string. */ +static void +complete_base_type (typeinfo *typeptr, char *buf, int *bufi) +{ + switch (typeptr->base) +{ +case BT_CHAR: + memcpy (&buf[*bufi], "qi", 2); + break; +case BT_SHORT: + memcpy (&buf[*bufi], "hi", 2); + break; +case BT_INT: + memcpy (&buf[*bufi], "si", 2); + break; +case BT_LONGLONG: + memcpy (&buf[*bufi], "di", 2); + break; +case BT_FLOAT: + memcpy (&buf[*bufi], "sf", 2); + break; +case BT_DOUBLE: + memcpy (&buf[*bufi], "df", 2); + break; +case BT_INT128: + memcpy (&buf[*bufi], "ti", 2); + break; +case BT_FLOAT128: + memcpy (&buf[*bufi], "tf", 2); + break; +default: + (*diag) ("unhandled basetype %d.\n", typeptr->base); + exit (EC_INTERR); +} + + *bufi += 2; +} + +/* Build a function type descriptor identifier from the return type + and argument types, and store it if it does not already exist. + Return the identifier. */ +static char * +construct_fntype_id (prototype *protoptr) +{ + /* Determine the maximum space for a function type descriptor id. + Each type requires at most 8 characters (6 for the mode*, 1 for + the optional 'u' preceding the mode, and 1 for an underscore + following the mode). We also need 5 characters for the string + "ftype" that separates the return mode from the argument modes. + The last argument doesn't need a trailing underscore, but we + count that as the one trailing "ftype" instead. For the special + case of zero arguments, we need 8 for the return type and 7 + for "ftype_v". Finally, we need one character for the + terminating null. Thus for a function with N arguments, we + need at most 8N+14 characters for N>0, otherwise 16. + + *Worst case is vb16qi for "vector bool char". */ + int len = protoptr->nargs ? (protoptr->nargs + 1) * 8 + 6 : 16; + char *buf = (char *) malloc (len); + int bufi = 0; + + if (protoptr->rettype.ispointer) +{ + assert (protoptr->rettype.isvoid); + buf[bufi++] = 'p'; +} + if (protoptr->rettype.isvoid) +buf[bufi++] = 'v'; + else +{ + if (protoptr->rettype.isunsigned) + buf[bufi++] = 'u'; + if (protoptr->rettype.isvector) + complete_vector_type (&protoptr->rettype, buf, &bufi); + else + complete_base_type (&protoptr->rettype, buf, &bufi); +} + + memcpy (&buf[bufi], "_ftype", 6); + bufi += 6; + + if (!protoptr->nargs) +{ +
[PATCH 13/14] Write code to rs6000-bif.c.
2020-02-03 Bill Schmidt * config/rs6000/rs6000-genbif.c (typemap): New struct. (TYPE_MAP_SIZE): New defined constant. (type_map): New filescope variable. (write_fntype): New callback function. (map_token_to_type_node): New function. (write_type_node): New function. (write_fntype_init): New function. (write_init_bif_table): New function. (write_init_ovld_table): New function. (write_init_file): Implement. --- gcc/config/rs6000/rs6000-genbif.c | 367 ++ 1 file changed, 367 insertions(+) diff --git a/gcc/config/rs6000/rs6000-genbif.c b/gcc/config/rs6000/rs6000-genbif.c index c84df1aa30f..ac640e14def 100644 --- a/gcc/config/rs6000/rs6000-genbif.c +++ b/gcc/config/rs6000/rs6000-genbif.c @@ -311,6 +311,52 @@ static rbt_strings bif_rbt; static rbt_strings ovld_rbt; static rbt_strings fntype_rbt; +/* Mapping from type tokens to type node names. */ +struct typemap +{ + const char *key; + const char *value; +}; + +/* This table must be kept in alphabetical order, as we use binary + search for table lookups in map_token_to_type_node. */ +#define TYPE_MAP_SIZE 32 +static typemap type_map[TYPE_MAP_SIZE] = + { +{ "df","double" }, +{ "di","intDI" }, +{ "hi","intHI" }, +{ "pv","ptr" }, +{ "qi","intQI" }, +{ "sf","float" }, +{ "si","intSI" }, +{ "tf","long_double" }, +{ "ti","intTI" }, +{ "udi", "unsigned_intDI" }, +{ "uhi", "unsigned_intHI" }, +{ "uqi", "unsigned_intQI" }, +{ "usi", "unsigned_intSI" }, +{ "uti", "unsigned_intTI" }, +{ "uv16qi","unsigned_V16QI" }, +{ "uv1ti", "unsigned_V1TI" }, +{ "uv2di", "unsigned_V2DI" }, +{ "uv4si", "unsigned_V4SI" }, +{ "uv8hi", "unsigned_V8HI" }, +{ "v", "void" }, +{ "v16qi", "V16QI" }, +{ "v1ti", "V1TI" }, +{ "v2df", "V2DF" }, +{ "v2di", "V2DI" }, +{ "v4sf", "V4SF" }, +{ "v4si", "V4SI" }, +{ "v8hi", "V8HI" }, +{ "vb16qi","bool_V16QI" }, +{ "vb2di", "bool_V2DI" }, +{ "vb4si", "bool_V4SI" }, +{ "vb8hi", "bool_V8HI" }, +{ "vp8hi", "pixel_V8HI" }, + }; + /* Pointer to a diagnostic function. */ void (*diag) (const char *, ...) __attribute__ ((format (printf, 1, 2))) = NULL; @@ -1761,6 +1807,80 @@ write_extern_fntype (char *str) fprintf (header_file, "extern tree %s;\n", str); } +void +write_fntype (char *str) +{ + fprintf (init_file, "tree %s;\n", str); +} + +/* Look up TOK in the type map and return the corresponding string used + to build the type node. */ +static const char * +map_token_to_type_node (char *tok) +{ + int low = 0; + int high = TYPE_MAP_SIZE - 1; + int mid = (low + high) >> 1; + int cmp; + + while ((cmp = strcmp (type_map[mid].key, tok)) && low < high) +{ + if (cmp < 0) + low = (low == mid ? mid + 1 : mid); + else + high = (high == mid ? mid - 1: mid); + mid = (low + high) >> 1; +} + + if (low > high) +{ + (*diag) ("token '%s' doesn't appear in the type map!\n", tok); + exit (EC_INTERR); +} + + return type_map[mid].value; +} + +/* Write the type node corresponding to TOK. */ +static void +write_type_node (char *tok) +{ + const char *str = map_token_to_type_node (tok); + fprintf (init_file, "%s_type_node", str); +} + +/* Write an initializer for a function type identified by STR. */ +void +write_fntype_init (char *str) +{ + char *tok; + + /* Avoid side effects of strtok on the original string by using a copy. */ + char *buf = (char *) malloc (strlen (str) + 1); + strcpy (buf, str); + + fprintf (init_file, " %s\n= build_function_type_list (", buf); + tok = strtok (buf, "_"); + write_type_node (tok); + tok = strtok (0, "_"); + assert (tok); + assert (!strcmp (tok, "ftype")); + + tok = strtok (0, "_"); + if (tok) +fprintf (init_file, ",\n\t\t\t\t"); + + /* Note: A function with no arguments ends with '_ftype_v'. */ + while (tok && strcmp (tok, "v")) +{ + write_type_node (tok); + tok = strtok (0, "_"); + fprintf (
[PATCH 14/14] Incorporate new code into the build machinery.
2020-02-03 Bill Schmidt * config.gcc (powerpc-*-*-*): Add rs6000-bif.o to extra_objs. * config/rs6000/t-rs6000 (rs6000-genbif.o): New target. (rbtree.o): Likewise. (rs6000-genbif): Likewise. (rs6000-bif.c): Likewise. (rs6000-bif.o): Likewise. --- gcc/config.gcc | 3 ++- gcc/config/rs6000/t-rs6000 | 22 ++ 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index ae5a845fcce..72448e43017 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -504,7 +504,8 @@ or1k*-*-*) ;; powerpc*-*-*) cpu_type=rs6000 - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o" + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o" + extra_objs="${extra_objs} rs6000-call.o rs6000-bif.o" extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h" extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h" extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000 index 170a69591dd..a3a214b2bfb 100644 --- a/gcc/config/rs6000/t-rs6000 +++ b/gcc/config/rs6000/t-rs6000 @@ -47,6 +47,28 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c $(COMPILE) $< $(POSTCOMPILE) +rs6000-genbif.o: $(srcdir)/config/rs6000/rs6000-genbif.c + $(COMPILE) $< + $(POSTCOMPILE) + +rbtree.o: $(srcdir)/config/rs6000/rbtree.c + $(COMPILE) $< + $(POSTCOMPILE) + +rs6000-genbif: rs6000-genbif.o rbtree.o + +$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \ + $(filter-out $(BUILD_LIBDEPS), $^) $(BUILD_LIBS) + +rs6000-bif.c: rs6000-genbif $(srcdir)/config/rs6000/rs6000-bif.def \ + $(srcdir)/config/rs6000/rs6000-overload.def + ./rs6000-genbif $(srcdir)/config/rs6000/rs6000-bif.def \ + $(srcdir)/config/rs6000/rs6000-overload.def rs6000-bif.h \ + rs6000-bif.c rs6000-vecdefines.h + +rs6000-bif.o: rs6000-bif.c + $(COMPILE) $< + $(POSTCOMPILE) + $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \ $(srcdir)/config/rs6000/rs6000-cpus.def $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \ -- 2.17.1
Re: [PATCH 01/14] Initial create of rs6000-genbif.c.
On 2/4/20 12:27 PM, Segher Boessenkool wrote: Hi! On Mon, Feb 03, 2020 at 08:26:02PM -0600, Bill Schmidt wrote: Includes header documentation and initial set of include directives. Please use full sentences in commit messages. OK. +/* This program generates built-in function initialization and + recognition code for Power targets, based on text files that + describe the built-in functions and vector overloads: + + rs6000-bif.def Table of built-in functions + rs6000-overload.def Table of overload functions I really don't think using the new acronym "bif" helps; built-in functions already are often called "builtins" (or "intrinsics", which is problematic itself). Until we manage to replace the old methods, we already have rs6000-builtin.def, so I am a bit constrained in my choices. Given that restriction, what name would you prefer? I can use rs6000-builtins.def (the plural) if you like. I didn't think I was inventing "bif" as shorthand, but maybe that was an LLVM thing... + ext Process as a vec_extract function Please spell out "extract"? There are too many other words starting with "ext", some of which you could expect here ("extend", "extension", maybe even "extra"); OK. + ldv Needs special handling for vec_ld semantics + stv Needs special handling for vec_st semantics Call those "vec_ld" and "vec_st", then? Or should I get used to it, the names aren't obvious, but cut-and-paste always is ;-) Hm. Well, vec_ld is a specific built-in, but this applies to a few more than just that one. But sure, if you want. +[TARGET_ALTIVEC] Can this be a C expression? Most gen* programs just copy similar things to the generated C code, which can be interesting to debug, but works perfectly well otherwise. I rather prefer the way it is. I do generate C code from this in the subsequent patches. But I like table-driven code to use things that look like tables for input. :-) + const vector signed char __builtin_altivec_abs_v16qi (vector signed char); +ABS_V16QI absv16qi2 {abs} + const vector signed short __builtin_altivec_abs_v8hi (vector signed short); +ABS_V8HI absv8hi2 {abs} + + Note the use of indentation, which is recommended but not required. It does require a single newline at the end of each such line, right? Does that work aout almost always, or do you get very long lines? Yes, for now I am requiring the newline at the end of each line. I found that it does indeed get very long (unreadably long) lines for vector signatures. I forgot to update this documentation when I changed my format. I am now using abbreviations for vector types that match those we use often in our test cases ("vuc" for "vector unsigned char", "vsll" for "vector signed long long", etc.). This makes for very nicely readable inputs (see patch #2). The above now becomes const vsc __builtin_altivec_abs_v16qi (vsc); ABS_V16QI absv16qi2 {abs} const vss __builtin_altivec_abs_v8hi (vss); ABS_V8HI absv8hi2 {abs} I will fix the documentation! + [, , ] Hrm, "internal" suggests "name within the GCC code", but that is not what it means. Maybe something like abi-name and builtin-name? OK, that's reasonable. + Blank lines may be used as desired in these files. Between stanzas and stuff only? There are places where newlines are significant and not just whitespace, right? I don't believe so, although there may be places where I forgot to allow a line to be advanced -- that would be a bug, though, so let me know if you see any. Blank lines don't have any inherent meaning in the input files. Great docs, thanks! Thanks for the review! Bill Segher
Re: [PATCH 01/14] Initial create of rs6000-genbif.c.
On 2/4/20 4:36 PM, Segher Boessenkool wrote: On Tue, Feb 04, 2020 at 03:10:32PM -0600, Bill Schmidt wrote: I really don't think using the new acronym "bif" helps; built-in functions already are often called "builtins" (or "intrinsics", which is problematic itself). Until we manage to replace the old methods, we already have rs6000-builtin.def, so I am a bit constrained in my choices. Given that restriction, what name would you prefer? I can use rs6000-builtins.def (the plural) if you like. As we discussed (offline), maybe rs6000-builtin-new.def is best (and at the end of this conversion, just move it). +1 + ldv Needs special handling for vec_ld semantics + stv Needs special handling for vec_st semantics Call those "vec_ld" and "vec_st", then? Or should I get used to it, the names aren't obvious, but cut-and-paste always is ;-) Hm. Well, vec_ld is a specific built-in, but this applies to a few more than just that one. But sure, if you want. "ldv" certainly is shorter and nicer in principle, but it is a bit cryptic. As I said, it's probably not too hard to get used to it; and maybe a better name will present itself? Maybe ldvec and stvec would serve without introducing specific builtin confusion. +[TARGET_ALTIVEC] Can this be a C expression? Most gen* programs just copy similar things to the generated C code, which can be interesting to debug, but works perfectly well otherwise. I rather prefer the way it is. I do generate C code from this in the subsequent patches. But I like table-driven code to use things that look like tables for input. :-) That's not what I meant... Can you say [TARGET_ALTIVEC && TARGET_64BIT] here? Or even just [!TARGET_ALTIVEC] or [1] for always, or [0] for never ("commented out"). Ah! Sorry for misunderstanding. Right now just an identifier is allowed, but we could certainly grab the whole string between the [] and drop it in with no concerns. Hopefully we both remember when we get to the patch that reads the stanzas... + Blank lines may be used as desired in these files. Between stanzas and stuff only? There are places where newlines are significant and not just whitespace, right? I don't believe so, although there may be places where I forgot to allow a line to be advanced -- that would be a bug, though, so let me know if you see any. Blank lines don't have any inherent meaning in the input files. Not blank lines, I'm asking about newlines :-) But those are not allowed to be inserted just anywhere, a line has to be one line, iiuc? Yes. Additional newlines can follow a newline, but the individual lines must contain everything that's expected in them. Bill Segher
Re: [PATCH 00/14] rs6000: Begin replacing built-in support
On 2/5/20 6:30 AM, Segher Boessenkool wrote: Hi! On Wed, Feb 05, 2020 at 08:57:16AM +0100, Richard Biener wrote: On Tue, Feb 4, 2020 at 6:40 PM Segher Boessenkool wrote: On Mon, Feb 03, 2020 at 08:26:01PM -0600, Bill Schmidt wrote: My intent is to make adding new built-in functions as simple as adding a few lines to a couple of files, and automatically generating as much of the initialization, overload resolution, and expansion logic as possible. This patch series establishes the format of the input files and creates a new program (rs6000-genbif) to: Let's call it rs6000-gen-builtins or similar. Not as cryptic. I believe we talked about this a few years ago. Any reason this is powerpc specific? If sufficiently generic most targets would benefit and maybe even frontends and the middle-end could make use of this. The generator program, that is. (disclaimer: I didn't look into the patches at all) One thing that's powerpc-unique (I believe) is our peculiar overloading infrastructure for the original AltiVec interface (extended to cover quite a bit more territory since). But that's largely an extra level of abstraction that could eventually be optional. There's also some specificity to our vector types (things like vector bool and vector pixel) that would need to be abstracted away. Finally, there's a set of flags for special handling that are definitely Power-specific and would have to be abstracted away also. Nothing that couldn't be dealt with given enough attention, so far as I can see. But honestly I have not looked a great deal into other targets' built-in handling to see what other landmines might be present. Absolutely, but we first want to solve the urgent problem for Power (because that is what it is); it's a huge job with that reduction of scope, already. After *that* is done, it will be clearer how to do things for what is wanted generically, will be clearer what is wanted in the first place :-) Yes, this is a necessary first step to even be able to see what's going on... I always wondered if we can make our C frontend spit out things from C declarations (with maybe extra #pragmas for some of the more obscure details) and how to fit that into the bootstrap. I think there will be too many problem cases, a direct description of the builtins will work better (but is more verbose of course). In any case, Bill's patches keep the exact same approach in rs6000 as we had before, just with some more pre-processing and macros etc.; which results in a much shorter description, many cases folded into one, which as a bonus also fixes bugs (directly, when two things you fold should be the same but are not, at least one of them is wrong; and maybe more importantly indirectly: a reader of the tables will spot errors much more easily if they fit on one screen, if you have similar entries on the screen at the same time so you *can* compare; and there will be more readers as well even, people are actually scared of having to look at it currently). So, yes, this same approach might be a good fit generically, but we'll do it for rs6000 only, in the interest of ever getting it done ;-) The generator programs etc. can move to generic code later, if that helps and there is interest in it, there isn't much (if anything) in here that is specific to our arch. I'll keep this possibility in mind as we move forward. It's probably a matter of months to get everything converted over just for Power. But this set of patches is the most generic; the remaining patches will all be quite Power-specific. Thanks, Bill Segher
rs6000: Correct documentation for __builtin_mtfsf
Hi, PR93570 reports that the documentation shows __builtin_mtfsf to return a double, but that is incorrect. The return signature should be void. Corrected herein. Built on powerpc64le-unknown-linux-gnu and verified correct PDF output. Committed as obvious. Thanks! Bill 2020-02-06 Bill Schmidt PR target/93570 * doc/extend.texi (Basic PowerPC Built-in Functions): Correct prototype for __builtin_mtfsf. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ec99c38a607..5739063b330 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -17166,7 +17166,7 @@ unsigned long __builtin_ppc_mftb (); double __builtin_unpack_ibm128 (__ibm128, int); __ibm128 __builtin_pack_ibm128 (double, double); double __builtin_mffs (void); -double __builtin_mtfsf (const int, double); +void __builtin_mtfsf (const int, double); void __builtin_mtfsb0 (const int); void __builtin_mtfsb1 (const int); void __builtin_set_fpscr_rn (int);
Re: [PATCH], Rename and document PowerPC -mprefixed-addr to -mprefixed
On 2/10/20 9:24 PM, Segher Boessenkool wrote: Hi! On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote: This patch renames the PowerPC internal switch -mprefixed-addr to be -mprefixed. If you use -mpcrel, you must be using the 64-bit ELF v2 ABI, and the code model must be medium. Currently, anyway. If you use -mpcrel, the compiler will generate PC-relative loads and stores to access items, rather than the current TOC based loads and stores. Where that is the best thing to do. Is that always now? :-) Yes. :-) Bill If you use -mpcrel, it implies -mprefixed. If you use -mno-prefixed, you cannot use -mpcrel. -mno-prefixed should imply -mno-pcrel; does it? * doc/invoke.texi (RS/6000 and PowerPC Options): Docment the (typo) --- /tmp/1ySv8k_invoke.texi 2020-02-07 17:56:52.700489015 -0500 +++ gcc/doc/invoke.texi 2020-02-07 17:34:02.925611138 -0500 @@ -22327,7 +22328,6 @@ faster on processors with 32-bit busses aligns structures containing the above types differently than most published application binary interface specifications for the m68k. -@item -mpcrel @opindex mpcrel Use the pc-relative addressing mode of the 68000 directly, instead of using a global offset table. At present, this option implies @option{-fpic}, This isn't a correct change. Okay for trunk modulo the m68k change. Thanks! Segher
Re: [PATCH, rs6000] Adjust vectorization cost for scalar COND_EXPR
Hi! I can't approve this, but for what it's worth it looks fine to me. Bill On 12/11/19 6:31 AM, Kewen.Lin wrote: Hi, We found that the vectorization cost modeling on scalar COND_EXPR is a bit off on rs6000. One typical case is 548.exchange2_r, -Ofast -mcpu=power9 -mrecip -fvect-cost-model=unlimited is better than -Ofast -mcpu=power9 -mrecip (the default is -fvect-cost-model=dynamic) by 1.94%. Scalar COND_EXPR is expanded into compare + branch or compare + isel normally, either of them should be priced more than the simple FXU operation. This patch is to add additional vectorization cost onto scalar COND_EXPR on top of builtin_vectorization_cost. The idea to use additional cost value 2 instead of the others: 1) try various possible value candidates from 1 to 5, 2 is the best measured on Power9. 2) from latency view, compare takes 3 cycles and isel takes 2 on Power9, it's 2.5 times of simple FXU instruction which takes cost 1 in the current modeling, it's close. 3) get fine SPEC2017 ratio on Power8 as well. The SPEC2017 performance evaluation on Power9 with explicit unrolling shows 548.exchange2_r +2.35% gains, but 526.blender_r -1.99% degradation, the others is trivial. By further investigation on 526.blender_r, the assembly of 10 hottest functions are unchanged, the impact should be due to some side effects. SPECINT geomean +0.16%, SPECFP geomean -0.16% (mainly due to blender_r). Without explicit unrolling, 548.exchange2_r +1.78% gains and the others are trivial. SPECINT geomean +0.19%, SPECINT geomean +0.06%. While the SPEC2017 performance evaluation on Power8 shows 500.perlbench_r +1.32% gain and 511.povray_r +2.03% gain, the others are trivial. SPECINT geomean +0.08%, SPECINT geomean +0.18%. Bootstrapped and regress tested on powerpc64le-linux-gnu. Is OK for trunk? BR, Kewen --- gcc/ChangeLog 2019-12-11 Kewen Lin * config/rs6000/rs6000.c (adjust_vectorization_cost): New function. (rs6000_add_stmt_cost): Call adjust_vectorization_cost and update stmt_cost.
Re: [PATCH]. Fix HAVE_SYS_SDT_H for cross-compilation
Hi Christian and Jakub, I'm curious whether there was ever any resolution for: http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01124.html. We've encountered what appears to be the same issue internally when building a cross for powerpc64le-linux-gnu: /scratch/tmp/anton/toolchain/build/src/gcc/libgcc/unwind-dw2.c:41:21: fatal error: sys/sdt.h: No such file or directory #include The gcc configure is looking at the build machine header files, instead of the installed headers for the host we're building. We can work around this with a build-sysroot, but it seems that shouldn't be necessary. Thoughts?
Re: [PATCH]. Fix HAVE_SYS_SDT_H for cross-compilation
On Thu, 2013-08-22 at 19:47 +0200, Jakub Jelinek wrote: > On Thu, Aug 22, 2013 at 09:39:48AM -0500, Bill Schmidt wrote: > > Hi Christian and Jakub, > > > > I'm curious whether there was ever any resolution for: > > http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01124.html. > > The last mail I remember didn't make any sense: > #include "tconfig.h" > > that includes it: > > #ifndef GCC_TCONFIG_H > #define GCC_TCONFIG_H > #ifndef USED_FOR_TARGET > # define USED_FOR_TARGET > #endif > #include "auto-host.h" > > in which there is : > > #ifndef USED_FOR_TARGET > #define HAVE_SYS_SDT_H 1 > #endif > > That means USED_FOR_TARGET is defined and thus HAVE_SYS_SDT_H is never > defined, which is not desirable. > > Jakub > Yes, that doesn't seem right at all. OK, thanks. I'll stick this on a list as a low-priority item to fix one of these days. Thanks, Bill
Re: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
On Mon, 2013-09-02 at 11:15 +0200, Richard Biener wrote: > On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng wrote: > > Hi, > > > > The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find > > different MEM_REFs sharing common part in addressing expression. If such > > MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient > > addressing expression during the RTL passes. > > The pass analyzes addressing expression in each MEM_REF to see if it can be > > formalized as follows: > > base:MEM_REF (T1, C1) > > offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) > > bitpos: C4 * BITS_PER_UNIT > > Then restructures it into below form: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), > > C1 + (C2 * C3) + C4) > > At last, rewrite the MEM_REFs if there are two or more sharing common > > (non-constant) part. > > The problem is it doesn't back trace T2. If T2 is recorded as a CAND_ADD in > > form of "T2' + C5", the MEM_REF should be restructure into: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)), > > C1 + (C2 * C3) + C4 + (C5 * C3)) > > > > The patch also includes a test case to illustrate the problem. > > > > Bootstrapped and tested on x86/x86_64/arm-a15, is it ok? > > This looks ok to me if Bill is ok with it. Sorry, I've been on vacation and haven't been checking in until now. I'll have a look at this tomorrow -- sounds good on the surface! Thanks, Bill > > Thanks, > Richard. > > > Thanks. > > bin > > > > 2013-09-02 Bin Cheng > > > > * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New. > > (restructure_reference): Call backtrace_base_for_ref. > > > > gcc/testsuite/ChangeLog > > 2013-09-02 Bin Cheng > > > > * gcc.dg/tree-ssa/slsr-39.c: New test. >
Re: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
On Mon, 2013-09-02 at 11:15 +0200, Richard Biener wrote: > On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng wrote: > > Hi, > > > > The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find > > different MEM_REFs sharing common part in addressing expression. If such > > MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient > > addressing expression during the RTL passes. > > The pass analyzes addressing expression in each MEM_REF to see if it can be > > formalized as follows: > > base:MEM_REF (T1, C1) > > offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) > > bitpos: C4 * BITS_PER_UNIT > > Then restructures it into below form: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), > > C1 + (C2 * C3) + C4) > > At last, rewrite the MEM_REFs if there are two or more sharing common > > (non-constant) part. > > The problem is it doesn't back trace T2. If T2 is recorded as a CAND_ADD in > > form of "T2' + C5", the MEM_REF should be restructure into: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)), > > C1 + (C2 * C3) + C4 + (C5 * C3)) > > > > The patch also includes a test case to illustrate the problem. > > > > Bootstrapped and tested on x86/x86_64/arm-a15, is it ok? > > This looks ok to me if Bill is ok with it. This is a good generalization and I'm fine with it. There are a few minor nits that should be corrected, outlined below. > > Thanks, > Richard. > > > Thanks. > > bin > > > > 2013-09-02 Bin Cheng > > > > * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New. > > (restructure_reference): Call backtrace_base_for_ref. > > > > gcc/testsuite/ChangeLog > > 2013-09-02 Bin Cheng > > > > * gcc.dg/tree-ssa/slsr-39.c: New test. > >>Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>=== >>--- >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>(revision 0) >>+++ >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>(revision 0) >>@@ -0,0 +1,26 @@ >>+/* Verify straight-line strength reduction for back-tracing >>+ CADN_ADD for T2 in: CAND_ADD >>+ >>+*PBASE:T1 >>+*POFFSET: MULT_EXPR (T2, C3) >>+*PINDEX: C1 + (C2 * C3) + C4 */ >>+ >>+/* { dg-do compile } */ >>+/* { dg-options "-O2 -fdump-tree-slsr" } */ >>+ >>+typedef int arr_2[50][50]; >>+ >>+void foo (arr_2 a2, int v1) >>+{ >>+ int i, j; >>+ >>+ i = v1 + 5; >>+ j = i; >>+ a2 [i] [j++] = i; >>+ a2 [i] [j++] = i; >>+ a2 [i] [i-1] += 1; >>+ return; >>+} >>+ >>+/* { dg-final { scan-tree-dump-times "MEM" 4 "slsr" } } */ >>+/* { dg-final { cleanup-tree-dump "slsr" } } */ >>Index: gcc/gimple-ssa-strength-reduction.c >>=== >>--- >>gcc/gimple-ssa-strength-reduction.c >>(revision 202067) >>+++ >>gcc/gimple-ssa-strength-reduction.c >>(working copy) >>@@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed) >> add_cand_for_stmt (phi, c); >> } >> >>+/* Given PBASE which is a pointer to tree, loop up the defining look up >>+ statement for it and check whether the candidate is in the >>+ form of: >>+ >>+ X = B + (1 * S), S is integer constant >>+ X = B + (i * S), S is integer one >>+ >>+ If so, set PBASE to the candiate's base_expr and return double candidate's >>+ int (i * S). >>+ Otherwise, just return double int zero. */ This is sufficient, since you are properly checking the next_interp chain. Another possible form would be X = (B + i) * 1, but if this is present, then one of the forms you're checking for should also be present, so there's no need to check the MULT_CANDs. >>+ >>+static double_int >>+backtrace_base_for_ref (tree *pbase) >>+{ >>+ tree base_in = *pbase; >>+ slsr_cand_t base_cand; >>+ >>+ STRIP_NOPS (base_in); >>+ if (TREE_CODE (base_in) != SSA_NAME) >>+return tree_to_double_int (integer_zero_node); >>+ >>+ base_cand = base_cand_from_table (base_in); >>+ >>+ while (base_cand && base_cand->kind != CAND_PHI) >>+{ >>+ if (base_cand->kind == CAND_ADD >>+ && base_cand->index.is_one () >>+ && TREE_CODE (base_cand->stride) == INTEGER_CST) >>+ { >>+ /* X = B + (1 * S), S is integer constant. */ >>+ *pbase = base_cand->base_expr; >>+ return tree_to_double_int (base_cand->stride); >>+ } >>+ else if (base_cand->kind == CAND_ADD >>+&& TREE_CODE (base_cand->stride) == INTEGER_CST >>+&& integer_onep (base_cand->stride)) >>+{ >>+ /* X = B + (i * S), S is integer one. */ >>+ *pbase = base_cand->base_expr; >>+ return base_cand->index; >>+ } >>+ >>+ if (base_cand->next_interp) >>+ base_cand = lookup_cand (base_cand->next_interp); >>+ else >>+ base_cand = NULL; >>+} >>+ >>+ return tree_to_double_int (integer_zero_node); >>+} >>+ >> /* Look for the following pattern: >> >> *PBASE:MEM_REF (T1, C1) >>@@ -767,8 +818,
RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
On Mon, 2013-09-09 at 14:25 +0800, bin.cheng wrote: > Thanks for reviewing, I will correct all stupid spelling problem in the next > version of patch. > > On Mon, Sep 9, 2013 at 8:15 AM, Bill Schmidt > wrote: > > > >>>+ int (i * S). > >>>+ Otherwise, just return double int zero. */ > > > > This is sufficient, since you are properly checking the next_interp > > chain. Another possible form would be > > > > X = (B + i) * 1, > > > > but if this is present, then one of the forms you're checking for should > > also be present, so there's no need to check the MULT_CANDs. > I'm not very sure here since I didn't check MULT_CAND in the patch. Could > you please explain more about this? Sorry, perhaps I shouldn't have mentioned it. I was simply stating that, although a candidate representing B + i could be represented with a CAND_MULT as shown, there is no need for you to check it (as you don't) since there will also be a corresponding CAND_ADD in one of the other forms. Since you are walking the next_interp chain, this works. In other words, the code is fine as is. I was just thinking out loud about other candidate types. > > > > >>>+ > >>>+static double_int > >>>+backtrace_base_for_ref (tree *pbase) > >>>+{ > >>>+ tree base_in = *pbase; > >>>+ slsr_cand_t base_cand; > >>>+ > >>>+ STRIP_NOPS (base_in); > >>>+ if (TREE_CODE (base_in) != SSA_NAME) > >>>+return tree_to_double_int (integer_zero_node); > >>>+ > >>>+ base_cand = base_cand_from_table (base_in); > >>>+ > >>>+ while (base_cand && base_cand->kind != CAND_PHI) > >>>+{ > >>>+ if (base_cand->kind == CAND_ADD > >>>+ && base_cand->index.is_one () > >>>+ && TREE_CODE (base_cand->stride) == INTEGER_CST) > >>>+ { > >>>+ /* X = B + (1 * S), S is integer constant. */ > >>>+ *pbase = base_cand->base_expr; > >>>+ return tree_to_double_int (base_cand->stride); > >>>+ } > >>>+ else if (base_cand->kind == CAND_ADD > >>>+&& TREE_CODE (base_cand->stride) == INTEGER_CST > >>>+&& integer_onep (base_cand->stride)) > >>>+{ > >>>+ /* X = B + (i * S), S is integer one. */ > >>>+ *pbase = base_cand->base_expr; > >>>+ return base_cand->index; > >>>+ } > >>>+ > >>>+ if (base_cand->next_interp) > >>>+ base_cand = lookup_cand (base_cand->next_interp); > >>>+ else > >>>+ base_cand = NULL; > >>>+} > >>>+ > >>>+ return tree_to_double_int (integer_zero_node); > >>>+} > >>>+ > >>> /* Look for the following pattern: > >>> > >>> *PBASE:MEM_REF (T1, C1) > >>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) > >>> > >>> *PBASE:T1 > >>> *POFFSET: MULT_EXPR (T2, C3) > >>>-*PINDEX: C1 + (C2 * C3) + C4 */ > >>>+*PINDEX: C1 + (C2 * C3) + C4 > >>> > >>>+ When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It > > ^ ^ > > a it > > > >>>+ will be further restructured to: > >>>+ > >>>+*PBASE:T1 > >>>+*POFFSET: MULT_EXPR (T2', C3) > >>>+*PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ > >>>+ > >>> static bool > >>> restructure_reference (tree *pbase, tree *poffset, double_int > > *pindex, > >>> tree *ptype) > >>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, > >>> double_int index = *pindex; > >>> double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); > >>> tree mult_op0, mult_op1, t1, t2, type; > >>>- double_int c1, c2, c3, c4; > >>>+ double_int c1, c2, c3, c4, c5; > >>> > >>> if (!base > >>> || !offset > >>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree > > *poffset, > >>> } > >>> > >>> c4 = index.udiv (bpu, FLOOR_DIV_EXPR); > >>>+ c5 = backtrace_base_for_ref (&t2); > >>> > >>> *pbase = t1; > >>>- *poffset = fold_build2 (MULT_EXPR, sizetype, t2, > >>>- double_int_to_tree (sizetype, c3)); > >>>- *pindex = c1 + c2 * c3 + c4; > >>>+ *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2), > >>>+ double_int_to_tree (sizetype, c3)); > > > > I am not sure why you changed this call. fold_build2 is a more > > efficient call than size_binop. size_binop makes several checks that > > will fail in this case, and then calls fold_build2_loc, right? Not a > > big deal but seems like changing it back would be better. Perhaps I'm > > missing something (as usual ;). > I rely on size_binop to convert T2 into sizetype, because T2' may be in other > kind of type. Otherwise there will be ssa_verify error later. OK, I see now. I had thought this was handled by fold_build2, but apparently not. I guess all T2's formerly handled were already sizetype as expected. Thanks for the explanation! Bill > > Thanks. > bin > > > >
RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
On Mon, 2013-09-09 at 10:20 -0500, Bill Schmidt wrote: > On Mon, 2013-09-09 at 14:25 +0800, bin.cheng wrote: > > Thanks for reviewing, I will correct all stupid spelling problem in the > > next version of patch. > > > > On Mon, Sep 9, 2013 at 8:15 AM, Bill Schmidt > > wrote: > > > > > >>>+ int (i * S). > > >>>+ Otherwise, just return double int zero. */ > > > > > > This is sufficient, since you are properly checking the next_interp > > > chain. Another possible form would be > > > > > > X = (B + i) * 1, > > > > > > but if this is present, then one of the forms you're checking for should > > > also be present, so there's no need to check the MULT_CANDs. > > I'm not very sure here since I didn't check MULT_CAND in the patch. Could > > you please explain more about this? > > Sorry, perhaps I shouldn't have mentioned it. I was simply stating > that, although a candidate representing B + i could be represented with > a CAND_MULT as shown, there is no need for you to check it (as you > don't) since there will also be a corresponding CAND_ADD in one of the > other forms. Since you are walking the next_interp chain, this works. > > In other words, the code is fine as is. I was just thinking out loud > about other candidate types. > > > > > > > > >>>+ > > >>>+static double_int > > >>>+backtrace_base_for_ref (tree *pbase) > > >>>+{ > > >>>+ tree base_in = *pbase; > > >>>+ slsr_cand_t base_cand; > > >>>+ > > >>>+ STRIP_NOPS (base_in); > > >>>+ if (TREE_CODE (base_in) != SSA_NAME) > > >>>+return tree_to_double_int (integer_zero_node); > > >>>+ > > >>>+ base_cand = base_cand_from_table (base_in); > > >>>+ > > >>>+ while (base_cand && base_cand->kind != CAND_PHI) > > >>>+{ > > >>>+ if (base_cand->kind == CAND_ADD > > >>>+ && base_cand->index.is_one () > > >>>+ && TREE_CODE (base_cand->stride) == INTEGER_CST) > > >>>+ { > > >>>+ /* X = B + (1 * S), S is integer constant. */ > > >>>+ *pbase = base_cand->base_expr; > > >>>+ return tree_to_double_int (base_cand->stride); > > >>>+ } > > >>>+ else if (base_cand->kind == CAND_ADD > > >>>+&& TREE_CODE (base_cand->stride) == INTEGER_CST > > >>>+&& integer_onep (base_cand->stride)) > > >>>+{ > > >>>+ /* X = B + (i * S), S is integer one. */ > > >>>+ *pbase = base_cand->base_expr; > > >>>+ return base_cand->index; > > >>>+ } > > >>>+ > > >>>+ if (base_cand->next_interp) > > >>>+ base_cand = lookup_cand (base_cand->next_interp); > > >>>+ else > > >>>+ base_cand = NULL; > > >>>+} > > >>>+ > > >>>+ return tree_to_double_int (integer_zero_node); > > >>>+} > > >>>+ > > >>> /* Look for the following pattern: > > >>> > > >>> *PBASE:MEM_REF (T1, C1) > > >>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) > > >>> > > >>> *PBASE:T1 > > >>> *POFFSET: MULT_EXPR (T2, C3) > > >>>-*PINDEX: C1 + (C2 * C3) + C4 */ > > >>>+*PINDEX: C1 + (C2 * C3) + C4 > > >>> > > >>>+ When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It > > > ^ ^ > > > a it > > > > > >>>+ will be further restructured to: > > >>>+ > > >>>+*PBASE:T1 > > >>>+*POFFSET: MULT_EXPR (T2', C3) > > >>>+*PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ > > >>>+ > > >>> static bool > > >>> restructure_reference (tree *pbase, tree *poffset, double_int > > > *pindex, > > >>> tree *ptype) > > >>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, > > >>> d
RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
On Tue, 2013-09-10 at 15:41 +0800, bin.cheng wrote: > On Mon, Sep 9, 2013 at 11:35 PM, Bill Schmidt > wrote: > > > >> > I rely on size_binop to convert T2 into sizetype, because T2' may be in > >> > other kind of type. Otherwise there will be ssa_verify error later. > >> > >> OK, I see now. I had thought this was handled by fold_build2, but > >> apparently not. I guess all T2's formerly handled were already sizetype > >> as expected. Thanks for the explanation! > > > > So, wouldn't it suffice to change t2 to fold_convert (sizetype, t2) in > > the argument list to fold_build2? It's picking nits, but that would be > > slightly more efficient. > > Hi Bill, > > This is the 2nd version of patch with your comments incorporated. > Bootstrap and re-test on x86. Re-test on ARM ongoing. Is it ok if tests > pass? Looks good to me! Thanks, Bin. Bill > > Thanks. > bin
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
On Wed, 2013-09-11 at 10:32 +0200, Richard Biener wrote: > On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang wrote: > > Hi, > > > > Following Bin's patch in > > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch tweaks > > backtrace_base_for_ref () to strip of any widening conversion after the > > first TREE_CODE check fails. Without this patch, the test > > (gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as > > backtrace_base_for_ref () will stop if not seeing an ssa_name since the tree > > code can be nop_expr instead. > > > > Regtested on arm and aarch64; still bootstrapping x86_64. > > > > OK for the trunk if the x86_64 bootstrap succeeds? > > Please add a testcase. Also, the comment "Strip of" should read "Strip off". Otherwise I have no comments. Thanks, Bill > > Richard. > > > Thanks, > > Yufeng > > > > gcc/ > > > > * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): Call > > get_unwidened and check 'base_in' again. >
Re: [PATCH, PowerPC] Fix PR57949 (ABI alignment issue)
On Wed, 2013-09-11 at 21:08 +0930, Alan Modra wrote: > On Wed, Aug 14, 2013 at 10:32:01AM -0500, Bill Schmidt wrote: > > This fixes a long-standing problem with GCC's implementation of the > > PPC64 ELF ABI. If a structure contains a member requiring 128-bit > > alignment, and that structure is passed as a parameter, the parameter > > currently receives only 64-bit alignment. This is an error, and is > > incompatible with correct code generated by the IBM XL compilers. > > This caused multiple failures in the libffi testsuite: > libffi.call/cls_align_longdouble.c > libffi.call/cls_align_longdouble_split.c > libffi.call/cls_align_longdouble_split2.c > libffi.call/nested_struct5.c > > Fixed by making the same alignment adjustment in libffi to structures > passed by value. Bill, I think your patch needs to go on all active > gcc branches as otherwise we'll need different versions of libffi for > the next gcc releases. Hm, the libffi case is unfortunate. :( The alternative is to leave libffi alone, and require code that calls these interfaces with "bad" structs passed by value to be built using -mcompat-align-parm, which was provided for such compatibility issues. Hopefully there is a small number of cases where this can happen, and this could be documented with libffi and gcc. What do you think? Thanks, Bill > > The following was bootstrapped and regression checked powerpc64-linux. > OK for mainline, and the 4.7 and 4.8 branches when/if Bill's patch > goes in there? > > * src/powerpc/ffi.c (ffi_prep_args64): Align FFI_TYPE_STRUCT. > (ffi_closure_helper_LINUX64): Likewise. > > Index: libffi/src/powerpc/ffi.c > === > --- libffi/src/powerpc/ffi.c (revision 202428) > +++ libffi/src/powerpc/ffi.c (working copy) > @@ -462,6 +462,7 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long > double **d; >} p_argv; >unsigned long gprvalue; > + unsigned long align; > >stacktop.c = (char *) stack + bytes; >gpr_base.ul = stacktop.ul - ASM_NEEDS_REGISTERS64 - > NUM_GPR_ARG_REGISTERS64; > @@ -532,6 +533,10 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long > #endif > > case FFI_TYPE_STRUCT: > + align = (*ptr)->alignment; > + if (align > 16) > + align = 16; > + next_arg.ul = ALIGN (next_arg.ul, align); > words = ((*ptr)->size + 7) / 8; > if (next_arg.ul >= gpr_base.ul && next_arg.ul + words > gpr_end.ul) > { > @@ -1349,6 +1354,7 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure, >long i, avn; >ffi_cif *cif; >ffi_dblfl *end_pfr = pfr + NUM_FPR_ARG_REGISTERS64; > + unsigned long align; > >cif = closure->cif; >avalue = alloca (cif->nargs * sizeof (void *)); > @@ -1399,6 +1405,10 @@ ffi_closure_helper_LINUX64 (ffi_closure *closure, > break; > > case FFI_TYPE_STRUCT: > + align = arg_types[i]->alignment; > + if (align > 16) > + align = 16; > + pst = ALIGN (pst, align); > #ifndef __LITTLE_ENDIAN__ > /* Structures with size less than eight bytes are passed >left-padded. */ > >
[PATCH, PowerPC] Change code generation for VSX loads and stores for little endian
This patch implements support for VSX vector loads and stores in little endian mode. VSX loads and stores permute the register image with respect to storage in a unique manner that is not truly little endian. This can cause problems (for example, when a vector appears in a union with a different type). This patch adds an explicit permute to each VSX load and store instruction so that the register image is true little endian. It is desirable to remove redundant pairs of permutes where legal to do so. This work is delayed to a later patch. This patch currently has no effect on generated code because -mvsx is disabled in little endian mode, pending fixes of additional problems with little endian code generation. I tested this by enabling -mvsx in little endian mode and running the regression bucket. Using a GCC code base from August 5, I observed that this patch corrected 187 failures and exposed a new regression. Investigation showed that the regression is not directly related to the patch. Unfortunately the results are not as good on current trunk. It appears we have introduced some more problems for little endian code generation since August 5th, which hides the effectiveness of the patch; most of the VSX vector tests still fail with the patch applied to current trunk. There are a handful of additional regressions, which again are not directly related to the patch. I feel that the patch is well-tested by the August 5 results, and would like to commit it before continuing to investigate the recently introduced problems. I also bootstrapped and tested the patch on a big-endian machine (powerpc64-unknown-linux-gnu) to verify that I introduced no regressions in that environment. Ok for trunk? Thanks, Bill gcc: 2013-09-30 Bill Schmidt * config/rs6000/vector.md (mov): Emit permuted move sequences for LE VSX loads and stores at expand time. * config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New prototype. * config/rs6000/rs6000.c (rs6000_const_vec): New. (rs6000_gen_le_vsx_permute): New. (rs6000_gen_le_vsx_load): New. (rs6000_gen_le_vsx_store): New. (rs6000_gen_le_vsx_move): New. * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New. (*vsx_le_perm_load_v4si): New. (*vsx_le_perm_load_v8hi): New. (*vsx_le_perm_load_v16qi): New. (*vsx_le_perm_store_v2di): New. (*vsx_le_perm_store_v4si): New. (*vsx_le_perm_store_v8hi): New. (*vsx_le_perm_store_v16qi): New. (*vsx_xxpermdi2_le_): New. (*vsx_xxpermdi4_le_): New. (*vsx_xxpermdi8_le_V8HI): New. (*vsx_xxpermdi16_le_V16QI): New. (*vsx_lxvd2x2_le_): New. (*vsx_lxvd2x4_le_): New. (*vsx_lxvd2x8_le_V8HI): New. (*vsx_lxvd2x16_le_V16QI): New. (*vsx_stxvd2x2_le_): New. (*vsx_stxvd2x4_le_): New. (*vsx_stxvd2x8_le_V8HI): New. (*vsx_stxvd2x16_le_V16QI): New. gcc/testsuite: 2013-09-30 Bill Schmidt * gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian. * gcc.target/powerpc/fusion.c: Likewise. Index: gcc/testsuite/gcc.target/powerpc/pr43154.c === --- gcc/testsuite/gcc.target/powerpc/pr43154.c (revision 203018) +++ gcc/testsuite/gcc.target/powerpc/pr43154.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-options "-O2 -mcpu=power7" } */ Index: gcc/testsuite/gcc.target/powerpc/fusion.c === --- gcc/testsuite/gcc.target/powerpc/fusion.c (revision 203018) +++ gcc/testsuite/gcc.target/powerpc/fusion.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */ Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 203018) +++ gcc/config/rs6000/vector.md (working copy) @@ -88,7 +88,8 @@ (smax "smax")]) -;; Vector move instructions. +;; Vector move instructions. Little-endian VSX loads and stores require +;; special handling to circumvent "element endianness." (define_expand "mov" [(set (match_operand:VEC_M 0 "nonimmediate_operand" "") (match_operand:VEC_M 1 "any_operand&q
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
On Tue, 2013-10-01 at 12:19 +0200, Richard Biener wrote: > On Wed, Sep 25, 2013 at 1:37 PM, Yufeng Zhang wrote: > > Hello, > > > > Please find the updated version of the patch in the attachment. It has > > addressed the previous comments and also included some changes in order to > > pass the bootstrapping on x86_64. > > > > It's also passed the regtest on arm-none-eabi and aarch64-none-elf. > > > > It will also fix the test failure as reported here: > > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html > > > > OK for the trunk? > > + where n is a 32-bit unsigned int and pointer are 64-bit long. In this > + case, the gimple for (n - 1) is: > + > + _2 = n_1(D) + 4294967295; // 0x > + > + and it is wrong to multiply the large constant by 4 in the 64-bit space. > */ > + > +static bool > +safe_to_multiply_p (tree type, double_int cst) > +{ > + if (TYPE_UNSIGNED (type) > + && ! double_int_fits_to_tree_p (signed_type_for (type), cst)) > +return false; > + > + return true; > +} > > This looks wrong. The only relevant check is as whether the > multiplication overflows the original type as you miss the implicit > truncation that happens. Which is something you don't know > unless you know the value. It definitely isn't a property of a type > and a constant but the property of two constants and a type. > Or the predicate has a wrong name. > > The use of get_unwidened in this core routine looks like this is > all happening in the wrong place and we should have picked up > another candidate for this instead? I'm sure Bill will know more here. I'm not happy with how this patch is progressing. Without having looked too deeply, this might be better handled earlier when determining which casts are safe to use in building candidates. What you have here seems more like closing the barn door after the horse got out. Maybe that's the only solution, but it doesn't seem likely. Another problem is that your test case isn't testing anything except that the compiler doesn't crash. That isn't sufficient as a regression test. I'll spend some time looking at this to see if I can find a better approach. It might be a day or two before I can get to it. In addition to the included test case, are there any other cases you've found that I should be concerned with? Thanks, Bill > > Richard. > > > > > Thanks, > > Yufeng > > > > > > gcc/ > > > > * gimple-ssa-strength-reduction.c (safe_to_multiply_p): New > > function. > > (backtrace_base_for_ref): Call get_unwidened, check 'base_in' > > again and set unwidend_p with true; call safe_to_multiply_p to avoid > > unsafe unwidened cases. > > > > gcc/testsuite/ > > > > * gcc.dg/tree-ssa/slsr-40.c: New test. > > > > > > > > > > On 09/11/13 13:39, Bill Schmidt wrote: > >> > >> On Wed, 2013-09-11 at 10:32 +0200, Richard Biener wrote: > >>> > >>> On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> Following Bin's patch in > >>>> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch > >>>> tweaks > >>>> backtrace_base_for_ref () to strip of any widening conversion after the > >>>> first TREE_CODE check fails. Without this patch, the test > >>>> (gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as > >>>> backtrace_base_for_ref () will stop if not seeing an ssa_name since the > >>>> tree > >>>> code can be nop_expr instead. > >>>> > >>>> Regtested on arm and aarch64; still bootstrapping x86_64. > >>>> > >>>> OK for the trunk if the x86_64 bootstrap succeeds? > >>> > >>> > >>> Please add a testcase. > >> > >> > >> Also, the comment "Strip of" should read "Strip off". Otherwise I have > >> no comments. > >> > >> Thanks, > >> Bill > >> > >>> > >>> Richard. >
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
On Tue, 2013-10-01 at 08:17 -0500, Bill Schmidt wrote: > On Tue, 2013-10-01 at 12:19 +0200, Richard Biener wrote: > > On Wed, Sep 25, 2013 at 1:37 PM, Yufeng Zhang wrote: > > > Hello, > > > > > > Please find the updated version of the patch in the attachment. It has > > > addressed the previous comments and also included some changes in order to > > > pass the bootstrapping on x86_64. > > > > > > It's also passed the regtest on arm-none-eabi and aarch64-none-elf. > > > > > > It will also fix the test failure as reported here: > > > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html > > > > > > OK for the trunk? > > > > + where n is a 32-bit unsigned int and pointer are 64-bit long. In this > > + case, the gimple for (n - 1) is: > > + > > + _2 = n_1(D) + 4294967295; // 0x > > + > > + and it is wrong to multiply the large constant by 4 in the 64-bit > > space. */ > > + > > +static bool > > +safe_to_multiply_p (tree type, double_int cst) > > +{ > > + if (TYPE_UNSIGNED (type) > > + && ! double_int_fits_to_tree_p (signed_type_for (type), cst)) > > +return false; > > + > > + return true; > > +} > > > > This looks wrong. The only relevant check is as whether the > > multiplication overflows the original type as you miss the implicit > > truncation that happens. Which is something you don't know > > unless you know the value. It definitely isn't a property of a type > > and a constant but the property of two constants and a type. > > Or the predicate has a wrong name. > > > > The use of get_unwidened in this core routine looks like this is > > all happening in the wrong place and we should have picked up > > another candidate for this instead? I'm sure Bill will know more here. > > I'm not happy with how this patch is progressing. Without having looked > too deeply, this might be better handled earlier when determining which > casts are safe to use in building candidates. What you have here seems > more like closing the barn door after the horse got out. Maybe that's > the only solution, but it doesn't seem likely. > > Another problem is that your test case isn't testing anything except > that the compiler doesn't crash. That isn't sufficient as a regression > test. Sorry, that was a pre-coffee comment. I would like also to see a test that verifies the expected gimple, though, not just that the test runs. > > I'll spend some time looking at this to see if I can find a better > approach. It might be a day or two before I can get to it. In addition > to the included test case, are there any other cases you've found that I > should be concerned with? > > Thanks, > Bill > > > > > Richard. > > > > > > > > > Thanks, > > > Yufeng > > > > > > > > > gcc/ > > > > > > * gimple-ssa-strength-reduction.c (safe_to_multiply_p): New > > > function. > > > (backtrace_base_for_ref): Call get_unwidened, check 'base_in' > > > again and set unwidend_p with true; call safe_to_multiply_p to > > > avoid > > > unsafe unwidened cases. > > > > > > gcc/testsuite/ > > > > > > * gcc.dg/tree-ssa/slsr-40.c: New test. > > > > > > > > > > > > > > > On 09/11/13 13:39, Bill Schmidt wrote: > > >> > > >> On Wed, 2013-09-11 at 10:32 +0200, Richard Biener wrote: > > >>> > > >>> On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang > > >>> wrote: > > >>>> > > >>>> Hi, > > >>>> > > >>>> Following Bin's patch in > > >>>> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch > > >>>> tweaks > > >>>> backtrace_base_for_ref () to strip of any widening conversion after the > > >>>> first TREE_CODE check fails. Without this patch, the test > > >>>> (gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as > > >>>> backtrace_base_for_ref () will stop if not seeing an ssa_name since the > > >>>> tree > > >>>> code can be nop_expr instead. > > >>>> > > >>>> Regtested on arm and aarch64; still bootstrapping x86_64. > > >>>> > > >>>> OK for the trunk if the x86_64 bootstrap succeeds? > > >>> > > >>> > > >>> Please add a testcase. > > >> > > >> > > >> Also, the comment "Strip of" should read "Strip off". Otherwise I have > > >> no comments. > > >> > > >> Thanks, > > >> Bill > > >> > > >>> > > >>> Richard. > >
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
On Tue, 2013-10-01 at 08:17 -0500, Bill Schmidt wrote: > On Tue, 2013-10-01 at 12:19 +0200, Richard Biener wrote: > > On Wed, Sep 25, 2013 at 1:37 PM, Yufeng Zhang wrote: > > > Hello, > > > > > > Please find the updated version of the patch in the attachment. It has > > > addressed the previous comments and also included some changes in order to > > > pass the bootstrapping on x86_64. > > > > > > It's also passed the regtest on arm-none-eabi and aarch64-none-elf. > > > > > > It will also fix the test failure as reported here: > > > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html > > > > > > OK for the trunk? > > > > + where n is a 32-bit unsigned int and pointer are 64-bit long. In this > > + case, the gimple for (n - 1) is: > > + > > + _2 = n_1(D) + 4294967295; // 0x > > + > > + and it is wrong to multiply the large constant by 4 in the 64-bit > > space. */ > > + > > +static bool > > +safe_to_multiply_p (tree type, double_int cst) > > +{ > > + if (TYPE_UNSIGNED (type) > > + && ! double_int_fits_to_tree_p (signed_type_for (type), cst)) > > +return false; > > + > > + return true; > > +} > > > > This looks wrong. The only relevant check is as whether the > > multiplication overflows the original type as you miss the implicit > > truncation that happens. Which is something you don't know > > unless you know the value. It definitely isn't a property of a type > > and a constant but the property of two constants and a type. > > Or the predicate has a wrong name. > > > > The use of get_unwidened in this core routine looks like this is > > all happening in the wrong place and we should have picked up > > another candidate for this instead? I'm sure Bill will know more here. > > I'm not happy with how this patch is progressing. Without having looked > too deeply, this might be better handled earlier when determining which > casts are safe to use in building candidates. What you have here seems > more like closing the barn door after the horse got out. Maybe that's > the only solution, but it doesn't seem likely. > > Another problem is that your test case isn't testing anything except > that the compiler doesn't crash. That isn't sufficient as a regression > test. > > I'll spend some time looking at this to see if I can find a better > approach. It might be a day or two before I can get to it. In addition > to the included test case, are there any other cases you've found that I > should be concerned with? To help me investigate this without having to build a cross compiler, could you please compile your test case (without the patch applied) using -fdump-tree-reassoc2 -fdump-tree-slsr-details and send me the generated dump files? Thanks, Bill > > Thanks, > Bill > > > > > Richard. > > > > > > > > > Thanks, > > > Yufeng > > > > > > > > > gcc/ > > > > > > * gimple-ssa-strength-reduction.c (safe_to_multiply_p): New > > > function. > > > (backtrace_base_for_ref): Call get_unwidened, check 'base_in' > > > again and set unwidend_p with true; call safe_to_multiply_p to > > > avoid > > > unsafe unwidened cases. > > > > > > gcc/testsuite/ > > > > > > * gcc.dg/tree-ssa/slsr-40.c: New test. > > > > > > > > > > > > > > > On 09/11/13 13:39, Bill Schmidt wrote: > > >> > > >> On Wed, 2013-09-11 at 10:32 +0200, Richard Biener wrote: > > >>> > > >>> On Tue, Sep 10, 2013 at 5:53 PM, Yufeng Zhang > > >>> wrote: > > >>>> > > >>>> Hi, > > >>>> > > >>>> Following Bin's patch in > > >>>> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00695.html, this patch > > >>>> tweaks > > >>>> backtrace_base_for_ref () to strip of any widening conversion after the > > >>>> first TREE_CODE check fails. Without this patch, the test > > >>>> (gcc.dg/tree-ssa/slsr-39.c) in Bin's patch will fail on AArch64, as > > >>>> backtrace_base_for_ref () will stop if not seeing an ssa_name since the > > >>>> tree > > >>>> code can be nop_expr instead. > > >>>> > > >>>> Regtested on arm and aarch64; still bootstrapping x86_64. > > >>>> > > >>>> OK for the trunk if the x86_64 bootstrap succeeds? > > >>> > > >>> > > >>> Please add a testcase. > > >> > > >> > > >> Also, the comment "Strip of" should read "Strip off". Otherwise I have > > >> no comments. > > >> > > >> Thanks, > > >> Bill > > >> > > >>> > > >>> Richard. > >
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
OK, thanks. The problem that you've encountered is that you are attempting to do something illegal. ;) (Bin's original patch is actually to blame for that, as well as me for not catching it then.) As your new test shows, it is unsafe to do the transformation in backtrace_base_for_ref when widening from an unsigned type, because the unsigned type has wrap semantics by default. (The actual test must be done on TYPE_OVERFLOW_WRAPS since this wrap semantics can be added or removed by compile option -- see the comments with legal_cast_p and legal_cast_p_1 later in the module.) You cannot in general prove that the transformation is allowable for a specific constant, because you don't know that what you're adding it to won't cause an overflow that's handled incorrectly. I believe the correct fix for the unsigned-overflow case is to fail backtrace_base_for_ref if legal_cast_p (in_type, out_type) returns false, where in_type is the type of the new *PBASE, and out_type is the widening type that you're looking through. So you can't just STRIP_NOPS, you have to check the cast for legitimacy for this transformation. This does not explain why backtrace_base_for_ref does not find all the opportunities on slsr-39.c. I don't immediately see what's preventing that. Note that the transformation is legal in that case because you are widening from a signed int to an unsigned int, which won't cause problems. You guys need to dig deeper into why those opportunities are missed when sizetype is larger than int. Let me know if you need help figuring it out. Thanks, Bill On Tue, 2013-10-01 at 16:06 +0100, Yufeng Zhang wrote: > Hi Bill, > > Thank you for the review and the offer to help. > > On 10/01/13 15:36, Bill Schmidt wrote: > > On Tue, 2013-10-01 at 08:17 -0500, Bill Schmidt wrote: > >> On Tue, 2013-10-01 at 12:19 +0200, Richard Biener wrote: > >>> On Wed, Sep 25, 2013 at 1:37 PM, Yufeng Zhang > >>> wrote: > >>>> Hello, > >>>> > >>>> Please find the updated version of the patch in the attachment. It has > >>>> addressed the previous comments and also included some changes in order > >>>> to > >>>> pass the bootstrapping on x86_64. > >>>> > >>>> It's also passed the regtest on arm-none-eabi and aarch64-none-elf. > >>>> > >>>> It will also fix the test failure as reported here: > >>>> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html > >>>> > >>>> OK for the trunk? > >>> > >>> + where n is a 32-bit unsigned int and pointer are 64-bit long. In this > >>> + case, the gimple for (n - 1) is: > >>> + > >>> + _2 = n_1(D) + 4294967295; // 0x > >>> + > >>> + and it is wrong to multiply the large constant by 4 in the 64-bit > >>> space. */ > >>> + > >>> +static bool > >>> +safe_to_multiply_p (tree type, double_int cst) > >>> +{ > >>> + if (TYPE_UNSIGNED (type) > >>> +&& ! double_int_fits_to_tree_p (signed_type_for (type), cst)) > >>> +return false; > >>> + > >>> + return true; > >>> +} > >>> > >>> This looks wrong. The only relevant check is as whether the > >>> multiplication overflows the original type as you miss the implicit > >>> truncation that happens. Which is something you don't know > >>> unless you know the value. It definitely isn't a property of a type > >>> and a constant but the property of two constants and a type. > >>> Or the predicate has a wrong name. > >>> > >>> The use of get_unwidened in this core routine looks like this is > >>> all happening in the wrong place and we should have picked up > >>> another candidate for this instead? I'm sure Bill will know more here. > >> > >> I'm not happy with how this patch is progressing. Without having looked > >> too deeply, this might be better handled earlier when determining which > >> casts are safe to use in building candidates. What you have here seems > >> more like closing the barn door after the horse got out. Maybe that's > >> the only solution, but it doesn't seem likely. > >> > >> Another problem is that your test case isn't testing anything except > >> that the compiler doesn't crash. That isn't sufficient as a regression > >> test. > >> > >> I'll spend some time looking at this to see if I c
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
On Tue, 2013-10-01 at 11:56 -0500, Bill Schmidt wrote: > OK, thanks. The problem that you've encountered is that you are > attempting to do something illegal. ;) (Bin's original patch is > actually to blame for that, as well as me for not catching it then.) > > As your new test shows, it is unsafe to do the transformation in > backtrace_base_for_ref when widening from an unsigned type, because the > unsigned type has wrap semantics by default. (The actual test must be > done on TYPE_OVERFLOW_WRAPS since this wrap semantics can be added or > removed by compile option -- see the comments with legal_cast_p and > legal_cast_p_1 later in the module.) > > You cannot in general prove that the transformation is allowable for a > specific constant, because you don't know that what you're adding it to > won't cause an overflow that's handled incorrectly. > > I believe the correct fix for the unsigned-overflow case is to fail > backtrace_base_for_ref if legal_cast_p (in_type, out_type) returns > false, where in_type is the type of the new *PBASE, and out_type is the > widening type that you're looking through. So you can't just > STRIP_NOPS, you have to check the cast for legitimacy for this > transformation. > > This does not explain why backtrace_base_for_ref does not find all the > opportunities on slsr-39.c. I don't immediately see what's preventing > that. Note that the transformation is legal in that case because you > are widening from a signed int to an unsigned int, which won't cause > problems. You guys need to dig deeper into why those opportunities are > missed when sizetype is larger than int. Let me know if you need help > figuring it out. Sorry, I had to leave before and wanted to get this response back to you in case I didn't get back soon. I've looked at this some more, and your general approach should work ok once you get the legal_cast_p check in place where you do the get_unwidened call now. Once you know you have a legal widening, you don't have to worry about the safe_to_multiply_p stuff. I.e., you don't need the last two chunks in the patch to backtrace_base_for_ref, and you don't need the unwidened_p variable. It should all fall out properly by just restricting your unwidening to legal casts. Thanks, Bill > > Thanks, > Bill > > On Tue, 2013-10-01 at 16:06 +0100, Yufeng Zhang wrote: > > Hi Bill, > > > > Thank you for the review and the offer to help. > > > > On 10/01/13 15:36, Bill Schmidt wrote: > > > On Tue, 2013-10-01 at 08:17 -0500, Bill Schmidt wrote: > > >> On Tue, 2013-10-01 at 12:19 +0200, Richard Biener wrote: > > >>> On Wed, Sep 25, 2013 at 1:37 PM, Yufeng Zhang > > >>> wrote: > > >>>> Hello, > > >>>> > > >>>> Please find the updated version of the patch in the attachment. It has > > >>>> addressed the previous comments and also included some changes in > > >>>> order to > > >>>> pass the bootstrapping on x86_64. > > >>>> > > >>>> It's also passed the regtest on arm-none-eabi and aarch64-none-elf. > > >>>> > > >>>> It will also fix the test failure as reported here: > > >>>> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html > > >>>> > > >>>> OK for the trunk? > > >>> > > >>> + where n is a 32-bit unsigned int and pointer are 64-bit long. In > > >>> this > > >>> + case, the gimple for (n - 1) is: > > >>> + > > >>> + _2 = n_1(D) + 4294967295; // 0x > > >>> + > > >>> + and it is wrong to multiply the large constant by 4 in the 64-bit > > >>> space. */ > > >>> + > > >>> +static bool > > >>> +safe_to_multiply_p (tree type, double_int cst) > > >>> +{ > > >>> + if (TYPE_UNSIGNED (type) > > >>> +&& ! double_int_fits_to_tree_p (signed_type_for (type), cst)) > > >>> +return false; > > >>> + > > >>> + return true; > > >>> +} > > >>> > > >>> This looks wrong. The only relevant check is as whether the > > >>> multiplication overflows the original type as you miss the implicit > > >>> truncation that happens. Which is something you don't know > > >>> unless you know the value. It definitely isn't a property of a type > > >>> and a constant but the pro
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
On Tue, 2013-10-01 at 23:57 +0100, Yufeng Zhang wrote: > On 10/01/13 20:55, Bill Schmidt wrote: > > > > > > On Tue, 2013-10-01 at 11:56 -0500, Bill Schmidt wrote: > >> OK, thanks. The problem that you've encountered is that you are > >> attempting to do something illegal. ;) (Bin's original patch is > >> actually to blame for that, as well as me for not catching it then.) > >> > >> As your new test shows, it is unsafe to do the transformation in > >> backtrace_base_for_ref when widening from an unsigned type, because the > >> unsigned type has wrap semantics by default. (The actual test must be > >> done on TYPE_OVERFLOW_WRAPS since this wrap semantics can be added or > >> removed by compile option -- see the comments with legal_cast_p and > >> legal_cast_p_1 later in the module.) > >> > >> You cannot in general prove that the transformation is allowable for a > >> specific constant, because you don't know that what you're adding it to > >> won't cause an overflow that's handled incorrectly. > >> > >> I believe the correct fix for the unsigned-overflow case is to fail > >> backtrace_base_for_ref if legal_cast_p (in_type, out_type) returns > >> false, where in_type is the type of the new *PBASE, and out_type is the > >> widening type that you're looking through. So you can't just > >> STRIP_NOPS, you have to check the cast for legitimacy for this > >> transformation. > >> > >> This does not explain why backtrace_base_for_ref does not find all the > >> opportunities on slsr-39.c. I don't immediately see what's preventing > >> that. Note that the transformation is legal in that case because you > >> are widening from a signed int to an unsigned int, which won't cause > >> problems. You guys need to dig deeper into why those opportunities are > >> missed when sizetype is larger than int. Let me know if you need help > >> figuring it out. > > > > Sorry, I had to leave before and wanted to get this response back to you > > in case I didn't get back soon. I've looked at this some more, and your > > general approach should work ok once you get the legal_cast_p check in > > place where you do the get_unwidened call now. Once you know you have a > > legal widening, you don't have to worry about the safe_to_multiply_p > > stuff. I.e., you don't need the last two chunks in the patch to > > backtrace_base_for_ref, and you don't need the unwidened_p variable. It > > should all fall out properly by just restricting your unwidening to > > legal casts. > > Many thanks for looking into the issue so promptly. I've updated the > patch; I have to use legal_cast_p_1 instead as the gimple node is no > longer available by then. > > Does the new patch look sane? Yes, much better. I'm happy with this approach. However, please restore the correct whitespace before the { at -786,7 +795,7. Thanks for fixing this up! Bill > > The regtest on aarch64 and bootstrapping on x86-64 are still running. > > Thanks, > Yufeng > > > gcc/ > > * gimple-ssa-strength-reduction.c (legal_cast_p_1): Forward > declaration. > (backtrace_base_for_ref): Call get_unwidened with 'base_in' if > 'base_in' represent a conversion and legal_cast_p_1 holds; set > 'base_in' with the returned value from get_unwidened. > > gcc/testsuite/ > > * gcc.dg/tree-ssa/slsr-40.c: New test.
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
On Tue, 2013-10-01 at 20:21 -0500, Bill Schmidt wrote: > On Tue, 2013-10-01 at 23:57 +0100, Yufeng Zhang wrote: > > On 10/01/13 20:55, Bill Schmidt wrote: > > > > > > > > > On Tue, 2013-10-01 at 11:56 -0500, Bill Schmidt wrote: > > >> OK, thanks. The problem that you've encountered is that you are > > >> attempting to do something illegal. ;) (Bin's original patch is > > >> actually to blame for that, as well as me for not catching it then.) > > >> > > >> As your new test shows, it is unsafe to do the transformation in > > >> backtrace_base_for_ref when widening from an unsigned type, because the > > >> unsigned type has wrap semantics by default. (The actual test must be > > >> done on TYPE_OVERFLOW_WRAPS since this wrap semantics can be added or > > >> removed by compile option -- see the comments with legal_cast_p and > > >> legal_cast_p_1 later in the module.) > > >> > > >> You cannot in general prove that the transformation is allowable for a > > >> specific constant, because you don't know that what you're adding it to > > >> won't cause an overflow that's handled incorrectly. > > >> > > >> I believe the correct fix for the unsigned-overflow case is to fail > > >> backtrace_base_for_ref if legal_cast_p (in_type, out_type) returns > > >> false, where in_type is the type of the new *PBASE, and out_type is the > > >> widening type that you're looking through. So you can't just > > >> STRIP_NOPS, you have to check the cast for legitimacy for this > > >> transformation. > > >> > > >> This does not explain why backtrace_base_for_ref does not find all the > > >> opportunities on slsr-39.c. I don't immediately see what's preventing > > >> that. Note that the transformation is legal in that case because you > > >> are widening from a signed int to an unsigned int, which won't cause > > >> problems. You guys need to dig deeper into why those opportunities are > > >> missed when sizetype is larger than int. Let me know if you need help > > >> figuring it out. > > > > > > Sorry, I had to leave before and wanted to get this response back to you > > > in case I didn't get back soon. I've looked at this some more, and your > > > general approach should work ok once you get the legal_cast_p check in > > > place where you do the get_unwidened call now. Once you know you have a > > > legal widening, you don't have to worry about the safe_to_multiply_p > > > stuff. I.e., you don't need the last two chunks in the patch to > > > backtrace_base_for_ref, and you don't need the unwidened_p variable. It > > > should all fall out properly by just restricting your unwidening to > > > legal casts. > > > > Many thanks for looking into the issue so promptly. I've updated the > > patch; I have to use legal_cast_p_1 instead as the gimple node is no > > longer available by then. > > > > Does the new patch look sane? > > Yes, much better. I'm happy with this approach. However, please > restore the correct whitespace before the { at -786,7 +795,7. > > Thanks for fixing this up! > > Bill (Just a reminder that I can't approve your patch; you need a maintainer for that. But it looks good to me.) Sometime when I get a moment I'm probably going to change this to handle the casting when the candidates are added to the table. I think we should look through the casts and distribute the multiply at that time. But for now what you have here is good. Thanks, Bill > > > > > The regtest on aarch64 and bootstrapping on x86-64 are still running. > > > > Thanks, > > Yufeng > > > > > > gcc/ > > > > * gimple-ssa-strength-reduction.c (legal_cast_p_1): Forward > > declaration. > > (backtrace_base_for_ref): Call get_unwidened with 'base_in' if > > 'base_in' represent a conversion and legal_cast_p_1 holds; set > > 'base_in' with the returned value from get_unwidened. > > > > gcc/testsuite/ > > > > * gcc.dg/tree-ssa/slsr-40.c: New test. >
[PATCH, committed] Fix PR55008
In straight-line strength reduction, a candidate expression of the form "(type1)x + (type2)x", where type1 and type2 are compatible, results in two interpretations of the candidate with different result types. Because the types are compatible, the first interpretation can appear to be a legal basis for the second, resulting in an invalid replacement. The obvious solution is to keep a statement from serving as its own basis. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions, committed as obvious. Thanks, Bill -- Bill Schmidt, Ph.D. IBM Advance Toolchain for PowerLinux IBM Linux Technology Center wschm...@linux.vnet.ibm.com wschm...@us.ibm.com gcc: 2012-10-22 Bill Schmidt PR tree-optimization/55008 * gimple-ssa-strength-reduction.c (find_basis_for_candidate): Don't allow a candidate to be a basis for itself under another interpretation. gcc/testsuite: 2012-10-22 Bill Schmidt PR tree-optimization/55008 * gcc.dg/tree-ssa/pr55008.c: New test. Index: gcc/testsuite/gcc.dg/tree-ssa/pr55008.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr55008.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr55008.c (revision 0) @@ -0,0 +1,17 @@ +/* This used to fail to compile; see PR55008. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -w" } */ + +typedef unsigned long long T; + +void f(void) +{ +int a, *p; + +T b = 6309343725; + +if(*p ? (b = 1) : 0) +if(b - (a = b /= 0) ? : (a + b)) +while(1); +} + Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 192691) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -366,6 +366,7 @@ find_basis_for_candidate (slsr_cand_t c) slsr_cand_t one_basis = chain->cand; if (one_basis->kind != c->kind + || one_basis->cand_stmt == c->cand_stmt || !operand_equal_p (one_basis->stride, c->stride, 0) || !types_compatible_p (one_basis->cand_type, c->cand_type) || !dominated_by_p (CDI_DOMINATORS,
Re: [PATCH GCC] Tweak gimple-ssa-strength-reduction.c:backtrace_base_for_ref () to cover different cases as seen on AArch64
On Wed, 2013-10-02 at 07:40 -0500, Bill Schmidt wrote: > On Tue, 2013-10-01 at 20:21 -0500, Bill Schmidt wrote: > > On Tue, 2013-10-01 at 23:57 +0100, Yufeng Zhang wrote: > > > On 10/01/13 20:55, Bill Schmidt wrote: > > > > > > > > > > > > On Tue, 2013-10-01 at 11:56 -0500, Bill Schmidt wrote: > > > >> OK, thanks. The problem that you've encountered is that you are > > > >> attempting to do something illegal. ;) (Bin's original patch is > > > >> actually to blame for that, as well as me for not catching it then.) > > > >> > > > >> As your new test shows, it is unsafe to do the transformation in > > > >> backtrace_base_for_ref when widening from an unsigned type, because the > > > >> unsigned type has wrap semantics by default. (The actual test must be > > > >> done on TYPE_OVERFLOW_WRAPS since this wrap semantics can be added or > > > >> removed by compile option -- see the comments with legal_cast_p and > > > >> legal_cast_p_1 later in the module.) > > > >> > > > >> You cannot in general prove that the transformation is allowable for a > > > >> specific constant, because you don't know that what you're adding it to > > > >> won't cause an overflow that's handled incorrectly. > > > >> > > > >> I believe the correct fix for the unsigned-overflow case is to fail > > > >> backtrace_base_for_ref if legal_cast_p (in_type, out_type) returns > > > >> false, where in_type is the type of the new *PBASE, and out_type is the > > > >> widening type that you're looking through. So you can't just > > > >> STRIP_NOPS, you have to check the cast for legitimacy for this > > > >> transformation. > > > >> > > > >> This does not explain why backtrace_base_for_ref does not find all the > > > >> opportunities on slsr-39.c. I don't immediately see what's preventing > > > >> that. Note that the transformation is legal in that case because you > > > >> are widening from a signed int to an unsigned int, which won't cause > > > >> problems. You guys need to dig deeper into why those opportunities are > > > >> missed when sizetype is larger than int. Let me know if you need help > > > >> figuring it out. > > > > > > > > Sorry, I had to leave before and wanted to get this response back to you > > > > in case I didn't get back soon. I've looked at this some more, and your > > > > general approach should work ok once you get the legal_cast_p check in > > > > place where you do the get_unwidened call now. Once you know you have a > > > > legal widening, you don't have to worry about the safe_to_multiply_p > > > > stuff. I.e., you don't need the last two chunks in the patch to > > > > backtrace_base_for_ref, and you don't need the unwidened_p variable. It > > > > should all fall out properly by just restricting your unwidening to > > > > legal casts. > > > > > > Many thanks for looking into the issue so promptly. I've updated the > > > patch; I have to use legal_cast_p_1 instead as the gimple node is no > > > longer available by then. > > > > > > Does the new patch look sane? > > > > Yes, much better. I'm happy with this approach. However, please > > restore the correct whitespace before the { at -786,7 +795,7. > > > > Thanks for fixing this up! > > > > Bill > > (Just a reminder that I can't approve your patch; you need a maintainer > for that. But it looks good to me.) > > Sometime when I get a moment I'm probably going to change this to handle > the casting when the candidates are added to the table. I think we > should look through the casts and distribute the multiply at that time. > But for now what you have here is good. FYI, I looked at this a little more this afternoon, and convinced myself that your approach is the right one. This is already representing everything pertinent in the candidate table. Thanks again for adding these extensions. Bill > > Thanks, > Bill > > > > > > > > > The regtest on aarch64 and bootstrapping on x86-64 are still running. > > > > > > Thanks, > > > Yufeng > > > > > > > > > gcc/ > > > > > > * gimple-ssa-strength-reduction.c (legal_cast_p_1): Forward > > > declaration. > > > (backtrace_base_for_ref): Call get_unwidened with 'base_in' if > > > 'base_in' represent a conversion and legal_cast_p_1 holds; set > > > 'base_in' with the returned value from get_unwidened. > > > > > > gcc/testsuite/ > > > > > > * gcc.dg/tree-ssa/slsr-40.c: New test. > >
[PATCH, rs6000] Correct vector permute for little endian
This patch corrects the expansion of vec_perm_constv16qi for powerpc64le. The explanation of the problem with a detailed example appears in the commentary, as this corrects for what I found to be surprising behavior in the implementation of the vperm instruction, and I don't want any of us to spend time figuring that out again. (We may want to add a programming note in the next version of the ISA.) This corrects 18 failing tests in the test suite for the powerpc64le target, without affecting the big-endian targets. Bootstrapped and tested with no new regressions on powerpc64le-unknown-linux-gnu and powerpc64-unknown-linux-gnu. Ok for trunk? Thanks, Bill 2013-10-06 Bill Schmidt * config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New. (altivec_expand_vec_perm_const): Call it. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 203018) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28426,6 +28526,88 @@ rs6000_emit_parity (rtx dst, rtx src) } } +/* Expand an Altivec constant permutation for little endian mode. + There are two issues: First, the two input operands must be + swapped so that together they form a double-wide array in LE + order. Second, the vperm instruction has surprising behavior + in LE mode: it interprets the elements of the source vectors + in BE mode ("left to right") and interprets the elements of + the destination vector in LE mode ("right to left"). To + correct for this, we must subtract each element of the permute + control vector from 31. + + For example, suppose we want to concatenate vr10 = {0, 1, 2, 3} + with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm. + We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to + serve as the permute control vector. Then, in BE mode, + + vperm 9,10,11,12 + + places the desired result in vr9. However, in LE mode the + vector contents will be + + vr10 = 0003 0002 0001 + vr11 = 0007 0006 0005 0004 + + The result of the vperm using the same permute control vector is + + vr9 = 0500 0700 0100 0300 + + That is, the leftmost 4 bytes of vr10 are interpreted as the + source for the rightmost 4 bytes of vr9, and so on. + + If we change the permute control vector to + + vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4} + + and issue + + vperm 9,11,10,12 + + we get the desired + + vr9 = 0006 0004 0002 . */ + +void +altivec_expand_vec_perm_const_le (rtx operands[4]) +{ + unsigned int i; + rtx perm[16]; + rtx constv, unspec; + rtx target = operands[0]; + rtx op0 = operands[1]; + rtx op1 = operands[2]; + rtx sel = operands[3]; + + /* Unpack and adjust the constant selector. */ + for (i = 0; i < 16; ++i) +{ + rtx e = XVECEXP (sel, 0, i); + unsigned int elt = 31 - (INTVAL (e) & 31); + perm[i] = GEN_INT (elt); +} + + /* Expand to a permute, swapping the inputs and using the + adjusted selector. */ + if (!REG_P (op0)) +op0 = force_reg (V16QImode, op0); + if (!REG_P (op1)) +op1 = force_reg (V16QImode, op1); + + constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm)); + constv = force_reg (V16QImode, constv); + unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv), + UNSPEC_VPERM); + if (!REG_P (target)) +{ + rtx tmp = gen_reg_rtx (V16QImode); + emit_move_insn (tmp, unspec); + unspec = tmp; +} + + emit_move_insn (target, unspec); +} + /* Expand an Altivec constant permutation. Return true if we match an efficient implementation; false to fall back to VPERM. */ @@ -28606,6 +28788,12 @@ altivec_expand_vec_perm_const (rtx operands[4]) } } + if (!BYTES_BIG_ENDIAN) +{ + altivec_expand_vec_perm_const_le (operands); + return true; +} + return false; }
[PATCH, rs6000] Fix variable permute control vectors for little endian
Hi, This is a follow-up to the recent patch that fixed constant permute control vectors for little endian. When the control vector is constant, we can adjust the constant and use a vperm without increasing code size. When the control vector is unknown, however, we have to generate two additional instructions to subtract each element of the control vector from 31 (equivalently, from -1, since only 5 bits are pertinent). This patch adds the additional code generation. There are two main paths to the affected permutes: via the known pattern vec_perm, and via an altivec builtin. The builtin path causes a little difficulty because there's no way to dispatch a builtin to two different insns for BE and LE. I solved this by adding two new unspecs for the builtins (UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X). The insns for the builtins are changed from a define_insn to a define_insn_and_split. We create the _X forms at expand time and later split them into the correct sequences for BE and LE, using the "real" UNSPEC_VPERM and UNSPEC_VPERM_UNS to generate the vperm instruction. For the path via the known pattern, I added a new routine in rs6000.c in similar fashion to the solution for the constant control vector case. When the permute control vector is a rotate vector loaded by lvsl or lvsr, we can generate the desired control vector more cheaply by simply changing to use the opposite instruction. We are already doing that when expanding an unaligned load. The changes in vector.md avoid undoing that effort by circumventing the subtract-from-splat (going straight to the UNSPEC_VPERM). I bootstrapped and tested this for big endian on powerpc64-unknown-linux-gnu with no new regressions. I did the same for little endian on powerpc64le-unknown-linux-gnu. Here the results were slightly mixed: the changes fix 32 test failures, but expose an unrelated bug in 9 others when -mvsx is permitted on LE (not currently allowed). The bug is a missing permute for a vector load in the unaligned vector load logic that will be fixed in a subsequent patch. Is this okay for trunk? Thanks, Bill 2013-10-09 Bill Schmidt * config/rs6000/vector.md (vec_realign_load): Generate vperm directly to circumvent subtract from splat{31} workaround. * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New prototype. * config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New. * config/rs6000/altivec.md (define_c_enum "unspec"): Add UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X. (altivec_vperm_): Convert to define_insn_and_split to separate big and little endian logic. (*altivec_vperm__internal): New define_insn. (altivec_vperm__uns): Convert to define_insn_and_split to separate big and little endian logic. (*altivec_vperm__uns_internal): New define_insn. (vec_permv16qi): Add little endian logic. Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 203246) +++ gcc/config/rs6000/vector.md (working copy) @@ -950,8 +950,15 @@ emit_insn (gen_altivec_vperm_ (operands[0], operands[1], operands[2], operands[3])); else -emit_insn (gen_altivec_vperm_ (operands[0], operands[2], -operands[1], operands[3])); +{ + /* Avoid the "subtract from splat31" workaround for vperm since + we have changed lvsr to lvsl instead. */ + rtx unspec = gen_rtx_UNSPEC (mode, + gen_rtvec (3, operands[2], + operands[1], operands[3]), + UNSPEC_VPERM); + emit_move_insn (operands[0], unspec); +} DONE; }) Index: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 203246) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -56,6 +56,7 @@ extern void paired_expand_vector_init (rtx, rtx); extern void rs6000_expand_vector_set (rtx, rtx, int); extern void rs6000_expand_vector_extract (rtx, rtx, int); extern bool altivec_expand_vec_perm_const (rtx op[4]); +extern void altivec_expand_vec_perm_le (rtx op[4]); extern bool rs6000_expand_vec_perm_const (rtx op[4]); extern void rs6000_expand_extract_even (rtx, rtx, rtx); extern void rs6000_expand_interleave (rtx, rtx, rtx, bool); Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 203247) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28608,6 +28608,54 @@ altivec_expand_vec_perm_const_le (rtx operands[4]) emit_move_insn (target, unspec); } +/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the + permute control vector. But h
[PATCH, rs6000] Handle missing permute splits for V2DF/V4SF in little endian
Hi, In my previous patch to split LE VSX loads and stores to introduce permutes, I managed to miss the vector float modes. This patch corrects the oversight, fixing up a few more test failures. Bootstrapped and tested on both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu with no regressions. Ok for trunk? Thanks, Bill 2013-10-11 Bill Schmidt * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): Generalize to handle vector float as well. (*vsx_le_perm_load_v4si): Likewise. (*vsx_le_perm_store_v2di): Likewise. (*vsx_le_perm_store_v4si): Likewise. Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 203246) +++ gcc/config/rs6000/vsx.md(working copy) @@ -219,18 +219,18 @@ ;; The patterns for LE permuted loads and stores come before the general ;; VSX moves so they match first. -(define_insn_and_split "*vsx_le_perm_load_v2di" - [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa") -(match_operand:V2DI 1 "memory_operand" "Z"))] +(define_insn_and_split "*vsx_le_perm_load_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") +(match_operand:VSX_D 1 "memory_operand" "Z"))] "!BYTES_BIG_ENDIAN && TARGET_VSX" "#" "!BYTES_BIG_ENDIAN && TARGET_VSX" [(set (match_dup 2) -(vec_select:V2DI +(vec_select: (match_dup 1) (parallel [(const_int 1) (const_int 0)]))) (set (match_dup 0) -(vec_select:V2DI +(vec_select: (match_dup 2) (parallel [(const_int 1) (const_int 0)])))] " @@ -242,19 +242,19 @@ [(set_attr "type" "vecload") (set_attr "length" "8")]) -(define_insn_and_split "*vsx_le_perm_load_v4si" - [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa") -(match_operand:V4SI 1 "memory_operand" "Z"))] +(define_insn_and_split "*vsx_le_perm_load_" + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") +(match_operand:VSX_W 1 "memory_operand" "Z"))] "!BYTES_BIG_ENDIAN && TARGET_VSX" "#" "!BYTES_BIG_ENDIAN && TARGET_VSX" [(set (match_dup 2) -(vec_select:V4SI +(vec_select: (match_dup 1) (parallel [(const_int 2) (const_int 3) (const_int 0) (const_int 1)]))) (set (match_dup 0) -(vec_select:V4SI +(vec_select: (match_dup 2) (parallel [(const_int 2) (const_int 3) (const_int 0) (const_int 1)])))] @@ -333,18 +333,18 @@ [(set_attr "type" "vecload") (set_attr "length" "8")]) -(define_insn_and_split "*vsx_le_perm_store_v2di" - [(set (match_operand:V2DI 0 "memory_operand" "=Z") -(match_operand:V2DI 1 "vsx_register_operand" "+wa"))] +(define_insn_and_split "*vsx_le_perm_store_" + [(set (match_operand:VSX_D 0 "memory_operand" "=Z") +(match_operand:VSX_D 1 "vsx_register_operand" "+wa"))] "!BYTES_BIG_ENDIAN && TARGET_VSX" "#" "!BYTES_BIG_ENDIAN && TARGET_VSX" [(set (match_dup 2) -(vec_select:V2DI +(vec_select: (match_dup 1) (parallel [(const_int 1) (const_int 0)]))) (set (match_dup 0) -(vec_select:V2DI +(vec_select: (match_dup 2) (parallel [(const_int 1) (const_int 0)])))] " @@ -356,19 +356,19 @@ [(set_attr "type" "vecstore") (set_attr "length" "8")]) -(define_insn_and_split "*vsx_le_perm_store_v4si" - [(set (match_operand:V4SI 0 "memory_operand" "=Z") -(match_operand:V4SI 1 "vsx_register_operand" "+wa"))] +(define_insn_and_split "*vsx_le_perm_store_" + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") +(match_operand:VSX_W 1 "vsx_register_operand" "+wa"))] "!BYTES_BIG_ENDIAN && TARGET_VSX" "#" "!BYTES_BIG_ENDIAN && TARGET_VSX" [(set (match_dup 2) -(vec_select:V4SI +(vec_select: (match_dup 1) (parallel [(const_int 2) (const_int 3) (const_int 0) (const_int 1)]))) (set (match_dup 0) -(vec_select:V4SI +(vec_select: (match_dup 2) (parallel [(const_int 2) (const_int 3) (const_int 0) (const_int 1)])))]
[PATCH, rs6000] Fix vsx_concat_ insns for little endian
Simple patch to reverse the order of the input operands when concatenating for little endian code generation. Bootstrapped and tested on powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu with no regressions. Fixes two tests in the testsuite for the latter. Ok for trunk? Thanks, Bill 2013-10-15 Bill Schmidt * config/rs6000/vsx.md (vsx_concat_): Adjust output for LE. (vsx_concat_v2sf): Likewise. Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 203508) +++ gcc/config/rs6000/vsx.md(working copy) @@ -1194,7 +1194,12 @@ (match_operand: 1 "vsx_register_operand" "ws,wa") (match_operand: 2 "vsx_register_operand" "ws,wa")))] "VECTOR_MEM_VSX_P (mode)" - "xxpermdi %x0,%x1,%x2,0" +{ + if (BYTES_BIG_ENDIAN) +return "xxpermdi %x0,%x1,%x2,0"; + else +return "xxpermdi %x0,%x2,%x1,0"; +} [(set_attr "type" "vecperm")]) ;; Special purpose concat using xxpermdi to glue two single precision values @@ -1207,7 +1212,12 @@ (match_operand:SF 2 "vsx_register_operand" "f,f")] UNSPEC_VSX_CONCAT))] "VECTOR_MEM_VSX_P (V2DFmode)" - "xxpermdi %x0,%x1,%x2,0" +{ + if (BYTES_BIG_ENDIAN) +return "xxpermdi %x0,%x1,%x2,0"; + else +return "xxpermdi %x0,%x2,%x1,0"; +} [(set_attr "type" "vecperm")]) ;; xxpermdi for little endian loads and stores. We need several of
[PATCH, rs6000] Fix vsx_unpack expansions for endianness
For vector unpack operations, the meaning of "high" and "low" is reversed for little endian. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. This fixes one test case for little endian (gcc.dg/vect/vect-122.c). Ok for trunk? Thanks, Bill 2013-10-16 Bill Schmidt * gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for endianness. (vec_unpacks_lo_v4sf): Likewise. (vec_unpacks_float_hi_v4si): Likewise. (vec_unpacks_float_lo_v4si): Likewise. (vec_unpacku_float_hi_v4si): Likewise. (vec_unpacku_float_lo_v4si): Likewise. Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 203508) +++ gcc/config/rs6000/vector.md (working copy) @@ -872,7 +872,7 @@ { rtx reg = gen_reg_rtx (V4SFmode); - rs6000_expand_interleave (reg, operands[1], operands[1], true); + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvspdp (operands[0], reg)); DONE; }) @@ -884,7 +884,7 @@ { rtx reg = gen_reg_rtx (V4SFmode); - rs6000_expand_interleave (reg, operands[1], operands[1], false); + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvspdp (operands[0], reg)); DONE; }) @@ -896,7 +896,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], true); + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg)); DONE; }) @@ -908,7 +908,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], false); + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg)); DONE; }) @@ -920,7 +920,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], true); + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg)); DONE; }) @@ -932,7 +932,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], false); + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg)); DONE; })
[PATCH, rs6000] Don't convert a vector constant load into a splat illegally
Hi, In little endian mode, we managed to convert a load of the V4SI vector {3, 3, 3, 7} into a vspltisw of 3, apparently taking offense at the number 7. It turns out we only looked at the first N-1 elements of an N-element vector in little endian mode, and verified the zeroth element twice. Adjusting the loop boundaries fixes the problem. Currently bootstrapping for powerpc64{,le}-unknown-linux-gnu. Ok to commit to trunk if no regressions? Thanks, Bill 2013-10-18 Bill Schmidt * config/rs6000/rs6000.c (vspltis_constant): Make sure we check all elements for both endian flavors. Index: gcc/config/rs6000/rs6000.c === @@ -4932,6 +4932,8 @@ vspltis_constant (rtx op, unsigned step, unsigned unsigned nunits; unsigned bitsize; unsigned mask; + unsigned start; + unsigned end; HOST_WIDE_INT val; HOST_WIDE_INT splat_val; @@ -4981,7 +4983,10 @@ vspltis_constant (rtx op, unsigned step, unsigned /* Check if VAL is present in every STEP-th element, and the other elements are filled with its most significant bit. */ - for (i = 0; i < nunits - 1; ++i) + start = BYTES_BIG_ENDIAN ? 0 : 1; + end = BYTES_BIG_ENDIAN ? nunits - 1 : nunits; + + for (i = start; i < end; ++i) { HOST_WIDE_INT desired_val; if (((BYTES_BIG_ENDIAN ? i + 1 : i) & (step - 1)) == 0)
Re: [PATCH, rs6000] Don't convert a vector constant load into a splat illegally
On Fri, 2013-10-18 at 00:34 -0400, David Edelsohn wrote: > On Thu, Oct 17, 2013 at 10:43 PM, Bill Schmidt > wrote: > > Hi, > > > > In little endian mode, we managed to convert a load of the V4SI vector > > {3, 3, 3, 7} into a vspltisw of 3, apparently taking offense at the > > number 7. It turns out we only looked at the first N-1 elements of an > > N-element vector in little endian mode, and verified the zeroth element > > twice. Adjusting the loop boundaries fixes the problem. > > > > Currently bootstrapping for powerpc64{,le}-unknown-linux-gnu. Ok to > > commit to trunk if no regressions? > > This patch does not make sense. It biases the loop bounds based on > BYTES_BIG_ENDIAN, but the body of the loop biases the index variable > based on BYTES_BIG_ENDIAN in one of the two uses. The changes seem to > compute the same value for the first use of index "i" for both > BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN in a convoluted way. It looks > like something should be able to be simplified. > > Thanks, David > It seems like it, but I haven't been able to figure out anything nicer. The thing that makes this weird is that "i" is referring to different elements of the array depending on Big or Little Endian numbering. Suppose you have a V16QI: {0 15 0 15 0 15 0 15 0 15 0 15 0 15 0 15} with step = 2. Then for BE, the 15's are in odd-numbered elements (starting with zero from the left), but for LE they are in even-numbered elements (starting with zero from the right). For both cases, we found the candidate value in the rightmost element (nunits - 1 for BE, 0 for LE). The first two iterations for BE process the two leftmost elements. For i = 0, we find (i+1)&1 to be nonzero, so the desired value is msb_val = 0. For i = 1, we find (i+1)&1 to be zero, so the desired value is val = 15, and so on. The first two iterations for LE process the second and third elements from the right. For i = 1, we find i&1 to be nonzero, so the desired value is 0. For i = 2, we find i&2 to be zero, so the desired value is 15. So even though the first iteration calculates the same value for both endian modes, "i" means something different in each case. I guess we could make a "simplification" as follows: /* Check if VAL is present in every STEP-th element, and the other elements are filled with its most significant bit. */ start = BYTES_BIG_ENDIAN ? 0 : 1; end = BYTES_BIG_ENDIAN ? nunits - 1 : nunits; for (i = start; i < end; ++i) { HOST_WIDE_INT desired_val; if (((i + (1 - start)) & (step - 1)) == 0) desired_val = val; else desired_val = msb_val; but that may be as hard to understand as what's there now... An alternative is to just change the loop bounds to for (i = 0; i < nunits; ++i) which will reprocess the candidate element for both endian modes, and leave everything else alone. That would probably actually be faster than all the convoluted nonsense to avoid the reprocessing. Want me to go in that direction instead? Thanks, Bill
Re: [PATCH, rs6000] Don't convert a vector constant load into a splat illegally
Just a quick note that overnight testing of the posted patch was clean. Recap: There are three options on the table: - the posted patch - that patch with the (1 - start) change - replace nunits - 1 with nunits as the loop upper bound I'm happy to implement any of them, as you prefer. I lean towards the last as the most easily understood. Thanks, Bill On Fri, 2013-10-18 at 00:09 -0500, Bill Schmidt wrote: > > On Fri, 2013-10-18 at 00:34 -0400, David Edelsohn wrote: > > On Thu, Oct 17, 2013 at 10:43 PM, Bill Schmidt > > wrote: > > > Hi, > > > > > > In little endian mode, we managed to convert a load of the V4SI vector > > > {3, 3, 3, 7} into a vspltisw of 3, apparently taking offense at the > > > number 7. It turns out we only looked at the first N-1 elements of an > > > N-element vector in little endian mode, and verified the zeroth element > > > twice. Adjusting the loop boundaries fixes the problem. > > > > > > Currently bootstrapping for powerpc64{,le}-unknown-linux-gnu. Ok to > > > commit to trunk if no regressions? > > > > This patch does not make sense. It biases the loop bounds based on > > BYTES_BIG_ENDIAN, but the body of the loop biases the index variable > > based on BYTES_BIG_ENDIAN in one of the two uses. The changes seem to > > compute the same value for the first use of index "i" for both > > BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN in a convoluted way. It looks > > like something should be able to be simplified. > > > > Thanks, David > > > > It seems like it, but I haven't been able to figure out anything nicer. > The thing that makes this weird is that "i" is referring to different > elements of the array depending on Big or Little Endian numbering. > > Suppose you have a V16QI: {0 15 0 15 0 15 0 15 0 15 0 15 0 15 0 15} with > step = 2. Then for BE, the 15's are in odd-numbered elements (starting > with zero from the left), but for LE they are in even-numbered elements > (starting with zero from the right). For both cases, we found the > candidate value in the rightmost element (nunits - 1 for BE, 0 for LE). > > The first two iterations for BE process the two leftmost elements. For > i = 0, we find (i+1)&1 to be nonzero, so the desired value is msb_val = > 0. For i = 1, we find (i+1)&1 to be zero, so the desired value is val = > 15, and so on. > > The first two iterations for LE process the second and third elements > from the right. For i = 1, we find i&1 to be nonzero, so the desired > value is 0. For i = 2, we find i&2 to be zero, so the desired value is > 15. > > So even though the first iteration calculates the same value for both > endian modes, "i" means something different in each case. > > I guess we could make a "simplification" as follows: > > /* Check if VAL is present in every STEP-th element, and the > > other elements are filled with its most significant bit. */ > start = BYTES_BIG_ENDIAN ? 0 : 1; > end = BYTES_BIG_ENDIAN ? nunits - 1 : nunits; > > for (i = start; i < end; ++i) > { > HOST_WIDE_INT desired_val; > if (((i + (1 - start)) & (step - 1)) == 0) > desired_val = val; > else > desired_val = msb_val; > > but that may be as hard to understand as what's there now... > > An alternative is to just change the loop bounds to > > for (i = 0; i < nunits; ++i) > > which will reprocess the candidate element for both endian modes, and > leave everything else alone. That would probably actually be faster > than all the convoluted nonsense to avoid the reprocessing. Want me to > go in that direction instead? > > Thanks, > Bill >
[PATCH, rs6000] Adjust vec_unpacku patterns for little endian
Hi, For little endian, the permute control vector for unpacking high and low halves of a vector register must be reversed from the one used for big endian. Fixing this corrects 27 failing tests for powerpc64le-unknown-linux-gnu. Bootstrapped and tested for powerpc64{,le}-unknown-linux-gnu with no new regressions. Is this ok for trunk? Thanks, Bill 2013-10-19 Bill Schmidt * altivec.md (vec_unpacku_hi_v16qi): Adjust for little endian. (vec_unpacku_hi_v8hi): Likewise. (vec_unpacku_lo_v16qi): Likewise. (vec_unpacku_lo_v8hi): Likewise. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 203792) +++ gcc/config/rs6000/altivec.md(working copy) @@ -2035,25 +2035,26 @@ rtx vzero = gen_reg_rtx (V8HImode); rtx mask = gen_reg_rtx (V16QImode); rtvec v = rtvec_alloc (16); + bool be = BYTES_BIG_ENDIAN; emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 0); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 2); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 4); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 6); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7); + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); @@ -2070,25 +2071,26 @@ rtx vzero = gen_reg_rtx (V4SImode); rtx mask = gen_reg_rtx (V16QImode); rtvec v = rtvec_alloc (16); + bool be = BYTES_BIG_ENDIAN; emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 0); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 2); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 4); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 6); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7); + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 6); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 0 : 17); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 4); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 2 : 17); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 2); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 4 : 17); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT
[PATCH, rs6000] Be careful with special permute masks for little endian
Hi, In altivec_expand_vec_perm_const, we look for special masks that match the behavior of specific instructions, so we can use those instructions rather than load a constant control vector and perform a permute. Some of the masks must be treated differently for little endian mode. The masks that represent merge-high and merge-low operations have reversed meanings in little-endian, because of the reversed ordering of the vector elements. The masks that represent vector-pack operations remain correct when the mode of the input operands matches the natural mode of the instruction, but not otherwise. This is because the pack instructions always select the rightmost, low-order bits of the vector element. There are cases where we use this, for example, with a V8SI vector matching a vpkuwum mask in order to select the odd-numbered elements of the vector. In little endian mode, this instruction will get us the even-numbered elements instead. There is no alternative instruction with the desired behavior, so I've just disabled use of those masks for little endian when the mode isn't natural. These changes fix 32 failures in the test suite for little endian mode. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new failures. Is this ok for trunk? Thanks, Bill 2013-10-21 Bill Schmidt * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse meaning of merge-high and merge-low masks for little endian; avoid use of vector-pack masks for little endian for mismatched modes. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 203792) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28837,17 +28838,23 @@ altivec_expand_vec_perm_const (rtx operands[4]) { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw, { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb, { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh, { 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw, { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew, { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, @@ -28980,6 +28987,22 @@ altivec_expand_vec_perm_const (rtx operands[4]) enum machine_mode omode = insn_data[icode].operand[0].mode; enum machine_mode imode = insn_data[icode].operand[1].mode; + /* For little-endian, don't use vpkuwum and vpkuhum if the +underlying vector type is not V4SI and V8HI, respectively. +For example, using vpkuwum with a V8HI picks up the even +halfwords (BE numbering) when the even halfwords (LE +numbering) are what we need. */ + if (!BYTES_BIG_ENDIAN + && icode == CODE_FOR_altivec_vpkuwum + && GET_CODE (op0) == SUBREG + && GET_MODE (XEXP (op0, 0)) != V4SImode) + continue; + if (!BYTES_BIG_ENDIAN + && icode == CODE_FOR_altivec_vpkuhum + && GET_CODE (op0) == SUBREG + && GET_MODE (XEXP (op0, 0)) != V8HImode) + continue; + /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering from 0 to 2N-1. */
Re: [PATCH, rs6000] Be careful with special permute masks for little endian
Whoops, looks like I missed some simpler cases (REG with the wrong mode instead of SUBREG with the wrong mode). Is this revised version ok, assuming it passes testing? It should fix a few more test cases. The changed code from the previous version is in the last hunk. Thanks, Bill Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 203792) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28837,17 +28838,23 @@ altivec_expand_vec_perm_const (rtx operands[4]) { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw, { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb, { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh, { 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw, { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew, { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, @@ -28980,6 +28987,26 @@ altivec_expand_vec_perm_const (rtx operands[4]) enum machine_mode omode = insn_data[icode].operand[0].mode; enum machine_mode imode = insn_data[icode].operand[1].mode; + /* For little-endian, don't use vpkuwum and vpkuhum if the +underlying vector type is not V4SI and V8HI, respectively. +For example, using vpkuwum with a V8HI picks up the even +halfwords (BE numbering) when the even halfwords (LE +numbering) are what we need. */ + if (!BYTES_BIG_ENDIAN + && icode == CODE_FOR_altivec_vpkuwum + && ((GET_CODE (op0) == REG + && GET_MODE (op0) != V4SImode) + || (GET_CODE (op0) == SUBREG + && GET_MODE (XEXP (op0, 0)) != V4SImode))) + continue; + if (!BYTES_BIG_ENDIAN + && icode == CODE_FOR_altivec_vpkuhum + && ((GET_CODE (op0) == REG + && GET_MODE (op0) != V8HImode) + || (GET_CODE (op0) == SUBREG + && GET_MODE (XEXP (op0, 0)) != V8HImode))) + continue; + /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering from 0 to 2N-1. */ On Mon, 2013-10-21 at 10:02 -0400, David Edelsohn wrote: > On Mon, Oct 21, 2013 at 8:49 AM, Bill Schmidt > wrote: > > Hi, > > > > In altivec_expand_vec_perm_const, we look for special masks that match > > the behavior of specific instructions, so we can use those instructions > > rather than load a constant control vector and perform a permute. Some > > of the masks must be treated differently for little endian mode. > > > > The masks that represent merge-high and merge-low operations have > > reversed meanings in little-endian, because of the reversed ordering of > > the vector elements. > > > > The masks that represent vector-pack operations remain correct when the > > mode of the input operands matches the natural mode of the instruction, > > but not otherwise. This is because the pack instructions always select > > the rightmost, low-order bits of the vector element. There are cases > > where we use this, for example, with a V8SI vector matching a vpkuwum > > mask in order to select the odd-numbered elements of the vector. In > > little endian mode, this instruction will get us the even-numbered > > elements instead. There is no alterna
Re: [PATCH, rs6000] Be careful with special permute masks for little endian
Please hold off reviewing this. I see at least one testcase that will have to be modified (expected code generation pattern will be different for LE vs. BE). I'll resubmit the whole thing later today. Thanks, Bill On Mon, 2013-10-21 at 11:39 -0500, Bill Schmidt wrote: > Whoops, looks like I missed some simpler cases (REG with the wrong mode > instead of SUBREG with the wrong mode). Is this revised version ok, > assuming it passes testing? It should fix a few more test cases. > > The changed code from the previous version is in the last hunk. > > Thanks, > Bill > > > Index: gcc/config/rs6000/rs6000.c > === > --- gcc/config/rs6000/rs6000.c(revision 203792) > +++ gcc/config/rs6000/rs6000.c(working copy) > @@ -28837,17 +28838,23 @@ altivec_expand_vec_perm_const (rtx operands[4]) >{ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, > { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, >{ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, > -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, > +{ OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, >{ 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, > -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, > +{ OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, >{ 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } }, > -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw, > +{ OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw, >{ 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } }, > -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb, > +{ OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb, >{ 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } }, > -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh, > +{ OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh, >{ 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } }, > -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw, > +{ OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw, >{ 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, > { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew, >{ 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, > @@ -28980,6 +28987,26 @@ altivec_expand_vec_perm_const (rtx operands[4]) > enum machine_mode omode = insn_data[icode].operand[0].mode; > enum machine_mode imode = insn_data[icode].operand[1].mode; > > + /* For little-endian, don't use vpkuwum and vpkuhum if the > + underlying vector type is not V4SI and V8HI, respectively. > + For example, using vpkuwum with a V8HI picks up the even > + halfwords (BE numbering) when the even halfwords (LE > + numbering) are what we need. */ > + if (!BYTES_BIG_ENDIAN > + && icode == CODE_FOR_altivec_vpkuwum > + && ((GET_CODE (op0) == REG > +&& GET_MODE (op0) != V4SImode) > + || (GET_CODE (op0) == SUBREG > + && GET_MODE (XEXP (op0, 0)) != V4SImode))) > + continue; > + if (!BYTES_BIG_ENDIAN > + && icode == CODE_FOR_altivec_vpkuhum > + && ((GET_CODE (op0) == REG > +&& GET_MODE (op0) != V8HImode) > + || (GET_CODE (op0) == SUBREG > + && GET_MODE (XEXP (op0, 0)) != V8HImode))) > + continue; > + > /* For little-endian, the two input operands must be swapped > (or swapped back) to ensure proper right-to-left numbering > from 0 to 2N-1. */ > > > On Mon, 2013-10-21 at 10:02 -0400, David Edelsohn wrote: > > On Mon, Oct 21, 2013 at 8:49 AM, Bill Schmidt > > wrote: > > > Hi, > > > > > > In altivec_expand_vec_perm_const, we look for special masks that match > > > the behavior of specific instructions, so we can use those instructions > > > rather than load a constant control vector and perform a permute. Some > > > of the masks must be treated differently for little endian mode. > > > > > > The masks that represent merge-high and merge-low operations have > > > reversed meanings in little-endian, because of th
[PATCH, rs6000] Be careful with special permute masks for little endian, take 2
Hi, This is a revision of my earlier patch on the subject, expanded to catch a few more cases and with some attendant test-case adjustments: In altivec_expand_vec_perm_const, we look for special masks that match the behavior of specific instructions, so we can use those instructions rather than load a constant control vector and perform a permute. Some of the masks must be treated differently for little endian mode. The masks that represent merge-high and merge-low operations have reversed meanings in little-endian, because of the reversed ordering of the vector elements. The masks that represent vector-pack operations remain correct when the mode of the input operands matches the natural mode of the instruction, but not otherwise. This is because the pack instructions always select the rightmost, low-order bits of the vector element. There are cases where we use this, for example, with a V8SI vector matching a vpkuwum mask in order to select the odd-numbered elements of the vector. In little endian mode, this instruction will get us the even-numbered elements instead. There is no alternative instruction with the desired behavior, so I've just disabled use of those masks for little endian when the mode isn't natural. This requires adjusting the altivec-perm-1.c test case. The vector pack tests are moved to a new altivec-perm-3.c test, which is restricted to big-endian targets. These changes fix 49 failures in the test suite for little endian mode (9 vector failures left to go!). Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new failures. Is this ok for trunk? Thanks, Bill gcc: 2013-10-21 Bill Schmidt * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse meaning of merge-high and merge-low masks for little endian; avoid use of vector-pack masks for little endian for mismatched modes. gcc/testsuite: 2013-10-21 Bill Schmidt * gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack tests into... * gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is restricted to big-endian targets. Index: gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c === --- gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ +/* { dg-options "-O -maltivec -mno-vsx" } */ + +typedef unsigned char V __attribute__((vector_size(16))); + +V p2(V x, V y) +{ + return __builtin_shuffle(x, y, + (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); + +} + +V p4(V x, V y) +{ + return __builtin_shuffle(x, y, + (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); +} + +/* { dg-final { scan-assembler-not "vperm" } } */ +/* { dg-final { scan-assembler "vpkuhum" } } */ +/* { dg-final { scan-assembler "vpkuwum" } } */ Index: gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c === --- gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c (revision 203792) +++ gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c (working copy) @@ -19,19 +19,6 @@ V b4(V x) return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, }); } -V p2(V x, V y) -{ - return __builtin_shuffle(x, y, - (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); - -} - -V p4(V x, V y) -{ - return __builtin_shuffle(x, y, - (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); -} - V h1(V x, V y) { return __builtin_shuffle(x, y, @@ -72,5 +59,3 @@ V l4(V x, V y) /* { dg-final { scan-assembler "vspltb" } } */ /* { dg-final { scan-assembler "vsplth" } } */ /* { dg-final { scan-assembler "vspltw" } } */ -/* { dg-final { scan-assembler "vpkuhum" } } */ -/* { dg-final { scan-assembler "vpkuwum" } } */ Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 203792) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28837,17 +28838,23 @@ altivec_expand_vec_perm_const (rtx operands[4]) { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, +
[PATCH, rs6000] Fix mulv8hi3 pattern for little endian
Hi, The RTL generation for mulv8hi3 is slightly different for big and little endian modes. In the latter case, the operands of the vector-pack instruction must be reversed to get the proper interleaving. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new regressions. This fixes 3 test case failures for the little endian target. Is this ok for trunk? Thanks, Bill 2013-10-22 Bill Schmidt * config/rs6000/altivec.md (mulv8hi3): Adjust for little endian. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 203923) +++ gcc/config/rs6000/altivec.md(working copy) @@ -681,7 +681,10 @@ emit_insn (gen_altivec_vmrghw (high, even, odd)); emit_insn (gen_altivec_vmrglw (low, even, odd)); - emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); + else + emit_insn (gen_altivec_vpkuwum (operands[0], low, high)); DONE; }")
[PATCH, testsuite/rs6000] Adjust two VMX tests for little endian
Hi, These two test cases require source changes when compiled on a little endian target. Verified on powerpc64{,le}-unknown-linux-gnu. Ok to commit? Thanks, Bill 2013-10-28 Bill Schmidt * gcc.dg/vmx/gcc-bug-i.c: Add little endian variant. * gcc.dg/vmx/eg-5.c: Likewise. Index: gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c === --- gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c(revision 203979) +++ gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c(working copy) @@ -13,12 +13,27 @@ #define DO_INLINE __attribute__ ((always_inline)) #define DONT_INLINE __attribute__ ((noinline)) +#ifdef __LITTLE_ENDIAN__ +static inline DO_INLINE int inline_me(vector signed short data) +{ + union {vector signed short v; signed short s[8];} u; + signed short x; + unsigned char x1, x2; + + u.v = data; + x = u.s[7]; + x1 = (x >> 8) & 0xff; + x2 = x & 0xff; + return ((x2 << 8) | x1); +} +#else static inline DO_INLINE int inline_me(vector signed short data) { union {vector signed short v; signed short s[8];} u; u.v = data; return u.s[7]; } +#endif static DONT_INLINE int foo(vector signed short data) { Index: gcc/testsuite/gcc.dg/vmx/eg-5.c === --- gcc/testsuite/gcc.dg/vmx/eg-5.c (revision 203979) +++ gcc/testsuite/gcc.dg/vmx/eg-5.c (working copy) @@ -7,10 +7,17 @@ matvecmul4 (vector float c0, vector float c1, vect /* Set result to a vector of f32 0's */ vector float result = ((vector float){0.,0.,0.,0.}); +#ifdef __LITTLE_ENDIAN__ + result = vec_madd (c0, vec_splat (v, 3), result); + result = vec_madd (c1, vec_splat (v, 2), result); + result = vec_madd (c2, vec_splat (v, 1), result); + result = vec_madd (c3, vec_splat (v, 0), result); +#else result = vec_madd (c0, vec_splat (v, 0), result); result = vec_madd (c1, vec_splat (v, 1), result); result = vec_madd (c2, vec_splat (v, 2), result); result = vec_madd (c3, vec_splat (v, 3), result); +#endif return result; }
[PATCH, rs6000] Correct handling of multiply high-part for little endian
Hi, When working around the peculiar little-endian semantics of the vperm instruction, our usual fix is to complement the permute control vector and swap the order of the two vector input operands, so that we get a double-wide vector in the proper order. We don't want to swap the operands when we are expanding a mult_highpart operation, however, as the two input operands are not to be interpreted as a double-wide vector. Instead they represent odd and even elements, and swapping the operands gets the odd and even elements reversed in the final result. The permute for this case is generated by target-neutral code in optabs.c: expand_mult_highpart (). We obviously can't change that code directly. However, we can redirect the logic from the "case 2" method to target-specific code by implementing expansions for the umul3_highpart and smul3_highpart operations. I've done this, with the expansions acting exactly as expand_mult_highpart does today, with the exception that it swaps the input operands to the call to expand_vec_perm when we are generating little-endian code. We will later swap them back to their original position in the code in rs6000.c: altivec_expand_vec_perm_const_le (). The change has no intended effect when generating big-endian code. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new regressions. This fixes the gcc.dg/vect/pr51581-4.c test failure for little endian. Ok for trunk? Thanks, Bill 2013-10-30 Bill Schmidt * config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New prototype. * config/rs6000/rs6000.c (altivec_expand_mul_highpart): New. * config/rs6000/altivec.md (umul3_highpart): New. (smul_3_highpart): New. Index: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 204192) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -58,6 +58,7 @@ extern void rs6000_expand_vector_extract (rtx, rtx extern bool altivec_expand_vec_perm_const (rtx op[4]); extern void altivec_expand_vec_perm_le (rtx op[4]); extern bool rs6000_expand_vec_perm_const (rtx op[4]); +extern bool altivec_expand_mul_highpart (rtx op[3], bool); extern void rs6000_expand_extract_even (rtx, rtx, rtx); extern void rs6000_expand_interleave (rtx, rtx, rtx, bool); extern void build_mask64_2_operands (rtx, rtx *); Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 204192) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -29249,6 +29249,58 @@ rs6000_do_expand_vec_perm (rtx target, rtx op0, rt emit_move_insn (target, x); } +/* Expand an Altivec multiply high-part. The logic matches the + general logic in optabs.c:expand_mult_highpart, but swaps the + inputs for little endian. Note that we will swap them again + during the permute; this is the one case where we don't want + the operands swapped, as if we do we get the even and odd values + reversed. */ + +bool +altivec_expand_mul_highpart (rtx operands[3], bool uns_p) +{ + struct expand_operand eops[3]; + rtx m1, m2, perm, tmp; + int i; + + optab tab1 = uns_p ? vec_widen_umult_even_optab : vec_widen_smult_even_optab; + optab tab2 = uns_p ? vec_widen_umult_odd_optab : vec_widen_umult_odd_optab; + enum machine_mode mode = GET_MODE (operands[0]); + enum insn_code icode = optab_handler (tab1, mode); + int nunits = GET_MODE_NUNITS (mode); + enum machine_mode wmode = insn_data[icode].operand[0].mode; + rtvec v = rtvec_alloc (nunits); + + create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode); + create_input_operand (&eops[1], operands[1], mode); + create_input_operand (&eops[2], operands[2], mode); + expand_insn (icode, 3, eops); + m1 = gen_lowpart (mode, eops[0].value); + + create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode); + create_input_operand (&eops[1], operands[1], mode); + create_input_operand (&eops[2], operands[2], mode); + expand_insn (optab_handler (tab2, mode), 3, eops); + m2 = gen_lowpart (mode, eops[0].value); + + for (i = 0; i < nunits; ++i) +RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1) + + ((i & 1) ? nunits : 0)); + + perm = gen_rtx_CONST_VECTOR (mode, v); + + if (!BYTES_BIG_ENDIAN) { +tmp = m1; +m1 = m2; +m2 = tmp; + } + + perm = expand_vec_perm (mode, m1, m2, perm, NULL_RTX); + emit_move_insn (operands[0], perm); + + return true; +} + /* Expand an extract even operation. */ void Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 204192) +++ gcc/config/rs6000/altivec.md(working copy) @@ -1401,6 +1401,30 @@ FAIL; }) +(define_expand "umul3_highpart" + [(
[PATCH, rs6000] Fix rs6000_expand_vector_set for little endian
Hi, Brooks Moses reported a bug with code that sets a single element of a vector to a given value and the rest of the vector to zero. This is implemented in rs6000_expand_vector_set, which uses a vperm instruction to place the nonzero value. As usual, we need to adjust the permute control vector and swap the order of the input operands. I added a test case based on the bug report. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. The new test now passes for both endiannesses. Is this ok for trunk? Thanks, Bill gcc: 2013-10-31 Bill Schmidt * config/rs6000/rs6000.c (rs6000_expand_vector_set): Adjust for little endian. gcc/testsuite: 2013-10-31 Bill Schmidt * gcc.dg/vmx/vec-set.c: New. Index: gcc/testsuite/gcc.dg/vmx/vec-set.c === --- gcc/testsuite/gcc.dg/vmx/vec-set.c (revision 0) +++ gcc/testsuite/gcc.dg/vmx/vec-set.c (revision 0) @@ -0,0 +1,14 @@ +#include "harness.h" + +vector short +vec_set (short m) +{ + return (vector short){m, 0, 0, 0, 0, 0, 0, 0}; +} + +static void test() +{ + check (vec_all_eq (vec_set (7), +((vector short){7, 0, 0, 0, 0, 0, 0, 0})), +"vec_set"); +} Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 204192) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5529,10 +5529,27 @@ rs6000_expand_vector_set (rtx target, rtx val, int XVECEXP (mask, 0, elt*width + i) = GEN_INT (i + 0x10); x = gen_rtx_CONST_VECTOR (V16QImode, XVEC (mask, 0)); - x = gen_rtx_UNSPEC (mode, - gen_rtvec (3, target, reg, -force_reg (V16QImode, x)), - UNSPEC_VPERM); + + if (!BYTES_BIG_ENDIAN) +{ + /* Invert selector. */ + rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode, +gen_rtx_CONST_INT (QImode, -1)); + rtx tmp = gen_reg_rtx (V16QImode); + emit_move_insn (tmp, splat); + x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x)); + emit_move_insn (tmp, x); + + /* Permute with operands reversed and adjusted selector. */ + x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp), + UNSPEC_VPERM); +} + else +x = gen_rtx_UNSPEC (mode, + gen_rtvec (3, target, reg, + force_reg (V16QImode, x)), + UNSPEC_VPERM); + emit_insn (gen_rtx_SET (VOIDmode, target, x)); }
[Fwd: Re: [PATCH, rs6000] Correct handling of multiply high-part for little endian]
Hi maintainers, I agree with David that duplicating this code is a bad approach. What he and I would both prefer is to add a target hook to account for an anomaly in the PowerPC architecture. Background: For historical reasons, our vperm instruction (which is produced for gen_vec_perm) has some peculiar semantics in little endian mode. The permute control vector is interpreted to contain elements indexed in big-endian order no matter which endian mode the processor is set to. We can work around this in little endian mode by inverting the permute control vector and swapping the order of the other two input vectors, thus obtaining the same semantics as we would get for big endian. This behavior works when the two vectors are being treated as a single double-wide vector; the swapping is needed to make the long array appear as [8, 7, 6, 5, 4, 3, 2, 1] instead of [4, 3, 2, 1, 8, 7, 6, 5]. In the specific case of expand_mult_highpart (), the two vectors are not a single double-wide vector, but instead contain the odd and even results of widening multiplies. In this case, we still need to invert the permute control vector, but we don't want to swap the operands, because that causes us to mix up the odd and even results. This is the only such case we've run into where the swap is not what we need to obtain the right semantics. What we would like to do is swap the operands an extra time in expand_mult_highpart (), so that our common code will then swap the operands back to their original position. But since this is in target-independent code, we would need a target hook to do this. Something like: if (targetm.swap_vperm_inputs ()) { rtx tmp = m1; m1 = m2; m2 = tmp; } For PowerPC, the target hook would return !BYTES_BIG_ENDIAN. The default implementation for all other targets would return false. Would you find such an approach tolerable? Thanks, Bill --- Begin Message --- On Wed, Oct 30, 2013 at 6:55 PM, Bill Schmidt wrote: > Hi, > > When working around the peculiar little-endian semantics of the vperm > instruction, our usual fix is to complement the permute control vector > and swap the order of the two vector input operands, so that we get a > double-wide vector in the proper order. We don't want to swap the > operands when we are expanding a mult_highpart operation, however, as > the two input operands are not to be interpreted as a double-wide > vector. Instead they represent odd and even elements, and swapping the > operands gets the odd and even elements reversed in the final result. > > The permute for this case is generated by target-neutral code in > optabs.c: expand_mult_highpart (). We obviously can't change that code > directly. However, we can redirect the logic from the "case 2" method > to target-specific code by implementing expansions for the > umul3_highpart and smul3_highpart operations. I've done > this, with the expansions acting exactly as expand_mult_highpart does > today, with the exception that it swaps the input operands to the call > to expand_vec_perm when we are generating little-endian code. We will > later swap them back to their original position in the code in rs6000.c: > altivec_expand_vec_perm_const_le (). > > The change has no intended effect when generating big-endian code. > > Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new > regressions. This fixes the gcc.dg/vect/pr51581-4.c test failure for > little endian. Ok for trunk? > > Thanks, > Bill > > > 2013-10-30 Bill Schmidt > > * config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New > prototype. > * config/rs6000/rs6000.c (altivec_expand_mul_highpart): New. > * config/rs6000/altivec.md (umul3_highpart): New. > (smul_3_highpart): New. I really do not like duplicating this code. I think that you need to explore with the community the possibility of including a hook in the general code to handle the strangeness of PPC LE vector semantics. This is asking for problems if the generic code is updated / modified / fixed. - David --- End Message ---
Re: [Fwd: Re: [PATCH, rs6000] Correct handling of multiply high-part for little endian]
After discussing this for Richard S at some length today, I want to withdraw this for now and re-examine the issue. I don't feel I understand this as well as I thought I did... ;) Thanks, Bill On Thu, 2013-10-31 at 21:06 -0500, Bill Schmidt wrote: > Hi maintainers, > > I agree with David that duplicating this code is a bad approach. What > he and I would both prefer is to add a target hook to account for an > anomaly in the PowerPC architecture. > > Background: For historical reasons, our vperm instruction (which is > produced for gen_vec_perm) has some peculiar semantics in little endian > mode. The permute control vector is interpreted to contain elements > indexed in big-endian order no matter which endian mode the processor is > set to. We can work around this in little endian mode by inverting the > permute control vector and swapping the order of the other two input > vectors, thus obtaining the same semantics as we would get for big > endian. > > This behavior works when the two vectors are being treated as a single > double-wide vector; the swapping is needed to make the long array appear > as [8, 7, 6, 5, 4, 3, 2, 1] instead of [4, 3, 2, 1, 8, 7, 6, 5]. > > In the specific case of expand_mult_highpart (), the two vectors are not > a single double-wide vector, but instead contain the odd and even > results of widening multiplies. In this case, we still need to invert > the permute control vector, but we don't want to swap the operands, > because that causes us to mix up the odd and even results. This is the > only such case we've run into where the swap is not what we need to > obtain the right semantics. > > What we would like to do is swap the operands an extra time in > expand_mult_highpart (), so that our common code will then swap the > operands back to their original position. But since this is in > target-independent code, we would need a target hook to do this. > Something like: > > if (targetm.swap_vperm_inputs ()) > > { > > rtx tmp = m1; > > m1 = m2; > > m2 = tmp; > > } > > > For PowerPC, the target hook would return !BYTES_BIG_ENDIAN. The > default implementation for all other targets would return false. > > Would you find such an approach tolerable? > > Thanks, > Bill > email message attachment (Re: [PATCH, rs6000] Correct handling of > multiply high-part for little endian), "Forwarded message - Re: > [PATCH, rs6000] Correct handling of multiply high-part for little > endian" > > Forwarded Message > > From: David Edelsohn > > To: Bill Schmidt > > Cc: GCC Patches > > Subject: Re: [PATCH, rs6000] Correct handling of multiply high-part > > for little endian > > Date: Wed, 30 Oct 2013 20:06:37 -0400 > > > > On Wed, Oct 30, 2013 at 6:55 PM, Bill Schmidt > > wrote: > > > Hi, > > > > > > When working around the peculiar little-endian semantics of the vperm > > > instruction, our usual fix is to complement the permute control vector > > > and swap the order of the two vector input operands, so that we get a > > > double-wide vector in the proper order. We don't want to swap the > > > operands when we are expanding a mult_highpart operation, however, as > > > the two input operands are not to be interpreted as a double-wide > > > vector. Instead they represent odd and even elements, and swapping the > > > operands gets the odd and even elements reversed in the final result. > > > > > > The permute for this case is generated by target-neutral code in > > > optabs.c: expand_mult_highpart (). We obviously can't change that code > > > directly. However, we can redirect the logic from the "case 2" method > > > to target-specific code by implementing expansions for the > > > umul3_highpart and smul3_highpart operations. I've done > > > this, with the expansions acting exactly as expand_mult_highpart does > > > today, with the exception that it swaps the input operands to the call > > > to expand_vec_perm when we are generating little-endian code. We will > > > later swap them back to their original position in the code in rs6000.c: > > > altivec_expand_vec_perm_cons
[PATCH, rs6000] Re-permute source register for postreload splits of VSX LE stores
Hi, When I created the patch to split VSX loads and stores to add permutes so the register image was correctly little endian, I forgot to implement a known requirement. When a VSX store is split after reload, we must reuse the source register for the permute, which leaves it in a swapped state after the store. If the register remains live, this is invalid. After the store, we need to permute the register back to its original value. For each of the little endian VSX store patterns, I've replaced the define_insn_and_split with a define_insn and two define_splits, one for prior to reload and one for after reload. The post-reload split has the extra behavior. I don't know of a way to set the insn's length attribute conditionally, so I'm optimistically setting this to 8, though it will be 12 in the post-reload case. Is there any concern with that? Is there a way to make it fully accurate? Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. This fixes two failing test cases. Is this ok for trunk? Thanks! Bill 2013-11-02 Bill Schmidt * config/rs6000/vsx.md (*vsx_le_perm_store_ for VSX_D): Replace the define_insn_and_split with a define_insn and two define_splits, with the split after reload re-permuting the source register to its original value. (*vsx_le_perm_store_ for VSX_W): Likewise. (*vsx_le_perm_store_v8hi): Likewise. (*vsx_le_perm_store_v16qi): Likewise. Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 204192) +++ gcc/config/rs6000/vsx.md(working copy) @@ -333,12 +333,18 @@ [(set_attr "type" "vecload") (set_attr "length" "8")]) -(define_insn_and_split "*vsx_le_perm_store_" +(define_insn "*vsx_le_perm_store_" [(set (match_operand:VSX_D 0 "memory_operand" "=Z") (match_operand:VSX_D 1 "vsx_register_operand" "+wa"))] "!BYTES_BIG_ENDIAN && TARGET_VSX" "#" - "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + +(define_split + [(set (match_operand:VSX_D 0 "memory_operand" "") +(match_operand:VSX_D 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" [(set (match_dup 2) (vec_select: (match_dup 1) @@ -347,21 +353,43 @@ (vec_select: (match_dup 2) (parallel [(const_int 1) (const_int 0)])))] - " { operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) : operands[1]; -} - " - [(set_attr "type" "vecstore") - (set_attr "length" "8")]) +}) -(define_insn_and_split "*vsx_le_perm_store_" +;; The post-reload split requires that we re-permute the source +;; register in case it is still live. +(define_split + [(set (match_operand:VSX_D 0 "memory_operand" "") +(match_operand:VSX_D 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + [(set (match_dup 1) +(vec_select: + (match_dup 1) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 0) +(vec_select: + (match_dup 1) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 1) +(vec_select: + (match_dup 1) + (parallel [(const_int 1) (const_int 0)])))] + "") + +(define_insn "*vsx_le_perm_store_" [(set (match_operand:VSX_W 0 "memory_operand" "=Z") (match_operand:VSX_W 1 "vsx_register_operand" "+wa"))] "!BYTES_BIG_ENDIAN && TARGET_VSX" "#" - "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + +(define_split + [(set (match_operand:VSX_W 0 "memory_operand" "") +(match_operand:VSX_W 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" [(set (match_dup 2) (vec_select: (match_dup 1) @@ -372,21 +400,46 @@ (match_dup 2) (parallel [(const_int 2) (const_int 3) (const_int 0) (const_int 1)])))] - " { operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) : operands[1]; -} - " - [(set_attr "type" "vecstore") - (set_attr "length&
[PATCH, rs6000] (0/3) Patch set to fix multiply even/odd problem and associated fallout for little endian
Hi, This set of patches addresses the problem with vector multiply even/odd instructions in little endian mode that I incorrectly attempted to address as part of expand_mult_highpart. (Thanks to Richard Sandiford for setting me on the right path.) The first patch fixes the root problem wherein the "even" multiply instructions actually process odd elements in little endian mode, and vice versa. However, fixing this problem exposed several other issues, necessitating the other two patches. The second patch addresses the vector widening multiply high/low operations by swapping the input operands of the merge high and merge low instructions. Those operands are multiply-even and multiply-odd instructions, which now have reversed meanings, so swapping the operands gets us back to correct behavior. The third patch addresses two other multiplication expansions: mulv4si3 and mulv8hi3. The first needs an exception to the rule of swapping the even and odd multiply instructions, and the second again needs to swap inputs to the merge high and merge low instructions. The net effect of these three patches is to fix one failing test case, without any regressions for either endianness. Thanks, Bill
[PATCH, rs6000] (1/3) Reverse meanings of multiply even/odd for little endian
Hi, This patch reverses the meanings of multiply even/odd instructions for little endian. Since these instructions use a big-endian idea of evenness/oddness, the nominal meanings of the instructions is wrong for little endian. Bootstrapped and tested with the rest of the patch set on powerpc64{,le}-unknown-linux-gnu with no regressions. Ok for trunk? Thanks, Bill 2013-11-03 Bill Schmidt * config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Swap meanings of even and odd multiplies for little endian. (vec_widen_smult_even_v16qi): Likewise. (vec_widen_umult_even_v8hi): Likewise. (vec_widen_smult_even_v8hi): Likewise. (vec_widen_umult_odd_v16qi): Likewise. (vec_widen_smult_odd_v16qi): Likewise. (vec_widen_umult_odd_v8hi): Likewise. (vec_widen_smult_odd_v8hi): Likewise. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 204192) +++ gcc/config/rs6000/altivec.md(working copy) @@ -978,7 +988,12 @@ (match_operand:V16QI 2 "register_operand" "v")] UNSPEC_VMULEUB))] "TARGET_ALTIVEC" - "vmuleub %0,%1,%2" +{ + if (BYTES_BIG_ENDIAN) +return "vmuleub %0,%1,%2"; + else +return "vmuloub %0,%1,%2"; +} [(set_attr "type" "veccomplex")]) (define_insn "vec_widen_smult_even_v16qi" @@ -987,7 +1002,12 @@ (match_operand:V16QI 2 "register_operand" "v")] UNSPEC_VMULESB))] "TARGET_ALTIVEC" - "vmulesb %0,%1,%2" +{ + if (BYTES_BIG_ENDIAN) +return "vmulesb %0,%1,%2"; + else +return "vmulosb %0,%1,%2"; +} [(set_attr "type" "veccomplex")]) (define_insn "vec_widen_umult_even_v8hi" @@ -996,7 +1016,12 @@ (match_operand:V8HI 2 "register_operand" "v")] UNSPEC_VMULEUH))] "TARGET_ALTIVEC" - "vmuleuh %0,%1,%2" +{ + if (BYTES_BIG_ENDIAN) +return "vmuleuh %0,%1,%2"; + else +return "vmulouh %0,%1,%2"; +} [(set_attr "type" "veccomplex")]) (define_insn "vec_widen_smult_even_v8hi" @@ -1005,7 +1030,12 @@ (match_operand:V8HI 2 "register_operand" "v")] UNSPEC_VMULESH))] "TARGET_ALTIVEC" - "vmulesh %0,%1,%2" +{ + if (BYTES_BIG_ENDIAN) +return "vmulesh %0,%1,%2"; + else +return "vmulosh %0,%1,%2"; +} [(set_attr "type" "veccomplex")]) (define_insn "vec_widen_umult_odd_v16qi" @@ -1014,7 +1044,12 @@ (match_operand:V16QI 2 "register_operand" "v")] UNSPEC_VMULOUB))] "TARGET_ALTIVEC" - "vmuloub %0,%1,%2" +{ + if (BYTES_BIG_ENDIAN) +return "vmuloub %0,%1,%2"; + else +return "vmuleub %0,%1,%2"; +} [(set_attr "type" "veccomplex")]) (define_insn "vec_widen_smult_odd_v16qi" @@ -1023,7 +1058,12 @@ (match_operand:V16QI 2 "register_operand" "v")] UNSPEC_VMULOSB))] "TARGET_ALTIVEC" - "vmulosb %0,%1,%2" +{ + if (BYTES_BIG_ENDIAN) +return "vmulosb %0,%1,%2"; + else +return "vmulesb %0,%1,%2"; +} [(set_attr "type" "veccomplex")]) (define_insn "vec_widen_umult_odd_v8hi" @@ -1032,7 +1072,12 @@ (match_operand:V8HI 2 "register_operand" "v")] UNSPEC_VMULOUH))] "TARGET_ALTIVEC" - "vmulouh %0,%1,%2" +{ + if (BYTES_BIG_ENDIAN) +return "vmulouh %0,%1,%2"; + else +return "vmuleuh %0,%1,%2"; +} [(set_attr "type" "veccomplex")]) (define_insn "vec_widen_smult_odd_v8hi" @@ -1041,7 +1086,12 @@ (match_operand:V8HI 2 "register_operand" "v")] UNSPEC_VMULOSH))] "TARGET_ALTIVEC" - "vmulosh %0,%1,%2" +{ + if (BYTES_BIG_ENDIAN) +return "vmulosh %0,%1,%2"; + else +return "vmulesh %0,%1,%2"; +} [(set_attr "type" "veccomplex")])
[PATCH, rs6000] (2/3) Fix widening multiply high/low operations for little endian
Hi, This patch fixes the widening multiply high/low operations to work correctly in the presence of the first patch of this series, which reverses the meanings of multiply even/odd instructions. Here we reorder the input operands to the vector merge low/high instructions. The general rule is that vmrghh(x,y) [BE] = vmrglh(y,x) [LE], and so on; that is, we need to reverse the usage of merge high and merge low, and also swap their inputs, to obtain the same semantics. In this case we are only swapping the inputs, because the reversed usage of high and low has already been done for us in the generic handling code for VEC_WIDEN_MULT_LO_EXPR. Bootstrapped and tested with the rest of the patch set on powerpc64{,le}-unknown-linux-gnu, with no regressions. Is this ok for trunk? Thanks, Bill 2013-11-03 Bill Schmidt * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap arguments to merge instruction for little endian. (vec_widen_umult_lo_v16qi): Likewise. (vec_widen_smult_hi_v16qi): Likewise. (vec_widen_smult_lo_v16qi): Likewise. (vec_widen_umult_hi_v8hi): Likewise. (vec_widen_umult_lo_v8hi): Likewise. (vec_widen_smult_hi_v8hi): Likewise. (vec_widen_smult_lo_v8hi): Likewise. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 204192) +++ gcc/config/rs6000/altivec.md(working copy) @@ -2185,7 +2235,10 @@ emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); DONE; }") @@ -2202,7 +2255,10 @@ emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); DONE; }") @@ -2219,7 +2275,10 @@ emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); DONE; }") @@ -2236,7 +2295,10 @@ emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); DONE; }") @@ -2253,7 +2315,10 @@ emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrghw (operands[0], vo, ve)); DONE; }") @@ -2270,7 +2335,10 @@ emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrglw (operands[0], vo, ve)); DONE; }") @@ -2287,7 +2355,10 @@ emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrghw (operands[0], vo, ve)); DONE; }") @@ -2304,7 +2375,10 @@ emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + else +emit_insn (gen_altivec_vmrglw (operands[0], vo, ve)); DONE; }")
[PATCH, rs6000] (3/3) Fix mulv4si3 and mulv8hi3 patterns for little endian
Hi, This patch contains two more fixes related to the multiply even/odd problem. First, it changes the mulv4si3 pattern so that it always uses the vmulouh (vector multiply odd halfword) instruction regardless of endianness. The reason for this is that we are not really multiplying halfwords, but are multiplying words that have been truncated to halfword length; therefore they always sit in the right half of each word, which is the odd-numbered word in big-endian numbering. The fix for mulv8hi3 is another case where reversing the meanings of the multiply even/odd instructions requires us to reverse the order of inputs on the merge high/low instructions to compensate. Bootstrapped and tested with the rest of the patch set on powerpc64{,le}-unknown-linux-gnu with no regressions. Ok for trunk? Thanks, Bill 2013-11-03 Bill Schmidt * config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh for both big and little endian. (mulv8hi3): Swap input operands for merge high and merge low instructions. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 204192) +++ gcc/config/rs6000/altivec.md(working copy) @@ -651,7 +651,12 @@ convert_move (small_swap, swap, 0); low_product = gen_reg_rtx (V4SImode); - emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two)); + /* We need this to be vmulouh for both big and little endian, + but for little endian we would swap this, so avoid that. */ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two)); + else + emit_insn (gen_vec_widen_umult_even_v8hi (low_product, one, two)); high_product = gen_reg_rtx (V4SImode); emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero)); @@ -678,13 +683,18 @@ emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw (high, even, odd)); - emit_insn (gen_altivec_vmrglw (low, even, odd)); - if (BYTES_BIG_ENDIAN) - emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); + { + emit_insn (gen_altivec_vmrghw (high, even, odd)); + emit_insn (gen_altivec_vmrglw (low, even, odd)); + emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); + } else - emit_insn (gen_altivec_vpkuwum (operands[0], low, high)); + { + emit_insn (gen_altivec_vmrghw (high, odd, even)); + emit_insn (gen_altivec_vmrglw (low, odd, even)); + emit_insn (gen_altivec_vpkuwum (operands[0], low, high)); + } DONE; }")
[PATCH, rs6000] Fix vect_pack_trunc_v2df pattern for little endian
Hi, This cleans up another case where a vector-pack operation needs to reverse its operand order for little endian. This fixes the last remaining vector test failure in the test suite when building for little endian. Next I'll be spending some time looking for additional little endian issues for which we don't yet have test coverage. As an example, there are several patterns similar to vect_pack_trunc_v2df that probably need the same treatment as this patch, but we don't have any test cases that expose a problem. I need to verify those are broken and add test cases when fixing them. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2013-11-04 Bill Schmidt * config/rs6000/vector.md (vec_pack_trunc_v2df): Adjust for little endian. Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 204320) +++ gcc/config/rs6000/vector.md (working copy) @@ -830,7 +830,12 @@ emit_insn (gen_vsx_xvcvdpsp (r1, operands[1])); emit_insn (gen_vsx_xvcvdpsp (r2, operands[2])); - rs6000_expand_extract_even (operands[0], r1, r2); + + if (BYTES_BIG_ENDIAN) +rs6000_expand_extract_even (operands[0], r1, r2); + else +rs6000_expand_extract_even (operands[0], r2, r1); + DONE; })
Re: [PATCH, rs6000] (3/3) Fix mulv4si3 and mulv8hi3 patterns for little endian
On Mon, 2013-11-04 at 15:48 +, Richard Sandiford wrote: > Bill Schmidt writes: > > + /* We need this to be vmulouh for both big and little endian, > > + but for little endian we would swap this, so avoid that. */ > > + if (BYTES_BIG_ENDIAN) > > + emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two)); > > + else > > + emit_insn (gen_vec_widen_umult_even_v8hi (low_product, one, two)); > > FWIW, an alternative would be to turn vec_widen_smult_{even,odd}_* into > define_expands and have define_insns for the underlying instructions. > E.g. vec_widen_umult_even_v16qi could call gen_vmuleub or gen_vmuloub > depending on endianness. > > Then the unspec name would always match the instruction, and you could > also use gen_vmulouh rather than gen_vec_widen_umult_*_v8hi above. > > It probably works out as more code overall, but maybe it means jumping > through fewer mental hoops... Good idea. I'll have a look and produce a new patch set shortly. Thanks, Bill > > Thanks, > Richard
Re: [PATCH, rs6000] (2/3) Fix widening multiply high/low operations for little endian
Per Richard S's suggestion, I'm reworking parts 1 and 3 of the patch set, but this one will remain unchanged and is ready for review. Thanks, Bill On Sun, 2013-11-03 at 23:34 -0600, Bill Schmidt wrote: > Hi, > > This patch fixes the widening multiply high/low operations to work > correctly in the presence of the first patch of this series, which > reverses the meanings of multiply even/odd instructions. Here we > reorder the input operands to the vector merge low/high instructions. > > The general rule is that vmrghh(x,y) [BE] = vmrglh(y,x) [LE], and so on; > that is, we need to reverse the usage of merge high and merge low, and > also swap their inputs, to obtain the same semantics. In this case we > are only swapping the inputs, because the reversed usage of high and low > has already been done for us in the generic handling code for > VEC_WIDEN_MULT_LO_EXPR. > > Bootstrapped and tested with the rest of the patch set on > powerpc64{,le}-unknown-linux-gnu, with no regressions. Is this ok for > trunk? > > Thanks, > Bill > > > 2013-11-03 Bill Schmidt > > * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap > arguments to merge instruction for little endian. > (vec_widen_umult_lo_v16qi): Likewise. > (vec_widen_smult_hi_v16qi): Likewise. > (vec_widen_smult_lo_v16qi): Likewise. > (vec_widen_umult_hi_v8hi): Likewise. > (vec_widen_umult_lo_v8hi): Likewise. > (vec_widen_smult_hi_v8hi): Likewise. > (vec_widen_smult_lo_v8hi): Likewise. > > > Index: gcc/config/rs6000/altivec.md > === > --- gcc/config/rs6000/altivec.md (revision 204192) > +++ gcc/config/rs6000/altivec.md (working copy) > @@ -2185,7 +2235,10 @@ > >emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); >emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); > + else > +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); >DONE; > }") > > @@ -2202,7 +2255,10 @@ > >emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); >emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); > + else > +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); >DONE; > }") > > @@ -2219,7 +2275,10 @@ > >emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); >emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > +emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); > + else > +emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); >DONE; > }") > > @@ -2236,7 +2295,10 @@ > >emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); >emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > +emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); > + else > +emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); >DONE; > }") > > @@ -2253,7 +2315,10 @@ > >emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); >emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > +emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); > + else > +emit_insn (gen_altivec_vmrghw (operands[0], vo, ve)); >DONE; > }") > > @@ -2270,7 +2335,10 @@ > >emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); >emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > +emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); > + else > +emit_insn (gen_altivec_vmrglw (operands[0], vo, ve)); >DONE; > }") > > @@ -2287,7 +2355,10 @@ > >emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2])); >emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) >
Re: [PATCH, rs6000] (3/3) Fix mulv4si3 and mulv8hi3 patterns for little endian
Hi, Here's a revised version of this patch according to Richard's suggestions. It differs from the previous version only in the method used to ensure vmulouh is generated; we now call the new gen_altivec_vmulouh to accomplish this. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2013-11-04 Bill Schmidt * config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh for both big and little endian. (mulv8hi3): Swap input operands for merge high and merge low instructions for little endian. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 204350) +++ gcc/config/rs6000/altivec.md(working copy) @@ -651,7 +651,7 @@ convert_move (small_swap, swap, 0); low_product = gen_reg_rtx (V4SImode); - emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two)); + emit_insn (gen_altivec_vmulouh (low_product, one, two)); high_product = gen_reg_rtx (V4SImode); emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero)); @@ -678,13 +678,18 @@ emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw (high, even, odd)); - emit_insn (gen_altivec_vmrglw (low, even, odd)); - if (BYTES_BIG_ENDIAN) - emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); + { + emit_insn (gen_altivec_vmrghw (high, even, odd)); + emit_insn (gen_altivec_vmrglw (low, even, odd)); + emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); + } else - emit_insn (gen_altivec_vpkuwum (operands[0], low, high)); + { + emit_insn (gen_altivec_vmrghw (high, odd, even)); + emit_insn (gen_altivec_vmrglw (low, odd, even)); + emit_insn (gen_altivec_vpkuwum (operands[0], low, high)); + } DONE; }")
Re: [PATCH, rs6000] (1/3) Reverse meanings of multiply even/odd for little endian
Hi, Here's a new version of this patch, revised according to Richard Sandiford's suggestions. Unfortunately the diffing is a little bit ugly for this version. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2013-11-04 Bill Schmidt * config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Change define_insn to define_expand that uses even patterns for big endian and odd patterns for little endian. (vec_widen_smult_even_v16qi): Likewise. (vec_widen_umult_even_v8hi): Likewise. (vec_widen_smult_even_v8hi): Likewise. (vec_widen_umult_odd_v16qi): Likewise. (vec_widen_smult_odd_v16qi): Likewise. (vec_widen_umult_odd_v8hi): Likewise. (vec_widen_smult_odd_v8hi): Likewise. (altivec_vmuleub): New define_insn. (altivec_vmuloub): Likewise. (altivec_vmulesb): Likewise. (altivec_vmulosb): Likewise. (altivec_vmuleuh): Likewise. (altivec_vmulouh): Likewise. (altivec_vmulesh): Likewise. (altivec_vmulosh): Likewise. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 204350) +++ gcc/config/rs6000/altivec.md(working copy) @@ -972,7 +977,111 @@ "vmrgow %0,%1,%2" [(set_attr "type" "vecperm")]) -(define_insn "vec_widen_umult_even_v16qi" +(define_expand "vec_widen_umult_even_v16qi" + [(use (match_operand:V8HI 0 "register_operand" "")) + (use (match_operand:V16QI 1 "register_operand" "")) + (use (match_operand:V16QI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2])); + else +emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_smult_even_v16qi" + [(use (match_operand:V8HI 0 "register_operand" "")) + (use (match_operand:V16QI 1 "register_operand" "")) + (use (match_operand:V16QI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2])); + else +emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_umult_even_v8hi" + [(use (match_operand:V4SI 0 "register_operand" "")) + (use (match_operand:V8HI 1 "register_operand" "")) + (use (match_operand:V8HI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2])); + else +emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_smult_even_v8hi" + [(use (match_operand:V4SI 0 "register_operand" "")) + (use (match_operand:V8HI 1 "register_operand" "")) + (use (match_operand:V8HI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2])); + else +emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_umult_odd_v16qi" + [(use (match_operand:V8HI 0 "register_operand" "")) + (use (match_operand:V16QI 1 "register_operand" "")) + (use (match_operand:V16QI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2])); + else +emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_smult_odd_v16qi" + [(use (match_operand:V8HI 0 "register_operand" "")) + (use (match_operand:V16QI 1 "register_operand" "")) + (use (match_operand:V16QI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) +emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2])); + else +emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_umult_odd_v8hi" + [(use (match_operand:V4SI 0 "register_operand" "")) + (use (match_operand:V8HI 1 "register_operand" "")) + (use (match_operand:V8HI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) +emit_insn (ge
Re: [PATCH, rs6000] Fix vect_pack_trunc_v2df pattern for little endian
Hi, This fixes the two companion patterns vec_pack_[su]fix_trunc_v2df in the same manner as the recent fix for vec_pack_trunc_v2df. The same fix obviously applies here as well. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2013-11-04 Bill Schmidt * config/rs6000/vector.md (vec_pack_sfix_trunc_v2df): Adjust for little endian. (vec_pack_ufix_trunc_v2df): Likewise. Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 204349) +++ gcc/config/rs6000/vector.md (working copy) @@ -850,7 +850,12 @@ emit_insn (gen_vsx_xvcvdpsxws (r1, operands[1])); emit_insn (gen_vsx_xvcvdpsxws (r2, operands[2])); - rs6000_expand_extract_even (operands[0], r1, r2); + + if (BYTES_BIG_ENDIAN) +rs6000_expand_extract_even (operands[0], r1, r2); + else +rs6000_expand_extract_even (operands[0], r2, r1); + DONE; }) @@ -865,7 +870,12 @@ emit_insn (gen_vsx_xvcvdpuxws (r1, operands[1])); emit_insn (gen_vsx_xvcvdpuxws (r2, operands[2])); - rs6000_expand_extract_even (operands[0], r1, r2); + + if (BYTES_BIG_ENDIAN) +rs6000_expand_extract_even (operands[0], r1, r2); + else +rs6000_expand_extract_even (operands[0], r2, r1); + DONE; })
Re: libsanitizer merge from upstream r191666
Hi Peter, The buildbot shows the latest LLVM ppc64 build is working ok: http://lab.llvm.org:8011/builders/llvm-ppc64-linux2/builds/8086 This build completed about two hours ago. Hope this helps, Bill On Mon, 2013-11-04 at 20:02 -0600, Peter Bergner wrote: > On Mon, 2013-11-04 at 17:48 -0800, Konstantin Serebryany wrote: > > Hi Peter. > > Does this also mean that asan in llvm trunk is broken for Power? > > We'll need to fix it there too (or, in fact, first). > > I'm not sure. Bill, can you fire off a quick LLVM trunk build on > powerpc64-linux and see you see the same build error I am seeing > on gcc trunk? Namely: > > http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00312.html > > Thanks. ...and I'll have to have you teach me how to fire off > an LLVM build one of these days so I can do this myself. :) > > Peter > >
[PATCH, rs6000] Fix PR63354
Hi, Anton Blanchard proposed a fix to his own bug report in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63354, but never submitted the patch upstream. I've added a formal test case and am submitting on his behalf. The patch simply ensures that we don't stack a frame for leaf procedures when called with -pg -mprofile-kernel. The automatically generated calls to _mcount occur prior to the prolog and do not require us to stack a frame. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2016-01-21 Anton Blanchard Bill Schmidt PR target/63354 * config/rs6000/linux64.h (TARGET_KEEP_LEAF_WHEN_PROFILED): New #define. * config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled): New function. [gcc/testsuite] 2016-01-21 Anton Blanchard Bill Schmidt PR target/63354 * gcc.target/powerpc/pr63354.c: New test. Index: gcc/config/rs6000/linux64.h === --- gcc/config/rs6000/linux64.h (revision 232677) +++ gcc/config/rs6000/linux64.h (working copy) @@ -59,6 +59,9 @@ extern int dot_symbols; #define TARGET_PROFILE_KERNEL profile_kernel +#undef TARGET_KEEP_LEAF_WHEN_PROFILED +#define TARGET_KEEP_LEAF_WHEN_PROFILED rs6000_keep_leaf_when_profiled + #define TARGET_USES_LINUX64_OPT 1 #ifdef HAVE_LD_LARGE_TOC #undef TARGET_CMODEL Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 232677) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -26237,6 +26237,14 @@ rs6000_output_function_prologue (FILE *file, rs6000_pic_labelno++; } +/* -mprofile-kernel code calls mcount before the function prolog, + so a profiled leaf function should stay a leaf function. */ +static bool +rs6000_keep_leaf_when_profiled () +{ + return TARGET_PROFILE_KERNEL; +} + /* Non-zero if vmx regs are restored before the frame pop, zero if we restore after the pop when possible. */ #define ALWAYS_RESTORE_ALTIVEC_BEFORE_POP 0 Index: gcc/testsuite/gcc.target/powerpc/pr63354.c === --- gcc/testsuite/gcc.target/powerpc/pr63354.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr63354.c (working copy) @@ -0,0 +1,11 @@ +/* Verify that we don't stack a frame for leaf functions when using + -pg -mprofile-kernel. */ + +/* { dg-do compile { target { powerpc64*-*-* } } } */ +/* { dg-options "-O2 -pg -mprofile-kernel" } */ +/* { dg-final { scan-assembler-not "mtlr" } } */ + +int foo(void) +{ + return 1; +}
[PATCH, rs6000] Fix PR67489
Hi, The test case gcc.target/powerpc/p8vector-builtin-8.c needs to be restricted to targets that support the __int128 keyword. This was wrongly being attempted with { dg-do compile { target int128 } } when what's really wanted is { dg-require-effective-target int128 }. With this patch, the test no longer runs on 32-bit targets. Tested on powerpc64-unknown-linux-gnu using -m32. Is this ok for trunk? Thanks, Bill 2016-01-21 Bill Schmidt PR testsuite/67489 * gcc.target/powerpc/p8vector-builtin-8.c: Remove { target int128 } from dg-do compile directive, and instead add { dg-require-effective-target int128 }. Index: gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c === --- gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c (revision 232683) +++ gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c (working copy) @@ -1,5 +1,6 @@ -/* { dg-do compile { target int128 } } */ +/* { dg-do compile } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target int128 } */ /* { dg-options "-mpower8-vector -O2" } */ #include
Re: [PATCH, rs6000] Fix PR63354
The testcase will need a slight adjustment, as currently it fails on powerpc64 with -m32 testing. Working on a fix. Bill On Thu, 2016-01-21 at 12:28 -0500, David Edelsohn wrote: > On Thu, Jan 21, 2016 at 11:48 AM, Bill Schmidt > wrote: > > Hi, > > > > Anton Blanchard proposed a fix to his own bug report in > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63354, but never submitted > > the patch upstream. I've added a formal test case and am submitting on > > his behalf. > > > > The patch simply ensures that we don't stack a frame for leaf procedures > > when called with -pg -mprofile-kernel. The automatically generated > > calls to _mcount occur prior to the prolog and do not require us to > > stack a frame. > > > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > > regressions. Is this ok for trunk? > > > > Thanks, > > Bill > > > > > > [gcc] > > > > 2016-01-21 Anton Blanchard > > Bill Schmidt > > > > PR target/63354 > > * config/rs6000/linux64.h (TARGET_KEEP_LEAF_WHEN_PROFILED): New > > #define. > > * config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled): New > > function. > > > > [gcc/testsuite] > > > > 2016-01-21 Anton Blanchard > > Bill Schmidt > > > > PR target/63354 > > * gcc.target/powerpc/pr63354.c: New test. > > Okay. > > Thanks, David >
Re: [PATCH, rs6000] Fix PR63354
Hi, On Thu, 2016-01-21 at 21:21 -0600, Bill Schmidt wrote: > The testcase will need a slight adjustment, as currently it fails on > powerpc64 with -m32 testing. Working on a fix. > > Bill > This patch adjusts the gcc.target/powerpc/pr63354 test to require 64-bit code generation, and also restricts the test to Linux targets, as this is necessary for using -mprofile-kernel. Tested on powerpc64-unknown-linux-gnu configured with --with-cpu=power7 and testing with -m32; the test is now correctly skipped there. Is this okay for trunk? Thanks, Bill 2016-01-22 Bill Schmidt * gcc.target/powerpc/pr63354.c: Restrict to Linux targets with 64-bit support. Index: gcc/testsuite/gcc.target/powerpc/pr63354.c === --- gcc/testsuite/gcc.target/powerpc/pr63354.c (revision 232716) +++ gcc/testsuite/gcc.target/powerpc/pr63354.c (working copy) @@ -1,8 +1,9 @@ /* Verify that we don't stack a frame for leaf functions when using -pg -mprofile-kernel. */ -/* { dg-do compile { target { powerpc64*-*-* } } } */ +/* { dg-do compile { target { powerpc64*-linux-* } } } */ /* { dg-options "-O2 -pg -mprofile-kernel" } */ +/* { dg-require-effective-target powerpc64 } */ /* { dg-final { scan-assembler-not "mtlr" } } */ int foo(void)
Re: [PATCH, rs6000] Fix PR63354
OK, thanks, Joseph! I'll make that adjustment later today. Bill On Fri, 2016-01-22 at 15:51 +, Joseph Myers wrote: > On Thu, 21 Jan 2016, Bill Schmidt wrote: > > > +/* { dg-do compile { target { powerpc64*-linux-* } } } */ > > That's suboptimal; you should allow powerpc*-*-linux* targets so that the > test is also run for --enable-targets=all powerpc-linux builds when > testing a -m64 multilib. >
Re: [PATCH, rs6000] Fix PR63354
On Sun, 2016-01-24 at 02:18 +0100, Jan-Benedict Glaw wrote: > On Thu, 2016-01-21 23:42:40 -0600, Bill Schmidt > wrote: > > On Thu, 2016-01-21 at 21:21 -0600, Bill Schmidt wrote: > > > The testcase will need a slight adjustment, as currently it fails on > > > powerpc64 with -m32 testing. Working on a fix. > > > > This patch adjusts the gcc.target/powerpc/pr63354 test to require 64-bit > > code generation, and also restricts the test to Linux targets, as this > > is necessary for using -mprofile-kernel. Tested on > > powerpc64-unknown-linux-gnu configured with --with-cpu=power7 and > > testing with -m32; the test is now correctly skipped there. Is this > > okay for trunk? > > Building for --target=powerpc-xilinx-eabi, I see this on my build > robot (see at the bottom of the page, the make.out artifact of build > http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=483851): > > g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE > -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall > -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute > -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros > -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. > -I../../../gcc/gcc -I../../../gcc/gcc/. -I../../../gcc/gcc/../include > -I../../../gcc/gcc/../libcpp/include -I/opt/cfarm/mpc/include > -I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../libdecnumber/dpd > -I../libdecnumber -I../../../gcc/gcc/../libbacktrace -o rs6000.o -MT > rs6000.o -MMD -MP -MF ./.deps/rs6000.TPo > ../../../gcc/gcc/config/rs6000/rs6000.c > ../../../gcc/gcc/config/rs6000/rs6000.c:26243:1: error: ‘bool > rs6000_keep_leaf_when_profiled()’ defined but not used > [-Werror=unused-function] > rs6000_keep_leaf_when_profiled () > ^~ > > cc1plus: all warnings being treated as errors > Makefile:2121: recipe for target 'rs6000.o' failed > make[2]: *** [rs6000.o] Error 1 > make[2]: Leaving directory > '/home/jbglaw/build-configlist_mk/powerpc-xilinx-eabi/build-gcc/mk/powerpc-xilinx-eabi/gcc' > Makefile:4123: recipe for target 'all-gcc' failed > make[1]: *** [all-gcc] Error 2 > make[1]: Leaving directory > '/home/jbglaw/build-configlist_mk/powerpc-xilinx-eabi/build-gcc/mk/powerpc-xilinx-eabi' > > > MfG, JBG > Hi Jan, thanks for the report! Patch below that should fix the problem. Bootstrapped and tested on powerpc64le-unknown-linux-gnu, no regressions. David, is this ok for trunk? Thanks, Bill 2016-01-24 Bill Schmidt * config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled): Add decl with __attribute__ ((unused)) annotation. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 232783) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1311,6 +1311,7 @@ static bool rs6000_secondary_reload_move (enum rs6 secondary_reload_info *, bool); rtl_opt_pass *make_pass_analyze_swaps (gcc::context*); +static bool rs6000_keep_leaf_when_profiled () __attribute__ ((unused)); /* Hash table stuff for keeping track of TOC entries. */
Re: [PATCH, rs6000] Fix PR63354
Thanks, committed as r232793. Bill On Mon, 2016-01-25 at 08:54 -0500, David Edelsohn wrote: > On Sun, Jan 24, 2016 at 9:17 PM, Bill Schmidt > wrote: > > > Hi Jan, thanks for the report! Patch below that should fix the problem. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu, no > > regressions. David, is this ok for trunk? > > > > Thanks, > > Bill > > > > > > 2016-01-24 Bill Schmidt > > > > * config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled): Add > > decl with __attribute__ ((unused)) annotation. > > Okay. > > Thanks, David >
[PATCH, 4.9, rs6000, testsuite] Fix PR69479
Hi, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69479 notes that gcc.dg/and-1.c fails a scan-assembler-not test for nand, but the test does pass in subsequent releases. The test author indicates in comment #1 that we can just remove this test for powerpc*-*-*, which this patch does. Verified for 4.9 on powerpc64le-unknown-linux-gnu. Ok to commit to that branch? Thanks, Bill 2016-01-26 Bill Schmidt * gcc.dg/and-1.c: Remove nand test for powerpc*-*-*. Index: gcc/testsuite/gcc.dg/and-1.c === --- gcc/testsuite/gcc.dg/and-1.c(revision 232844) +++ gcc/testsuite/gcc.dg/and-1.c(working copy) @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2" } */ /* { dg-final { scan-assembler "and" { target powerpc*-*-* spu-*-* } } } */ -/* There should be no nand for this testcase (for either PPC or SPU). */ -/* { dg-final { scan-assembler-not "nand" { target powerpc*-*-* spu-*-* } } } */ +/* There should be no nand for this testcase for SPU. */ +/* { dg-final { scan-assembler-not "nand" { target spu-*-* } } } */ int f(int y) {
[PATCH, rs6000] Partial fix for PR65546 (GCC 6)
Hi, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65546 discusses the failure of gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c. The test fails differently on GCC 4.9 and 5 than it does on GCC 6. For GCC 6, the test case is faulty, as we only expect to see the "vectorization not profitable" statement when misaligned loads/stores are not efficient on the target hardware. This patch fixes the test for GCC 6. Something else is going on in the earlier releases, which I plan to look at separately. Verified on powerpc64le-unknown-linux-gnu. Is this okay for trunk? Thanks, Bill 2016-01-28 Bill Schmidt PR target/65546 * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Disable check for "vectorization not profitable" when the target supports misaligned loads and stores. Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c === --- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c (revision 232890) +++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c (working copy) @@ -46,5 +46,5 @@ int main (void) return main1 (); } -/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { ! vect_hw_misalign } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { ! vect_hw_misalign } } } } */
Re: [PATCH, rs6000] Partial fix for PR65546 (GCC 6)
Actually, please hold off on this. The test in general is just faulty. I'll get something more complete later on. Sorry for the noise, Bill On Thu, 2016-01-28 at 11:45 -0600, Bill Schmidt wrote: > Hi, > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65546 discusses the failure > of gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c. The test fails > differently on GCC 4.9 and 5 than it does on GCC 6. For GCC 6, the test > case is faulty, as we only expect to see the "vectorization not > profitable" statement when misaligned loads/stores are not efficient on > the target hardware. This patch fixes the test for GCC 6. > > Something else is going on in the earlier releases, which I plan to look > at separately. > > Verified on powerpc64le-unknown-linux-gnu. Is this okay for trunk? > > Thanks, > Bill > > > 2016-01-28 Bill Schmidt > > PR target/65546 > * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Disable check > for "vectorization not profitable" when the target supports > misaligned loads and stores. > > > Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c > === > --- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c > (revision 232890) > +++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c > (working copy) > @@ -46,5 +46,5 @@ int main (void) >return main1 (); > } > > -/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" > } } */ > +/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" > { target { ! vect_hw_misalign } } } } */ > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target > { ! vect_hw_misalign } } } } */ > >
[PATCH, rs6000] Fix PR65546
Hi, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65546 identifies a failure in gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c. The test case hasn't kept up with changes in the vectorizer, so it's looking for the wrong error message. Also, the error message should be conditioned by a check for support of unaligned memory accesses. This patch corrects these problems. For 4.9 and 5, the error message needs to be similarly changed. However, for these earlier releases, the check for misalignment support doesn't apply. Verified on powerpc64le-unknown-linux-gnu for both -mcpu=power7 and -mcpu=power8, which differ in their support for misalignment. Is this ok for trunk? Provided verification succeeds on 4.9 and 5, is the revised test ok for those releases? Thanks, Bill 2016-01-28 Bill Schmidt PR target/65546 * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Correct condition being checked, and disable it when the target supports misaligned loads and stores. Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c === --- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c (revision 232890) +++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c (working copy) @@ -46,5 +46,5 @@ int main (void) return main1 (); } -/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "not vectorized: unsupported unaligned store" 1 "vect" { target { ! vect_hw_misalign } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { ! vect_hw_misalign } } } } */
[wwwdocs] Add more PowerPC information to gcc-6/changes.html
Hi, The following was applied to the website to record additional GCC 6 changes for PowerPC. The changes passed XHTML verification. Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.54 diff -r1.54 changes.html 361a362,434 > PowerPC64 now supports IEEE 128-bit floating-point using the > __float128 data type. In GCC 6, this is NOT enabled by default, > but you can enable it with -mfloat128. The IEEE 128-bit > floating-point support requires the use of the VSX instruction > set. IEEE 128-bit floating-point values are passed and returned > as a single vector value. The software emulator for IEEE 128-bit > floating-point support is only built on PowerPC Linux systems > where the default cpu is at least power7. On future ISA 3.0 > systems (power9 and later), you will be able to use the > -mfloat128-hardware option to use the ISA 3.0 instructions > that support IEEE 128-bit floating-point. An additional type > (__ibm128) has been added to refer to the IBM extended double > type that normally implements long double. This will allow > for a future transition to implementing long double with IEEE > 128-bit floating-point. > Basic support has been added for POWER9 hardware that will use the > recently published OpenPOWER ISA 3.0 instructions. The following > new switches are available: > > -mcpu=power9: Implement all of the ISA 3.0 > instructions supported by the compiler. > -mtune=power9: In the future, apply tuning for > POWER9 systems. Currently, POWER8 tunings are used. > -mmodulo: Generate code using the ISA 3.0 > integer instructions (modulus, count trailing zeros, array > index support, integer multiply/add). > -mpower9-fusion: Generate code to suitably fuse > instruction sequences for a POWER9 system. > -mpower9-dform: Generate code to use the new D-form > (register +offset) memory instructions for the vector > registers. > -mpower9-vector: Generate code using the new ISA > 3.0 vector (VSX or Altivec) instructions. > -mpower9-minmax: Reserved for future development. > > -mtoc-fusion: Keep TOC entries together to provide > more fusion opportunities. > > New constraints have been added to support IEEE 128-bit > floating-point and ISA 3.0 instructions: > > wb: Altivec register if -mpower9-dform is > enabled. > we: VSX register if -mpower9-vector is enabled > for 64-bit code generation. > wo: VSX register if -mpower9-vector is > enabled. > wp: Reserved for future use if long double > is implemented with IEEE 128-bit floating-point instead > of IBM extended double. > wq: VSX register if -mfloat128 is enabled. > wF: Memory operand suitable for POWER9 fusion > load/store. > wG: Memory operand suitable for TOC fusion memory > references. > wL: Integer constant identifying the element > number mfvsrld accesses within a vector. > > Support has been added for __builtin_cpu_is () and > __builtin_cpu_supports (), allowing for very fast access to > AT_PLATFORM, AT_HWCAP, and AT_HWCAP2 values. This requires > use of glibc 2.23 or later. > All hardware transactional memory builtins now correctly > behave as memory barriers. Programmers can use #ifdef __TM_FENCE__ > to determine whether their "old" compiler treats the builtins > as barriers. > Split-stack support has been added for gccgo on PowerPC64 > for both big- and little-endian (but NOT for 32-bit). The gold > linker from at least binutils 2.25.1 must be available in the PATH > when configuring and building gccgo to enable split stack. (The > requirement for binutils 2.25.1 applies to PowerPC64 only.) The > split-stack feature allows a small initial stack size to be > allocated for each goroutine, which increases as needed.
Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st
Wow, that's pretty bad; obviously a pasto. Thanks for pointing it out! I'm really surprised this has survived this long, but that may be a comment on how much lvxl is used. I'll get this fixed asap. Thanks, Bill On Tue, 2016-02-09 at 18:25 +0100, Ulrich Weigand wrote: > Hi Bill, > > > 2014-02-20 Bill Schmidt > > > > * config/rs6000/altivec.md (altivec_lvxl): Rename as > > *altivec_lvxl__internal and use VM2 iterator instead of > > V4SI. > > (altivec_lvxl_): New define_expand incorporating > > -maltivec=be semantics where needed. > > I just noticed that this: > > > -(define_insn "altivec_lvxl" > > +(define_expand "altivec_lvxl_" > >[(parallel > > -[(set (match_operand:V4SI 0 "register_operand" "=v") > > - (match_operand:V4SI 1 "memory_operand" "Z")) > > +[(set (match_operand:VM2 0 "register_operand" "=v") > > + (match_operand:VM2 1 "memory_operand" "Z")) > > (unspec [(const_int 0)] UNSPEC_SET_VSCR)])] > >"TARGET_ALTIVEC" > > - "lvxl %0,%y1" > > +{ > > + if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG) > > +{ > > + altivec_expand_lvx_be (operands[0], operands[1], mode, > > UNSPEC_SET_VSCR); > > + DONE; > > +} > > +}) > > + > > +(define_insn "*altivec_lvxl__internal" > > + [(parallel > > +[(set (match_operand:VM2 0 "register_operand" "=v") > > + (match_operand:VM2 1 "memory_operand" "Z")) > > + (unspec [(const_int 0)] UNSPEC_SET_VSCR)])] > > + "TARGET_ALTIVEC" > > + "lvx %0,%y1" > >[(set_attr "type" "vecload")]) > > now causes vec_ldl to emit the lvx instead of the lvxl instruction. > I assume this was not actually intended? > > Bye, > Ulrich >
[PATCH, rs6000] Fix pasto resulting in wrong instruction from builtins for lvxl
Hi, During the little-endian vector modification work in 2014, I accidentally introduced an error that Uli Weigand noticed this week. This results in wrong code being generated for the __builtin_altivec_lvxl and vec_lvxl interfaces; an "lvx" instruction is generated instead of an "lvxl" instruction. Now, this is only a performance issue, since the error just means a cache hint is not being generated. However, it needs to be corrected, as below. This brings up a point that, though we have many test cases for the altivec built-ins, they only test that well-formed built-ins are accepted. None of them test the actual code generation. I don't intend to fix that here, but at some point we should do better with this. We should definitely do better with future built-ins that we add. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. I hand-checked some of the existing test cases that invoke __builtin_altivec_lvxl and vec_lvxl to verify correct code gen. Is this okay for trunk? I would also like to backport this to GCC 5 and 4.9 if that's acceptable. Thanks, Bill 2016-02-16 Bill Schmidt * config/rs6000/altivec.md (*altivec_lvxl__internal): Output correct instruction. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 233466) +++ gcc/config/rs6000/altivec.md(working copy) @@ -2511,7 +2511,7 @@ (match_operand:VM2 1 "memory_operand" "Z")) (unspec [(const_int 0)] UNSPEC_SET_VSCR)])] "TARGET_ALTIVEC" - "lvx %0,%y1" + "lvxl %0,%y1" [(set_attr "type" "vecload")]) (define_expand "altivec_lvx_"
Re: [PATCH, rs6000] Fix pasto resulting in wrong instruction from builtins for lvxl
On Tue, 2016-02-16 at 11:40 -0800, David Edelsohn wrote: > This is okay, but how about starting with a testcase for this? That's fine. I'll make it generic enough that we can add to it later, then. Bill > > Thanks David > > On Feb 16, 2016 11:37 AM, "Bill Schmidt" > wrote: > Hi, > > During the little-endian vector modification work in 2014, I > accidentally introduced an error that Uli Weigand noticed this > week. > This results in wrong code being generated for the > __builtin_altivec_lvxl and vec_lvxl interfaces; an "lvx" > instruction is > generated instead of an "lvxl" instruction. Now, this is only > a > performance issue, since the error just means a cache hint is > not being > generated. However, it needs to be corrected, as below. > > This brings up a point that, though we have many test cases > for the > altivec built-ins, they only test that well-formed built-ins > are > accepted. None of them test the actual code generation. I > don't intend > to fix that here, but at some point we should do better with > this. We > should definitely do better with future built-ins that we add. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with > no > regressions. I hand-checked some of the existing test cases > that invoke > __builtin_altivec_lvxl and vec_lvxl to verify correct code > gen. Is this > okay for trunk? I would also like to backport this to GCC 5 > and 4.9 if > that's acceptable. > > Thanks, > Bill > > > 2016-02-16 Bill Schmidt > > * config/rs6000/altivec.md > (*altivec_lvxl__internal): Output > correct instruction. > > > Index: gcc/config/rs6000/altivec.md > === > --- gcc/config/rs6000/altivec.md(revision 233466) > +++ gcc/config/rs6000/altivec.md(working copy) > @@ -2511,7 +2511,7 @@ > (match_operand:VM2 1 "memory_operand" "Z")) > (unspec [(const_int 0)] UNSPEC_SET_VSCR)])] >"TARGET_ALTIVEC" > - "lvx %0,%y1" > + "lvxl %0,%y1" >[(set_attr "type" "vecload")]) > > (define_expand "altivec_lvx_" > >
Re: [PATCH, rs6000] Fix pasto resulting in wrong instruction from builtins for lvxl
On Tue, 2016-02-16 at 11:40 -0800, David Edelsohn wrote: > This is okay, but how about starting with a testcase for this? Fair enough. Here's the revised patch with a test, which I've verified on powerpc64-unknown-linux-gnu. Ok to proceed? Thanks! Bill [gcc] 2016-02-16 Bill Schmidt * config/rs6000/altivec.md (*altivec_lvxl__internal): Output correct instruction. [gcc/testsuite] 2012-02-16 Bill Schmidt * gcc.target/powerpc/vec-cg.c: New test. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 233466) +++ gcc/config/rs6000/altivec.md(working copy) @@ -2511,7 +2511,7 @@ (match_operand:VM2 1 "memory_operand" "Z")) (unspec [(const_int 0)] UNSPEC_SET_VSCR)])] "TARGET_ALTIVEC" - "lvx %0,%y1" + "lvxl %0,%y1" [(set_attr "type" "vecload")]) (define_expand "altivec_lvx_" Index: gcc/testsuite/gcc.target/powerpc/vec-cg.c === --- gcc/testsuite/gcc.target/powerpc/vec-cg.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-cg.c (working copy) @@ -0,0 +1,22 @@ +/* Test code generation of vector built-ins. We don't have this for + most of ours today. As new built-ins are added, please add to this + test case. Update as necessary to add VSX, P8-vector, P9-vector, + etc. */ + +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-options "-maltivec -O0" } */ + +#include + +static vector signed int i, *pi; +static int int1; + +void +b() +{ + i = __builtin_altivec_lvxl (int1, pi); + i = vec_lvxl (int1, pi); +} + +/* { dg-final { scan-assembler-times "lvxl" 2 } } */
Re: [PATCH, rs6000] Fix PR63354
Hi Andreas, Sorry I haven't responded sooner; I was on vacation and have been unpiling things since then. The test case had already been updated since the patch you cited, adding /* { dg-require-effective-target powerpc64 } */ Is this the version you're testing with? Thanks, Bill On Sat, 2016-02-06 at 21:35 +0100, Andreas Schwab wrote: > Bill Schmidt writes: > > > Index: gcc/testsuite/gcc.target/powerpc/pr63354.c > > === > > --- gcc/testsuite/gcc.target/powerpc/pr63354.c (revision 0) > > +++ gcc/testsuite/gcc.target/powerpc/pr63354.c (working copy) > > @@ -0,0 +1,11 @@ > > +/* Verify that we don't stack a frame for leaf functions when using > > + -pg -mprofile-kernel. */ > > + > > +/* { dg-do compile { target { powerpc64*-*-* } } } */ > > +/* { dg-options "-O2 -pg -mprofile-kernel" } */ > > +/* { dg-final { scan-assembler-not "mtlr" } } */ > > + > > +int foo(void) > > +{ > > + return 1; > > +} > > With -m32: > > FAIL: gcc.target/powerpc/pr63354.c (test for excess errors) > Excess errors: > /daten/gcc/gcc-20160205/gcc/testsuite/gcc.target/powerpc/pr63354.c:1:0: > error: -mprofile-kernel not supported in this configuration > > Andreas. >
[PATCH, rs6000] Fix PR61397 (test case update for P8 vector loads/stores)
Hi, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61397 was almost resolved a year ago, but had a proposed patch by Mike Meissner that was never vetted and committed. I've reviewed the patch and tested it on GCC 5 and GCC 6, and with the patch applied we see the test pass for both 32-bit and 64-bit on a Power8 big-endian platform, as well as for 64-bit on a Power8 little-endian platform. As I understand it, the test case got out of sync with the implementation in GCC 5, and this rewrite of the test case restores order. I've verified that the original compilation options from Andreas Schwab for this test case result in correct generation of lxsdx, which was not the case with the original report. The test case is extremely different in GCC 4.9. As Mike has noted in the PR, the -mupper-regs support does not exist in GCC 4.9, so the rewritten test case does not apply there. As stated, verified on powerpc64-unknown-linux-gnu (-m32, -m64) and powerpc64le-unknown-linux-gnu (-m64). Is this ok for trunk and GCC 5? Thanks, Bill 2016-02-26 Michael Meissner Bill Schmidt * gcc.target/powerpc/p8vector-ldst.c: Adjust to test desired functionality for both 32-bit and 64-bit. --- gcc/testsuite/gcc.target/powerpc/p8vector-ldst.c(revision 220948) +++ gcc/testsuite/gcc.target/powerpc/p8vector-ldst.c(working copy) @@ -1,4 +1,4 @@ -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ @@ -51,13 +51,14 @@ load_store_sf (unsigned long num, float value37= 0.0f; float value38= 0.0f; float value39= 0.0f; - unsigned long in_mask; - unsigned long out_mask; + unsigned long in_mask, in_mask2; + unsigned long out_mask, out_mask2; unsigned long i; for (i = 0; i < num; i++) { in_mask = *in_mask_ptr++; + in_mask2 = *in_mask_ptr++; if ((in_mask & (1L << 0)) != 0L) value00 = *from_ptr++; @@ -118,67 +119,68 @@ load_store_sf (unsigned long num, if ((in_mask & (1L << 19)) != 0L) value19 = *from_ptr++; - if ((in_mask & (1L << 20)) != 0L) + if ((in_mask2 & (1L << 0)) != 0L) value20 = *from_ptr++; - if ((in_mask & (1L << 21)) != 0L) + if ((in_mask2 & (1L << 1)) != 0L) value21 = *from_ptr++; - if ((in_mask & (1L << 22)) != 0L) + if ((in_mask2 & (1L << 2)) != 0L) value22 = *from_ptr++; - if ((in_mask & (1L << 23)) != 0L) + if ((in_mask2 & (1L << 3)) != 0L) value23 = *from_ptr++; - if ((in_mask & (1L << 24)) != 0L) + if ((in_mask2 & (1L << 4)) != 0L) value24 = *from_ptr++; - if ((in_mask & (1L << 25)) != 0L) + if ((in_mask2 & (1L << 5)) != 0L) value25 = *from_ptr++; - if ((in_mask & (1L << 26)) != 0L) + if ((in_mask2 & (1L << 6)) != 0L) value26 = *from_ptr++; - if ((in_mask & (1L << 27)) != 0L) + if ((in_mask2 & (1L << 7)) != 0L) value27 = *from_ptr++; - if ((in_mask & (1L << 28)) != 0L) + if ((in_mask2 & (1L << 8)) != 0L) value28 = *from_ptr++; - if ((in_mask & (1L << 29)) != 0L) + if ((in_mask2 & (1L << 9)) != 0L) value29 = *from_ptr++; - if ((in_mask & (1L << 30)) != 0L) + if ((in_mask2 & (1L << 10)) != 0L) value30 = *from_ptr++; - if ((in_mask & (1L << 31)) != 0L) + if ((in_mask2 & (1L << 11)) != 0L) value31 = *from_ptr++; - if ((in_mask & (1L << 32)) != 0L) + if ((in_mask2 & (1L << 12)) != 0L) value32 = *from_ptr++; - if ((in_mask & (1L << 33)) != 0L) + if ((in_mask2 & (1L << 13)) != 0L) value33 = *from_ptr++; - if ((in_mask & (1L << 34)) != 0L) + if ((in_mask2 & (1L << 14)) != 0L) value34 = *from_ptr++; - if ((in_mask & (1L << 35)) != 0L) + if ((in_mask2 & (1L << 15)) != 0L) value35 = *from_ptr++; - if ((in_mask & (1L << 36)) != 0L) + if ((in_mask2 & (1L << 16)) != 0L) value36 = *from_ptr++; - if ((in_mask & (1L << 37)) != 0L) + if ((in_mask2 & (1L << 17)) != 0L) value37 = *from_ptr++; - if ((in_mask & (1L << 38)) != 0L) + if ((in_mask2 & (1L << 18)) != 0L) value38 = *from
Re: [PATCH] Avoid 1x vectors in tree-vect-generic (PR rtl-optimization/69896)
Also tested with powerpc64le-unknown-linux-gnu native bootstrap with no regressions, where it fixes this bug as well as the (reopened) PR69613. Bill On Sat, 2016-02-27 at 00:04 +0100, Jakub Jelinek wrote: > Hi! > > On ppc64, the widest (and only) supported vector mode for __int128 > element type is V1TImode, and there is a V1TImode or opcode > and a couple of others, but IMNSHO it is highly undesirable to lower > BLKmode (say 2xTI, 4xTI etc.) generic vectors to V1TI instead of > TI, there are no advantages in doing that and apparently lots of various > bugs (the PR contains a WIP partial patch to fix some of them, but that is > just a tip of an iceberg, apparently lots of the folding etc. code > is returning sometimes a 1x vector type when it should be returning the > element type or vice versa, if CTOR contains 1x VECTOR_CSTs, there are > issues too etc. > > So while I think it is desirable to fix all those V1?? handling issues > eventually, IMHO it is right to also just use element type instead > of 1x vectors during the generic vector lowering. > > Bootstrapped/regtested on x86_64-linux and i686-linux and tested on the > testcase using powerpc64le-linux cross, ok for trunk? > > 2016-02-26 Jakub Jelinek > > PR rtl-optimization/69896 > * tree-vect-generic.c (get_compute_type): Avoid single element > vector types. > > --- gcc/tree-vect-generic.c.jj2016-01-04 14:55:52.0 +0100 > +++ gcc/tree-vect-generic.c 2016-02-26 21:11:36.694482256 +0100 > @@ -1405,6 +1405,7 @@ get_compute_type (enum tree_code code, o >if (vector_compute_type != NULL_TREE > && (TYPE_VECTOR_SUBPARTS (vector_compute_type) > < TYPE_VECTOR_SUBPARTS (compute_type)) > + && TYPE_VECTOR_SUBPARTS (vector_compute_type) > 1 > && (optab_handler (op, TYPE_MODE (vector_compute_type)) > != CODE_FOR_nothing)) > compute_type = vector_compute_type; > > Jakub >