On Thu, Jun 11, 2015 at 03:08:59PM +0100, Richard Biener wrote:
> On Thu, 11 Jun 2015, Uros Bizjak wrote:
> 
> > > So this turned up other issues thus the following is what I have
> > > committed after bootstrapping and testing on x86_64-unknown-linux-gnu.
> > >
> > > Richard.
> > >
> > > 2015-06-08  Richard Biener  <rguent...@suse.de>
> > >
> > > * tree-vect-stmts.c (vectorizable_load): Compute the pointer
> > > adjustment for gaps at the end of a SLP load group properly.
> > > * tree-vect-slp.c (vect_supported_load_permutation_p): Allow
> > > all permutations we can generate.
> > > (vect_transform_slp_perm_load): Use the correct group-size.
> > >
> > > * gcc.dg/vect/slp-perm-10.c: New testcase.
> > > * gcc.dg/vect/slp-23.c: Adjust.
> > > * gcc.dg/torture/pr53366-2.c: Also verify cross-iteration vector pointer 
> > > update.
> > 
> > This patch caused:
> > 
> > FAIL: gcc.target/i386/pr61403.c scan-assembler blend
> 
> Yeah, I noticed.  We now want to vectorize this differently but
> fail due to the cost model.  I'm working on enhancing the vectorizer
> here.

It also caused an ICE in the ARM port (arm-none-eabi,
arm-none-linux-gnueabihf):

    FAIL: gcc.target/arm/pr53636.c (internal compiler error)

Full ICE text below, and reduced testcase attached, compile with:

    arm-none-eabi-gcc -O -ftree-vectorize -mfpu=neon -mcpu=cortex-a9 bug.c

I tried to take a look to see what was happening, but I couldn't see
the root of the problem. The access to dr_chain in
vect_create_mask_and_perm:

    second_vec = dr_chain[second_vec_indx];

Fails as dr_chain has length 1, and second_vec_indx is 2.

I think that the mask that the code is trying to produce is { 1, 2, 3, 4 }.

    bug.c:4:3: note: add new stmt: vect__8.6_108 = VEC_PERM_EXPR 
<vect__8.4_104, vect__8.5_106, { 1, 2, 3, 4 }>;

But that's about as far as I got.

Thanks,
James

---
bug.c: In function 'test':
bug.c:1:6: internal compiler error: in operator[], at vec.h:738
 void test(unsigned char *dst) {
      ^
0xd759fe vec<tree_node*, va_heap, vl_embed>::operator[](unsigned int)
        .../src/gcc/gcc/vec.h:738
0xd759fe vec<tree_node*, va_heap, vl_ptr>::operator[](unsigned int)
        .../src/gcc/gcc/vec.h:1204
0xd759fe vect_create_mask_and_perm
        .../src/gcc/gcc/tree-vect-slp.c:3072
0xd759fe vect_transform_slp_perm_load(_slp_tree*, vec<tree_node*, va_heap, 
vl_ptr>, gimple_stmt_iterator*, int, _slp_instance*, bool)
        .../src/gcc/gcc/tree-vect-slp.c:3350
0xd51613 vectorizable_load
        .../src/gcc/gcc/tree-vect-stmts.c:6847
0xd57ad2 vect_transform_stmt(gimple_statement_base*, gimple_stmt_iterator*, 
bool*, _slp_tree*, _slp_instance*)
        .../src/gcc/gcc/tree-vect-stmts.c:7490
0xd7aac1 vect_schedule_slp_instance
        .../src/gcc/gcc/tree-vect-slp.c:3500
0xd7a117 vect_schedule_slp_instance
        .../src/gcc/gcc/tree-vect-slp.c:3381
0xd7a117 vect_schedule_slp_instance
        .../src/gcc/gcc/tree-vect-slp.c:3381
0xd7a117 vect_schedule_slp_instance
        .../src/gcc/gcc/tree-vect-slp.c:3381
0xd7a117 vect_schedule_slp_instance
        .../src/gcc/gcc/tree-vect-slp.c:3381
0xd7a117 vect_schedule_slp_instance
        .../src/gcc/gcc/tree-vect-slp.c:3381
0xd7a117 vect_schedule_slp_instance
        .../src/gcc/gcc/tree-vect-slp.c:3381
0xd7a117 vect_schedule_slp_instance
        .../src/gcc/gcc/tree-vect-slp.c:3381
0xd7abce vect_schedule_slp(_loop_vec_info*, _bb_vec_info*)
        .../src/gcc/gcc/tree-vect-slp.c:3570
0xd5e564 vect_transform_loop(_loop_vec_info*)
        .../src/gcc/gcc/tree-vect-loop.c:6223
0xd7eca8 vectorize_loops()
        .../src/gcc/gcc/tree-vectorizer.c:499
0xc88c54 execute
        .../src/gcc/gcc/tree-ssa-loop.c:292
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

void test(unsigned char *dst) {
  short tmp[11 * 8], *tptr;
  int i;
  for (i = 0; i < 8; i++)
  {
    dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) >> 7;
    dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) >> 7;
    dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) >> 7;
    dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) >> 7;
    dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) >> 7;
    dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) >> 7;
    dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) >> 7;
    dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) >> 7;
    dst += 8;
    tptr += 11;
  }
}

Reply via email to