On Thu, Jun 11, 2015 at 03:08:59PM +0100, Richard Biener wrote: > On Thu, 11 Jun 2015, Uros Bizjak wrote: > > > > So this turned up other issues thus the following is what I have > > > committed after bootstrapping and testing on x86_64-unknown-linux-gnu. > > > > > > Richard. > > > > > > 2015-06-08 Richard Biener <rguent...@suse.de> > > > > > > * tree-vect-stmts.c (vectorizable_load): Compute the pointer > > > adjustment for gaps at the end of a SLP load group properly. > > > * tree-vect-slp.c (vect_supported_load_permutation_p): Allow > > > all permutations we can generate. > > > (vect_transform_slp_perm_load): Use the correct group-size. > > > > > > * gcc.dg/vect/slp-perm-10.c: New testcase. > > > * gcc.dg/vect/slp-23.c: Adjust. > > > * gcc.dg/torture/pr53366-2.c: Also verify cross-iteration vector pointer > > > update. > > > > This patch caused: > > > > FAIL: gcc.target/i386/pr61403.c scan-assembler blend > > Yeah, I noticed. We now want to vectorize this differently but > fail due to the cost model. I'm working on enhancing the vectorizer > here.
It also caused an ICE in the ARM port (arm-none-eabi, arm-none-linux-gnueabihf): FAIL: gcc.target/arm/pr53636.c (internal compiler error) Full ICE text below, and reduced testcase attached, compile with: arm-none-eabi-gcc -O -ftree-vectorize -mfpu=neon -mcpu=cortex-a9 bug.c I tried to take a look to see what was happening, but I couldn't see the root of the problem. The access to dr_chain in vect_create_mask_and_perm: second_vec = dr_chain[second_vec_indx]; Fails as dr_chain has length 1, and second_vec_indx is 2. I think that the mask that the code is trying to produce is { 1, 2, 3, 4 }. bug.c:4:3: note: add new stmt: vect__8.6_108 = VEC_PERM_EXPR <vect__8.4_104, vect__8.5_106, { 1, 2, 3, 4 }>; But that's about as far as I got. Thanks, James --- bug.c: In function 'test': bug.c:1:6: internal compiler error: in operator[], at vec.h:738 void test(unsigned char *dst) { ^ 0xd759fe vec<tree_node*, va_heap, vl_embed>::operator[](unsigned int) .../src/gcc/gcc/vec.h:738 0xd759fe vec<tree_node*, va_heap, vl_ptr>::operator[](unsigned int) .../src/gcc/gcc/vec.h:1204 0xd759fe vect_create_mask_and_perm .../src/gcc/gcc/tree-vect-slp.c:3072 0xd759fe vect_transform_slp_perm_load(_slp_tree*, vec<tree_node*, va_heap, vl_ptr>, gimple_stmt_iterator*, int, _slp_instance*, bool) .../src/gcc/gcc/tree-vect-slp.c:3350 0xd51613 vectorizable_load .../src/gcc/gcc/tree-vect-stmts.c:6847 0xd57ad2 vect_transform_stmt(gimple_statement_base*, gimple_stmt_iterator*, bool*, _slp_tree*, _slp_instance*) .../src/gcc/gcc/tree-vect-stmts.c:7490 0xd7aac1 vect_schedule_slp_instance .../src/gcc/gcc/tree-vect-slp.c:3500 0xd7a117 vect_schedule_slp_instance .../src/gcc/gcc/tree-vect-slp.c:3381 0xd7a117 vect_schedule_slp_instance .../src/gcc/gcc/tree-vect-slp.c:3381 0xd7a117 vect_schedule_slp_instance .../src/gcc/gcc/tree-vect-slp.c:3381 0xd7a117 vect_schedule_slp_instance .../src/gcc/gcc/tree-vect-slp.c:3381 0xd7a117 vect_schedule_slp_instance .../src/gcc/gcc/tree-vect-slp.c:3381 0xd7a117 vect_schedule_slp_instance .../src/gcc/gcc/tree-vect-slp.c:3381 0xd7a117 vect_schedule_slp_instance .../src/gcc/gcc/tree-vect-slp.c:3381 0xd7abce vect_schedule_slp(_loop_vec_info*, _bb_vec_info*) .../src/gcc/gcc/tree-vect-slp.c:3570 0xd5e564 vect_transform_loop(_loop_vec_info*) .../src/gcc/gcc/tree-vect-loop.c:6223 0xd7eca8 vectorize_loops() .../src/gcc/gcc/tree-vectorizer.c:499 0xc88c54 execute .../src/gcc/gcc/tree-ssa-loop.c:292 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions.
void test(unsigned char *dst) { short tmp[11 * 8], *tptr; int i; for (i = 0; i < 8; i++) { dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) >> 7; dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) >> 7; dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) >> 7; dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) >> 7; dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) >> 7; dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) >> 7; dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) >> 7; dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) >> 7; dst += 8; tptr += 11; } }