https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #14 from sergey.shalnov at intel dot com ---
" we have a basic-block vectorizer. Do you propose to remove it? "
Definitely not! SLP vectorizer is very good to have!
“What's the rationale for not using vector register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #16 from sergey.shalnov at intel dot com ---
«it's one vec_construct operation - it's the task of the target to turn this
into a cost comparable to vector_store»
I agree that vec_construct operation cost is based on the t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #18 from sergey.shalnov at intel dot com ---
Yes, I agree that vector_store stage has it’s own vectorization cost.
And each vector_store has vector_construction stage. These stages are different
in gcc slp (as you know).
To better
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #20 from sergey.shalnov at intel dot com ---
Richard,
I did quick static analysis for your latest patch.
Using command line “-g -Ofast -mfpmath=sse -funroll-loops -march=znver1” your
latest patch
doesn’t affects the issue I discussed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #21 from sergey.shalnov at intel dot com ---
Thanks Richard for your comments.
Based on our discussion I've produced the patch attached and
run it on SPEC2017intrate/fprate on skylake server (with [-Ofast -flto
-march=skylake-a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #24 from sergey.shalnov at intel dot com ---
Richard,
The latest "SLP costing for constants/externs improvement" patch generates the
same code as baseline for the test example.
Are you sure that "num_vects_to_che
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #26 from sergey.shalnov at intel dot com ---
Sorry, did you meant "arm_sve.h" on ARM?
In this case we have machine specific code in common part of the gcc code.
Should we make it as machine dependent callback function beca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #28 from sergey.shalnov at intel dot com ---
Richard,
Thank you for your comments.
I see that TYPE_VECTOR_SUBPARTS is constant for for the test case but
multiple_p (group_size, const_nunits) returns 1 in the code:
if
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #29 from sergey.shalnov at intel dot com ---
Richard,
Thank you for your latest patch. I would like to clarify
the multiple_p() function usage in if() clause.
First of all, I assume that architectures with fixed
size of HW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #31 from sergey.shalnov at intel dot com ---
Richard,
Thank you for your latest patch. This patch is exactly that
I’ve discussed in this issue request.
I tested it with SPEC20[06|17] and see no performance/stability degradation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #33 from sergey.shalnov at intel dot com ---
Richard,
I'm not sure is it a regression or not. I see code has been visibly refactored
in this commit
https://github.com/gcc-mirror/gcc/commit/ee6e9ba576099aed29f1097195c649fc796ecf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #36 from sergey.shalnov at intel dot com ---
The patch fixes the issue for SKX is in
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00405.html
I will close the PR after the patch has been merged.
Thank you very much for all involved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
sergey.shalnov at intel dot com changed:
What|Removed |Added
Status|NEW |RESOLVED
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: sergey.shalnov at intel dot com
Target Milestone: ---
I found extra instruction generated on x86_64 platform. I have no performance
data to prove performance gap for this but other
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: sergey.shalnov at intel dot com
Target Milestone: ---
I found strange code generated by GCC-8.0/7.x with following command line
options:
-g -Ofast -march=skylake
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #1 from sergey.shalnov at intel dot com ---
Created attachment 42616
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42616&action=edit
reproducer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65930
sergey.shalnov at intel dot com changed:
What|Removed |Added
CC||sergey.shalnov at intel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #5 from sergey.shalnov at intel dot com ---
(In reply to Richard Biener from comment #2)
> The strange code is because we perform basic-block vectorization resulting in
>
> vect_cst__249 = {_251, _251, _251, _251, _334, _
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #6 from sergey.shalnov at intel dot com ---
I found the issue request related to the vactorization issues in second loop
(reduction uint->int).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65930
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #8 from sergey.shalnov at intel dot com ---
Richard,
This is great changes and I see the first loop became vectorized for the test
example I provided with gcc-8.0 main trunk.
But I think the issue a bit more complicated. Vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #9 from sergey.shalnov at intel dot com ---
Created attachment 42813
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42813&action=edit
New reproducer
Slightly changed first loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #11 from sergey.shalnov at intel dot com ---
Richard,
“Is this about the "stupid" attempt to use as little AVX512 as possible”
No, it is not.
I provided asm listing at the beginning with zmm only to illustrate the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #12 from sergey.shalnov at intel dot com ---
Richard,
Your last proposal changed the code generated a bit.
Currently is shows:
test_bugzilla1.c:6:5: note: Cost model analysis:.
Vector inside of loop cost: 62576
Vector prologue
23 matches
Mail list logo