https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Jerry DeLisle changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Thomas Koenig changed:
What|Removed |Added
Status|NEW |WAITING
--- Comment #15 from Thomas Koen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #14 from Thomas Koenig ---
Question: Would it make sense to add an option so that only
matrices with size known at compile-time are inlined?
Somethin like
-finline-matmul-size-var=0 (to disable), -finline-matmul-size-fixed=5
(to inl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #13 from Jerry DeLisle ---
(In reply to Thomas Koenig from comment #12)
> (In reply to Jerry DeLisle from comment #11)
---snip--
>
> May I suggest reading the docs? ;-)
>
--- snip ---
> The default value for N is the value sp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #12 from Thomas Koenig ---
(In reply to Jerry DeLisle from comment #11)
> I was experimenting some more here a few days ago. I really think that
> inlineing shold be disabled above some threshold. On larger arrays, the
> runtime li
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #11 from Jerry DeLisle ---
(In reply to Jerry DeLisle from comment #8)
> Created attachment 36887 [details]
> A faster version
>
> I took the example code found in
> http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/ where the registe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #10 from Dominique d'Humieres ---
> I think you are seeing the effects of inefficiencies of assumed-shape arrays.
>
> If you want to use matmul on very small matrix sizes, it is best to
> use fixed-size explicit arrays.
Then IMO the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #9 from Thomas Koenig ---
> I took the example code found in
> http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/ where the register based
> vector computations are explicitly called via the SSE registers and
> converted it to use the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #8 from Jerry DeLisle ---
Created attachment 36887
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36887&action=edit
A faster version
I took the example code found in
http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/ where the r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Joost VandeVondele changed:
What|Removed |Added
CC||Joost.VandeVondele at mat dot
ethz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Dominique d'Humieres changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #5 from Thomas Koenig ---
Another interesting data point. I deleted the DGEMM implementation from
the file and linked against the serial version of openblas. OK,
openblas is based on GOTO blas, so we have to expect a hit
for large ma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Jerry DeLisle changed:
What|Removed |Added
CC||jvdelisle at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #3 from Thomas Koenig ---
Created attachment 36868
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36868&action=edit
Modified benchmark (really this time)
Hi Dominique,
I think you are seeing the effects of inefficiencies of as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Thomas Koenig changed:
What|Removed |Added
CC||tkoenig at gcc dot gnu.org
--- Comment #
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
--- Comment #1 from Dominique d'Humieres ---
Created attachment 36864
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36864&action=edit
Code used for the timings
16 matches
Mail list logo