http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295



--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> 2013-03-04 21:50:58 
UTC ---

(In reply to comment #2)

> +1

> 

> I'm seeing the same pattern.

> Infact, I'm noticing a lot of my maths code seems to be performing a lot of

> redundant moves.



Some examples would be great regarding this matter, although I can already

imagine what the code looks like.  One of the problems is the auto-inc-dec pass

(see PR 50749).  A long time ago the rule of thumb for SH4 programmers was

"read float values with post-inc addressing in your C code, and write float

values with pre-dec addressing".  This does not work anymore, since all memory

accesses are turned into array like index based addresses internally in the

compiler.  Then the auto-inc-dec RTL pass is supposed to find post-inc and

pre-dec addressing mode opportunities, but it fails to do so in most cases.

I have started writing a replacement RTL pass that would try to optimize

addressing mode selections.  I hope to get it in for GCC 4.9.



Anyway, if you have some example code that you can share, it would be really

appreciated and helpful during development for testing purposes.



> Are there actually any builtins/intrinsics available for the SH4?

> How do I access the awesome vector operations without breaking out the inline

> asm?



There aren't that many HW vector ops on SH4, just fipr and ftrv.  At the

moment, there are no builtins for those, so you'd have to use inline asm

intrinsics.  Like I mentioned in comment #1, I'd rather make the compiler

figure out opportunities from portable generic code.  Although for ftrv the

patterns might be a bit .... complicated, also because the compiler then has to

manage the 2nd FPU regs bank...



> It would be nice to have some intrinsics that understand vectors as sequences

> of 4 float regs, and automate a sequential (vector) load.



That would be the job of the address-mode-selection RTL pass.  It would also

improve overall code quality on SH.  The fastest way to load 4 float vectors is

to use 2x fmov.d.  The compiler could also do that automatically, but this

requires FPSCR switching, which unfortunately also needs some rework (e.g. see

PR 53513, PR 6526).



And on top of that, we also have PR 13423.  It seems that the proper fix for

this is a new reworked (vector) ABI for SH.



> 

> Also, the ftrv opcode doesn't seem to be accessible either.



True.  I really hope that I'll find enough time to brush up SH FPU code

generation for GCC 4.9.  Until then, I'd suggest to use inline-asm style

intrinsics.

Reply via email to