Nicholas Vinen schrieb:
Georg-Johann Lay wrote:
If you are sure it is really some performance issue/regression and not
due to some language standard implication, you can add a report to
http://sourceforge.net/tracker/?group_id=68108
so that the subject won't be forgotten. Also mind
http://gcc.gnu.org/bugs.html
And of course, you can ask questions here. In that case it is helpful
if you can manage to simplify the source to a small piece of code that
triggers the problem and allows others to reproduce the problem. (i.e.
no #include in the code, no ... (except for varargs), a.s.o).
Snippets of .s may point to the problem when you add -dp -fverbose-asm
And there are lots of places where avr-gcc produces suboptimal or even
bad code, so feedback is welcome.
But note that just a few guys are working on the AVR part of gcc.
I would do more if I had the time (and the support of some gurus to
ask questions on internals then and when...)
OK, I only spent a few minutes looking at old code and I found some
obviously sub-optimal results. It distills down to this:
#include <avr/io.h>
int main(void) {
unsigned long packet = 0;
while(1) {
if( !(PINC & _BV(PC2)) ) {
packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
}
PORTB = packet;
}
}
avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3
The avr/io stuff is just so that it won't optimise the code away to nothing.
Please avoid the #include stuff. You can use source like this:
#define PINC (*((unsigned char volatile*) 0x20))
#define PORTB (*((unsigned char volatile*) 0x21))
void foo ()
{
unsigned long packet = 0;
while(1)
{
if (!(PINC & (1 << 2)))
{
packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
}
PORTB = packet;
}
}
I tried compiling it with both -Os and -O2:
avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c
The result includes this:
lsl r18 ; packet ; 50 *ashlsi3_const/2 [length = 4]
rol r19 ; packet
rol r20 ; packet
rol r21 ; packet
in r24,38-0x20 ; D.1214, ; 16 *movqi/4 [length = 1]
lsr r24 ; D.1214 ; 17 lshrqi3/3 [length = 1]
ldi r25,lo8(0) ; , ; 48 *movqi/2 [length = 1]
ldi r26,lo8(0) ; , ; 46 *movhi/4 [length = 2]
ldi r27,hi8(0) ; ,
andi r24,lo8(1) ; tmp52, ; 19 andsi3/2 [length = 4]
andi r25,hi8(1) ; tmp52,
andi r26,hlo8(1) ; tmp52,
andi r27,hhi8(1) ; tmp52,
or r18,r24 ; packet, tmp52 ; 20 iorsi3/1 [length
= 4]
or r19,r25 ; packet, tmp52
or r20,r26 ; packet, tmp52
or r21,r27 ; packet, tmp52
The problem, it seems, is that the compiler doesn't realize that the
right hand side of the expression can only have any non-zero values in
the bottom 8 bits, since it's an unsigned char which is being implicitly
expanded to 32 bits for the or operation. In fact, it's only the bottom
bit that's ever non-zero. As a result it's spending a number of cycles
and registers doing useless things. I'll copy a report to the locations
you mention in your e-mail.
Note that this may partially be covered by report 145284 (which I cannot
find, maybe Eric has closed/removed it)
I already filed a patch for that in
http://lists.gnu.org/archive/html/avr-gcc-list/2008-12/msg00019.html
that covers your issue to some extent or maybe almost complete: The new
pattern "*ior<MODE>2_<MODE>bit0" would match some parts of your code.
There are probably ways to work around this, such as making "packet" a
union of an unsigned char and a long, then shifting the long and only
ORing in the unsigned char. I'll note that there's also an optimization
to be had with the right hand side of the expression. I would write the
assembly something like this:
> lsl r18
> rol r19
> rol r20
> rol r21
> in r24,38-0x20
> bst r24, 1
> bld r18, 0
The result of the above patch should lead to something like
lsl r18
rol r19
rol r20
rol r21
in r24,38-0x20
bst r24, 1
sbrs r18, 0
bld r18, 0
The SBRS is necessary, because the pattern is not aware of the fact that
r18.0 is 0. Maybe the optimization is even better (or waeker); I am not
using that patch at the moment and can just estimate its effect when
peeking into rtl dumps of an unpatched gcc.
Concerning the patch itself, I don't know anything about its fate and if
it will ever make its way into gcc because of administrative obstacles
and the technique I used.
I don't like the technique I am using because it leads to complex
patterns that are hard to understand an test and will become useless if
the middleend decides to represent the stuff in a slightly different way...
Georg-Johann
_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list