It isn't only on the AVR that bswap_32() is nontrivial to get right. These two versions would rule on the i386 if GCC would be just a little bitsmarter:
I prefer the single instruction bswap that we now generate for __builtin_bswap[32,64] myself...
-eric