http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52941
--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-04-16 09:14:57 UTC --- Created attachment 27164 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27164 WIP patch The attached patch adds support for movco.l/movli.l insns on SH4A for -msoft-atomic. It also adds a new option -mhard-atomic. For SImode all hard-atomic patterns should be working. I've started implementing some of the QImode and HImode hard-atomic patterns (atomic_{ior|xor|and|add|sub}_fetch{hi|qi}_hard so far) to see how it would turn out. I'm currently using a 4 byte lookup table to get the endian dependent bit positions for the shift insns which are required to extract/insert the subwords. The HImode variants could also be done without the LUT, but I didn't want to introduce a special case for that. Ideally, the the LUT would go into the constant pool, which would allow it to be shared among multiple atomic insns and also would eliminate the need to branch around it after the atomic insn. However, I don't know how to reliably get the the address of a constant in the constant pool (by using the mova insn). The atomic_{ior|xor|and|add|sub}_fetch{hi|qi}_hard patterns in the patch seem to be working OK, but the atomic sequence code turns out rather big. The address calculation code could be moved out of the atomic insns so that it could be CSE'd, but I guess that it would most likely increase register pressure. The extu.{b|w} insn in the sequnces can definitely be done before that in a separate insn, so that it can be eliminated by other passes. Still, because of the code size for HImode / QImode hard atomic sequences I think it would be better to also have a copy of them in libgcc and emit function calls when compiling with -Os. Feedback appreciated :)