> It was because I had decided to expose the registers as %al, %ah, > ... %bl, %bh, ... instead of the customary %[e]ax and friends.
I originally did this for the m32c port (which has hi/low pairs like the i386) but discovered that reload always allocates registers in UNITS_PER_WORD chunks, and move-by-pieces uses UNITS_PER_WORD chunks, so if you have 8 bit registers you end up with 8 bit moves all over the place. If you have 8 bit registers and 16 bit moves, reload counts wrong. I ended up switching to the word-sized register model that i386 currently uses, even though it meant worse code generation. I seem to recall ranting about it at the time, too. UNITS_PER_WORD must die! The m32c has four 8 bit registers, two 16 bit registers, and five 24 bit registers. They can be combined to form 8, 16, 24, 32, 48, and 64 bit registers. GCC has no way of expressing that.