Hi Juzhe, I would find it a bit clearer if the prepare_ternay part were a separate patch. As it's mostly mechanical replacements I don't mind too much, though so it's LGTM from my side without that.
As to the lmul = 8 ICE, is the problem that the register allocator would actually need 5 "registers" when doing the merge by itself and we only have 4? Regards Robin