"H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: >> > + { >> > + /* First generate subreg of word mode if the previous mode is >> > + wider than word mode and word mode is wider than MODE. */ >> > + prev_rtx = simplify_gen_subreg (word_mode, prev_rtx, >> > + prev_mode, 0); >> > + prev_mode = word_mode; >> > + } >> > + if (prev_rtx != nullptr) >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0); >> >> This should be lowpart_subreg, since 0 isn't the right offset for >> big-endian targets. Using lowpart_subreg should also avoid the need >> for the word_size “if” above: lowpart_subreg can handle lowpart subword >> subregs of multiword values. > > I tried it. It didn't work since it caused the LRA failure. I replaced > simplify_gen_subreg with lowpart_subreg instead.
What specifically went wrong? >> > +/* Callback routine for store_by_pieces. Read GET_MODE_BITSIZE (MODE) >> > + bytes from constant string DATA + OFFSET and return it as target >> > + constant. If PREV isn't nullptr, it has the RTL info from the >> > + previous iteration. */ >> > + >> > +rtx >> > +builtin_memset_read_str (void *data, void *prev, >> > + HOST_WIDE_INT offset ATTRIBUTE_UNUSED, >> > + machine_mode mode) >> > +{ >> > + rtx target; >> > const char *c = (const char *) data; >> > - char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode)); >> > + char *p; >> > + unsigned int size = GET_MODE_SIZE (mode).to_constant (); >> > + >> > + /* Don't use the previous value if size is 1. */ >> >> Why not though? The existing code does, and that seems like the right >> thing to do when operating on integers. >> >> I can see the check would be a good thing to do if MODE isn't a vector >> mode and the previous mode was. Is that the case you're worried about? >> If so, that check could go in gen_memset_value_from_prev instead. > > We are storing one byte. Doing it directly is faster. But the first thing being protected here is… >> > + if (size != 1) >> > + { >> > + target = gen_memset_value_from_prev (prev, mode); >> > + if (target != nullptr) >> > + return target; …this attempt to use the previous value. If the target uses, say, SImode for the first piece and QImode for a final byte, using the QImode lowpart of the SImode register would avoid having to move the byte value into a separate QImode register. Why's that a bad thing to do? AFAICT it's what the current code would do, so if we want to change it even for integer modes, I think it should be a separate patch with a separate justification. Like I say, I can understand that using the QImode lowpart of a vector wouldn't be a good idea. But if that's specifically what you're trying to prevent, I think we should test for it. Thanks, Richard