> I was hoping that this patch might help me fix pr121393 (on which I'm still > stuck), if I just disabled the unsigned gather/scatter patterns, but no such > luck: it just zero-extends the supposed-to-be-negative offset and suffers > exactly the same bug. > > On the plus side, whereas previously if I disabled unsigned offsets I > got a total failure to vectorize, whereas with your patch it now > vectorizes just fine, albeit with the bug.
Once things have settled down here, I can also have a look at the PR but so far I didn't have much time. > Assuming your patch is approved, should I continue to accept signed > offsets in the backend? The hardware instruction that accepts 32-bit > offsets only accepts unsigned offsets, so I'm supporting signed offsets > by calculating the absolute address explicitly. Of course, the backend > has no way to know if the range of possible offsets is actually safe to > treat it as unsigned, so is your way likely to produce better code? We also accept "everything" in the riscv backend and convert the offsets to unsigned. Similar to yours our instructions can only do unsigned offsets. I wouldn't expect this patch to improve on what a backend can do. For us the motivation is that costing will be easier, as well as choosing the offset type. I'm not doing any range checks, though. The idea is that for signed -> unsigned we need a new pointer-sized offset and things will just work via two's complement. The range truncation for strided gather/scatter is not touched. > The backend also accepts 64-bit offsets, but there's no hardware > instruction for that, so both variants are converted to absolute > addresses explicitly. This variant only exists because I found that the > SPEC HPC "lbm" benchmark fails to vectorize at all without it, but it > would be better if middle end didn't choose to use 64-bit offsets unless > absolutely necessary, but vectorizing is better than not. Hmm, we try to use smaller offset types first, generally. If a 64-bit offset is used that's likely due to a double array being used just like in SPECfp 2017's lbm? How do you treat 64-bit offsets when the hardware can only handle 32-bits? Maybe we could use ranger range information in vect_truncate_gather_scatter_offset? -- Regards Robin
