https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945
Andrew Waterman <andrew at sifive dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrew at sifive dot com --- Comment #8 from Andrew Waterman <andrew at sifive dot com> --- > In fact, I'd be rather surprised to see anything preferring tail undisturbed. Right. To be precise, microarchitectures without register renaming absolutely do prefer to leave the tail undisturbed. But that's why the ISA defines the agnostic mode in such a way that undisturbed is a valid implementation of agnostic. (The in-order microarchitectures I've worked on simply ignore the tail-/mask-agnostic setting; the state bits that control the mode are essentially vestigial.) Since no plausible implementation will benefit from being in undisturbed mode, we don't need to consider that aspect of the problem, but... > I prefer fewer "vsetvli" (which allows more fusion) by default. ...but here's the rub. Implementations that don't benefit from the agnostic setting would definitely prefer to avoid the extra setvl instructions, not because they're expensive, but because they're not free. > Some designs aren't sensitive to the number of vsetvls and I would expect > that over time that's where high performance designs will land over time. Low-performance ones, too. (Making vset[i]vli fast is more of an engineering cost than a silicon cost.) But the instructions still have to be fetched and decoded, and registers have to be read and written, so the perf cost will converge on that of, say, an ADDI instruction, which is to say cheap but not zero. For narrow-issue machines, this does matter. > Obviously for your design you'll want to set the knob which says "minimize > vsetvls" as opposed to "avoid false dependencies by preferring tail > agnostic". That's easily handled by putting the data in the tuning structure > for each design. And so this is the right answer :)