Richard Henderson writes: > Most of the time, guest vector operations are rare enough that it doesn't > really matter that we implement them with a loop around integer operations. > > But for target-alpha, there's one vector comparison operation that appears in > every guest string operation, and is used heavily enough that it's in the top > 10 functions in the profile: cmpbge (compare bytes greater or equal).
For a helper function to top the profile is pretty impressive. I wonder how it compares when you break it down by basic blocks? > I did some experiments, where I rewrote the function using gcc's "generic" > vector types and builtin operations. > > <snip> > > GCC doesn't do a half-bad job on other hosts either: > > aarch64: > b4: 4f000400 movi v0.4s, #0x0 > b8: 4ea01c01 mov v1.16b, v0.16b > bc: 4e081c00 mov v0.d[0], x0 > c0: 4e081c21 mov v1.d[0], x1 > c4: 6e213c00 cmhs v0.16b, v0.16b, v1.16b > c8: 4e083c00 mov x0, v0.d[0] > cc: 9200c000 and x0, x0, #0x101010101010101 > d0: aa401c00 orr x0, x0, x0, lsr #7 > d4: aa403800 orr x0, x0, x0, lsr #14 > d8: aa407000 orr x0, x0, x0, lsr #28 > dc: 53001c00 uxtb w0, w0 > e0: d65f03c0 ret > > Of course aarch64 *does* have an 8-byte vector size that gcc knows how to use. > If I adjust the patch above to use it, only the first two insns are > eliminated > -- surely not a measurable difference. > > power7: > ... > vcmpgtub 13,0,1 > vcmpequb 0,0,1 > xxlor 32,45,32 > ... > > > But I guess the larger question here is: how much of this should we accept? > > (0) Ignore this and do nothing? > > (1) No general infrastructure. Special case this one insn with #ifdef > __SSE2__ > and ignore anything else. Not a big fan of special cases that are arch dependent. > (2) Put in just enough infrastructure to know if compiler support for general > vectors is available, and then use it ad hoc when such functions are shown to > be high on the profile? > > (3) Put in more infrastructure and allow it to be used to implement most guest > vector operations, possibly tidying their implementations? <snip> (4) Consider supporting generic vector operations in the TCG? While making helper functions faster is good I've wondered if they is enough genericsm across the various SIMD/vector operations we could add add TCG ops to translate them? The ops could fall back to generic helper functions using the GCC instrinsics if we know there is no decent back-end support for them? -- Alex Bennée