On Thu, 16 Oct 2014, Kirill Batuzov wrote: > > (4) Consider supporting generic vector operations in the TCG? > > I gave it a go and was quite happy with the result. I have implemented the > add_i32x4 > opcode which is addition of 128-bit vectors composed of four 32-bit integers > and used it to translate NEON vadd.i32 to SSE paddd instruction.
<snip> > > Why I think all this is worth doing: > > (1) Performance. 200% speedup is a lot. My test was specifically crafted and > real > life applications may not have that much vector operations on average, but > there is a specific class of applications where it will matter a lot - > media > processing applications like ffmpeg. > > (2) Some unification of common operations. Right now every target reimplements > common vector operations (like vector add/sub/mul/min/compare etc.). We > can > do it once in the common TCG code. > > Still there are some cons I mentioned earlier. The need to support a lot of > opcodes is the most significant in the long run I think. So before I commit my > time to conversion of more operations I'd like to hear your opinions if this > approach is acceptable and worth spending efforts. > > Kirill Batuzov (7): > tcg: add support for 128bit vector type > tcg: store ENV global in TCGContext > tcg: add sync_temp opcode > tcg: add add_i32x4 opcode > target-arm: support access to 128-bit guest registers as globals > target-arm: use add_i32x4 opcode to handle vadd.i32 instruction > tcg/i386: add support for vector opcodes > > target-arm/translate.c | 30 ++++++++++- > tcg/i386/tcg-target.c | 103 ++++++++++++++++++++++++++++++++--- > tcg/i386/tcg-target.h | 24 ++++++++- > tcg/tcg-op.h | 141 > ++++++++++++++++++++++++++++++++++++++++++++++++ > tcg/tcg-opc.h | 13 +++++ > tcg/tcg.c | 36 +++++++++++++ > tcg/tcg.h | 34 ++++++++++++ > 7 files changed, 371 insertions(+), 10 deletions(-) > > Ping? Any more comments? -- Kirill