On 30/12/2019, Lauri Kasanen <c...@gmx.com> wrote: > Hi, > > For the Libre RISC-V project, I'm going to research the popular codecs > and design new instructions to help speed them up. With ffmpeg being > home to lots of asm folks for many platforms, I also want to ask your > opinion. > > What new instructions would you like? Anything particular you find > missing in existing ISAs, slow, or cumbersome?
Do you mean SIMD instructions? I have no idea what exists in RISC-V already or what capabilities or limitations it has, and I am going to use x86 language and terms such as byte, word, dword, qword. Things I have found missing in old(er) x86 instruction sets are missing word size and signed/unsigned variants for existing operations. Some operations may have byte and word variants but dword and qword might be missing, or there might be a signed version but not an unsigned version (and vice versa). A couple of things I had to emulate: * packed absolute value of dwords * packed maximum unsigned words * packed max and min signed dwords (I might have really wanted unsigned for this) * arithmetic right shift of qwords * pack dwords to words with unsigned saturation Shuffle instructions. pshufb is very useful and I think I read on IRC that arm/aarch64/neon does not have an equivalent. (Or was that other shuffles?) It allows for arbitrary reordering of bytes and setting bytes to 0. On x86 it takes the shuffle pattern from another SIMD register but I usually use it with a constant pattern that gets loaded from memory. An interesting improvement would be if you can encode 17 * 16 (or however long your vectors might be) values in an immediate value so it doesn't require another register. Good documentation. The intel instruction manual has pretty good explanation of what the instructions do. The old instructions from around the time of MMX and SSE had excellent diagrams, these might have been mostly for shuffle operations. I need to look and jog my memory. I think punpcklbw is an example of what I mean. The entry in the manual for it has a good diagram IMO. (At least the version I am currently looking at) No stupid lane stuff. AVX2 brought us a SIMD vector length extension from 16 to 32 bytes. Good except for the stupid lanes they were split into making it hard to "mix" data from the low 0-15 bytes and the high 16-31 bytes. I forgot about this email for a month. Sorry about that. Seeing RISC-V in the schedule at FOSDEM reminded me about this. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".