On Sat, 1 Feb 2020 12:53:28 +0100 James Darnley <james.darn...@gmail.com> wrote:
> On 30/12/2019, Lauri Kasanen <c...@gmx.com> wrote: > > For the Libre RISC-V project, I'm going to research the popular codecs > > and design new instructions to help speed them up. With ffmpeg being > > home to lots of asm folks for many platforms, I also want to ask your > > opinion. > > > > What new instructions would you like? Anything particular you find > > missing in existing ISAs, slow, or cumbersome? > > Do you mean SIMD instructions? I have no idea what exists in RISC-V > already or what capabilities or limitations it has, and I am going to > use x86 language and terms such as byte, word, dword, qword. > > Things I have found missing in old(er) x86 instruction sets are > missing word size and signed/unsigned variants for existing > operations. Some operations may have byte and word variants but dword > and qword might be missing, or there might be a signed version but not > an unsigned version (and vice versa). A couple of things I had to > emulate: > * packed absolute value of dwords > * packed maximum unsigned words > * packed max and min signed dwords (I might have really wanted > unsigned for this) > * arithmetic right shift of qwords > * pack dwords to words with unsigned saturation > > Shuffle instructions. pshufb is very useful and I think I read on IRC > that arm/aarch64/neon does not have an equivalent. (Or was that other > shuffles?) It allows for arbitrary reordering of bytes and setting > bytes to 0. On x86 it takes the shuffle pattern from another SIMD > register but I usually use it with a constant pattern that gets loaded > from memory. An interesting improvement would be if you can encode 17 > * 16 (or however long your vectors might be) values in an immediate > value so it doesn't require another register. > > Good documentation. The intel instruction manual has pretty good > explanation of what the instructions do. The old instructions from > around the time of MMX and SSE had excellent diagrams, these might > have been mostly for shuffle operations. I need to look and jog my > memory. I think punpcklbw is an example of what I mean. The entry in > the manual for it has a good diagram IMO. (At least the version I am > currently looking at) > > No stupid lane stuff. AVX2 brought us a SIMD vector length extension > from 16 to 32 bytes. Good except for the stupid lanes they were split > into making it hard to "mix" data from the low 0-15 bytes and the high > 16-31 bytes. > > I forgot about this email for a month. Sorry about that. Seeing > RISC-V in the schedule at FOSDEM reminded me about this. Thanks for your thoughts. The project scope is both SIMD and scalar, if there's for example a particular bit packing that's slow and unparallelizable, it might benefit from a dedicated instruction. - Lauri _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".