On Mon, Apr 21, 2014 at 10:20 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > On Mon, Apr 21, 2014 at 12:56 PM, Matt Turner <matts...@gmail.com> wrote: >> On Mon, Apr 21, 2014 at 8:54 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >>> Hello, >>> >>> I've been giving some thought to catching up with core mesa on ARB_gs5 >>> support. One of the things that ARB_gs5 introduces are new operations: >>> >>> genType frexp(genType x, out genIType exp); >>> genType ldexp(genType x, in genIType exp); >>> >>> genIType bitfieldExtract(genIType value, int offset, int bits); >>> genUType bitfieldExtract(genUType value, int offset, int bits); >>> >>> genIType bitfieldInsert(genIType base, genIType insert, int offset, >>> int bits); >>> genUType bitfieldInsert(genUType base, genUType insert, int offset, >>> int bits); >>> >>> genIType bitfieldReverse(genIType value); >>> genUType bitfieldReverse(genUType value); >>> >>> genIType bitCount(genIType value); >>> genIType bitCount(genUType value); >>> >>> genIType findLSB(genIType value); >>> genIType findLSB(genUType value); >>> >>> genIType findMSB(genIType value); >>> genIType findMSB(genUType value); >>> >>> genUType uaddCarry(genUType x, genUType y, out genUType carry); >>> genUType usubBorrow(genUType x, genUType y, out genUType borrow); >>> >>> void umulExtended(genUType x, genUType y, out genUType msb, >>> out genUType lsb); >>> void imulExtended(genIType x, genIType y, out genIType msb, >>> out genIType lsb); >>> >>> (I've skipped the packing stuff since that seems to already be >>> supported/lowered elsewhere, i2f/f2i which is already handled, and the >>> texture gather stuff, for which support already exists. And the >>> interpolateAt* stuff which isn't supported by core mesa yet, and when >>> it is, will require a very diff kind of handling than the above.) >>> >>> I guess the only drivers one really needs to worry about here are >>> r600/radeonsi and nouveau. svga is largely a passthrough afaik, and >>> llvmpipe/softpipe is software and can thus implement it however it >>> wants. >>> >>> Looking at the nvc0+ shader ISA, there are instructions to directly >>> handle all the bitfield stuff (bitfieldExtract, bitfieldInsert, >>> bitfieldReverse, bitCount, findLSB, findMSB). There is also a "mul >>> high", which is that the *mulExtended stuff gets translated into. >>> >>> There are no instructions to handle frexp/ldexp, or the add carry/sub >>> borrow stuff. (Looking at the code the blob generates, they just do >>> all that "by hand". Even though there is a "set cc" flag on those >>> instructions which one might assume has the carry. But the blob didn't >>> use it.) >>> >>> So I was thinking that we could just take the relevant SM5 >>> instructions and lower the rest. Specifically, these would be the new >>> opcodes: >>> >>> IBFE >>> UBFE >>> BFI >>> BREV (not BFREV since most instructions appear to be 3/4 letters) >>> POPC (shorter than "countbits") >>> LSB >>> UMSB >>> IMSB >>> IMULHI >>> >>> I just took a look at the Radeon SI ISA, and it does seem like it has >>> ldexp/frexp instructions, as well as setting the carry flag for >>> addc/subb. Although since TGSI doesn't have flags or multiple >>> destinations, not sure how the latter 2 could be easily encoded in the >>> glsl->tgsi translation. >>> >>> Thoughts/opinions before I go and implement the above? Is someone else >>> already working on this? >> >> I've written lowering code for ldexp/frexp. It relies on support for > > The lowering code for ldexp is optional, but the frexp one seems to be > "required" (in that there is no ir_binop_frexp at all). If RadeonSI > wants to make use of its built-in frexp instruction, they'll either > need to change it, or have a _really_ clever peephole pass. (Didn't > check if the r600 isa had the same thing...)
R700 is the first to have frexp/ldexp instructions. Someone will need to convert the frexp code in builtin_functions.cpp to a lowering pass if they want to use an frexp instruction. >> EXT_shader_integer_mix, which disappointingly no other Mesa drivers >> have exposed. > > http://gallium.readthedocs.org/en/latest/tgsi.html#opcode-UCMP > > I assume that's the same thing? If so, that extension can probably > just be exposed as-is on gallium for drivers that support > NativeIntegers. Yeah, looks like that instruction is all that's needed. >> For the multi-destination built-ins, i965 has multi-destination >> instructions (addc, subb) which write the carry/borrow to the >> accumulator register. Instead of doing a ton of infrastructure to >> support multi-destination IR I emit an add an addc for uaddCarry and >> only use the carry result from addc. A peephole optimization can >> easily combine the add/addc into a single addc. > > Hm, neat idea. But the same peephole pass could, instead, be used to detect > > UADD x, a, b > USLT y, x, a > > And then you don't need the special ADDC instruction. And you get the > advantage of being able to detect (some) people who were doing this by > hand before. I suppose so, but we can't implement USLT on i965 more efficiently than addc. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev