Am 21.04.2014 21:10, schrieb Ilia Mirkin: > On Mon, Apr 21, 2014 at 2:52 PM, Roland Scheidegger <srol...@vmware.com> > wrote: >> Am 21.04.2014 17:54, schrieb Ilia Mirkin: >>> Hello, >>> >>> I've been giving some thought to catching up with core mesa on ARB_gs5 >>> support. One of the things that ARB_gs5 introduces are new operations: >>> >>> genType frexp(genType x, out genIType exp); >>> genType ldexp(genType x, in genIType exp); >>> >>> genIType bitfieldExtract(genIType value, int offset, int bits); >>> genUType bitfieldExtract(genUType value, int offset, int bits); >>> >>> genIType bitfieldInsert(genIType base, genIType insert, int offset, >>> int bits); >>> genUType bitfieldInsert(genUType base, genUType insert, int offset, >>> int bits); >>> >>> genIType bitfieldReverse(genIType value); >>> genUType bitfieldReverse(genUType value); >>> >>> genIType bitCount(genIType value); >>> genIType bitCount(genUType value); >>> >>> genIType findLSB(genIType value); >>> genIType findLSB(genUType value); >>> >>> genIType findMSB(genIType value); >>> genIType findMSB(genUType value); >>> >>> genUType uaddCarry(genUType x, genUType y, out genUType carry); >>> genUType usubBorrow(genUType x, genUType y, out genUType borrow); >>> >>> void umulExtended(genUType x, genUType y, out genUType msb, >>> out genUType lsb); >>> void imulExtended(genIType x, genIType y, out genIType msb, >>> out genIType lsb); >>> >>> (I've skipped the packing stuff since that seems to already be >>> supported/lowered elsewhere, i2f/f2i which is already handled, and the >>> texture gather stuff, for which support already exists. And the >>> interpolateAt* stuff which isn't supported by core mesa yet, and when >>> it is, will require a very diff kind of handling than the above.) >>> >>> I guess the only drivers one really needs to worry about here are >>> r600/radeonsi and nouveau. svga is largely a passthrough afaik, and >>> llvmpipe/softpipe is software and can thus implement it however it >>> wants. >>> >>> Looking at the nvc0+ shader ISA, there are instructions to directly >>> handle all the bitfield stuff (bitfieldExtract, bitfieldInsert, >>> bitfieldReverse, bitCount, findLSB, findMSB). There is also a "mul >>> high", which is that the *mulExtended stuff gets translated into. >>> >>> There are no instructions to handle frexp/ldexp, or the add carry/sub >>> borrow stuff. (Looking at the code the blob generates, they just do >>> all that "by hand". Even though there is a "set cc" flag on those >>> instructions which one might assume has the carry. But the blob didn't >>> use it.) >>> >>> So I was thinking that we could just take the relevant SM5 >>> instructions and lower the rest. Specifically, these would be the new >>> opcodes: >>> >>> IBFE >>> UBFE >>> BFI >>> BREV (not BFREV since most instructions appear to be 3/4 letters) >>> POPC (shorter than "countbits") >>> LSB >>> UMSB >>> IMSB >>> IMULHI >> We already have imul_hi. > > Yeah, I noticed that after I sent it out. Only llvmpipe (and perhaps > softpipe) supports it though, based on a quick grep. And nothing emits > it (although presumably the vmware d3d10 st makes use of it). > >> >>> >>> I just took a look at the Radeon SI ISA, and it does seem like it has >>> ldexp/frexp instructions, as well as setting the carry flag for >>> addc/subb. Although since TGSI doesn't have flags or multiple >>> destinations, not sure how the latter 2 could be easily encoded in the >>> glsl->tgsi translation. >> It is not entirely true that tgsi doesn't support multiple destinations. >> The token format allows 0-3 destinations. But so far instructions with >> more than one destination do not exist. There was some discussion about >> it when we needed umul_hi/imul_hi (since these are also multiple >> destination sm4 instructions) but deemed it not worth it, partly also >> because it didn't look like (most) gpus could actually benefit from this >> being just 1 instruction instead of two (that is, it would emit the same >> 2 instructions for the low and high part of the mul anyway). Mostly >> because gpus (and cpus) usually follow the model of multiple 32bit >> sources in, one 32bit dst out. Obviously the accumulator of intel gpus >> is an exception there. >> So, you could follow that same model with subb/addc - use the existing >> sub/add and just use a new instruction for the borrow/carry part (though >> it looks like if you do it with two instructions anyway, you could just >> use an existing instruction for the carry/borrow part). But if gpus >> actually can set two regs simultaneously (or otherwise benefit from this >> being one instruction without having to "reassemble" it, for instance >> with special carry flags), then it might be better to actually use >> multi-dest instructions. Most likely because this hasn't been used at >> all until now it will break in some places, but there should not be >> anything major preventing this to work. > > You're still going to have to reassemble it one way or another -- > either detecting UADD/ADDC combinations, or UADD/USLT combinations. > Might as well use the more general one, no? (And a similar combo can > be used for SUBB, I think.) Yes, if you use two instructions. > > Having real multiple outputs will be useful if anyone wants to pipe > FREXP all the way through -- that'll be a bit awkward to do as 2 > opcodes. Since nvc0 doesn't support it, I won't be losing sleep over > it :) Well in theory it doesn't look awkward at all to me as 2 instructions - one just returns the mantissa, the other the exponent. As far as I can tell, this is exactly what radeonsi would do. (Older radeons it seems do not support frexp, or rather they only support it for doubles - there indeed it is one instruction returning 2 results, as it is using 4 slots out of the 4 (or 5) vliw slots.) I guess though the things which can't be lowered reasonably would be more important to implement.
> >> >> >>> >>> Thoughts/opinions before I go and implement the above? Is someone else >>> already working on this? >> I think this looks good overall. We're getting close to the max number >> of different instructions though (256) but if that should become a >> problem can easily ditch some (or double the max number by killing a bit >> from max number of sources - 0-15 sources is not useful, 0-7 would still >> be more than enough). > > I didn't realize there was a max instruction quantity, but these will > have to be added one way or another if gallium is to support GL4.0 :) > There's also the Double ISA which appears to be documented but not > actually in p_shader_tokens.h, which will take up a whole bunch of > opcodes as well. Yes indeed. > > In any case, I'm going to take a stab at implementing these and piping > them through to nvc0 after I finish up ARB_sample_shading (coming soon > to a patch near you). > > -ilia > Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev