Am 07.05.2014 20:33, schrieb Ilia Mirkin: > On Tue, May 6, 2014 at 1:36 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >> On Tue, May 6, 2014 at 1:29 PM, Roland Scheidegger <srol...@vmware.com> >> wrote: >>> Am 06.05.2014 17:03, schrieb Ilia Mirkin: >>>> On Tue, May 6, 2014 at 10:48 AM, Roland Scheidegger <srol...@vmware.com> >>>> wrote: >>>>> Looks good to me. >>>> >>>> Thanks! >>>> >>>>> Does that mean if also the GATHER_SM5 cap is supported you have to >>>>> support 4 independent, non-constant offsets? >>>> >>>> Not 100% sure what you're asking... but yes, for ARB_gs5 to work, you >>>> have to support independent non-constant offsets. And if you have >>>> PIPE_CAP_TEXTURE_GATHER_OFFSETS enabled, you're making the claim that >>>> you can handle multiple independent offsets in a single texgather. >>>> Without the cap, the 4 offsets get lowered into 4 separate texgathers >>>> (with only one of the returned components used). >>>> >>>> With nvc0, the offsets are passed in via a register, so non-constant >>>> is never an issue. And with nv50, the offsets must be immediates (and >>>> there can be only 1 set of them), but it also has no hope of >>>> supporting all of ARB_gs5. >>>> >>>>> Would it make sense to reorder the caps so the gather stuff is all >>>>> together (now 5 cap bits just for this...)? >>>> >>>> The quantity of caps for texgather is a little ridiculous. I'm of the >>>> opinion that this should be the default behaviour, and it should be up >>>> to the driver to lower it into 4 texgathers if it can't handle them >>>> directly. Furthermore, this functionality is only available (via GL) >>>> with ARB_gs5, which in turn will require a whole bunch of stuff, so I >>>> don't know whether the GATHER_SM5 cap is really that useful. And for >>>> someone with a DX tracker, this functionality would again not be >>>> useful on its own, the rest of SM5 would have to be supported as well >>>> (I assume). >>>> >>>> But that's not what got implemented, and I don't care to modify >>>> radeon, which can only support 1 offset at a time. (Although I don't >>>> think the radeon impl got pushed...) I anticipate that llvmpipe >>>> doesn't care one way or another (perhaps with even a minor preference >>>> towards having it all in one instruction). >>>> >>>> If there's concensus, happy to switch this on by default and get rid >>>> of the cap :) [And also get rid of the GATHER_SM5 cap.] >>> Well I think the point was that there's really hw which can only do >>> simple gather (what d3d10.1 could do or arb_texture_gather would do). >>> This hw will not be able to do other stuff from newer gl versions anyway >>> so it should not be required to support those new features. >> >> Right. But since that hw will only ever expose ARB_texture_gather and >> not ARB_gpu_shader5, it will never receive a TG4 instruciton with >> non-const offsets or multiple offsets. So the cap to indicate that >> non-const or quad offsets are supported isn't really necessary, since >> those will only appear if ARB_gs5 support is claimed, which requires >> more than just the texgather stuff. (The >> PIPE_CAP_TEXTURE_GATHER_COMPONENTS cap _is_ necessary since it >> indicates ARB_texture_gather support, and the value that should be >> returned by some GL query about what tex gather supports.) >> >>> I'm not entirely sure to what it's actually lowered but in any case >>> llvmpipe if it implemented this definitely would want a non-lowered >>> version. >> >> Right now, it'll get lowered to 4 texgathers, with only one of the >> returned 4 components used from each one. (And it can't use texfetch >> since the min/max offsets are different, and there's probably some >> other clever reason as well.) >> >>> I think though some radeon hw could really do SM5 version but >>> not independent offsets natively, though I'm not sure if it would really >>> be all that complicated to handle it in the driver. >> >> Well, I think the claim was that SM5 doesn't actually support the 4 >> separate offsets, but GL4 does with textureGatherOffsets(). Also, I >> believe that radeon supports non-const natively, just not have 4 >> offsets in one instruction. Same deal with i965 (which is why that >> lowering pass exists in the first place). > > Getting back on topic... what should I do? :) Check this in with the > new cap? Or just make it the default behaviour and let drivers that > can't handle it do the lowering in the driver? FWIW, I believe Dave > Airlie was against that, but that might have been because he was > implementing it for r600, which can't handle 4 separate offsets. (BTW, > was that "looks good to me" == R-b?) Yes, that is Reviewed-by: Roland Scheidegger <srol...@vmware.com>
I think the code is ok as is _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev