On Thu, Aug 8, 2013 at 6:57 PM, Christian König <deathsim...@vodafone.de> wrote: > Am 08.08.2013 16:33, schrieb Marek Olšák: >> >> On Thu, Aug 8, 2013 at 3:09 PM, Christian König <deathsim...@vodafone.de> >> wrote: >>> >>> Am 08.08.2013 14:38, schrieb Marek Olšák: >>> >>>> .On Thu, Aug 8, 2013 at 9:47 AM, Christian König >>>> <deathsim...@vodafone.de> wrote: >>>>> >>>>> Am 08.08.2013 02:20, schrieb Marek Olšák: >>>>> >>>>>> FMASK is bound as a separate texture. For every texture, there can be >>>>>> an FMASK. Therefore a separate array of resource slots has to be >>>>>> added. >>>>>> >>>>>> This adds a new mechanism for emitting resource descriptors, its >>>>>> features >>>>>> are: >>>>>> - resource descriptors are stored in an ordinary buffer (not in a CS) >>>>> >>>>> >>>>> Having resource descriptors outside of the CS has two problems that we >>>>> need >>>>> to solve first: >>>>> >>>>> 1. Fine grained descriptor updates doesn't work, I already tried that. >>>>> The >>>>> problem is that unlike previous asics descriptors are now a memory >>>>> block, >>>>> so >>>>> no longer part of the CP context. So when we (for example) have a draw >>>>> command executing and the next draw command is using new resources for >>>>> a >>>>> specific slot we would either block until the first draw command is >>>>> finished >>>>> (which is bad for performance) or change the descriptors while they are >>>>> still in use (which results in VM faults). >>>> >>>> So what would the proper solution be here? Do I need to flush some >>>> caches or would moving the descriptor updates to the constant IB fix >>>> that? >>> >>> >>> Actually the current implementation worked better than anything else I >>> tried. >>> >>> When you really need the resource descriptors in a separate buffer you >>> need >>> to use one buffer for each draw call and always write the full buffer >>> contents (no partial updates). Flushing anything won't really help >>> either.. >>> >>> The only solution I see using one buffer is to block until the last draw >>> call is finished with WAIT_REG_MEM, but that would be quite disastrous >>> for >>> performance. >>> >>> >>>>> 2. If my understand is correct when they are embedded the descriptors >>>>> are >>>>> preloaded into the caches while executing the IB, so to archive the >>>>> same >>>>> speed with descriptors outside of the IB you need to add additional >>>>> commands >>>>> to the constant IB which is new to SI and we currently doesn't support >>>>> in >>>>> the CS interface. >>>> >>>> There seems to be support for the constant IB. The CS ioctl chunk ID >>>> is RADEON_CHUNK_ID_CONST_IB and the allowed packets are listed in >>>> si_vm_packet3_ce_check. Is there anything missing? >>> >>> >>> The userspace side seems to be missing and except for throwing NOP >>> packets >>> into it we never tested it. I know from the closed source side that it >>> actually was quite tricky for them to get working. >>> >>> Additional to that please note that I'm not 100% sure that just putting >>> the >>> descriptors into the IB is really helping here. It was just the most >>> simplest solution to avoid allocating a new buffer on each draw call. >> >> I understand. I don't really need to have resource descriptors in a >> separate buffer, all I need is these 3 basic features a gallium driver >> should support: >> - fine-grained resource updates (mainly for performance, see below) >> - ability to unbind resources (e.g. by setting IMG_RSRC_WORD1 to 0) >> - no GPU crash if a shader is using SAMPLER[15] but there are no samplers >> bound >> >> FYI, partial sampler view and sampler state updates are coming to >> gallium, Brian Paul already has some patches, it's just a matter of >> time now. Vertex and constant buffer states already support partial >> updates. > > > That shouldn't be to much off a problem. > > Just allocate a state at startup and initialize it with the proper pm4 > commands for 16 samplers, then update the resource descriptors in that state > when we change the bound textures/samplers/views/constants/whatever. All we > need to do then is setting the emitted state to NULL so that it gets > re-emitted in the next draw command.
That would re-emit all 16 shader resources even if just one of them needs to be changed. I was trying to avoid this inefficiency. Is it really impossible to emit just one resource descriptor and keep the others unchanged? This is a basic D3D10/11 feature, for example: void ID3D11DeviceContext::VSSetShaderResources( [in] UINT StartSlot, [in] UINT NumViews, [in] ID3D11ShaderResourceView *const *ppShaderResourceViews ); If the constant engine is required to implement this interface efficiently, then I'd like to work on constant IB support. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev