On Fri, Aug 9, 2013 at 10:34 AM, Christian König <deathsim...@vodafone.de> wrote: > Am 08.08.2013 21:38, schrieb Alex Deucher: > >> On Thu, Aug 8, 2013 at 1:34 PM, Marek Olšák <mar...@gmail.com> wrote: >>> >>> On Thu, Aug 8, 2013 at 6:57 PM, Christian König <deathsim...@vodafone.de> >>> wrote: >>>> >>>> Am 08.08.2013 16:33, schrieb Marek Olšák: >>>>> >>>>> On Thu, Aug 8, 2013 at 3:09 PM, Christian König >>>>> <deathsim...@vodafone.de> >>>>> wrote: >>>>>> >>>>>> Am 08.08.2013 14:38, schrieb Marek Olšák: >>>>>> >>>>>>> .On Thu, Aug 8, 2013 at 9:47 AM, Christian König >>>>>>> <deathsim...@vodafone.de> wrote: >>>>>>>> >>>>>>>> Am 08.08.2013 02:20, schrieb Marek Olšák: >>>>>>>> >>>>>>>>> FMASK is bound as a separate texture. For every texture, there can >>>>>>>>> be >>>>>>>>> an FMASK. Therefore a separate array of resource slots has to be >>>>>>>>> added. >>>>>>>>> >>>>>>>>> This adds a new mechanism for emitting resource descriptors, its >>>>>>>>> features >>>>>>>>> are: >>>>>>>>> - resource descriptors are stored in an ordinary buffer (not in a >>>>>>>>> CS) >>>>>>>> >>>>>>>> >>>>>>>> Having resource descriptors outside of the CS has two problems that >>>>>>>> we >>>>>>>> need >>>>>>>> to solve first: >>>>>>>> >>>>>>>> 1. Fine grained descriptor updates doesn't work, I already tried >>>>>>>> that. >>>>>>>> The >>>>>>>> problem is that unlike previous asics descriptors are now a memory >>>>>>>> block, >>>>>>>> so >>>>>>>> no longer part of the CP context. So when we (for example) have a >>>>>>>> draw >>>>>>>> command executing and the next draw command is using new resources >>>>>>>> for >>>>>>>> a >>>>>>>> specific slot we would either block until the first draw command is >>>>>>>> finished >>>>>>>> (which is bad for performance) or change the descriptors while they >>>>>>>> are >>>>>>>> still in use (which results in VM faults). >>>>>>> >>>>>>> So what would the proper solution be here? Do I need to flush some >>>>>>> caches or would moving the descriptor updates to the constant IB fix >>>>>>> that? >>>>>> >>>>>> >>>>>> Actually the current implementation worked better than anything else I >>>>>> tried. >>>>>> >>>>>> When you really need the resource descriptors in a separate buffer you >>>>>> need >>>>>> to use one buffer for each draw call and always write the full buffer >>>>>> contents (no partial updates). Flushing anything won't really help >>>>>> either.. >>>>>> >>>>>> The only solution I see using one buffer is to block until the last >>>>>> draw >>>>>> call is finished with WAIT_REG_MEM, but that would be quite disastrous >>>>>> for >>>>>> performance. >>>>>> >>>>>> >>>>>>>> 2. If my understand is correct when they are embedded the >>>>>>>> descriptors >>>>>>>> are >>>>>>>> preloaded into the caches while executing the IB, so to archive the >>>>>>>> same >>>>>>>> speed with descriptors outside of the IB you need to add additional >>>>>>>> commands >>>>>>>> to the constant IB which is new to SI and we currently doesn't >>>>>>>> support >>>>>>>> in >>>>>>>> the CS interface. >>>>>>> >>>>>>> There seems to be support for the constant IB. The CS ioctl chunk ID >>>>>>> is RADEON_CHUNK_ID_CONST_IB and the allowed packets are listed in >>>>>>> si_vm_packet3_ce_check. Is there anything missing? >>>>>> >>>>>> >>>>>> The userspace side seems to be missing and except for throwing NOP >>>>>> packets >>>>>> into it we never tested it. I know from the closed source side that it >>>>>> actually was quite tricky for them to get working. >>>>>> >>>>>> Additional to that please note that I'm not 100% sure that just >>>>>> putting >>>>>> the >>>>>> descriptors into the IB is really helping here. It was just the most >>>>>> simplest solution to avoid allocating a new buffer on each draw call. >>>>> >>>>> I understand. I don't really need to have resource descriptors in a >>>>> separate buffer, all I need is these 3 basic features a gallium driver >>>>> should support: >>>>> - fine-grained resource updates (mainly for performance, see below) >>>>> - ability to unbind resources (e.g. by setting IMG_RSRC_WORD1 to 0) >>>>> - no GPU crash if a shader is using SAMPLER[15] but there are no >>>>> samplers >>>>> bound >>>>> >>>>> FYI, partial sampler view and sampler state updates are coming to >>>>> gallium, Brian Paul already has some patches, it's just a matter of >>>>> time now. Vertex and constant buffer states already support partial >>>>> updates. >>>> >>>> >>>> That shouldn't be to much off a problem. >>>> >>>> Just allocate a state at startup and initialize it with the proper pm4 >>>> commands for 16 samplers, then update the resource descriptors in that >>>> state >>>> when we change the bound textures/samplers/views/constants/whatever. All >>>> we >>>> need to do then is setting the emitted state to NULL so that it gets >>>> re-emitted in the next draw command. >>> >>> That would re-emit all 16 shader resources even if just one of them >>> needs to be changed. I was trying to avoid this inefficiency. Is it >>> really impossible to emit just one resource descriptor and keep the >>> others unchanged? This is a basic D3D10/11 feature, for example: >>> >>> void ID3D11DeviceContext::VSSetShaderResources( >>> [in] UINT StartSlot, >>> [in] UINT NumViews, >>> [in] ID3D11ShaderResourceView *const *ppShaderResourceViews >>> ); >>> >>> If the constant engine is required to implement this interface >>> efficiently, then I'd like to work on constant IB support. >> >> You'll need to either store them in memory or re-emit them if you >> store them in the IB. The CE is mainly there so that it can prime the >> TC in parallel with the command stream processing. > > > Yeah indeed. The CE is just for prefetching everything into caches and > doesn't really help here. > > The only two options I see is either fully emitting it into the command > stream whenever anything changes or allocating a new buffer for the > resources on each new draw call, copying over the old state and then setting > just the things that changed. Both options have their pro and cons, no idea > what might be better. > > Fact is the resource descriptors are not allowed to change as long as the > shaders are running.
I think flushing the TC before changing the descriptors should help. If not, then PS_PARTIAL_FLUSH or some other equivalent of WAIT_UNTIL should. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev