On Wed, Jan 20, 2016 at 4:44 AM, Tapani Pälli <tapani.pa...@intel.com> wrote: > On 01/20/2016 11:31 AM, Ilia Mirkin wrote: >> >> On Wed, Jan 20, 2016 at 4:22 AM, Tapani Pälli <tapani.pa...@intel.com> >> wrote: >>> >>> On 01/20/2016 11:16 AM, Ilia Mirkin wrote: >>>> >>>> On Wed, Jan 20, 2016 at 4:09 AM, Tapani Pälli <tapani.pa...@intel.com> >>>> wrote: >>>>> >>>>> On 01/20/2016 10:26 AM, Ilia Mirkin wrote: >>>>>> >>>>>> On Tue, Jan 19, 2016 at 6:35 AM, Tapani Pälli <tapani.pa...@intel.com> >>>>>> wrote: >>>>>>> >>>>>>> On 01/19/2016 01:14 PM, Ilia Mirkin wrote: >>>>>>>> >>>>>>>> The data structure is a (memory) heap... there appears to be one in >>>>>>>> mesa/main/mm.h. There's also one in nouveau_heap.h which is quite >>>>>>>> simple and totally unreliant on nouveau, just happens to be there. >>>>>>>> How >>>>>>>> hard would it be to integrate something like that? >>>>>>>> >>>>>>>> The trouble with adding slow things is that you forget about them, >>>>>>>> and >>>>>>>> they're not _that_ slow, but this stuff adds up. >>>>>>> >>>>>>> >>>>>>> The solution I had in mind is to build a list of empty slots when >>>>>>> allocating >>>>>>> remaptable or while finding slots (keep pushing unused empty slots to >>>>>>> list) >>>>>>> ... but if possible I would prefer optimization later. First of all, >>>>>>> this >>>>>>> is >>>>>>> quite exotic path to hit with a real program (last words ... yes >>>>>>> yes). >>>>>>> Secondly, and more importantly, we can apply for certification >>>>>>> sooner, >>>>>>> there >>>>>>> are very few failures left. >>>>>> >>>>>> I see you pushed this patch without concluding this discussion. >>>>>> Certification may be something that you (personally, as a company, >>>>>> whatever) are striving for, but that doesn't mean that you get to >>>>>> ignore reviewer feedback. >>>>> >>>>> >>>>> I'm sorry if you have that impression but I'm not ignoring review >>>>> feedback. >>>>> I agree that the find function is not 'optimal' and have planned how to >>>>> optimize it and I'm happy with any changes if someone wants to optimize >>>>> and >>>>> refactor it instead. However, I've noticed this to be not a bottleneck >>>>> and >>>>> cold path so because of the schedule I'm asking to do this later. >>>>> >>>>>> Perhaps in the end you're actually right, I don't know, but we >>>>>> certainly didn't agree on anything. I'm inclined to push out a revert >>>>>> while this is being sorted out. >>>>> >>>>> >>>>> I'm surprised to see this as such a big deal. >>>>> >>>>> // Tapani >>>>> >>>> The big deal is pushing the patch before concluding the discussion. >>>> >>>> Getting back to the matter at hand, what's the absolute worst case >>>> here? How big does the UniformRemapTable get? How many times can this >>>> function get called? >>> >>> >>> As example with Intel Haswell we have max as 98304, this is the biggest >>> size >>> with HSW. >>> >>> This function gets called only if the remaptable has 'holes' in it, >>> meaning >>> that explicit uniforms locations get scattered in this available space, I >>> consider this very rare for anyone or some engine to do. It could only >>> really happen if you use both explicit locations (non continuous >>> locations) >>> and implicit locations together. >> >> So... what's the worst case? What would that test look like? How long >> would it take to execute? > > > A shader that has max amount of uniforms possible. They need to be invidual > (not arrays or structs), declare half of them with explicit location so that > they fill every other location (leaving as much holes as possible), this > should be the worst case.
I played around a bunch with this. I couldn't trigger the badness, and then I realized it's because you have a bug: You need to use "j - i < entries", otherwise you end up picking the wrong location. I also hit another issue when trying to create a bad shader, with something like: layout (location=0) uniform Foo { layout (location=0) float a; vec4 ff[2048]; }; I end up hitting link_uniforms.cpp:1241: void link_assign_uniform_locations(gl_shader_program*, unsigned int, unsigned int): Assertion `prog->UniformRemapTable[element_loc] == ((gl_uniform_storage *) -1)' failed. Not sure if that's related to your work or not. However it dawns on me that I had misunderstood your algorithm. I thought it was triggering O(n^2) behaviour *for each search*, but it is not -- it's, in essence, a linear scan over the remap table. There could, of course, be a bunch of these scans. And you skip the process if the remap table is complete, so it really has to be a bunch of holes in the thing. Of course the size is finite, so at worst you'd have a remap table of size N with N/2 holes, and you'd end up scanning up to the first hole N/2 times. Which is roughly N^2/4, which is still 4M iterations for N = 4096. Unfortunately putting such a shader together is a bit of a pain, since all the uniforms have to be used. I still really think you need to build a heap. Or at least store a "first empty slot" so that you don't repeat the iteration *every* time. -ilia _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev