Yup, most definitely. I just have one more thing to test before sending out a V2. I've toyed around with arrays and sets and stuff to see if there are better options than a linked list. At least for now the answer is: "no, there isn't", but I'm gonna test u_vector for this use later today to see if that is even better. Expect new patch this evening CET.
2018-03-14 20:58 GMT+01:00 Dieter Nützel <die...@nuetzel-hh.de>: > Hello Thomas, > > is this useful even after '[Mesa-dev] [PATCH 0/2] V2: Use hash table cloning > in copy propagation' landed? > > I've running both together with Dave's '[Mesa-dev] [PATCH] radv/winsys: > replace bo list searchs with a hash table.' patch. > > Dieter > > > Am 24.01.2018 08:33, schrieb Thomas Helland: >> >> 2018-01-21 23:58 GMT+01:00 Eric Anholt <e...@anholt.net>: >>> >>> Thomas Helland <thomashellan...@gmail.com> writes: >>> >>>> Also, allocate worklist_elem in groups of 20, to reduce the burden of >>>> allocation. Do not use rzalloc, as there is no need. This lets us drop >>>> the number of calls to ralloc from aproximately 10% of all calls to >>>> ralloc(130 000 calls), down to a mere 2000 calls to ralloc_array_size. >>>> This cuts the runtime of shader-db by 1%, while at the same time >>>> reducing the number of stalled cycles, executed cycles, and executed >>>> instructions by about 1 % as reported by perf. I did a five-run >>>> benchmark pre and post and got a statistical variance less than 0.1% pre >>>> and post. This was with i965's ir validation polluting the benchmark, so >>>> the numbers are even better in release builds. >>>> >>>> Performance change as found with perf-diff: >>>> 4.74% -0.23% libc-2.26.so [.] _int_malloc >>>> 1.88% -0.21% libc-2.26.so [.] malloc >>>> 2.27% +0.16% libmesa_dri_drivers.so [.] match_value.part.7 >>>> 2.95% -0.12% libc-2.26.so [.] _int_free >>>> +0.11% libmesa_dri_drivers.so [.] worklist_push >>>> 1.22% -0.08% libc-2.26.so [.] malloc_consolidate >>>> 0.16% -0.06% libmesa_dri_drivers.so [.] mark_live_cb >>>> 1.21% +0.06% libmesa_dri_drivers.so [.] match_expression.part.6 >>>> 0.75% -0.05% libc-2.26.so [.] cfree@GLIBC_2.2.5 >>>> 0.50% -0.05% libmesa_dri_drivers.so [.] ralloc_size >>>> 0.57% +0.04% libmesa_dri_drivers.so [.] nir_replace_instr >>>> 1.29% -0.04% libmesa_dri_drivers.so [.] unsafe_free >>> >>> >>> I'm curious, since a NIR instruction worklist seems like a generally >>> useful thing to have: >>> >>> Could nir_worklist.c keep the implementation of this? >>> >>> Also, I wonder if it wouldn't be even better to have a u_dynarray of >>> instructions in the worklist, with push/pop on the end of the array, and >>> a struct set tracking the instructions in the array to avoid >>> double-adding. I actually don't know if that would be better or not, so >>> I'd be happy with the worklist management just moved to nir_worklist.c. >> >> >> I'll look into this to see what I can do. nir_worklist.c at this time has >> only >> a block worklist. This numbers all the blocks, uses a bitset for checking >> if the item is present, and uses an array with an index pointing to the >> start of the queue of blocks in the buffer. >> >> The same scheme could be easily used for ssa-defs, as these are >> also numbered. I actually did this for the VRP pass I wrote years ago. >> >> However, for instructions we do not have a way of numbering them, >> so a different scheme would have to be used. A dynarray + set type >> of thing, us you're suggesting, might get us where we want. >> I'll see what I can come up with. >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev