On Thu, Dec 07, 2017 at 10:57:31AM -0800, Jason Ekstrand wrote: > On Thu, Dec 7, 2017 at 9:57 AM, Pohjolainen, Topi < > topi.pohjolai...@gmail.com> wrote: > > > On Thu, Dec 07, 2017 at 09:25:08AM -0800, Jason Ekstrand wrote: > > > On Thu, Dec 7, 2017 at 9:10 AM, Pohjolainen, Topi < > > > topi.pohjolai...@gmail.com> wrote: > > > > > > > On Wed, Dec 06, 2017 at 08:34:19PM -0800, Jason Ekstrand wrote: > > > > > This rewires the logic for assigning uniform locations to work in > > terms > > > > > of "complex alignments". The basic idea is that, as we walk the > > list of > > > > > instructions, we keep track of the alignment and continuity > > requirements > > > > > of each slot and assert that the alignments all match up. We then > > use > > > > > those alignments in the compaction stage to ensure that everything > > gets > > > > > placed at a properly aligned register. The old mechanism handled > > > > > alignments by special-casing each of the bit sizes and placing 64-bit > > > > > values first followed by 32-bit values. > > > > > > > > > > The old scheme had the advantage of never leaving a hole since all > > the > > > > > 64-bit values could be tightly packed and so could the 32-bit values. > > > > > However, the new scheme has no type size special cases so it handles > > not > > > > > only 32 and 64-bit types but should gracefully extend to 16 and 8-bit > > > > > types as the need arises. > > > > > > > > > > Tested-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com> > > > > > --- > > > > > src/intel/compiler/brw_fs.cpp | 307 ++++++++++++++++++++++++------ > > > > ------------ > > > > > 1 file changed, 174 insertions(+), 133 deletions(-) > > > > > > > > > > diff --git a/src/intel/compiler/brw_fs.cpp > > b/src/intel/compiler/brw_fs. > > > > cpp > > > > > index 93bb6b4..41260b4 100644 > > > > > --- a/src/intel/compiler/brw_fs.cpp > > > > > +++ b/src/intel/compiler/brw_fs.cpp > > > > > @@ -1906,62 +1906,6 @@ fs_visitor::compact_virtual_grfs() > > > > > return progress; > > > > > } > > > > > > > > > > -static void > > > > > -set_push_pull_constant_loc(unsigned uniform, int *chunk_start, > > > > > - unsigned *max_chunk_bitsize, > > > > > - bool contiguous, unsigned bitsize, > > > > > - const unsigned target_bitsize, > > > > > - int *push_constant_loc, int > > > > *pull_constant_loc, > > > > > - unsigned *num_push_constants, > > > > > - unsigned *num_pull_constants, > > > > > - const unsigned max_push_components, > > > > > - const unsigned max_chunk_size, > > > > > - bool allow_pull_constants, > > > > > - struct brw_stage_prog_data > > *stage_prog_data) > > > > > -{ > > > > > - /* This is the first live uniform in the chunk */ > > > > > - if (*chunk_start < 0) > > > > > - *chunk_start = uniform; > > > > > - > > > > > - /* Keep track of the maximum bit size access in contiguous > > uniforms > > > > */ > > > > > - *max_chunk_bitsize = MAX2(*max_chunk_bitsize, bitsize); > > > > > - > > > > > - /* If this element does not need to be contiguous with the next, > > we > > > > > - * split at this point and everything between chunk_start and u > > > > forms a > > > > > - * single chunk. > > > > > - */ > > > > > - if (!contiguous) { > > > > > - /* If bitsize doesn't match the target one, skip it */ > > > > > - if (*max_chunk_bitsize != target_bitsize) { > > > > > - /* FIXME: right now we only support 32 and 64-bit accesses > > */ > > > > > - assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8); > > > > > - *max_chunk_bitsize = 0; > > > > > - *chunk_start = -1; > > > > > - return; > > > > > - } > > > > > - > > > > > - unsigned chunk_size = uniform - *chunk_start + 1; > > > > > - > > > > > - /* Decide whether we should push or pull this parameter. In > > the > > > > > - * Vulkan driver, push constants are explicitly exposed via > > the > > > > API > > > > > - * so we push everything. In GL, we only push small arrays. > > > > > - */ > > > > > - if (!allow_pull_constants || > > > > > - (*num_push_constants + chunk_size <= max_push_components > > && > > > > > - chunk_size <= max_chunk_size)) { > > > > > - assert(*num_push_constants + chunk_size <= > > > > max_push_components); > > > > > - for (unsigned j = *chunk_start; j <= uniform; j++) > > > > > - push_constant_loc[j] = (*num_push_constants)++; > > > > > - } else { > > > > > - for (unsigned j = *chunk_start; j <= uniform; j++) > > > > > - pull_constant_loc[j] = (*num_pull_constants)++; > > > > > - } > > > > > - > > > > > - *max_chunk_bitsize = 0; > > > > > - *chunk_start = -1; > > > > > - } > > > > > -} > > > > > - > > > > > static int > > > > > get_subgroup_id_param_index(const brw_stage_prog_data *prog_data) > > > > > { > > > > > @@ -1977,6 +1921,98 @@ get_subgroup_id_param_index(const > > > > brw_stage_prog_data *prog_data) > > > > > } > > > > > > > > > > /** > > > > > + * Struct for handling complex alignments. > > > > > + * > > > > > + * A complex alignment is stored as multiplier and an offset. A > > value > > > > is > > > > > + * considered to be aligned if it is congruent to the offset modulo > > the > > > > > + * multiplier. > > > > > + */ > > > > > +struct cplx_align { > > > > > + unsigned mul:4; > > > > > + unsigned offset:4; > > > > > +}; > > > > > + > > > > > +#define CPLX_ALIGN_MAX_MUL 8 > > > > > + > > > > > +static void > > > > > +cplx_align_assert_sane(struct cplx_align a) > > > > > +{ > > > > > + assert(a.mul > 0 && util_is_power_of_two(a.mul)); > > > > > + assert(a.offset < a.mul); > > > > > +} > > > > > + > > > > > +/** > > > > > + * Combines two alignments to produce a least multiple of sorts. > > > > > + * > > > > > + * The returned alignment is the smallest (in terms of multiplier) > > such > > > > that > > > > > + * anything aligned to both a and b will be aligned to the new > > > > alignment. > > > > > + * This function will assert-fail if a and b are not compatible, > > i.e. > > > > if the > > > > > + * offset parameters are such that no common alignment is possible. > > > > > + */ > > > > > +static struct cplx_align > > > > > +cplx_align_combine(struct cplx_align a, struct cplx_align b) > > > > > +{ > > > > > + cplx_align_assert_sane(a); > > > > > + cplx_align_assert_sane(b); > > > > > + > > > > > + /* Assert that the alignments agree. */ > > > > > + assert((a.offset & (b.mul - 1)) == (b.offset & (a.mul - 1))); > > > > > + > > > > > + return a.mul > b.mul ? a : b; > > > > > +} > > > > > + > > > > > +/** > > > > > + * Apply a complex alignment > > > > > + * > > > > > + * This function will return the smallest number greater than or > > equal > > > > to > > > > > + * offset that is aligned to align. > > > > > + */ > > > > > +static unsigned > > > > > +cplx_align_apply(struct cplx_align align, unsigned offset) > > > > > +{ > > > > > > > > Should we assert that offset >= 4? > > > > > > > > > > Not here, no. I intentionally want this code to be as generic as > > > possible. We may have reason to re-use it (or something similar) > > elsewhere > > > and I don't want to build in lots of push-constant assumptions. > > > > > > > > > > > + return ALIGN(offset - align.offset, align.mul) + align.offset; > > > > > > > > Soon I'm going to continue with glsl mediump support and thought > > better to > > > > read these patches. I stumbled on something (probably missing something > > > > again): > > > > > > > > > > :-) > > > > > > > > > > To me it looks there can ever be only three different instances of > > > > cplx_align: > > > > > > > > > > Today, yes, but there may come a day when we want to re-arrange with a > > > finer granularity than dwords. > > > > > > > > > > {8, 0}, {8, 4} and {4, 0} and that "offset" argument here is always > > > > multiple > > > > of 4. > > > > > > > > In case of align == {8, 0} "offset" gets aligned to 8, and in case of > > > > align == {4, 0} cplx_align_apply() is nop. > > > > > > > > But in case of {8, 4} and "offset" not multiple of 8, "offset" - 4 is > > > > multiple of 8 and returned value becomes ALIGN(offset - 4, 8) + 4 which > > > > in turn isn't aligned by 8. > > > > > > > > > > Correct, and it shouldn't be. Let's walk through the calculation with a > > > more complex alignment: {8, 6} (no, that isn't possible today but the > > > calculations should work with it) and some concrete numbers: 6, 8, 10, > > 12, > > > 14, 16 > > > > > > 6 -> ALIGN(6 - 6, 8) + 6 = 6 > > > 8 -> ALIGN(8 - 6, 8) + 6 = 14 > > > 10 -> ALIGN(10 - 6, 8) + 6 = 14 > > > 12 -> ALIGN(12 - 6, 8) + 6 = 14 > > > 14 -> ALIGN(14 - 6, 8) + 6 = 14 > > > 16 -> ALIGN(16 - 6, 8) + 6 = 22 > > > > > > As you can see, the result of each is the next number that is 6 more > > than a > > > multiple of 8. There is also a question about negative numbers. Let's > > > look at one more calculation, this time with more detail about the guts > > of > > > ALIGN(): > > > > > > 4 -> ALIGN(4 - 6, 8) + 6 = (((4 - 6) + (8 - 1)) & ~(8-1)) + 6 = (5 & ~7) > > + > > > 6 = 6 > > > > > > That one's a bit more complicated but I think you see what's going on. > > If > > > it makes things more clear, we could do something such as > > > > > > return (offset + (align.mul - align.offset - 1)) & ~(align.mul - 1)) + > > > align.offset; > > > > Big thanks for the good explanation. I was also wondering why I didn't > > fully > > get the documentation of cplx_align. I kept asking myself "but offset is > > always modulo of mul". Now the comment there makes sense :) Some of your > > reply > > here would probably go nicely there. > > > > Any preference whether it goes in the documentation of struct cplx_align or > cplx_align_apply? I'm leaning towards cplx_align
Sounds good. > > > > I'll read the whole thing again to see if there was anything else I wanted > > to > > ask. I have a few more questions. Replying again to the patch directly as things are getting too indented here. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev