On Tue, 2016-12-13 at 13:22 +0100, Samuel Iglesias Gonsálvez wrote: > On Sun, 2016-12-11 at 15:00 -0800, Matt Turner wrote: > > > > i965/vec4: handle 32 and 64 bit channels in liveness analysis > > > > Please indent the returned multiline expressions in > > var_from_reg() like we do elsewhere, so that the second line > > begins on the same column as the first line. > > > > */ goes on its own line. > > > > I'm having a hard time reviewing this one. The logic is rather > > complex. I'll ask someone to help me review it on Monday at the > > office. > > > OK, no problem!
I think for this one you want to ping Curro, he reviewed parts of it when Juan wrote it and he suggested changes so he is familiar with the solution. Iago > > > > i965/vec4: add a horiz_offset() helper > > i965: move the group field from fs_inst to backend_instruction. > > i965/vec4: add a SIMD lowering pass > > > > In the commit message, you say > > > > For now the pass only handles the gen7 restriction where > > any > > instruction that writes 2 registers also needs to read 2 > > registers. This affects double-precision instructions > > reading uniforms, for example. Later patches will extend > > the > > lowering pass adding a few more cases. > > > > But the rule about if-writing-two-regs, must-read-two-regs > > says that scalar sources are an exception: > > > > "When source is scalar, the source registers are not > > incremented." > > > > I don't see any code that allows us to avoid splitting an > > instruction if it's writing two registers but sourcing a scalar > > uniform. Maybe this doesn't apply because we have to use a non > > scalar swizzle (.xy) to access a single fp64 component? > > > Right, however, this is necessary in align16 because uniforms are > vectors (hstride != 0) so they are not scalars. > > > > > i965/vec4: make the generator set correct NibCtrl for SIMD4 DF > > instructions > > i965/vec4: dump NibCtrl for instructions with execsize != 8 > > i965/disasm: print NibCtrl for instructions with execsize < 8 > > i965/vec4: teach CSE about exec_size, group and doubles > > i965/vec4: teach cmod propagation about different execution sizes > > i965/vec4: split double-precision bcsel > > > > bcsel is the NIR opcode. I'd change references to bcsel to SEL. > > > > Very interesting find... > > > > i965/vec4: add a scalarization pass for double-precision > > instructions > > > > Don't indent case inside a switch. > > > > i965/vec4: translate 64-bit swizzles to 32-bit > > i965/vec4: implement access to DF source components Z/W > > > > Wow, bien hecho! > > > > i965/disasm: fix subreg for dst in Align16 mode > > i965/vec4: teach register coalescing about 64-bit > > i965/vec4: fix pack_uniform_registers for doubles > > i965/vec4: fix indentation in pack_uniform_registers > > i965/vec4: Skip swizzle to subnr in 3src instructions with DF > > operands > > > > s/need/needs/ in the comment. > > > > i965/vec4/nir: do not emit 64-bit MAD > > i965/vec4: do not emit 64-bit MAD > > > > I might change the name of this commit to "i965/vec4: Lower > > 64-bit MAD" or "i965/vec4: Lower DF MAD" > > > > I think I'd change the name of the function as well, maybe to > > lower_64bit_mad[_to_mul_add] or something. > > > OK, we will do the rename. > > > > > i965/vec4: support multiple dispatch widths and groups in the IR > > builder. > > i965/vec4: Add a shuffle_64bit_data helper > > > > I was initially confused by r0.0:DF/r0.1:DF, thinking that .1 > > in > > r0.1:DF was a subreg offset. But I think it's actually the > > register offset (i.e., .offset)? > > > > If that's the case, I think it would be clearer just to > > increment the register number in the example: > > > > r0.0:DF x0 y0 z0 w0 > > r1.0:DF x1 y1 z1 w1 > > > > s/opperation/operation/ in the comment. > > > > On the multiline bld.group(...), I think Curro's style is to > > align with the '.'. For instance, > > > > inst = bld.group(4, for_write ? 1 : 0) > > .MOV(writemask(dst, WRITEMASK_ZW), > > swizzle(byte_offset(src, REG_SIZE), > > BRW_SWIZZLE_XYXY)); > > > > so that group and MOV align, with the '.' on the same line as > > the MOV. > > > Regarding the example, yes, you are right. We are going to fix it. > Thanks for the rest of suggestions, we will do them too :-) > > > > > > i965/vec4: Fix UBO loads for 64-bit data > > i965/vec4: Fix SSBO loads for 64-bit data > > i965/vec4: Fix SSBO stores for 64-bit data > > i965/vec4: don't constant propagate 64-bit immediates > > i965/vec4: prevent copy-propagation from values with a different > > type > > size > > i965/vec4: Prevent copy propagation from violating pre-gen8 > > restrictions > > > > Similar comment as before about being allowed to write two > > registers while sourcing a scalar. Maybe doesn't apply because > > of the double swizzle. > > > Same reply as to the other question. > > > > > i965/vec4: don't propagate single-precision uniforms into 4-wide > > instructions > > i965/vec4: don't copy propagate misaligned registers > > i965/vec4: extend the DWORD multiply DepCtrl restriction to all > > gen8 > > platforms > > > > I don't see this in the BDW PRMs, and the internal > > documentation > > says that it applies to "CHV, BXT" > > > > I suggest dropping this patch (or replacing it with one that > > adds || devinfo->is_broxton). > > > It is present in BDW PRMs, Volume 7: 3D-Media-GPGPU, "Special > Requirements for Handling Double Precision Data Types", > page 837: > > "When source or destination datatype is 64b or operation is integer > DWord multiply, DepCtrl must not be used." > > Regarding Broxton, there is no public PRMs here [0]. So if your > internal docs say it is needed, then we must fix this patch. > > [0] https://01.org/linuxgraphics/documentation/hardware-specification > -p > rms > > > > > > i965/vec4: Do not use DepCtrl with 64-bit instructions > > i965/vec4: do not split scratch read/write opcodes > > i965/vec4: fix scratch offset for 64bit data > > i965/vec4: fix scratch reads for 64bit data > > i965/vec4: fix scratch writes for 64bit data > > i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64- > > bit data > > i965/vec4: fix indentation in > > move_push_constants_to_pull_constants() > > i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit > > data > > i965/vec4: make emit_pull_constant_load support 64-bit loads > > i965/vec4: fix indentation in lower_attributes_to_hw_regs() > > i965/vec4: fix attribute setup for doubles > > i965/vec4: fix store output for 64-bit types > > i965/vec4/gs: fix input loading for 64bit data > > i965/vec4/tcs: fix input loading for 64-bit data > > i965/vec4/tcs: fix outputs for 64-bit data > > i965/vec4/tes: fix input loading for 64bit data types > > i965/vec4/tes: fix setup_payload() for 64bit data types > > i965/vec4/tes: consider register offsets during attribute setup > > i965/vec4: dump subnr for FIXED_GRF > > i965/vec4: split instructions that read 64-bit interleaved > > attributes > > > > I think you want to make stage_uses_interleaved_attributes > > static. > > > > Also don't indent case inside switch. > > > > i965/vec4/scalarize_df: do not scalarize swizzles that we can > > support > > natively > > i965/vec4/scalarize_df: support more swizzles via vstride=0 > > i965/vec4: prevent src/dst hazards during 64-bit register > > allocation > > i965/vec4: run scalarize_df() after spilling > > > > Calling scalarize_df() on code that's already been scalarized > > seems a little scary... > > > > Are you sure that scalarize_df() will not modify any code that > > has already been scalarized? > > > Yes, it is safe. The scalarization pass operates on logical > channels, > so for something like: > > mov r0.xy:df r1.zw:df > > it produces: > > mov r0.x:df r1.z:dfmov r0.y:df r1.w:df > > and if we run the pass again nothing will change for these > instructions > because they only operate on a single channel, so we end up with the > same thing. It would have been a different story if we also > translated > the logical channels into 32-bit channels during the scalarization > process, but that won't happen until right before codegen precisely > so > we can avoid this sort of problems (specifically we do it in > convert_to_hw_regs). > > > > > i965/vec4: support basic spilling of 64-bit registers > > i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit > > access > > i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination > > i965/vec4: adjust spilling costs for 64-bit registers. > > i965/vec4: enable ARB_gpu_shader_fp64 for Haswell > > i965/gen7: expose OpenGL 4.0 on Haswell > > > > There's been a lot of discussion about this patch that I > > haven't > > been involved in. Presumably given my review of the previous > > 100 > > patches, someone previously involved can be bothered to > > rereview > > this one... > OK, no problem. Thanks for reviewing all the other patches! We will > ping Curro about this one since he was the one who triggered the v3. > > Sam _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev