On Mon, 2016-12-05 at 15:21 -0800, Matt Turner wrote: > On 10/11, Iago Toral Quiroga wrote: > > > > It's been some time since > ... anyone has reviewed your patches. Sorry. :( > > I'm going to review from your rebased i965-fp64-gen7-scalar-vec4-rc2 > branch. There have probably been some reorderings or other changes > due > to rebasing since the patches were sent, so I'm going to paste the > list > of patches below and then attempt to list any review comments after > the > patch name.
Hey Matt, thanks for reviewing this! I am on holidays this week but I'll go through all your comments starting Monday next week. Iago > > A couple of patches have an extra newline in the commit message > between > *-by: tags. Would be nice to make a pass through and fix that. > > > i965/nir: double/dvec2 uniforms only need to be padded to a single > vec4 slot > i965/vec4/nir: simplify glsl_type_for_nir_alu_type() > i965/vec4/nir: allocate two registers for dvec3/dvec4 > i965/vec4/nir: Add bit-size information to types > i965/vec4/nir: support doubles in ALU operations > i965/vec4/nir: set the right type for 64-bit registers > i965/vec4/nir: fix emitting 64-bit immediates > i965/vec4: add support for printing DF immediates > i965/vec4: add double/float conversion pseudo-opcodes > > I wonder if we should allow MOV F/DF and DF/F operations in the > IR and then have a lowering pass that "legalizes" them. I'm > happy to leave that experiment for after this series lands. > > i965/vec4: translate d2f/f2d > i965: add brw_vecn_grf() > i965/vec4: set correct register regions for 32-bit and 64-bit > i965/disasm: align16 DF source regions have a width of 2 > > It's actually kind of weird to print width and horizontal > stride > for align16 sources, since they don't exist in the instruction > word. We should probably print only the vertical stride. I > don't > care if that's fixed a part of this series. > > i965/vec4: We only support 32-bit integer ALU operations for now > i965/vec4: add dst_null_df() > i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes > i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes > > If I understand correctly, these opcodes map to instructions > like > > mov(XXX) dst<1>UD src<8,4,2>:UD > > Is the exec_size 4? I ask, because if it's 8 (and the source > region spans two registers and the dest region spans one) > that's > not a legal instruction. If it's 4, then it's legal. > > i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT > i965/vec4: don't copy propagate vector opcodes that operate in align1 > mode > i965/vec4: implement double unpacking > > This emits > > MOV dvec4_tmp, op[0] > PICK_LO/HI uvec4_tmp, dvec4_tmp > MOV dst, uvec4_tmp > > I'm confused about the purpose of the MOVs. It seems like op[0] > should already be a dvec and dst should already be a uvec. > > i965/vec4: implement double packing > > More or less the same thing here. Looks like we don't need all > of the MOVs. > > i965/vec4/nir: implement double comparisons > > Trivial: A newline before the if() would be nice. > > I have a memory of Curro telling me that the hardware maps each > 32-bit chunk in the dst to a single bit in the flag register. > Maybe that's only on IVB, and maybe I'm misremembering. I'm > concerned that while the PICK_LOW+MOV will properly handle the > result that is written to the destination, the result written > to > the flag register might be incorrect. > > My commit d9b09f8a30 fixed some problems that seems similar in > my mind. > > i965/vec4: fix indentation in get_nir_src() > i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations > i965/vec4: make opt_vector_float ignore doubles > i965/vec4: fix register allocation for 64-bit undef sources > i965/vec4: Rename DF to/from F generator opcodes > > I'm not sure replacing "float" with "single" implies that the > opcodes can handle other 32-bit (integer) types, since "single" > is actually the name of the "float" type in some other > programming languages. > > Maybe call them VEC4_OPCODE_TO_DOUBLE and > VEC4_OPCODE_FROM_DOUBLE? > > i965/vec4: add helpers for conversions to/from doubles > > Same thing here. > > Also, same confusion about the purpose of the MOVs. > > i965/vec4: implement hardware workaround for align16 double to float > conversion > > This always seemed like a really strange hardware bug, and one > that no one should ever hit. > > I'd prefer that, instead of loading an immediate double and > then > performing a conversion to float, that we just convert the > double to float in the compiler and emit an instruction to load > that. > > i965/vec4: implement d2i, d2u, i2d and u2d > i965/vec4: implement d2b > > Trivial: s/Curo/Curro/ in commit message. > > Trivial: The comment says "predicated MOV", but it's actually a > MOV with conditional_mod. > > i965/vec4: implement fsign() for doubles > > Trivial: v2 comment and comment in code say "predicated MOV" > like previous patch > > i965/vec4: fix optimize predicate for doubles > > I guess I'm pleasantly surprised that the any/all predicates do > the right thing in the presence of doubles... > > (Are you sure? :) > > i965/vec4: add a helper function to create double immediates > > Can leave for later: Shouldn't we use the DIM instruction (on > HSW)? > > I'm not sure if this should be fixed now or later, but > shouldn't > we use NibCtrl on these two instructions instead of > force_writemask_all? I think this is a case where NibCtrl is > useful. > > i965/vec4: use the new helper function to create double immediates > i965: move exec_size from fs_instruction to backend_instruction > i965/vec4: fix size_written for doubles > i965/vec4: fix regs_read() for doubles > i965/vec4: use the IR's execution size > i965/vec4: dump the instruction execution size > > > > ==== I made it to here today. ==== > > Anything with no or trivial feedback is > > Reviewed-by: Matt Turner <matts...@gmail.com> > > I will continue with these tomorrow: > > i965/vec4: handle 32 and 64 bit channels in liveness analysis > i965/vec4: add a horiz_offset() helper > i965: move the group field from fs_inst to backend_instruction. > i965/vec4: add a SIMD lowering pass > i965/vec4: make the generator set correct NibCtrl for SIMD4 DF > instructions > i965/vec4: dump NibCtrl for instructions with execsize != 8 > i965/disasm: print NibCtrl for instructions with execsize < 8 > i965/vec4: teach CSE about exec_size, group and doubles > i965/vec4: teach cmod propagation about different execution sizes > i965/vec4: split double-precision bcsel > i965/vec4: add a scalarization pass for double-precision instructions > i965/vec4: translate 64-bit swizzles to 32-bit > i965/vec4: implement access to DF source components Z/W > i965/disasm: fix subreg for dst in Align16 mode > i965/vec4: teach register coalescing about 64-bit > i965/vec4: fix pack_uniform_registers for doubles > i965/vec4: fix indentation in pack_uniform_registers > i965/vec4: Skip swizzle to subnr in 3src instructions with DF > operands > i965/vec4/nir: do not emit 64-bit MAD > i965/vec4: do not emit 64-bit MAD > i965/vec4: support multiple dispatch widths and groups in the IR > builder. > i965/vec4: Add a shuffle_64bit_data helper > i965/vec4: Fix UBO loads for 64-bit data > i965/vec4: Fix SSBO loads for 64-bit data > i965/vec4: Fix SSBO stores for 64-bit data > i965/vec4: don't constant propagate 64-bit immediates > i965/vec4: prevent copy-propagation from values with a different type > size > i965/vec4: Prevent copy propagation from violating pre-gen8 > restrictions > i965/vec4: don't propagate single-precision uniforms into 4-wide > instructions > i965/vec4: don't copy propagate misaligned registers > i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 > platforms > i965/vec4: Do not use DepCtrl with 64-bit instructions > i965/vec4: do not split scratch read/write opcodes > i965/vec4: fix scratch offset for 64bit data > i965/vec4: fix scratch reads for 64bit data > i965/vec4: fix scratch writes for 64bit data > i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64- > bit data > i965/vec4: fix indentation in move_push_constants_to_pull_constants() > i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit > data > i965/vec4: make emit_pull_constant_load support 64-bit loads > i965/vec4: fix indentation in lower_attributes_to_hw_regs() > i965/vec4: fix attribute setup for doubles > i965/vec4: fix store output for 64-bit types > i965/vec4/gs: fix input loading for 64bit data > i965/vec4/tcs: fix input loading for 64-bit data > i965/vec4/tcs: fix outputs for 64-bit data > i965/vec4/tes: fix input loading for 64bit data types > i965/vec4/tes: fix setup_payload() for 64bit data types > i965/vec4/tes: consider register offsets during attribute setup > i965/vec4: dump subnr for FIXED_GRF > i965/vec4: split instructions that read 64-bit interleaved attributes > i965/vec4/scalarize_df: do not scalarize swizzles that we can support > natively > i965/vec4/scalarize_df: support more swizzles via vstride=0 > i965/vec4: prevent src/dst hazards during 64-bit register allocation > i965/vec4: run scalarize_df() after spilling > i965/vec4: support basic spilling of 64-bit registers > i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit > access > i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination > i965/vec4: adjust spilling costs for 64-bit registers. > i965/vec4: enable ARB_gpu_shader_fp64 for Haswell > i965/gen7: expose OpenGL 4.0 on Haswell _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev