Neil Roberts <n...@linux.intel.com> writes: > Ok, that makes sense, thanks for the detailed explanation. I think > you're right, given the extra restrictions avoiding the integer dword > multiplication seems like the simplest solution. > > I can't say that I fully understand what the documentation is trying to > say here. Does it mean that the horizontal strides must be set such that > each element in the source is offset to the same qword within its > register as the corresponding element in the destination? >
The way I understand it one's required to: - Align the stride in bytes (i.e. horizontal stride times the size of the register type) between components of an individual source or destination to a multiple of 64b (if the register region is more than a single scalar that is). The multiple must be the same for all non-scalar sources and the destination. - Use the same sub-register offsets for all non-scalar sources and the destination. - Set vertical stride to horizontal stride times width. - Don't use indirect addressing or ARF registers. > I tried changing the test caseĀ¹ so that one of the sources comes from a > uniform. That would make the register for the uniform source be a scalar > and thus all of its elements would come from the first qword, whereas > for the destination the elements would be written to different qwords. > That ends up with an instruction like this: > > mul(16) g120<1>D g9<8,8,1>D g6<0,1,0>D { align1 > 1H compacted }; > Yeah, this should definitely violate the alignment rule on both the destination and first source. > However, the test still passes on my BXT so I've probably misunderstood > something. On the other hand this does look similar to the stride you > had in your example instruction that the simulator complained about. > > Given that the documentation and simulator implies this shouldn't work > I'm happy to leave this patch and I'm not suggesting we do anything now. > Honestly I'm also not 100% sure if this restriction really applies to all BXT hardware or if it's specific to some preproduction steppings -- The documentation seems to imply the former, but I guess it could be wrong. > Regards, > - Neil > > 1. > https://github.com/bpeel/piglit/blob/test-integer-multiply-uniform/tests/general/mult32.c > > Francisco Jerez <curroje...@riseup.net> writes: > >> Neil Roberts <n...@linux.intel.com> writes: >> >>> Are you sure this patch is necessary? The documentation for the multiply >>> instruction on BDW+ says: >>> >>> SourceType : *D >>> DestinationType : *D >>> Project : EXCLUDE(CHV) >>> >>> This to me implies that it should work on BXT because it doesn't say >>> EXCLUDE(BXT). >>> >> >> In fact both CHV and BXT *can* support 32x32 multiplication, that's not >> really the reason that motivated this patch -- The problem is that 32x32 >> multiplication has QWORD execution type which has several restrictions >> on CHV and BXT, the annoying one is: >> >> "CHV, BXT >> >> When source or destination datatype is 64b or operation is integer DWord >> multiply, regioning in Align1 must follow these rules: >> >> Source and Destination horizontal stride must be aligned to the same >> qword. >> >> Example: >> [...] >> // mul (4) r10.0<2>:d r11.0<8;4,2>:d r12.0<8;4,2>:d // Source and >> Destination stride must be 2 since the execution type is Qword." >> >> So it sounds like we could use the native 32x32 multiply with some >> additional effort to pass the arguments with the correct stride, but >> it's not clear to me whether the solution would be any more better than >> splitting up the multiply into 32x16 multiplies, so doing the same as in >> CHV and pre-Gen8 seemed like the most KISS solution for now. >> >>> I made a little test case and ran it on my BXT and it seems to work even >>> without this patch. I looked at the assembler source and it is >>> definitely doing a MUL instruction with D types for the dst and two >>> sources. >>> >>> https://github.com/bpeel/piglit/blob/test-integer-multiply/tests/general/mult32.c >>> >> Hmm, I guess the docs could be wrong, but I'm not sure what the >> consequences would be of violating the alignment rule, I guess the >> failure may be non-deterministic. What stepping of BXT did you test >> this on? >> >>> Regards, >>> - Neil >>> >>> Francisco Jerez <curroje...@riseup.net> writes: >>> >>>> AFAIK BXT has the same annoying alignment limitation as CHV on the >>>> source register regions of 32x32 bit MULs, give it the same treatment. >>>> --- >>>> src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp >>>> index 244f299..fc9f007 100644 >>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >>>> @@ -3126,9 +3126,9 @@ fs_visitor::lower_integer_multiplication() >>>> bool progress = false; >>>> >>>> /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit operation >>>> - * directly, but Cherryview cannot. >>>> + * directly, but CHV/BXT cannot. >>>> */ >>>> - if (devinfo->gen >= 8 && !devinfo->is_cherryview) >>>> + if (devinfo->gen >= 8 && !devinfo->is_cherryview && >>>> !devinfo->is_broxton) >>>> return false; >>>> >>>> foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { >>>> -- >>>> 2.4.6 >>>> >>>> _______________________________________________ >>>> mesa-dev mailing list >>>> mesa-dev@lists.freedesktop.org >>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev