| Issue |
83626
|
| Summary |
[DXIL] implement dot intrinsic lowering
|
| Labels |
new issue
|
| Assignees |
farzonl
|
| Reporter |
farzonl
|
There are three parts
First to do float\half dot products we need to support three opcodes with varying argument lengths
- `@dx.op.dot2.f32(i32 54 ...)` - 4 arguments a[0], a[1], b[0], b[1]
- `@dx.op.dot3.f32(i32 55 ...)` - 6 arguments a[0], a[1], a[2], b[0], b[1], b[2]
- `@dx.op.dot4.f32(i32 56 ...)` - 8 arguments a[0], a[1], a[2], a[3], b[0], b[1], b[2], b[3]
For each of these we will need to do 4 to 8 extract element before we call the intrinsic.
Part 1 would be to create a pass that flattens the vectors to scalars into the form shown above
Part2 is to modify DXIL.td to represent DIXIL ops
For good references on behavior see:
- https://godbolt.org/z/TbvshPchs
- [lib/HLSL/HLOperationLower.cpp](https://github.com/microsoft/DirectXShaderCompiler/blob/main/lib/HLSL/HLOperationLower.cpp#L6646C5-L6651C37)
Part 3 is to support integer dot products.
We will need to create a DxilTrinaryOperation which we don't currently have and further support the lowering of DXIL::OpCode::UMad and DXIL::OpCode::IMad
See
- https://godbolt.org/z/srbzrhMbq
- [TranslateIDot] (https://github.com/microsoft/DirectXShaderCompiler/blob/main/lib/HLSL/HLOperationLower.cpp#L2451C1-L2467C1)
The format for integer dot product is silightly different. You don't fetch all the vector indices up front you do them in stages
extract a[i] and b[i]
multiply a[i] and b[i]
extract a[i+1] and b[i+1]
Perform an IMad on (a[i+1], b[i+1],a[i]*b[i]) // `%IMad = call i32 @dx.op.tertiary.i32(i32 48, i32 %4, i32 %5, i32 %3)`
For every vector size increase you daisy chain the extract elements and IMad results.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs