Issue |
129978
|
Summary |
AMDGPU should form s_addk_i32 and s_mulk_i32 earlier
|
Labels |
backend:AMDGPU
|
Assignees |
|
Reporter |
arsenm
|
Currently AMDGPU forms these in SIShrinkInstructions, which runs twice. The first run sets a register hint that the input and the result should be the same register. The second post-regalloc runs performs the s_add_i32 -> s_addk_i32 / s_mul_i32 -> s_mulk_i32 transform if the source and destination registers are the same.
This kind of works, but could be better. When touching the allocator and copy related patches, I regularly see regressions where the formation of the K forms is broken. Using tied operands to begin with is a much stronger hint to the allocator to prefer the tied form. This is how we handle FMA vs. fmac cases.
This can be done in 3 parts:
1. Handle s_addk_i32 -> s_add_i32 and s_mulk_i32 -> s_mul_i32 in convertToThreeAddress
2. Teach FoldImmediate to form the K forms from the non-K forms
3. Modify constant selection patterns to directly emit the K forms for valid immediate
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs