On Fri, 3 Jan 2025 08:48:37 GMT, Emanuel Peter <epe...@openjdk.org> wrote:
>> That's right. Neoverse V2 is 4 pipes of 128 bits, V1 is 2 pipes of 256 bits. >> That comment is "interesting". Maybe it should be tunable by the back end. >> Given that Neoverse V2 can issue 4 SVE operations per clock cycle, it might >> still be a win. >> >> Galder, how about you disable that line and give it another try? > > FYI: I'm working on removing the line > [here](https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/share/opto/superword.cpp#L1564-L1566). > > The issue is that on some platforms 2-element vectors are somehow really > slower, and we need a cost-model to give us a better heuristic, rather than > the hard "no". See my draft https://github.com/openjdk/jdk/pull/20964. > > But yes: why don't you remove the line, and see if that makes it work. If so, > then don't worry about this case for now, and maybe leave a comment in the > test. We can then fix that later. Yeah, this limit limits reductions like this working on 128 bit registers: // Length 2 reductions of INT/LONG do not offer performance benefits if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) { retValue = false; I've tried today to remove that but then the profitable checks fail to pass. So, I'm not going down that route now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1908608309