Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

Emanuel Peter Fri, 05 Jan 2024 02:06:51 -0800

On Fri, 5 Jan 2024 07:03:34 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:


>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307:
>> 
>>> 5305:     assert(bt == T_LONG || bt == T_DOUBLE, "");
>>> 5306:     vmovmskpd(rtmp, mask, vec_enc);
>>> 5307:     shlq(rtmp, 5);
>> 
>> Might this need to be 6? If I understand right, then you want to have a 
>> 64bit stride, hence 2^6, right?
>> If that is correct, then this did not show in your tests, and you need a 
>> regression test anyway.
>
> This computes the byte offset from start of the table, both integer and long 
> permute table have same row sizes, 8 int elements vs 4 long elements.

Ah, I understand now. Maybe leave a comment for that?

>> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java 
>> line 76:
>> 
>>> 74:         longinCol = new long[size];
>>> 75:         longoutCol = new long[size];
>>> 76:         lpivot = size / 2;
>> 
>> I'd be interested to see what happens if you move up or down the "density" 
>> of elements that you accept. Would the simple branch prediction be faster if 
>> the density is low enough, i.e. we almost take no element.
>> 
>> Though maybe that is not compiler problem but a user-problem?
>
> Included fuzzy filter micro with varying mask density.
> ![image](https://github.com/openjdk/jdk/assets/59989778/a6af21cc-36c0-4503-aeb3-e66b862da2e1)

You are using `VectorMask<Integer> pred = VectorMask.fromLong(ispecies, 
maskctr++);`.
That basically systematically iterates over all masks, which is nice for a 
correctness test.
But that would use different density inside one test run, right? The average 
over the loop is still at `50%`, correct?

I was thinking more a run where the percentage over the whole loop is lower 
than maybe `1%`. That would get us to a point where maybe the branch prediction 
of non-vectorized code might be faster, what do you think?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442670411
PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442676633

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

Reply via email to