>> To fix it, is it necessary to support 'vec_unpack' ?
>
> both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer
> ties its hands by choosing vector types early and based on the number
> of incoming/outgoing vectors it chooses one or the other method.
>
> More precise dumping w
> it's target dependent what we choose first so it's going to be
> a bit difficult to adjust testcases like this (and it looks like
> a testsuite issue). I think for this specific testcase changing
> scan-tree-dump-times to scan-tree-dump is reasonable. Note we
> really want to check that for the
>> I am wondering whether we do have some situations that
>> vec_pack/vec_unpack/vec_widen_xxx/dot_prod pattern can be
>> beneficial for RVV ? I have ever met some situation that vec_unpack
>> can be beneficial when working on SELECT_VL but I don't which
>> case
>
> With fixed size vectors y
> the dump-scans. Can we do sth like
> "vect_recog_dot_prod_pattern: detected\n(!FAILED)*SUCCEEDED", thus
> after the dot-prod pattern dumping allow arbitrary stuff but _not_
> a "failed" and then require a "succeeded"?
It took some fighting with tcl syntax until I arrived at the regex
pattern be
> Hi,
>
> I think gcc is relying on undefined behaviour with the vcompress instruction.
> Unfortunately my test case isn't reproducing on mainline, but gcc looks to
> use the fields between the last mask selected field and vl while setting
> tail agnostic.
>
> This thread explains how vcompress is
I am revisiting an effort to make the number of lanes for vector segment
load/store a tunable parameter.
A year ago, Robin added minimal and not-yet-tunable
common_vector_cost::segment_permute_[2-8]
But it is tunable, just not a param? :) We have our own cost structure in our
downstream repo,
You won't see failures in the testsuite. The failures only show-up when I
attempt to impose huge costs on NF above threshold. A quick & dirty way to
expose the bug is apply the appended patch, then observe that you get output
from this only for mask_struct_store-*.c and not for mask_struct_load-*.
There are two levels of dysfunction here:
1. Why spill & fill through the stack? Why not extract scalars directly
from vregs
directly into scalar regs?
2. Why involve scalar registers at all? Why not vslide or even vrgather,
using
temporary vregs as necessary?
That's how expmed does