Hi,
2014-08-23 17:48 GMT+02:00 Mickaël Raulet :
> For avx2 I have some to push to the trunk, I did merge it yesterday with
> all recent changes. But I don t remember what those tables looks like.
Well, my point was hypothetical, but I guess this means some conflicts
are to be expected when either
For avx2 I have some to push to the trunk, I did merge it yesterday with
all recent changes. But I don t remember what those tables looks like.
For 10 and 12bits, ssse3 should slow down the decoding since it uses 4 more
instructions in the loop.
Le samedi 23 août 2014, Christophe Gisquet a
écrit
On 23/08/14 12:20 PM, Christophe Gisquet wrote:
> Hi,
>
> 2014-08-23 17:16 GMT+02:00 James Almer :
>>> What do you mean by duplicated? That tables for 10 and 12 are?
> [...]
>> I was talking about the opt suffix since both the ssse3 and sse4 tables will
>> be the same.
>
> Oh ok, in case we have
Hi,
2014-08-23 17:16 GMT+02:00 James Almer :
>> What do you mean by duplicated? That tables for 10 and 12 are?
[...]
> I was talking about the opt suffix since both the ssse3 and sse4 tables will
> be the same.
Oh ok, in case we have to instantiate sse4 versions. Because at the
moment there are o
On 23/08/14 12:11 PM, Christophe Gisquet wrote:
> Hi,
>
> 2014-08-23 16:52 GMT+02:00 James Almer :
>>> -QPEL_TABLE 8, 8, b, sse4
>>> -QPEL_TABLE 10, 4, w, sse4
>>> -QPEL_TABLE 12, 4, w, sse4
>>> +QPEL_TABLE 8, 8, b, ssse3
>>> +QPEL_TABLE 10, 4, w, ssse3
>>> +QPEL_TABLE 12, 4, w, ssse3
>>
>> Do t
Hi,
2014-08-23 16:52 GMT+02:00 James Almer :
>> -QPEL_TABLE 8, 8, b, sse4
>> -QPEL_TABLE 10, 4, w, sse4
>> -QPEL_TABLE 12, 4, w, sse4
>> +QPEL_TABLE 8, 8, b, ssse3
>> +QPEL_TABLE 10, 4, w, ssse3
>> +QPEL_TABLE 12, 4, w, ssse3
>
> Do these need to be duplicated? You could just remove the suffix a
On 23/08/14 10:22 AM, Christophe Gisquet wrote:
> The only sse4 instruction is pextrw, which is used on rather minor
> functions for small blocks. Therefore use whichever GPR is available
> to extract the output word.
>
> Before (sse4), for block_w == 6:
> 4627 decicycles in epel_uni, 16377 runs,