On Tue, Aug 28, 2012 at 12:17:43PM +0300, Jussi Kivilinna wrote:
> With this patch twofish-avx is faster than twofish-3way for 256, 1k
> and 8k tests.
>
> sizeold-vs-new new-vs-3way old-vs-3way
> ecb-enc ecb-dec ecb-enc ecb-dec ecb-enc ecb-dec
> 256 1.10x 1.11x 1.01x
Quoting Borislav Petkov :
On Wed, Aug 22, 2012 at 10:20:03PM +0300, Jussi Kivilinna wrote:
Actually it does look better, at least for encryption. Decryption
had different
ordering for test, which appears to be bad on bulldozer as it is on
sandy-bridge.
So, yet another patch then :)
Here yo
On Wed, Aug 22, 2012 at 10:20:03PM +0300, Jussi Kivilinna wrote:
> Actually it does look better, at least for encryption. Decryption had
> different
> ordering for test, which appears to be bad on bulldozer as it is on
> sandy-bridge.
>
> So, yet another patch then :)
Here you go:
[ 153.736745
Quoting Jason Garrett-Glaser :
On Wed, Aug 22, 2012 at 12:20 PM, Jussi Kivilinna
wrote:
Quoting Borislav Petkov :
On Wed, Aug 22, 2012 at 07:35:12AM +0300, Jussi Kivilinna wrote:
Looks that encryption lost ~0.4% while decryption gained ~1.8%.
For 256 byte test, it's still slightly slower t
On Wed, Aug 22, 2012 at 12:20 PM, Jussi Kivilinna
wrote:
> Quoting Borislav Petkov :
>
>> On Wed, Aug 22, 2012 at 07:35:12AM +0300, Jussi Kivilinna wrote:
>>> Looks that encryption lost ~0.4% while decryption gained ~1.8%.
>>>
>>> For 256 byte test, it's still slightly slower than twofish-3way
>>>
Quoting Borislav Petkov :
> On Wed, Aug 22, 2012 at 07:35:12AM +0300, Jussi Kivilinna wrote:
>> Looks that encryption lost ~0.4% while decryption gained ~1.8%.
>>
>> For 256 byte test, it's still slightly slower than twofish-3way
>> (~3%). For 1k
>> and 8k tests, it's ~5% faster.
>>
>> Here's very
On Wed, Aug 22, 2012 at 07:35:12AM +0300, Jussi Kivilinna wrote:
> Looks that encryption lost ~0.4% while decryption gained ~1.8%.
>
> For 256 byte test, it's still slightly slower than twofish-3way (~3%). For 1k
> and 8k tests, it's ~5% faster.
>
> Here's very last test-patch, testing different
Quoting Borislav Petkov :
>
> Here you go:
>
> [ 52.282208]
> [ 52.282208] testing speed of async ecb(twofish) encryption
Thanks!
Looks that encryption lost ~0.4% while decryption gained ~1.8%.
For 256 byte test, it's still slightly slower than twofish-3way (~3%). For 1k
and 8k tests, it'
On Fri, Aug 17, 2012 at 10:37:10AM +0300, Jussi Kivilinna wrote:
> I made few further changes, mainly moving/interleaving 'vmovq/vpextrq'
> ahead so they should be completed before those target registers are
> needed. This only gave 0.5% increase on Sandy-bridge, but might help
> more on Bulldozer.
Quoting Borislav Petkov :
>
> Yep, looks better than the previous run and also a bit better or on par
> with the initial run I did.
>
I made few further changes, mainly moving/interleaving 'vmovq/vpextrq' ahead
so they should be completed before those target registers are needed. This
only gave 0
Quoting Borislav Petkov :
On Wed, Aug 15, 2012 at 08:34:25PM +0300, Jussi Kivilinna wrote:
About ~5% slower, probably because I was tuning for sandy-bridge and
introduced more FPU<=>CPU register moves.
Here's new version of patch, with FPU<=>CPU moves from original
implementation.
(Note: also
On Wed, Aug 15, 2012 at 08:34:25PM +0300, Jussi Kivilinna wrote:
> About ~5% slower, probably because I was tuning for sandy-bridge and
> introduced more FPU<=>CPU register moves.
>
> Here's new version of patch, with FPU<=>CPU moves from original
> implementation.
>
> (Note: also changes encryptio
Quoting Borislav Petkov :
> On Wed, Aug 15, 2012 at 05:22:03PM +0300, Jussi Kivilinna wrote:
>
>> Patch replaces 'movb' instructions with 'movzbl' to break false
>> register dependencies and interleaves instructions better for
>> out-of-order scheduling.
>>
>> Also move common round code to separa
On Wed, Aug 15, 2012 at 05:22:03PM +0300, Jussi Kivilinna wrote:
> Patch replaces 'movb' instructions with 'movzbl' to break false
> register dependencies and interleaves instructions better for
> out-of-order scheduling.
>
> Also move common round code to separate function to reduce object
> size
> On Wed, Aug 15, 2012 at 04:48:54PM +0300, Jussi Kivilinna wrote:
> > I posted patch that optimize twofish-avx few weeks ago:
> > http://marc.info/?l=linux-crypto-vger&m=134364845024825&w=2
> >
> > I'd be interested to know, if this is patch helps on Bulldozer.
>
> Sure, can you inline it here to
On Wed, Aug 15, 2012 at 04:48:54PM +0300, Jussi Kivilinna wrote:
> I posted patch that optimize twofish-avx few weeks ago:
> http://marc.info/?l=linux-crypto-vger&m=134364845024825&w=2
>
> I'd be interested to know, if this is patch helps on Bulldozer.
Sure, can you inline it here too please. The
Quoting Borislav Petkov :
Ok, here we go. Raw data below.
Thanks alot!
Twofish-avx appears somewhat slower than 3way, ~9% slower with 256byte
blocks to ~3% slower with 8kb blocks.
Let me know if you need more tests.
I posted patch that optimize twofish-avx few weeks ago:
http:
Ok, here we go. Raw data below.
On Wed, Aug 15, 2012 at 02:00:16PM +0300, Jussi Kivilinna wrote:
> >And if you tell me exactly how to run the tests and on what kernel,
> >I'll try to do so.
Ok, the box is a single-socket Bulldozer: "AMD FX(tm)-8100 Eight-Core
Processor stepping 02"; kernel is 3.6
Quoting Borislav Petkov :
On Wed, Aug 15, 2012 at 11:42:16AM +0300, Jussi Kivilinna wrote:
I started thinking about the performance on AMD Bulldozer.
vmovq/vmovd/vpextr*/vpinsr* between FPU and general purpose registers
on AMD CPU is alot slower (latencies from 8 to 12 cycles) than on
Intel san
On Wed, Aug 15, 2012 at 11:42:16AM +0300, Jussi Kivilinna wrote:
> I started thinking about the performance on AMD Bulldozer.
> vmovq/vmovd/vpextr*/vpinsr* between FPU and general purpose registers
> on AMD CPU is alot slower (latencies from 8 to 12 cycles) than on
> Intel sandy-bridge (where instr
Quoting Johannes Goetzfried
:
This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in general-purpose
registers.
For small blocksizes the 3way-p
21 matches
Mail list logo