Re: Redundant sign-extension instructions on RISC-V

2017-09-06 Thread Richard Henderson
On 08/29/2017 05:36 PM, Michael Clark wrote:
> We’re investigating an issue with redundant sign-extension instructions being 
> emitted with the riscv backend. Firstly I would like to state that riscv is 
> possibly a unique backend with respect to its canonical sign-extended 
> register form due to the following:
> 
> - POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit 
> operations on RV64 is sign-extended not zero-extended.
> 
> - PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to 
> signed DI mode values holding SI mode subregs
> 
> - RISC-V does not have register aliases for these different modes, rather 
> different instructions are selected to operate on different register widths.

This is identical to Alpha.

> - The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, etc.
> 
> - Some instructions such as logical ops only have full word width variants 
> (AND, OR, XOR) but these instructions maintain canonical form as there is no 
> horizontal movement of bits.

Alpha only had 32-bit instructions for for add/sub/mul that also sign-extend,
so there is a little less scope for optimization.

> testcase 1:
> 
> $ cat bswap.c
> unsigned bswap(unsigned p) {
>   return ((p >> 24) & 0x00ff) | ((p << 8 ) & 0x00ff) | 
>((p >> 8 ) & 0xff00) | ((p << 24) & 0xff00);
> }
> $ cat bswap.s
> bswap2:
>   sllwa3,a0,24
>   srlwa5,a0,24
>   sllwa4,a0,8
>   or  a5,a5,a3
>   li  a3,16711680
>   and a4,a4,a3
>   or  a5,a5,a4
>   li  a4,65536
>   add a4,a4,-256
>   srlwa0,a0,8
>   and a0,a0,a4
>   or  a0,a5,a0
>   sext.w  a0,a0   # redundant
>   ret

I think the easiest solution to this is for combine to notice when IOR has
operands with non-zero-bits that do not overlap, convert the operation to ADD.
That allows the final two insns to fold to "addw" and the compiler need do no
further analysis.

That may even allow better code generation in other ways, such as a
three-operand LEA insn on two-operand architectures (x86, s390x).  Or for a
slightly different test case, perhaps further folding to a shift-add insn.

> $ cat rshift.c
> unsigned rs24(unsigned rs1) { return rs1 >> 24; }
> $ cat rshift.s
> rs24:
>   sllwa0,a0,24
>   sext.w  a0,a0   # redundant
>   ret

That seems like a missing check somewhere (combine? simplify-rtx? both?) for
SUBREG_PROMOTED_SIGNED_P.  Since Alpha didn't have a 32-bit shift you're in new
territory for this one.


r~


Re: Redundant sign-extension instructions on RISC-V

2017-09-06 Thread Jeff Law
On 09/06/2017 10:43 AM, Richard Henderson wrote:
> On 08/29/2017 05:36 PM, Michael Clark wrote:
>> We’re investigating an issue with redundant sign-extension instructions 
>> being emitted with the riscv backend. Firstly I would like to state that 
>> riscv is possibly a unique backend with respect to its canonical 
>> sign-extended register form due to the following:
>>
>> - POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit 
>> operations on RV64 is sign-extended not zero-extended.
>>
>> - PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to 
>> signed DI mode values holding SI mode subregs
>>
>> - RISC-V does not have register aliases for these different modes, rather 
>> different instructions are selected to operate on different register widths.
> 
> This is identical to Alpha.
> 
>> - The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, 
>> etc.
>>
>> - Some instructions such as logical ops only have full word width variants 
>> (AND, OR, XOR) but these instructions maintain canonical form as there is no 
>> horizontal movement of bits.
> 
> Alpha only had 32-bit instructions for for add/sub/mul that also sign-extend,
> so there is a little less scope for optimization.
> 
>> testcase 1:
>>
>> $ cat bswap.c
>> unsigned bswap(unsigned p) {
>>   return ((p >> 24) & 0x00ff) | ((p << 8 ) & 0x00ff) | 
>>((p >> 8 ) & 0xff00) | ((p << 24) & 0xff00);
>> }
>> $ cat bswap.s
>> bswap2:
>>  sllwa3,a0,24
>>  srlwa5,a0,24
>>  sllwa4,a0,8
>>  or  a5,a5,a3
>>  li  a3,16711680
>>  and a4,a4,a3
>>  or  a5,a5,a4
>>  li  a4,65536
>>  add a4,a4,-256
>>  srlwa0,a0,8
>>  and a0,a0,a4
>>  or  a0,a5,a0
>>  sext.w  a0,a0   # redundant
>>  ret
> 
> I think the easiest solution to this is for combine to notice when IOR has
> operands with non-zero-bits that do not overlap, convert the operation to ADD.
> That allows the final two insns to fold to "addw" and the compiler need do no
> further analysis.
I thought we had combine support for that.  I don't think it's ever been
particularly good though.  With your alpha background you're probably
more familiar with it than anyone -- IIRC it fell out of removal of low
order bit masking to deal with alignment issues on the Alpha.

I wrote some match.pd patterns to do it in gimple as part of a larger
problem.  The work as a whole on that larger problem ultimately didn't
pan out (generated even worse code than what we have on the trunk).  But
it might be possible to resurrect those patterns and see if they are
useful independently.  My recollection was I was looking at low order
bits only, but the concepts were general enough.



> 
>> $ cat rshift.c
>> unsigned rs24(unsigned rs1) { return rs1 >> 24; }
>> $ cat rshift.s
>> rs24:
>>  sllwa0,a0,24
>>  sext.w  a0,a0   # redundant
>>  ret
> 
> That seems like a missing check somewhere (combine? simplify-rtx? both?) for
> SUBREG_PROMOTED_SIGNED_P.  Since Alpha didn't have a 32-bit shift you're in 
> new
> territory for this one.
Yea.  I'd also expect zero/nonzero bits tracking in combine to catch
this.  Shouldn't the sign bit be known to be zero after the shift which
makes the extension redundant regardless of the SUBREG_PROMOTED flag?

jeff


Re: Redundant sign-extension instructions on RISC-V

2017-09-06 Thread Richard Henderson
On 08/30/2017 02:43 AM, Michael Clark wrote:
> POINTERS_EXTEND_UNSIGNED -1 (which is true) is defined on some targets. I 
> assume they sign-extend but the meaning has been overloaded.

Just for your edification, this is for e.g. ia64's "addp4" instruction and it
is not a normal extension.  A 2-bit segment tag of a 32-bit pointer is remapped
to a 3-bit segment tag of a 64-bit pointer and also zeros bits {60:32}.

If you care,

  http://refspecs.linuxbase.org/IA64-softdevman-vol3

has a nice picture that better describes this.


r~


Re: Redundant sign-extension instructions on RISC-V

2017-09-06 Thread Richard Henderson
On 09/06/2017 09:53 AM, Jeff Law wrote:
>> I think the easiest solution to this is for combine to notice when IOR has
>> operands with non-zero-bits that do not overlap, convert the operation to 
>> ADD.
>> That allows the final two insns to fold to "addw" and the compiler need do no
>> further analysis.
> I thought we had combine support for that.  I don't think it's ever been
> particularly good though.  With your alpha background you're probably
> more familiar with it than anyone -- IIRC it fell out of removal of low
> order bit masking to deal with alignment issues on the Alpha.

Yes, but that would have been within AND to drop unnecessary/redundant masks.
We probably just need a few lines over in IOR to handle this case with the same
machinery.

Heh...  Just having a browse shows that we currently perform exactly the
opposite transformation:

  /* If we are adding two things that have no bits in common, convert
 the addition into an IOR.  This will often be further simplified,
 for example in cases like ((a & 1) + (a & 2)), which can
 become a & 3.  */

So managing these conflicting goals might be tricky...

> I wrote some match.pd patterns to do it in gimple as part of a larger
> problem.  The work as a whole on that larger problem ultimately didn't
> pan out (generated even worse code than what we have on the trunk).  But
> it might be possible to resurrect those patterns and see if they are
> useful independently.  My recollection was I was looking at low order
> bits only, but the concepts were general enough.

That would be interesting, yes.

>>> $ cat rshift.c
>>> unsigned rs24(unsigned rs1) { return rs1 >> 24; }
>>> $ cat rshift.s
>>> rs24:
>>> sllwa0,a0,24
>>> sext.w  a0,a0   # redundant
>>> ret
>>
>> That seems like a missing check somewhere (combine? simplify-rtx? both?) for
>> SUBREG_PROMOTED_SIGNED_P.  Since Alpha didn't have a 32-bit shift you're in 
>> new
>> territory for this one.
> Yea.  I'd also expect zero/nonzero bits tracking in combine to catch
> this.  Shouldn't the sign bit be known to be zero after the shift which
> makes the extension redundant regardless of the SUBREG_PROMOTED flag?

You're right, this should be irrelevant.  Any anyway combine should be
constrained by modes, so it should require the SIGN_EXTEND to be present just
to make the modes match up.  Perhaps this test case is being suppressed because
of something else, e.g. failure to combine insns when a hard-reg is involved?


r~



Re: Redundant sign-extension instructions on RISC-V

2017-09-06 Thread Richard Henderson
On 09/06/2017 10:17 AM, Richard Henderson wrote:
>> Yea.  I'd also expect zero/nonzero bits tracking in combine to catch
>> this.  Shouldn't the sign bit be known to be zero after the shift which
>> makes the extension redundant regardless of the SUBREG_PROMOTED flag?
> You're right, this should be irrelevant.  Any anyway combine should be
> constrained by modes, so it should require the SIGN_EXTEND to be present just
> to make the modes match up.  Perhaps this test case is being suppressed 
> because
> of something else, e.g. failure to combine insns when a hard-reg is involved?

It turns out that combine would like to match

  (set (reg:DI 75)
   (zero_extract:DI (reg:DI 10 a0 [ rs1 ])
  (const_int 8 [0x8])
  (const_int 24 [0x18])))

which seems reasonable enough to add as a pattern

(define_code_iterator any_extract [sign_extract zero_extract])
(define_code_attr [(sign_extract "sra") (zero_extract "srl")])

(define_insn "*_si_high"
  (set (match_operand:DI 0 "register_operand" "=r")
   (any_extract:DI
 (match_operand:DI 1 "register_operand" "r")
 (match_operand:DI 2 "const_int_operand" "")
 (match_operand:DI 3 "const_int_operand" "")))
  "TARGET_64BIT
   && INTVAL (operand[3]) > 0
   && INTVAL (operand[2]) + INTVAL (operand[3]) == 32"
  "w\t%0,%1,%3"
  [(set_attr "type" "shift")
   (set_attr "mode" "SI")])


r~


Re: Redundant sign-extension instructions on RISC-V

2017-09-06 Thread Jeff Law
On 09/06/2017 11:17 AM, Richard Henderson wrote:
> On 09/06/2017 09:53 AM, Jeff Law wrote:
>>> I think the easiest solution to this is for combine to notice when IOR has
>>> operands with non-zero-bits that do not overlap, convert the operation to 
>>> ADD.
>>> That allows the final two insns to fold to "addw" and the compiler need do 
>>> no
>>> further analysis.
>> I thought we had combine support for that.  I don't think it's ever been
>> particularly good though.  With your alpha background you're probably
>> more familiar with it than anyone -- IIRC it fell out of removal of low
>> order bit masking to deal with alignment issues on the Alpha.
> 
> Yes, but that would have been within AND to drop unnecessary/redundant masks.
> We probably just need a few lines over in IOR to handle this case with the 
> same
> machinery.
Right.  My point was most of the infrastructure ought to be in place.

> 
> Heh...  Just having a browse shows that we currently perform exactly the
> opposite transformation:
> 
>   /* If we are adding two things that have no bits in common, convert
>  the addition into an IOR.  This will often be further simplified,
>  for example in cases like ((a & 1) + (a & 2)), which can
>  become a & 3.  */
> 
> So managing these conflicting goals might be tricky...
And I wonder how often that happens in practice.

> 
>> I wrote some match.pd patterns to do it in gimple as part of a larger
>> problem.  The work as a whole on that larger problem ultimately didn't
>> pan out (generated even worse code than what we have on the trunk).  But
>> it might be possible to resurrect those patterns and see if they are
>> useful independently.  My recollection was I was looking at low order
>> bits only, but the concepts were general enough.
> 
> That would be interesting, yes.
So I just dug up the BZ.  59393.  Sadly it was the other way around --
turning a PLUS into BIT_IOR like combine.  So not helpful here.

The PLUS->IOR apparently was still profitable from a codesize
standpoint.  But in general, I'm not sure how to select between the two
forms.  I guess IOR is slightly better because we know precisely what
bits are potentially changed based on the constant.

> 
 $ cat rshift.c
 unsigned rs24(unsigned rs1) { return rs1 >> 24; }
 $ cat rshift.s
 rs24:
sllwa0,a0,24
sext.w  a0,a0   # redundant
ret
>>>
>>> That seems like a missing check somewhere (combine? simplify-rtx? both?) for
>>> SUBREG_PROMOTED_SIGNED_P.  Since Alpha didn't have a 32-bit shift you're in 
>>> new
>>> territory for this one.
>> Yea.  I'd also expect zero/nonzero bits tracking in combine to catch
>> this.  Shouldn't the sign bit be known to be zero after the shift which
>> makes the extension redundant regardless of the SUBREG_PROMOTED flag?
> 
> You're right, this should be irrelevant.  Any anyway combine should be
> constrained by modes, so it should require the SIGN_EXTEND to be present just
> to make the modes match up.  Perhaps this test case is being suppressed 
> because
> of something else, e.g. failure to combine insns when a hard-reg is involved?
It could well be hard register involvement.

jeff


Re: dejagnu version update?

2017-09-06 Thread Jonathan Wakely
On 25 August 2017 at 14:55, David Edelsohn wrote:
> On Fri, Aug 25, 2017 at 9:50 AM, Rainer Orth
>  wrote:
>> Hi H.J.,
>>
>>> On Fri, Aug 25, 2017 at 6:32 AM, David Edelsohn  wrote:
 On Fri, Aug 25, 2017 at 9:24 AM, H.J. Lu  wrote:
> On Fri, Aug 25, 2017 at 6:01 AM, David Edelsohn  wrote:
>> FYI, DejaGNU 1.6.1 is not compatible with the GCC Testsuite.  The GCC
>> Testsuite uses "unsetenv" in multiple instances and that feature has
>> been removed from DejaGNU.  The testsuite is going to experience
>> DejaGNU errors when Fedora or OpenSUSE upgrades to a more recent
>> DejaGNU in the 1.6 series.
>>
>
> I am running Fedora 26 with dejagnu-1.6-2.fc26.  What should I
> look for?

 ERROR: (DejaGnu) proc "unsetenv GCC_EXEC_PREFIX" does not exist.
 The error code is NONE
 The info on the error is:
 invalid command name "unsetenv"
 while executing
 "::tcl_unknown unsetenv GCC_EXEC_PREFIX"
 ("uplevel" body line 1)
 invoked from within
 "uplevel 1 ::tcl_unknown $args"

>>>
>>> I checked my log.  I didn't see them.  Which log file do they appear in?
>>
>> unsetenv was only removed after DejaGnu 1.6 was released.  The change is
>> in the git repo; so far there exists no post-1.6 release.
>
> That is why I wrote 1.6.1.  I didn't know if 1.6-2 was from snapshot after 
> 1.6.

The -2 is just because the Fedora package got rebuilt, H.J.'s version
is 1.6, and looking at the Fedora package it's unmodified from the
upstream 1.6 release. So Fedora doesn't have the change yet, even in
rawhide.


Re: dejagnu version update?

2017-09-06 Thread David Edelsohn
On Wed, Sep 6, 2017 at 8:48 PM, Jonathan Wakely  wrote:
> On 25 August 2017 at 14:55, David Edelsohn wrote:
>> On Fri, Aug 25, 2017 at 9:50 AM, Rainer Orth
>>  wrote:
>>> Hi H.J.,
>>>
 On Fri, Aug 25, 2017 at 6:32 AM, David Edelsohn  wrote:
> On Fri, Aug 25, 2017 at 9:24 AM, H.J. Lu  wrote:
>> On Fri, Aug 25, 2017 at 6:01 AM, David Edelsohn  
>> wrote:
>>> FYI, DejaGNU 1.6.1 is not compatible with the GCC Testsuite.  The GCC
>>> Testsuite uses "unsetenv" in multiple instances and that feature has
>>> been removed from DejaGNU.  The testsuite is going to experience
>>> DejaGNU errors when Fedora or OpenSUSE upgrades to a more recent
>>> DejaGNU in the 1.6 series.
>>>
>>
>> I am running Fedora 26 with dejagnu-1.6-2.fc26.  What should I
>> look for?
>
> ERROR: (DejaGnu) proc "unsetenv GCC_EXEC_PREFIX" does not exist.
> The error code is NONE
> The info on the error is:
> invalid command name "unsetenv"
> while executing
> "::tcl_unknown unsetenv GCC_EXEC_PREFIX"
> ("uplevel" body line 1)
> invoked from within
> "uplevel 1 ::tcl_unknown $args"
>

 I checked my log.  I didn't see them.  Which log file do they appear in?
>>>
>>> unsetenv was only removed after DejaGnu 1.6 was released.  The change is
>>> in the git repo; so far there exists no post-1.6 release.
>>
>> That is why I wrote 1.6.1.  I didn't know if 1.6-2 was from snapshot after 
>> 1.6.
>
> The -2 is just because the Fedora package got rebuilt, H.J.'s version
> is 1.6, and looking at the Fedora package it's unmodified from the
> upstream 1.6 release. So Fedora doesn't have the change yet, even in
> rawhide.

I reported the impact on the DejaGnu bugs mailing list.  Ben reverted
the change.

dejagnu-git now provides the unsetenv proc.

- David


gcc-6-20170906 is now available

2017-09-06 Thread gccadmin
Snapshot gcc-6-20170906 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/6-20170906/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch 
revision 251821

You'll find:

 gcc-6-20170906.tar.xzComplete GCC

  SHA256=dd3f78d357a3dd88b6a6cd338fd03b844ada4e8e732257ad4135d153cac37585
  SHA1=ce5604dd3ba880ee71928657d54dd75898e14fa1

Diffs from 6-20170830 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: [llvm-dev] DragonEgg for GCC v8.x and LLVM v6.x is just able to work

2017-09-06 Thread Chris Lattner

> On Sep 4, 2017, at 8:13 PM, Leslie Zhai via llvm-dev 
>  wrote:
> 
> Hi LLVM and GCC developers,
> 
> LLVM China  http://www.llvm.org.cn  forked DragonEgg 
> https://github.com/LLVM-China/dragonegg  because:
> 
> * Some subprojects are impractical or uninteresting to relicense (e.g. 
> llvm-gcc  and dragonegg). These will be split off from the LLVM project (e.g. 
> to separate Github projects), allowing interested people to continue their 
> development elsewhere. 
> http://lists.llvm.org/pipermail/llvm-dev/2017-August/116266.html
> 
> * There are a lot of issues https://github.com/xiangzhai/dragonegg/issues  so 
> I need smarter developers' help.

Hi Leslie,

Out of curiosity, what is motivating this work?  What is the usecase for 
dragonegg these days, now that Clang has great C++ support?  Are you interested 
in Ada + LLVM or some other frontend?

-Chris



Power 8 in-core crypto not working as expected

2017-09-06 Thread Jeffrey Walton
Hi Everyone,

I'm on gcc rather than gcc-help because we need to talk with some GCC
devs who can help take this further.

I have implementation for AES on Power 8 using GCC's built-ins. Its
available for inspection and download at
https://github.com/noloader/AES-Power8. The problem is, it does not
arrive at the correct results on GCC112 (ppc64-le) or GCC119 (AIX, big
endian).

The source file is the reduced, minimal test case. It uses
pre-caclulated subkeys so we've removed that variable from the
equation. It also uses the null vector (string of 0's) as the message,
so that variable has been removed from the equation too.

About all we are left with is loading a subkey, calling vcipher to
perform a single round of encryption, and assigning the result back to
a variable. Lather, rinse, repeat.

For the crypto side of things I've consulted with Andy Polyakov of the
OpenSSL project. I believe we are doing everything we should be as far
as the crypto goes, including the subkey byte-swaps on LE machines.
Our subkey table is exactly the same as the one OpenSSL arrives at on
both LE and BE machines.

Would someone familiar with the processor and knowledge of GCC
built-in's please take a look at things. Suggestions for our next
steps would be greatly appreciated.

Thanks in advance,

Jeffrey Walton

==

Here are the compiler versions.

  - GCC112 (Linux, little endian)

$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)

  - GCC119 (AIX, big endian):

$ gcc --version
gcc (GCC) 6.1.0


Re: [llvm-dev] DragonEgg for GCC v8.x and LLVM v6.x is just able to work

2017-09-06 Thread Leslie Zhai

Dear Chris,

Thanks for your kind response!

The motivating of this work:

1. Clang can not build Linux https://bugs.llvm.org/show_bug.cgi?id=22830 
and LLVMLinux patch was not be maintained? 
http://llvm.linuxfoundation.org/index.php/Main_Page


2. Clang analyzer Frontend can not static analysis glibc or Linux 
https://bugs.llvm.org/show_bug.cgi?id=31017 but analyzer checker is able 
to find the bugs for KDE :) 
http://www.leetcode.cn/2016/11/analyzing-code-for-kde-qt-open-source-components.html


3. For leaning GCC plugin, GIMPLE, LLVM IR, GCC PASS, LLVM PASS, 
CodeGen, etc. from llvm-gcc and dragonegg, thanks for your great job!


The usecase for dragonegg:

1. mips64-linux-gnu-gcc (WIP), arm-linux-gnu-gcc and x86 GCC Frontend -> 
GIMPLE -> LLVM IR -> Assembly or for KLEE


2. GCC Frontend -> GIMPLE -> Clang AST (WIP)

I am interested in some other frontend, such as flang-clang 
https://github.com/flang-compiler/clang/pull/28 and it is glad to learn 
from other developers :) https://gcc.gnu.org/ml/gcc/2017-09/msg00022.html



在 2017年09月07日 12:10, Chris Lattner 写道:

On Sep 4, 2017, at 8:13 PM, Leslie Zhai via llvm-dev  
wrote:

Hi LLVM and GCC developers,

LLVM China  http://www.llvm.org.cn  forked DragonEgg 
https://github.com/LLVM-China/dragonegg  because:

* Some subprojects are impractical or uninteresting to relicense (e.g. llvm-gcc 
 and dragonegg). These will be split off from the LLVM project (e.g. to 
separate Github projects), allowing interested people to continue their 
development elsewhere. 
http://lists.llvm.org/pipermail/llvm-dev/2017-August/116266.html

* There are a lot of issues https://github.com/xiangzhai/dragonegg/issues  so I 
need smarter developers' help.

Hi Leslie,

Out of curiosity, what is motivating this work?  What is the usecase for 
dragonegg these days, now that Clang has great C++ support?  Are you interested 
in Ada + LLVM or some other frontend?

-Chris




--
Regards,
Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/





Re: Power 8 in-core crypto not working as expected

2017-09-06 Thread R0b0t1
On Wed, Sep 6, 2017 at 11:37 PM, Jeffrey Walton  wrote:
> Hi Everyone,
>
> I'm on gcc rather than gcc-help because we need to talk with some GCC
> devs who can help take this further.
>
> I have implementation for AES on Power 8 using GCC's built-ins. Its
> available for inspection and download at
> https://github.com/noloader/AES-Power8. The problem is, it does not
> arrive at the correct results on GCC112 (ppc64-le) or GCC119 (AIX, big
> endian).
>
> The source file is the reduced, minimal test case. It uses
> pre-caclulated subkeys so we've removed that variable from the
> equation. It also uses the null vector (string of 0's) as the message,
> so that variable has been removed from the equation too.
>
> About all we are left with is loading a subkey, calling vcipher to
> perform a single round of encryption, and assigning the result back to
> a variable. Lather, rinse, repeat.
>
> For the crypto side of things I've consulted with Andy Polyakov of the
> OpenSSL project. I believe we are doing everything we should be as far
> as the crypto goes, including the subkey byte-swaps on LE machines.
> Our subkey table is exactly the same as the one OpenSSL arrives at on
> both LE and BE machines.
>
> Would someone familiar with the processor and knowledge of GCC
> built-in's please take a look at things. Suggestions for our next
> steps would be greatly appreciated.
>

Have you inspected the generated assembly listing and machine
instructions to be sure that they are correct?

You can refer to the source for vmx-crypto
(https://github.com/torvalds/linux/tree/master/drivers/crypto/vmx) in
addition to that of OpenSSL. Are you trying to do a cleanroom
implementation of this software?


Full disclosure: despite my interest in the architecture I have not
been able to get access to a POWER8 machine. A server costs about as
much as a new car. Any account reseller recommendations or any other
options you can think of? If you don't mind responding feel free to do
it privately so it doesn't clutter this thread.

Cheers,
 R0b0t1.


Re: Power 8 in-core crypto not working as expected

2017-09-06 Thread Jeffrey Walton
On Thu, Sep 7, 2017 at 1:39 AM, R0b0t1  wrote:
> On Wed, Sep 6, 2017 at 11:37 PM, Jeffrey Walton  wrote:
>> Hi Everyone,
>>
>> I'm on gcc rather than gcc-help because we need to talk with some GCC
>> devs who can help take this further.
>>
>> I have implementation for AES on Power 8 using GCC's built-ins. Its
>> available for inspection and download at
>> https://github.com/noloader/AES-Power8. The problem is, it does not
>> arrive at the correct results on GCC112 (ppc64-le) or GCC119 (AIX, big
>> endian).
>>
>> The source file is the reduced, minimal test case. It uses
>> pre-caclulated subkeys so we've removed that variable from the
>> equation. It also uses the null vector (string of 0's) as the message,
>> so that variable has been removed from the equation too.
>>
>> About all we are left with is loading a subkey, calling vcipher to
>> perform a single round of encryption, and assigning the result back to
>> a variable. Lather, rinse, repeat.
>>
>> For the crypto side of things I've consulted with Andy Polyakov of the
>> OpenSSL project. I believe we are doing everything we should be as far
>> as the crypto goes, including the subkey byte-swaps on LE machines.
>> Our subkey table is exactly the same as the one OpenSSL arrives at on
>> both LE and BE machines.
>>
>> Would someone familiar with the processor and knowledge of GCC
>> built-in's please take a look at things. Suggestions for our next
>> steps would be greatly appreciated.
>>
>
> Have you inspected the generated assembly listing and machine
> instructions to be sure that they are correct?

Unfortunately, I don't read PPC asm. It could be dead wrong and I
could not spot it.

> You can refer to the source for vmx-crypto
> (https://github.com/torvalds/linux/tree/master/drivers/crypto/vmx) in
> addition to that of OpenSSL. Are you trying to do a cleanroom
> implementation of this software?

Yeah, Andy's code in used for both OpenSSL and the Linux kernel. I've
spent the last two days trying to connect the dots between our code
and Andy's code. I've also been talking with Andy offline.

I'm pretty sure it is mostly apples and oranges. Andy's code is highly
optimized hand tuned assembly. Its just does not lineup well with
C/C++ based code.

I'll hit your other point privately.

Jeff