Re: Redundant sign-extension instructions on RISC-V
On 08/29/2017 05:36 PM, Michael Clark wrote: > We’re investigating an issue with redundant sign-extension instructions being > emitted with the riscv backend. Firstly I would like to state that riscv is > possibly a unique backend with respect to its canonical sign-extended > register form due to the following: > > - POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit > operations on RV64 is sign-extended not zero-extended. > > - PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to > signed DI mode values holding SI mode subregs > > - RISC-V does not have register aliases for these different modes, rather > different instructions are selected to operate on different register widths. This is identical to Alpha. > - The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, etc. > > - Some instructions such as logical ops only have full word width variants > (AND, OR, XOR) but these instructions maintain canonical form as there is no > horizontal movement of bits. Alpha only had 32-bit instructions for for add/sub/mul that also sign-extend, so there is a little less scope for optimization. > testcase 1: > > $ cat bswap.c > unsigned bswap(unsigned p) { > return ((p >> 24) & 0x00ff) | ((p << 8 ) & 0x00ff) | >((p >> 8 ) & 0xff00) | ((p << 24) & 0xff00); > } > $ cat bswap.s > bswap2: > sllwa3,a0,24 > srlwa5,a0,24 > sllwa4,a0,8 > or a5,a5,a3 > li a3,16711680 > and a4,a4,a3 > or a5,a5,a4 > li a4,65536 > add a4,a4,-256 > srlwa0,a0,8 > and a0,a0,a4 > or a0,a5,a0 > sext.w a0,a0 # redundant > ret I think the easiest solution to this is for combine to notice when IOR has operands with non-zero-bits that do not overlap, convert the operation to ADD. That allows the final two insns to fold to "addw" and the compiler need do no further analysis. That may even allow better code generation in other ways, such as a three-operand LEA insn on two-operand architectures (x86, s390x). Or for a slightly different test case, perhaps further folding to a shift-add insn. > $ cat rshift.c > unsigned rs24(unsigned rs1) { return rs1 >> 24; } > $ cat rshift.s > rs24: > sllwa0,a0,24 > sext.w a0,a0 # redundant > ret That seems like a missing check somewhere (combine? simplify-rtx? both?) for SUBREG_PROMOTED_SIGNED_P. Since Alpha didn't have a 32-bit shift you're in new territory for this one. r~
Re: Redundant sign-extension instructions on RISC-V
On 09/06/2017 10:43 AM, Richard Henderson wrote: > On 08/29/2017 05:36 PM, Michael Clark wrote: >> We’re investigating an issue with redundant sign-extension instructions >> being emitted with the riscv backend. Firstly I would like to state that >> riscv is possibly a unique backend with respect to its canonical >> sign-extended register form due to the following: >> >> - POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit >> operations on RV64 is sign-extended not zero-extended. >> >> - PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to >> signed DI mode values holding SI mode subregs >> >> - RISC-V does not have register aliases for these different modes, rather >> different instructions are selected to operate on different register widths. > > This is identical to Alpha. > >> - The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, >> etc. >> >> - Some instructions such as logical ops only have full word width variants >> (AND, OR, XOR) but these instructions maintain canonical form as there is no >> horizontal movement of bits. > > Alpha only had 32-bit instructions for for add/sub/mul that also sign-extend, > so there is a little less scope for optimization. > >> testcase 1: >> >> $ cat bswap.c >> unsigned bswap(unsigned p) { >> return ((p >> 24) & 0x00ff) | ((p << 8 ) & 0x00ff) | >>((p >> 8 ) & 0xff00) | ((p << 24) & 0xff00); >> } >> $ cat bswap.s >> bswap2: >> sllwa3,a0,24 >> srlwa5,a0,24 >> sllwa4,a0,8 >> or a5,a5,a3 >> li a3,16711680 >> and a4,a4,a3 >> or a5,a5,a4 >> li a4,65536 >> add a4,a4,-256 >> srlwa0,a0,8 >> and a0,a0,a4 >> or a0,a5,a0 >> sext.w a0,a0 # redundant >> ret > > I think the easiest solution to this is for combine to notice when IOR has > operands with non-zero-bits that do not overlap, convert the operation to ADD. > That allows the final two insns to fold to "addw" and the compiler need do no > further analysis. I thought we had combine support for that. I don't think it's ever been particularly good though. With your alpha background you're probably more familiar with it than anyone -- IIRC it fell out of removal of low order bit masking to deal with alignment issues on the Alpha. I wrote some match.pd patterns to do it in gimple as part of a larger problem. The work as a whole on that larger problem ultimately didn't pan out (generated even worse code than what we have on the trunk). But it might be possible to resurrect those patterns and see if they are useful independently. My recollection was I was looking at low order bits only, but the concepts were general enough. > >> $ cat rshift.c >> unsigned rs24(unsigned rs1) { return rs1 >> 24; } >> $ cat rshift.s >> rs24: >> sllwa0,a0,24 >> sext.w a0,a0 # redundant >> ret > > That seems like a missing check somewhere (combine? simplify-rtx? both?) for > SUBREG_PROMOTED_SIGNED_P. Since Alpha didn't have a 32-bit shift you're in > new > territory for this one. Yea. I'd also expect zero/nonzero bits tracking in combine to catch this. Shouldn't the sign bit be known to be zero after the shift which makes the extension redundant regardless of the SUBREG_PROMOTED flag? jeff
Re: Redundant sign-extension instructions on RISC-V
On 08/30/2017 02:43 AM, Michael Clark wrote: > POINTERS_EXTEND_UNSIGNED -1 (which is true) is defined on some targets. I > assume they sign-extend but the meaning has been overloaded. Just for your edification, this is for e.g. ia64's "addp4" instruction and it is not a normal extension. A 2-bit segment tag of a 32-bit pointer is remapped to a 3-bit segment tag of a 64-bit pointer and also zeros bits {60:32}. If you care, http://refspecs.linuxbase.org/IA64-softdevman-vol3 has a nice picture that better describes this. r~
Re: Redundant sign-extension instructions on RISC-V
On 09/06/2017 09:53 AM, Jeff Law wrote: >> I think the easiest solution to this is for combine to notice when IOR has >> operands with non-zero-bits that do not overlap, convert the operation to >> ADD. >> That allows the final two insns to fold to "addw" and the compiler need do no >> further analysis. > I thought we had combine support for that. I don't think it's ever been > particularly good though. With your alpha background you're probably > more familiar with it than anyone -- IIRC it fell out of removal of low > order bit masking to deal with alignment issues on the Alpha. Yes, but that would have been within AND to drop unnecessary/redundant masks. We probably just need a few lines over in IOR to handle this case with the same machinery. Heh... Just having a browse shows that we currently perform exactly the opposite transformation: /* If we are adding two things that have no bits in common, convert the addition into an IOR. This will often be further simplified, for example in cases like ((a & 1) + (a & 2)), which can become a & 3. */ So managing these conflicting goals might be tricky... > I wrote some match.pd patterns to do it in gimple as part of a larger > problem. The work as a whole on that larger problem ultimately didn't > pan out (generated even worse code than what we have on the trunk). But > it might be possible to resurrect those patterns and see if they are > useful independently. My recollection was I was looking at low order > bits only, but the concepts were general enough. That would be interesting, yes. >>> $ cat rshift.c >>> unsigned rs24(unsigned rs1) { return rs1 >> 24; } >>> $ cat rshift.s >>> rs24: >>> sllwa0,a0,24 >>> sext.w a0,a0 # redundant >>> ret >> >> That seems like a missing check somewhere (combine? simplify-rtx? both?) for >> SUBREG_PROMOTED_SIGNED_P. Since Alpha didn't have a 32-bit shift you're in >> new >> territory for this one. > Yea. I'd also expect zero/nonzero bits tracking in combine to catch > this. Shouldn't the sign bit be known to be zero after the shift which > makes the extension redundant regardless of the SUBREG_PROMOTED flag? You're right, this should be irrelevant. Any anyway combine should be constrained by modes, so it should require the SIGN_EXTEND to be present just to make the modes match up. Perhaps this test case is being suppressed because of something else, e.g. failure to combine insns when a hard-reg is involved? r~
Re: Redundant sign-extension instructions on RISC-V
On 09/06/2017 10:17 AM, Richard Henderson wrote: >> Yea. I'd also expect zero/nonzero bits tracking in combine to catch >> this. Shouldn't the sign bit be known to be zero after the shift which >> makes the extension redundant regardless of the SUBREG_PROMOTED flag? > You're right, this should be irrelevant. Any anyway combine should be > constrained by modes, so it should require the SIGN_EXTEND to be present just > to make the modes match up. Perhaps this test case is being suppressed > because > of something else, e.g. failure to combine insns when a hard-reg is involved? It turns out that combine would like to match (set (reg:DI 75) (zero_extract:DI (reg:DI 10 a0 [ rs1 ]) (const_int 8 [0x8]) (const_int 24 [0x18]))) which seems reasonable enough to add as a pattern (define_code_iterator any_extract [sign_extract zero_extract]) (define_code_attr [(sign_extract "sra") (zero_extract "srl")]) (define_insn "*_si_high" (set (match_operand:DI 0 "register_operand" "=r") (any_extract:DI (match_operand:DI 1 "register_operand" "r") (match_operand:DI 2 "const_int_operand" "") (match_operand:DI 3 "const_int_operand" ""))) "TARGET_64BIT && INTVAL (operand[3]) > 0 && INTVAL (operand[2]) + INTVAL (operand[3]) == 32" "w\t%0,%1,%3" [(set_attr "type" "shift") (set_attr "mode" "SI")]) r~
Re: Redundant sign-extension instructions on RISC-V
On 09/06/2017 11:17 AM, Richard Henderson wrote: > On 09/06/2017 09:53 AM, Jeff Law wrote: >>> I think the easiest solution to this is for combine to notice when IOR has >>> operands with non-zero-bits that do not overlap, convert the operation to >>> ADD. >>> That allows the final two insns to fold to "addw" and the compiler need do >>> no >>> further analysis. >> I thought we had combine support for that. I don't think it's ever been >> particularly good though. With your alpha background you're probably >> more familiar with it than anyone -- IIRC it fell out of removal of low >> order bit masking to deal with alignment issues on the Alpha. > > Yes, but that would have been within AND to drop unnecessary/redundant masks. > We probably just need a few lines over in IOR to handle this case with the > same > machinery. Right. My point was most of the infrastructure ought to be in place. > > Heh... Just having a browse shows that we currently perform exactly the > opposite transformation: > > /* If we are adding two things that have no bits in common, convert > the addition into an IOR. This will often be further simplified, > for example in cases like ((a & 1) + (a & 2)), which can > become a & 3. */ > > So managing these conflicting goals might be tricky... And I wonder how often that happens in practice. > >> I wrote some match.pd patterns to do it in gimple as part of a larger >> problem. The work as a whole on that larger problem ultimately didn't >> pan out (generated even worse code than what we have on the trunk). But >> it might be possible to resurrect those patterns and see if they are >> useful independently. My recollection was I was looking at low order >> bits only, but the concepts were general enough. > > That would be interesting, yes. So I just dug up the BZ. 59393. Sadly it was the other way around -- turning a PLUS into BIT_IOR like combine. So not helpful here. The PLUS->IOR apparently was still profitable from a codesize standpoint. But in general, I'm not sure how to select between the two forms. I guess IOR is slightly better because we know precisely what bits are potentially changed based on the constant. > $ cat rshift.c unsigned rs24(unsigned rs1) { return rs1 >> 24; } $ cat rshift.s rs24: sllwa0,a0,24 sext.w a0,a0 # redundant ret >>> >>> That seems like a missing check somewhere (combine? simplify-rtx? both?) for >>> SUBREG_PROMOTED_SIGNED_P. Since Alpha didn't have a 32-bit shift you're in >>> new >>> territory for this one. >> Yea. I'd also expect zero/nonzero bits tracking in combine to catch >> this. Shouldn't the sign bit be known to be zero after the shift which >> makes the extension redundant regardless of the SUBREG_PROMOTED flag? > > You're right, this should be irrelevant. Any anyway combine should be > constrained by modes, so it should require the SIGN_EXTEND to be present just > to make the modes match up. Perhaps this test case is being suppressed > because > of something else, e.g. failure to combine insns when a hard-reg is involved? It could well be hard register involvement. jeff
Re: dejagnu version update?
On 25 August 2017 at 14:55, David Edelsohn wrote: > On Fri, Aug 25, 2017 at 9:50 AM, Rainer Orth > wrote: >> Hi H.J., >> >>> On Fri, Aug 25, 2017 at 6:32 AM, David Edelsohn wrote: On Fri, Aug 25, 2017 at 9:24 AM, H.J. Lu wrote: > On Fri, Aug 25, 2017 at 6:01 AM, David Edelsohn wrote: >> FYI, DejaGNU 1.6.1 is not compatible with the GCC Testsuite. The GCC >> Testsuite uses "unsetenv" in multiple instances and that feature has >> been removed from DejaGNU. The testsuite is going to experience >> DejaGNU errors when Fedora or OpenSUSE upgrades to a more recent >> DejaGNU in the 1.6 series. >> > > I am running Fedora 26 with dejagnu-1.6-2.fc26. What should I > look for? ERROR: (DejaGnu) proc "unsetenv GCC_EXEC_PREFIX" does not exist. The error code is NONE The info on the error is: invalid command name "unsetenv" while executing "::tcl_unknown unsetenv GCC_EXEC_PREFIX" ("uplevel" body line 1) invoked from within "uplevel 1 ::tcl_unknown $args" >>> >>> I checked my log. I didn't see them. Which log file do they appear in? >> >> unsetenv was only removed after DejaGnu 1.6 was released. The change is >> in the git repo; so far there exists no post-1.6 release. > > That is why I wrote 1.6.1. I didn't know if 1.6-2 was from snapshot after > 1.6. The -2 is just because the Fedora package got rebuilt, H.J.'s version is 1.6, and looking at the Fedora package it's unmodified from the upstream 1.6 release. So Fedora doesn't have the change yet, even in rawhide.
Re: dejagnu version update?
On Wed, Sep 6, 2017 at 8:48 PM, Jonathan Wakely wrote: > On 25 August 2017 at 14:55, David Edelsohn wrote: >> On Fri, Aug 25, 2017 at 9:50 AM, Rainer Orth >> wrote: >>> Hi H.J., >>> On Fri, Aug 25, 2017 at 6:32 AM, David Edelsohn wrote: > On Fri, Aug 25, 2017 at 9:24 AM, H.J. Lu wrote: >> On Fri, Aug 25, 2017 at 6:01 AM, David Edelsohn >> wrote: >>> FYI, DejaGNU 1.6.1 is not compatible with the GCC Testsuite. The GCC >>> Testsuite uses "unsetenv" in multiple instances and that feature has >>> been removed from DejaGNU. The testsuite is going to experience >>> DejaGNU errors when Fedora or OpenSUSE upgrades to a more recent >>> DejaGNU in the 1.6 series. >>> >> >> I am running Fedora 26 with dejagnu-1.6-2.fc26. What should I >> look for? > > ERROR: (DejaGnu) proc "unsetenv GCC_EXEC_PREFIX" does not exist. > The error code is NONE > The info on the error is: > invalid command name "unsetenv" > while executing > "::tcl_unknown unsetenv GCC_EXEC_PREFIX" > ("uplevel" body line 1) > invoked from within > "uplevel 1 ::tcl_unknown $args" > I checked my log. I didn't see them. Which log file do they appear in? >>> >>> unsetenv was only removed after DejaGnu 1.6 was released. The change is >>> in the git repo; so far there exists no post-1.6 release. >> >> That is why I wrote 1.6.1. I didn't know if 1.6-2 was from snapshot after >> 1.6. > > The -2 is just because the Fedora package got rebuilt, H.J.'s version > is 1.6, and looking at the Fedora package it's unmodified from the > upstream 1.6 release. So Fedora doesn't have the change yet, even in > rawhide. I reported the impact on the DejaGnu bugs mailing list. Ben reverted the change. dejagnu-git now provides the unsetenv proc. - David
gcc-6-20170906 is now available
Snapshot gcc-6-20170906 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/6-20170906/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch revision 251821 You'll find: gcc-6-20170906.tar.xzComplete GCC SHA256=dd3f78d357a3dd88b6a6cd338fd03b844ada4e8e732257ad4135d153cac37585 SHA1=ce5604dd3ba880ee71928657d54dd75898e14fa1 Diffs from 6-20170830 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: [llvm-dev] DragonEgg for GCC v8.x and LLVM v6.x is just able to work
> On Sep 4, 2017, at 8:13 PM, Leslie Zhai via llvm-dev > wrote: > > Hi LLVM and GCC developers, > > LLVM China http://www.llvm.org.cn forked DragonEgg > https://github.com/LLVM-China/dragonegg because: > > * Some subprojects are impractical or uninteresting to relicense (e.g. > llvm-gcc and dragonegg). These will be split off from the LLVM project (e.g. > to separate Github projects), allowing interested people to continue their > development elsewhere. > http://lists.llvm.org/pipermail/llvm-dev/2017-August/116266.html > > * There are a lot of issues https://github.com/xiangzhai/dragonegg/issues so > I need smarter developers' help. Hi Leslie, Out of curiosity, what is motivating this work? What is the usecase for dragonegg these days, now that Clang has great C++ support? Are you interested in Ada + LLVM or some other frontend? -Chris
Power 8 in-core crypto not working as expected
Hi Everyone, I'm on gcc rather than gcc-help because we need to talk with some GCC devs who can help take this further. I have implementation for AES on Power 8 using GCC's built-ins. Its available for inspection and download at https://github.com/noloader/AES-Power8. The problem is, it does not arrive at the correct results on GCC112 (ppc64-le) or GCC119 (AIX, big endian). The source file is the reduced, minimal test case. It uses pre-caclulated subkeys so we've removed that variable from the equation. It also uses the null vector (string of 0's) as the message, so that variable has been removed from the equation too. About all we are left with is loading a subkey, calling vcipher to perform a single round of encryption, and assigning the result back to a variable. Lather, rinse, repeat. For the crypto side of things I've consulted with Andy Polyakov of the OpenSSL project. I believe we are doing everything we should be as far as the crypto goes, including the subkey byte-swaps on LE machines. Our subkey table is exactly the same as the one OpenSSL arrives at on both LE and BE machines. Would someone familiar with the processor and knowledge of GCC built-in's please take a look at things. Suggestions for our next steps would be greatly appreciated. Thanks in advance, Jeffrey Walton == Here are the compiler versions. - GCC112 (Linux, little endian) $ gcc --version gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11) - GCC119 (AIX, big endian): $ gcc --version gcc (GCC) 6.1.0
Re: [llvm-dev] DragonEgg for GCC v8.x and LLVM v6.x is just able to work
Dear Chris, Thanks for your kind response! The motivating of this work: 1. Clang can not build Linux https://bugs.llvm.org/show_bug.cgi?id=22830 and LLVMLinux patch was not be maintained? http://llvm.linuxfoundation.org/index.php/Main_Page 2. Clang analyzer Frontend can not static analysis glibc or Linux https://bugs.llvm.org/show_bug.cgi?id=31017 but analyzer checker is able to find the bugs for KDE :) http://www.leetcode.cn/2016/11/analyzing-code-for-kde-qt-open-source-components.html 3. For leaning GCC plugin, GIMPLE, LLVM IR, GCC PASS, LLVM PASS, CodeGen, etc. from llvm-gcc and dragonegg, thanks for your great job! The usecase for dragonegg: 1. mips64-linux-gnu-gcc (WIP), arm-linux-gnu-gcc and x86 GCC Frontend -> GIMPLE -> LLVM IR -> Assembly or for KLEE 2. GCC Frontend -> GIMPLE -> Clang AST (WIP) I am interested in some other frontend, such as flang-clang https://github.com/flang-compiler/clang/pull/28 and it is glad to learn from other developers :) https://gcc.gnu.org/ml/gcc/2017-09/msg00022.html 在 2017年09月07日 12:10, Chris Lattner 写道: On Sep 4, 2017, at 8:13 PM, Leslie Zhai via llvm-dev wrote: Hi LLVM and GCC developers, LLVM China http://www.llvm.org.cn forked DragonEgg https://github.com/LLVM-China/dragonegg because: * Some subprojects are impractical or uninteresting to relicense (e.g. llvm-gcc and dragonegg). These will be split off from the LLVM project (e.g. to separate Github projects), allowing interested people to continue their development elsewhere. http://lists.llvm.org/pipermail/llvm-dev/2017-August/116266.html * There are a lot of issues https://github.com/xiangzhai/dragonegg/issues so I need smarter developers' help. Hi Leslie, Out of curiosity, what is motivating this work? What is the usecase for dragonegg these days, now that Clang has great C++ support? Are you interested in Ada + LLVM or some other frontend? -Chris -- Regards, Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/
Re: Power 8 in-core crypto not working as expected
On Wed, Sep 6, 2017 at 11:37 PM, Jeffrey Walton wrote: > Hi Everyone, > > I'm on gcc rather than gcc-help because we need to talk with some GCC > devs who can help take this further. > > I have implementation for AES on Power 8 using GCC's built-ins. Its > available for inspection and download at > https://github.com/noloader/AES-Power8. The problem is, it does not > arrive at the correct results on GCC112 (ppc64-le) or GCC119 (AIX, big > endian). > > The source file is the reduced, minimal test case. It uses > pre-caclulated subkeys so we've removed that variable from the > equation. It also uses the null vector (string of 0's) as the message, > so that variable has been removed from the equation too. > > About all we are left with is loading a subkey, calling vcipher to > perform a single round of encryption, and assigning the result back to > a variable. Lather, rinse, repeat. > > For the crypto side of things I've consulted with Andy Polyakov of the > OpenSSL project. I believe we are doing everything we should be as far > as the crypto goes, including the subkey byte-swaps on LE machines. > Our subkey table is exactly the same as the one OpenSSL arrives at on > both LE and BE machines. > > Would someone familiar with the processor and knowledge of GCC > built-in's please take a look at things. Suggestions for our next > steps would be greatly appreciated. > Have you inspected the generated assembly listing and machine instructions to be sure that they are correct? You can refer to the source for vmx-crypto (https://github.com/torvalds/linux/tree/master/drivers/crypto/vmx) in addition to that of OpenSSL. Are you trying to do a cleanroom implementation of this software? Full disclosure: despite my interest in the architecture I have not been able to get access to a POWER8 machine. A server costs about as much as a new car. Any account reseller recommendations or any other options you can think of? If you don't mind responding feel free to do it privately so it doesn't clutter this thread. Cheers, R0b0t1.
Re: Power 8 in-core crypto not working as expected
On Thu, Sep 7, 2017 at 1:39 AM, R0b0t1 wrote: > On Wed, Sep 6, 2017 at 11:37 PM, Jeffrey Walton wrote: >> Hi Everyone, >> >> I'm on gcc rather than gcc-help because we need to talk with some GCC >> devs who can help take this further. >> >> I have implementation for AES on Power 8 using GCC's built-ins. Its >> available for inspection and download at >> https://github.com/noloader/AES-Power8. The problem is, it does not >> arrive at the correct results on GCC112 (ppc64-le) or GCC119 (AIX, big >> endian). >> >> The source file is the reduced, minimal test case. It uses >> pre-caclulated subkeys so we've removed that variable from the >> equation. It also uses the null vector (string of 0's) as the message, >> so that variable has been removed from the equation too. >> >> About all we are left with is loading a subkey, calling vcipher to >> perform a single round of encryption, and assigning the result back to >> a variable. Lather, rinse, repeat. >> >> For the crypto side of things I've consulted with Andy Polyakov of the >> OpenSSL project. I believe we are doing everything we should be as far >> as the crypto goes, including the subkey byte-swaps on LE machines. >> Our subkey table is exactly the same as the one OpenSSL arrives at on >> both LE and BE machines. >> >> Would someone familiar with the processor and knowledge of GCC >> built-in's please take a look at things. Suggestions for our next >> steps would be greatly appreciated. >> > > Have you inspected the generated assembly listing and machine > instructions to be sure that they are correct? Unfortunately, I don't read PPC asm. It could be dead wrong and I could not spot it. > You can refer to the source for vmx-crypto > (https://github.com/torvalds/linux/tree/master/drivers/crypto/vmx) in > addition to that of OpenSSL. Are you trying to do a cleanroom > implementation of this software? Yeah, Andy's code in used for both OpenSSL and the Linux kernel. I've spent the last two days trying to connect the dots between our code and Andy's code. I've also been talking with Andy offline. I'm pretty sure it is mostly apples and oranges. Andy's code is highly optimized hand tuned assembly. Its just does not lineup well with C/C++ based code. I'll hit your other point privately. Jeff