> Just checking to see if I understand this optimization correctly. In order to > find merkle roots in which the rightmost 32 bits are identical (i.e. partial > hash collisions), we want to compute as many merkle root hashes as quickly as > possible. The fastest way to do this is to take the top level of the Merkle > tree, and to collect a set of left branches and right branches which can be > independently manipulated. While the left branch can easily be manipulated by > changing the extranonce in the coinbase transaction, the right branch would > need to be modified by changing one of the transactions in the right branch > or by changing the number of transactions in the right branch. Correct so far?
Envisioning it in my head and trying to read the white paper, it sounds like the process for a non-stratum mining farm would be this: On primary server with sufficient memory, calculate ~4k-6k valid left-side merkle tree roots and ~4k-6k right-side merkle tree roots. Then try hashing every left-side option with every right-side option. I'm not sure if modern asic chips are sufficiently generic that they can also sha256-double-hash those combinations, but it seems logical to assume that the permutations of those hashes could be computed on an asic, perhaps via additional hardware installed on the server. Hashing these is easier if there are fewer steps, i.e., fewer transactions. Out of this will come N(2-16 at most, higher not needed) colliding merkle roots where the last 4 bytes are identical. Those N different merkle combinations are what can be used on the actual mining devices, and those are all that needs to be sent for the optimization to work. On the actual mining device, what is done is to take the identical (collision) right 4 bytes of the merkle root and hash it with one nonce value. Since you have N(assume 8) inputs that all work with the same value, calculating this single hash of once nonce is equivalent to calculating 8 nonce hashes during the normal process, and this step is 1/4th of the normal hashing process. This hash(or mid-value?) is then sent to 8 different cores which complete the remaining 3 hash steps with each given collision value. Then you increment the nonce once and start over. This works out to a savings of (assuming compressor and expander steps of SHA2 require computationally the same amount of time) 25% * (7 / 8) where N=8. Greg, or someone else, can you confirm that this is the right understanding of the approach? > I have not seen or heard of any hardware available that can run more > efficiently using getblocktemplate. As above, it doesn't require such a massive change. They just need to retrieve N different sets of work from the central server instead of 1 set of work. The central server itself might need substantial bandwidth if it farmed out the merkle-root hashing computational space to miners. Greg, is that what you're assuming they are doing? Now that I think about it, even that situation could be improved. Suppose you have N miners who can do either a merkle-tree combinatoric double-sha or a block-nonce double-sha. The central server calculates the left and right merkle treeset to be combined and also assigns each miner each a unique workspace within those combinatorics. The miners compute each hash in their workspace and shard the results within themselves according to the last 16 bits. Each miner then needs only the memory for 1/Nth of the workspace, and can report back to the central server only the highest number of collisions it has found until the central server is satisfied and returns the miners to normal (collided) mining. Seems quite workable in a large mining farm to me, and would allow the collisions to be found very, very quickly. That said, it strikes me that there may be some statistical method by which we can isolate which pools seem to have used this approach against the background noise of other pools. Hmm... Jared On Wed, Apr 5, 2017 at 7:10 PM, Jonathan Toomim via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote: > Just checking to see if I understand this optimization correctly. In order to > find merkle roots in which the rightmost 32 bits are identical (i.e. partial > hash collisions), we want to compute as many merkle root hashes as quickly as > possible. The fastest way to do this is to take the top level of the Merkle > tree, and to collect a set of left branches and right branches which can be > independently manipulated. While the left branch can easily be manipulated by > changing the extranonce in the coinbase transaction, the right branch would > need to be modified by changing one of the transactions in the right branch > or by changing the number of transactions in the right branch. Correct so far? > > With the stratum mining protocol, the server (the pool) includes enough > information for the coinbase transaction to be modified by stratum client > (the miner), but it does not include any information about the right side of > the merkle tree except for the top-level hash. Stratum also does not allow > the client to supply any modifications to the merkle tree (including the > right side) back to the stratum server. This means that any implementation of > this final optimization would need to be using a protocol other than stratum, > like getblocktemplate, correct? > > I think it would be helpful for the discussion to know if this optimization > were currently being used or not, and if so, how widely. > > All of the consumer-grade hardware that I have seen defaults to stratum-only > operation, and I have not seen or heard of any hardware available that can > run more efficiently using getblocktemplate. As the current pool > infrastructure uses stratum exclusively, this optimization would require > significant retooling among pools, and probably a redesign of their core > algorithms to help discover and share these partial collisions more > frequently. It's possible that some large private farms have deployed a > special system for solo mining that uses this optimization, of course, but > it's also possible that there's a teapot in space somewhere between the orbit > of Earth and Mars. > > Do you know of any ways to perform this optimization via stratum? If not, do > you have any evidence that this optimization is actually being used by > private solo mining farms? Or is this discussion purely about preventing this > optimization from being used in the future? > > -jtoomim > >> On Apr 5, 2017, at 2:37 PM, Gregory Maxwell via bitcoin-dev >> <bitcoin-dev@lists.linuxfoundation.org> wrote: >> >> An obvious way to generate different candidates is to grind the >> coinbase extra-nonce but for non-empty blocks each attempt will >> require 13 or so additional sha2 runs which is very inefficient. >> >> This inefficiency can be avoided by computing a sqrt number of >> candidates of the left side of the hash tree (e.g. using extra >> nonce grinding) then an additional sqrt number of candidates of >> the right side of the tree using transaction permutation or >> substitution of a small number of transactions. All combinations >> of the left and right side are then combined with only a single >> hashing operation virtually eliminating all tree related >> overhead. >> >> With this final optimization finding a 4-way collision with a >> moderate amount of memory requires ~2^24 hashing operations >> instead of the >2^28 operations that would be require for >> extra-nonce grinding which would substantially erode the >> benefit of the attack. > > > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > _______________________________________________ bitcoin-dev mailing list bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev