Natanael,
=== Metal Layers ===
One factor in chip cost other than transistor count is the number of layers
required to route all the interconnects in the desired die area constraint. The
need for fewer layers can result in less patent-able costs of layering
technology. Fewer layers are quicker and easier to manufacture.
I'm not an expert in the field, and I can't vouch for the validity of the
entirety of the paper, but this paper discusses various factors that impact
chip cost design.
http://www.cse.psu.edu/~juz138/files/3d-cost-tcad10.pdf
=== Early nonce mixing, Variable Length Input with Near Constant Work ===
To minimize asicboost like optimizations... the entirety of the input should be
mixed with the nonce data ASAP. For example with Bitcoin as it is now, the 80
byte block header doesn't fully fit in one 64 byte SHA256 input block. This
results in a 2nd SHA256 block input that only has 4 bytes of nonce and the rest
constant that are mixed much later than the rest of the input... which allows
for unexpected optimizations.
Solution: A hash algorithm that could have more linear computation time vs
input size would be a 2 stage algorithm:
1. 1st stage Merkle tree hash to pre-lossy-mix-compress the variable length
input stream to the size of the 2nd stage state vector. Each bit of input
should have about equal influence on each of the output bits. (Minimize
information loss, maximize mixed-ness).
2. Multi-round mixing of the 2nd stage, where this stage is significantly more
work than the 1st stage.
This is somewhat done already in Bitcoin by the PoW doing SHA256 twice in
serial. The first time is pretty much the merkle tree hash (a node with two
children), and then the second time is the mult-round mixing. If the Bitcoin
PoW did SHA256 three or four times or more, then asicboost like optimizations
would have less of an effect.
In actual hardware, assuming a particular input length for the design can
result in a significantly more optimized design than creating hardware that can
handle a variable length input. So your design goal of "not linear in
performance relative to input size" to me seems to be a hard one to attain...
in practical, to support very large input sizes in a constant work fashion
requires a trade off between memory/parallelization and die space. I think it
would be better to make an assumption about the block header size, such as that
it is exactly 80 bytes, or, at least something reasonable like the hardware
should be able to support a block header size <= 128 bytes.
Cheers,
Praxeology Guy
_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev