The entropy argument is as follows: There is a rule of thumb which says it is safer plaintext to have low redundancy, see https://en.wikipedia.org/wiki/Redundancy_(information_theory), i. e. it's better to encrypt random or compressed data than natural language. This rule is based on Shannon's information theory which means that a breach of the rule usually doesn't induce a vulnerability (there is no known generic attack). This rule is application of a precautionary principle.
Nevertheless, here are some examples of cryptographic attacks which may be considered as a consequence of the breach of the rule: * Related Message Attack by Coppersmith, Franklin, Patarin, Reiter (https://pdfs.semanticscholar.org/899a/4fdc048102471875e24f7fecb3fb8998d754.pdf) - given RSA ciphertext of two plaintexts x and a*x + b, where a, b are known, it's possible to effectively compute x provided public exponent is three. From the informaton-theoretic point of view the second message is redundant, because it's determined by the first one. Which means that relative redundancy of both messages is at least one half. * Stereotyped Messages by Coppersmith (https://www.di.ens.fr/~fouque/ens-rennes/coppersmith.pdf, section 7) - given RSA ciphertext and (1-1/e) fraction of plaintext (where e is public exponent), it's possible to effectively compute x. Message is highly redundant, because only 1/e of the message is unknown. Relative redundancy of the message is at least (1-1/e). Consider a few notes: * Nowadays there exists more complicated variants of mentioned attacks which have weaker premisses. * There is a considerable similarity between RSA and SSS. Both schemes are algebraically-based (rather than boolean function based). * CRCs (and error-correcting codes generally) introduce redundancy into the message. Moreover the redundancy is induced by a linear relationship among message (compare with the premise of the Related Message Attack). * Related Message Attack wouldn't be possible if you had two plaintexts x and hash(x). The relationship between messages has to be (algebraically) uncomplicated. From the information-theoretic point of view the situation is the same, but from the practical point of view it is completely different. To sum it up, there is a precautionary principle which tells us not to increase redundancy of a message unless it is introduced in a complicated way (for example by a hash function). That's why we use SHA rather than CRC. One more reason why we stick to the principle is that there's no randomisation in our scheme (such as padding or initialisation vector). We understood advantages of error-correctings codes over hash functions (minimal codewords distance property, performance) and we considered it thoroughly. Ondřej Vejpustek _______________________________________________ bitcoin-dev mailing list bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev