Hi all, On Mon, Apr 20, 2020 at 09:49:11PM +0100, Ian Collier wrote: > As Arnt has implied, the current method of generating the Message-ID > does not *guarantee* uniqueness; merely makes it highly improbable to > be non-unique. The thing is that we are not just concerned with other > instances of mutt on this host, but with other instances of mutt that > have the same "hostname" setting. Derek has already mentioned that > some people set this to their mail domain rather than the name of the > specific host it's running on. In my domain we have a couple of hundred > hosts so configured in their /etc/Muttrc.local. Given 15-bit PIDs, the > birthday paradox tells us that with one instance of mutt started up on > each machine there's a ~45% chance that two (or more) of them have the > same PID - and the probability is higher if people habitually start mutt > soon after booting and logging in. [Of course in practice the actual > number of machines here with a running instance of mutt is roughly 1].
Agreed. Although aware of it, this was beside my focus because I didn't see complaints in the last centuries. But when thinking about better solutions, this can and shuld be addressed. > But what if your RNG isn't good enough? I understand (but may stand > to be corrected because I am not an expert in cryptography) that a > cryptographic hash function such as sha1 (which we have in mutt) can be > used to make random numbers. The size of the seed would be 160 bits, > and if we use just 64 bits of the seed to produce the random number, > that leaves enough hidden information that the sequence cannot be > predicted. What to use as the initial seed? How about 128 bits > from your imperfect RNG plus the least 16 bits of the current time > (in microseconds if available) plus the least 16 bits of the PID > (doesn't really matter if the top bit of that is always zero). This > guarantees that two mutt instances started at different times (or at > the same time on a single machine) will get different seeds even if > the RNG is rubbish enough to duplicate itself across instances. Seems that this problem is called "cryptographically secure pseudorandom number generator" (CSPRNG or shorter as CPRNG/CRNG) and that standards about that exist (which shouldn't be trusted unconditionally after the incident with a US agency). In case we conclude that risk behind duplicates is big enough that we don't trust RNG from libc or the system (or assume that there is a significant portion of system with really weak implementations): I suggest to hand it over to the cryptographic library, which needs this functions for the operation anyway and seem to export it. As libssl and libgnutls are only options (but common), resorting to the RNG from the system should be an alternative but a warning at configure time would be appropriate. > Alternatively, would the sha1 of the current message body and envelope > plus the current time in milliseconds be random and/or unique enough? > I guess so. It depends on whether you want to go to the trouble of > hashing the whole message. IMO we should focus on quality of RNG. I see using the message as seed and contribution of pid as source of entropy as marginal questions. Maybe it is a waste of CPU cycles to hash a long email just for generating a MessageId. > One thing, though: use base36, not base64 - as recommended in [0]. > Base64 only saves 4 characters and you don't necessarily need to put all > 160 bits of the sha1 into the Message-ID. Also agreed. As the standard says, if there is software treating MessageId as case-insensitive, this shouldn't be exploitet. Kind regards, Gero