On Mon, Apr 20, 2020 at 05:57:00PM +0200, Vincent Lefevre wrote: > On 2020-04-19 16:34:57 +0200, Gero Treuner wrote: > > For the small purpose of avoiding collisions within a time frame of 1s > > a couple of extra bytes are comparatively high cost IMO. > > > > But you are right, and the timestamp could also be base64-ified to > > compensate. > > But why not using a cryptographic hash for the full local part, then? > This could be based on the full message (including the generated > headers at this point) + some random number. If there is a collision, > this would mean that the messages are the same, so that I don't think > this is an issue in practice... Or that cryptographic protocols are > broken, which would be a much more important issue than Message-Id > collisions.
Yes, for space considerations when using hashes it is best to exclusively use the hash output for the part before "@". With hash algorithms the concern is not about collisions. I assume all established algorithms (even older ones) are acceptable in this discipline. The concern is that the inputs based on local and/or private information can be leaked. To achieve this the search space must be big enough. For hiding our pid etc. all data which can be found in the same email or maybe related emails is of no use for feeding to the hash, because it can easily be inserted as constants in brute-force searchs. Only the random number remains as secret besides the data. We need data which is unrelated to the email but - to be deterministic with regard to other Mutt instances - is equal to all Mutt instances on the same machine (even if generated from different sources - every Mutt developer has a separate "head" version, right? ;-) In my example in another branch of this thread I proposed to use inode contents from central files to source this kind of data. (Note: It was about random numbers, but "deterministic" seeding lead to a similar question.) If we fail and leak this, risk is limited as it is easier to access files by path than by inode number (when we take care that file timestamps are chosen so that they can't be matched against system software distributions to not add other kind of information). But maybe there are better ideas where to find irrelevant but constant system data which we can include in hashes (so that as currently exactly the pid and a sequence number remain to safely distinguish Mutt instances). Kind regards, Gero