On 2023-06-25 at 10:59:53 UTC-0400 (Sun, 25 Jun 2023 16:59:53 +0200)
Robert M. Münch <mailmate@lists.freron.com>
is rumored to have said:
[...]
One serious issue with indexing email is that email is highly divergent in data structure, and while you can do a simple index for basic standard mail metadata, "full text" and "all headers" search for mail is a nightmare because real-world mail breaks almost every rule theoretically governing it and it is not a simple matter to determine what is or is not body text. Email typically arrives with multiple alternative parts theoretically representing the same message, possibly QP or B64 encoded and usually including one version with HTML markup. And that markup can be bad, wrong, or even intentionally malicious.

Well, MM already handles all this, otherwise we couldn't use it as we do. Those parts are will known to MM.

I've had a bug open for quite a while regarding a MM parsing problem with pure text messages generated by automated tools.

I don't know what the root cause of that is, but I am certain that Benny does not have all the arcana handled.

Very large mail stores are inherently tough to search.

After pre-processing all the mail mess, I don't think so. Searching in Gmail is fast. MM is already much better than other clients.

I haven't looked in a long while but last I checked, GMail could not search on arbitrary headers. Have they fixed that?

That's a huge part of the scaling problem. There are not a lot of people who really use that feature, but we do value it highly. The extremely long tail of headers and full-text tokens that only appear in a small number of messages makes mail particularly hard to search efficiently if you include all the garbage spam full of 'hashbusters' and such.

IMO the use-case search *1+ million emails as fast as possible* is just not in scope for most of the clients.

Right, because most users do not need that. I don't know that any mail client does it as well as MM with the same search capabilities.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
_______________________________________________
mailmate mailing list
mailmate@lists.freron.com
https://lists.freron.com/listinfo/mailmate

Reply via email to