Hi, On 2009-07-05 11:39:04 +0200, Rocco Rutte wrote: > Hi, > > * Vincent Lefevre wrote: > > > I don't know what you mean here, but by default, Mutt does bad things > > with charsets. The $thorough_search variable is broken by design and > > should be removed. > > Can you explain that a bit, please?
I've attached a testcase. Open it under, say, UTF-8 locales, and with $thorough_search variable unset (the problem doesn't occur when it is set). Then limit to "~Bé". Only the message "body in utf-8" is found. Note: I wonder why Mutt doesn't encode the regexp in the encoding of the message / body part (any time it finds a new encoding). IMHO, that would be faster than decoding the body part as the number of different encodings remain limited in practice. Note that the manual says: Users searching attachments or for non-ASCII characters should set this value because decoding also includes MIME parsing/decoding and possible character set conversions. Otherwise mutt will attempt to match against the raw message received (for example quoted-printable encoded or with encoded headers) which may lead to incorrect search results. but this is worse than that. For instance, I need to set $thorough_search to search for some strings with ASCII characters only, when such strings contain a space, as some mailers encode all spaces as =20 (more generally they can also occur at the end of a line). -- Vincent Lefèvre <vinc...@vinc17.org> - Web: <http://www.vinc17.org/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/> Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)
>From a...@b.invalid Sun Jul 5 00:43:52 2009 Date: Sun, 5 Jul 2009 00:43:52 +0200 Subject: body in utf-8 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Status: RO Content-Length: 57 Lines: 1 In the index, limit to ~Bé with $thorough_search unset. >From a...@b.invalid Sun Jul 5 00:43:52 2009 Date: Sun, 5 Jul 2009 00:43:52 +0200 Subject: body in iso-8859-1 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit Status: RO Content-Length: 56 Lines: 1 In the index, limit to ~Bé with $thorough_search unset.