Michael M Slusarz <slus...@curecanti.org> writes: > Quoting Eric Abrahamsen <e...@ericabrahamsen.net>: > >> While I've got you here, I hope you'll answer one more question: what's >> the format for searching multiple terms with non-ascii strings? Is it >> possible in one run to find a utf-8 encoded subject, and a utf-8 encoded >> body? > > IMAP interaction would look like this: > > C: . UID SEARCH CHARSET UTF-8 SUBJECT {4} > S: +OK > C: aéb BODY {4} > S: +OK > C: aéb > S: * SEARCH XXX > S: . OK > > Even better... if the server supports LITERAL+, you don't have to wait > for the synchronizing literal which prevents the need to wait for 2 > round-trips from the server: > > C: . UID SEARCH CHARSET UTF-8 SUBJECT {4+} > C: aéb BODY {4+} > C: aéb[CRLF] > S: * SEARCH XXX > S: . OK > > michael
One other question: I've set up full text search indexing via Lucene, and it works great. But how is this index encoded? Specifically, if I use the above method to search for non-ascii strings, am I still benefiting from the speedups of the search index? I know that some people who are indexing non-ascii, non-UTF-8 messages are running them through some sort of decoder to force them into UTF-8, so that Lucene can index them properly. Is this still necessary if I'm using the method above? Thanks! Eric