On Mon, 2007-12-10 at 23:29 +0800, Joe Wong wrote:
> Hi Timo,
> 
> Just take your suggestion. I have another collections of emails and running 
> full text search on that did not encounter any problem no matter they are on 
> NFS or local disk.
> 
> You mentioned that full text search is only working on for english only 
> mailbox, what is the current limitation of it? Is there any plan to support 
> non-english email ( conversion to UTF8? )

It should work with any UTF8 input, and I've tested that it works with
some mails containing non-ASCII characters. There's nothing in design
that prevents it. But I guess there is some bug then that causes these
problems. If you could send me a test mailbox where this happens I could
take a look at fixing it.

Although now that you mentioned it, I wonder if the current design could
be optimized to work a bit differently with Chinese/Japanese/etc.
Currently it works by indexing 4 character blocks, so with non-ASCII
UTF-8 input it may end up indexing more than 4 bytes per block. How many
bytes does a typical chinese UTF-8 character take? How many characters
does a typical chinese word take? How many characters are in your
typical search word?

I was just wondering if there's a lot of 1-3 character words, maybe the
indexing could limit itself to something like minimum of(4 characters,
~8 bytes). That would then take less space and memory.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to