From: [EMAIL PROTECTED] (Craig Small) Subject: Re: search.debian.org is online Date: Thu, 16 Jan 2003 09:35:00 +1100
> On Wed, Jan 15, 2003 at 03:32:43PM +0900, Tomohiro KUBOTA wrote: > > > > I'd like the mnoGoSearch of search.debian.org to be recompiled > > with extra-charsets enabled, because it (I expect) immediately > > benefits Korean. (Note that Korean doesn't have the problem 2). > > Since it doesn't need the newer version of mnoGoSearch with ChaSen > > support (CVS version 3.2.8, to solve problem 2), it can be done now! > > Except we're using UTF-8, so it shouldn't matter, I think. mnoGoSearch uses Unicode internally for their indexing and searching in the current configuration, as you wrote. Thus, it needs to convert HTML files into Unicode before processing them and it needs converters. The default compilation of mnoGoSearch omits converters to Unicode from east Asian encodings (ISO-2022-JP, EUC-KR, Big5, GB2312), and this is why it cannot index nor search east Asian pages. Compilation with the ./configure option will enable this. Though Japanese and Chinese have further problem (problem 2), Korean should be solved by this. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/