I test the lucene spellchecker and it doesn't support chinese spell checker, how can i achieve this goal as google does?
2009/4/9 Karl Wettin <karl.wet...@gmail.com> > If you use prefix grams only then you'll get a forward-only suggestion > scheme. I've seen several implementation that use that and it works quite > well. > > harry potter: ^ha, ^har, ^harr, ^harry, ^harry p, ^harry po.. > harry houdini: ^ha, ^har, ^harr, ^harry, ^harry h, ^harry ho.. > > I prefere the trie-pattern though. Just rememberd there is an old one in > LUCENE-625. > > karl > > 8 apr 2009 kl. 20.50 skrev Matt Schraeder: > > > Corerct me if I'm wrong, but I don't think n-grams is really what I'm >> looking for here. I'm not looking for a spellchecker or phrase checker >> style suggestive search, but only based on the exact phrases the user is >> currently typing. Since Lucene uses term-based searching, I'm not sure >> how to have it search on portions of a full phrase. Using a standard >> lucene search typing in "harr" will result in searching for "harr" as a >> term, which will not find "Harry Potter". Using ngrams it would find >> "Harry" as a term, but not at the beginning of an entire phrase. This >> would bring back "My Dog Harry" as a result, which isn't what I'm >> looking for. I just want phrases from fields beginning with "Harr" >> only. >> >> I could easily do this all with our database server by simply doing a >> query for "where searchqueries like 'harr%'" but we're trying to limit >> our hits to the database to keep speed up on the site. >> >> karl.wet...@gmail.com 4/8/2009 12:49:45 PM >>> >>>>> >>>> >> For this you probably want to use ngrams. Wether or not this is >> something that fits in your current index is hard to say. My guess is >> >> that you want to create a new index with one document per unique >> phrase. You might also want to try to load this index in an >> InstantiatedIndex, that could speed things up quite a bit if the >> corpus is not too large. >> >> If your suggestion text corpus is really large and you only want >> forward-only suggestions then you might want to consider a trie- >> pattern solution instead. These can be rather resource efficient, even >> >> when loaded to memory. >> >> If you have a lot of user load on your search eninge then it might be >> >> interesting to use old user queries as the base of your suggestions >> and perhaps boost a bit on trends, i.e. the more people search for >> something the more it get boosted in the suggestions list. >> >> >> karl >> >> 8 apr 2009 kl. 15.26 skrev Matt Schraeder: >> >> I want to add a suggestive search similar to google's to >>> >> autocomplete >> >>> search phrases as the user types. It doesn't have to be very >>> elaborate >>> and for the most part will just involve searching single fields. >>> >> How >> >>> can I perform a search to be able to fill in autocomplete text? >>> >>> For instance, if I start typing "Harr" it should bring up "Harry >>> Potter" "Harry Houdini" and "Harry S. Truman" >>> >>> I have tried doing search queries for "Harr*" but it's still doing >>> term-based searching rather than searching a full field. To make a >>> field both searchable as the full field as well as tokenized, would >>> >> I >> >>> have to duplicate the field and make one a keyword field? Is there a >>> more convenient way to do this? I have also considered making a >>> >> second >> >>> index for suggestive search, which would only have the fields that I >>> want to enable suggestive search on, but this seems like it would be >>> unneccesary duplication of data as well, though it would probably >>> >> make >> >>> suggestive search faster due to a smaller index. >>> >>> Ideally it would also be nice to be able to rank these terms based >>> >> on >> >>> the number of times they have been searched for so that the results >>> >> >> are >>> tailored more to our users rather than simply just the score that >>> Lucene >>> chooses. >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- 王巍巍(Weiwei Wang) Department of Computer Science Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Mobile: 86-13913310569 MSN: ww.wang...@gmail.com Homepage: http://cs.nju.edu.cn/rl/weiweiwang