KeywordAnalyzer can not handle a whole complete sentence. On Tue, Dec 15, 2009 at 7:33 PM, Ganesh <emailg...@yahoo.co.in> wrote:
> How about KeywordAnalyzer? It will treat C++ and C# as single term. > > Regards > Ganesh > > ----- Original Message ----- > From: "Chris Lu" <chris...@gmail.com> > To: <java-user@lucene.apache.org> > Sent: Saturday, December 12, 2009 5:27 AM > Subject: Re: Lucene Analyzer that can handle C++ vs C# > > > > What we did in DBSight is to provide a reserved list of words for every > > Lucene Analyzer. > > This way you can handle any special characters like C++ and C#. > > > > Any common analyzers usually are not suitable for these special words. > > > > -- > > Chris Lu > > ------------------------- > > Instant Scalable Full-Text Search On Any Database/Application > > site: http://www.dbsight.net > > demo: http://search.dbsight.com > > Lucene Database Search in 3 minutes: > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > > DBSight customer, a shopping comparison site, (anonymous per request) got > 2.6 Million Euro funding! > > > > > > On 12/11/2009 9:09 AM, maxSchlein wrote: > >> Can someone please point me in the right direction. > >> > >> We are creating an application that needs to beable to search on C++ and > get > >> back doc's that have C++ in it. The StandardAnalyzer does not seem to > index > >> the "+", so a search for "C++" will bring back docs that contain, C++, > C, > >> C#, etc..... The WhiteSpaceAnalyzer will index the "+", but if we have > the > >> term "C++." that is, if C++ is at the end of a sentence, it will index > >> "C++." so a search for "C++" will not return the doc. I have heard of > maybe > >> a CustomAnalyzer; however, it seems like there would actually need to be > a > >> CustomFilter/CustomTokenizer, I looked at: > >> - StandardAnalyzer.java > >> - StandardFilter.java > >> - StandardTokenizer.java > >> - StandardTokenizerImpl.java > >> - StandardTokenizerImpl.jflex > >> > >> I would guess that the StandardTokenizer is where the changes would need > to > >> be made to allow the "+" character, but I am unclear as to how. > >> > >> Any and all help is greatly appreciated. > >> > >> Going thru all the documents, stripping out "+" for the word "plus" is > not > >> really an option for us. > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > Send instant messages to your online friends http://in.messenger.yahoo.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang