Ok, got this working with one small caveat If the token starts with a comma, e.g. - ,dummy
I'd like to remove the comma like so public override bool IncrementToken() { .... else if (bufferLength > 1 && buffer[0] == ',' ) { // strip the starting , off offsetAtt.SetOffset(offsetAtt.StartOffset + 1, offsetAtt.EndOffset); } .... } But it doesn't work, any ideas on why that would be ? Thanks Kumar -----Original Message----- From: x10...@gmail.com [mailto:x10...@gmail.com] Sent: Monday, February 25, 2013 6:05 PM To: java-user@lucene.apache.org Subject: RE: Searching for keywords .net,c#,... I did search google on TokenFilter lucene example and found this link http://sujitpal.blogspot.com/2011/07/lucene-token-concatenating-tokenfilte r_30.html which seems to override incrementToken() ( guess as I don't know java ) however using lucene.net 3.0.3, I can override public override Token Next(Token result) public override Token Next() but not able to figure out how to proceed there, I tried to debug using public override Token Next(Token result) { Debug.WriteLine(string.Format(" --- {0}", result)); return result; } But went nowhere with that, any help on how to write my custom tokenFilter() Also, The analyzer I have is setup as below without the use of ReusableTokenStream() per the example in your link, not sure if that makes a difference ?? class MyAnalyzer : Analyzer { public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader) { TokenStream result = new WhitespaceTokenizer(reader); result = new LowerCaseFilter(result); result = new StandardFilter(result); result = new StopFilter(true, result, StopAnalyzer.ENGLISH_STOP_WORDS_SET); return result; } } -----Original Message----- From: Naresh [mailto:nnar...@gmail.com] Sent: Monday, February 25, 2013 1:18 AM To: java-user@lucene.apache.org Subject: Re: Searching for keywords .net,c#,... Hi, You can write your own token-filter to split on some characters (comma, | etc.,) and then build an analyzer using the WhiteSpaceTokenizer, LowerCaseFilter and your CustomTokenFilter. See http://stackoverflow.com/questions/9015348/lucene-custom-analyzer/9015658# 9015658 On Mon, Feb 25, 2013 at 11:24 AM, kumar <x10...@gmail.com> wrote: > Hello all > > I am a lucene novice and trying to setup lucene in a .net app using > lucene.net for searching through documents So far it has been > fantastic, however given that the users expectations are for > "google"-like search, running into issues searching for .net and c# > > Initially tried the StandardAnalyzer which of course does not work for > searching - .net & c# > Changed that to a custom analyzer using WhitespaceTokenizer and > LowerCaseFilter and it works > however some of the documents have the keywords as > > oracle,.net,C#,java etc. ( i.e. separated by commas without any space > ) > > and this custom analyzer fails here > > Looking for suggestions on how this might work as i'm sure it's > possible considering both lucene and .net/c# have been around for a > long long while > > It looks like PatternAnalyzer might be of some use in this case, > however i'm not quite sure how to use it and have found scant > references to it > > > Any help is appreciated > > Thanks > kumar > > -- Regards Naresh --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org