RE: Searching for keywords .net,c#,...

x10179 Mon, 25 Feb 2013 22:04:04 -0800

Ok, got this working with one small caveat

If the token starts with a comma, e.g. - ,dummy


I'd like to remove the comma like so

public override bool IncrementToken()
{
....
                else if (bufferLength > 1 && buffer[0] == ',' )
                {
                    // strip the starting , off 
                    offsetAtt.SetOffset(offsetAtt.StartOffset + 1,
offsetAtt.EndOffset);
                }
....

}

But it doesn't work, any ideas on why that would be ?

Thanks
Kumar


-----Original Message-----
From: x10...@gmail.com [mailto:x10...@gmail.com] 
Sent: Monday, February 25, 2013 6:05 PM
To: java-user@lucene.apache.org
Subject: RE: Searching for keywords .net,c#,...

I did search google on TokenFilter lucene example and found this link
http://sujitpal.blogspot.com/2011/07/lucene-token-concatenating-tokenfilte
r_30.html
which seems to override incrementToken() ( guess as I don't know java )
however using lucene.net 3.0.3, I can override
           public override Token Next(Token result)
           public override Token Next()
but not able to figure out how to proceed there, I tried to debug using
            public override Token Next(Token result)
            {
                Debug.WriteLine(string.Format(" --- {0}", result));
                return result;
            }
But went nowhere with that, any help on how to write my custom
tokenFilter()




Also, The analyzer I have is setup as below without the use of
ReusableTokenStream() per the example in your link, not sure if that makes a
difference ??

    class MyAnalyzer : Analyzer
    {
        public override TokenStream TokenStream(string fieldName,
System.IO.TextReader reader)
        {
            TokenStream result = new WhitespaceTokenizer(reader);
            result = new LowerCaseFilter(result);
            result = new StandardFilter(result);
            result = new StopFilter(true, result,
StopAnalyzer.ENGLISH_STOP_WORDS_SET);
            return result;
        }
    }

-----Original Message-----
From: Naresh [mailto:nnar...@gmail.com]
Sent: Monday, February 25, 2013 1:18 AM
To: java-user@lucene.apache.org
Subject: Re: Searching for keywords .net,c#,...

Hi,
You can write your own token-filter to split on some characters (comma, |
etc.,) and then build an analyzer using the WhiteSpaceTokenizer,
LowerCaseFilter and your CustomTokenFilter.

See
http://stackoverflow.com/questions/9015348/lucene-custom-analyzer/9015658#
9015658

On Mon, Feb 25, 2013 at 11:24 AM, kumar <x10...@gmail.com> wrote:

> Hello all
>
> I am a lucene novice and trying to setup lucene in a .net app using 
> lucene.net for searching through documents So far it has been 
> fantastic, however given that the users expectations are for 
> "google"-like search, running into issues searching for .net and c#
>
> Initially tried the StandardAnalyzer which of course does not work for 
> searching - .net & c#
> Changed that to a custom analyzer       using WhitespaceTokenizer and
> LowerCaseFilter and it works
> however some of the documents have the keywords as
>
> oracle,.net,C#,java etc. ( i.e. separated by commas without any space
> )
>
> and this custom analyzer fails here
>
> Looking for suggestions on how this might work as i'm sure it's 
> possible considering both lucene and .net/c# have been around for a 
> long long while
>
> It looks like PatternAnalyzer might be of some use in this case, 
> however i'm not quite sure how to use it and have found scant 
> references to it
>
>
> Any help is appreciated
>
> Thanks
> kumar
>
>


--
Regards
Naresh


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Searching for keywords .net,c#,...

Reply via email to