RE: Can't get tokenization/stop works working

2010-02-02 Thread Digy
har c) { return char.IsLetterOrDigit(c); } } DIGY -Original Message- From: jchang [mailto:jchangkihat...@gmail.com] Sent: Tuesday, February 02, 2010 11:16 PM To: java-user@lucene.apache.org Subject: Re: Can't get tokenization/stop works working I am using

Re: Can't get tokenization/stop works working

2010-02-02 Thread jchang
I am using org.apache.lucene.analysis.snowball.SnowballAnalyzer. Looking through luke, I see that www.fubar.com was indexed, not fubar. So, clearly, I'm not stripping out the stop words of www and com. Any ideas? -- View this message in context: http://old.nabble.com/Can%27t-get-tokenizatio

Re: Can't get tokenization/stop works working

2010-02-01 Thread Ian Lea
If you make com a stop word then you won't be able to search for it, but a search for fubar should have worked. Are you sure your analyzer is doing what you want? You don't tell us what analyzer you are using. Tips: use Luke to see what has been indexed read the FAQ entry http://wiki.apache.

Can't get tokenization/stop works working

2010-01-31 Thread jchang
I want to be able to store a doc with a field with this as a substring: www.fubar.com And then I want this document to get returned when I query on fubar or fubar.com I assume what I should do is make www and com stop words, and make sure the field is tokenized, so it wil break it up along