RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
That was an easy fix. Everything works as expected now. Thanks again. -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, December 05, 2013 1:46 PM To: java-user@lucene.apache.org Subject: RE: Analyzers aren't reusable?? (lucene 4.2.1) The problem is the Ch

RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
Thanks for the quick response. I'll read through the references. Thanks again Scott -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, December 05, 2013 1:46 PM To: java-user@lucene.apache.org Subject: RE: Analyzers aren't reusable?? (lucene 4.2.1) The p

RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Uwe Schindler
The problem is the CharFilter, which cannot be reused. To correctly implement the Analyzer do the wrapping of the incoming Reader in the protected initReader():http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/analysis/Analyzer.html#initReader(java.lang.String, java.io.Reader). In creat

Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
I wrote the following to demonstrate what for me was surprising behavior (this is Lucene 4.2.1). If you want to run this yourself, you should be able to since there are no references to anything other than standard lucene and java libraries. Basically, this is an analyzer that makes everything

Re: Custom Tokenizer

2013-12-05 Thread Erick Erickson
You can also string together one of a myriad of TokenFilters, see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters I'd recommend spending some time on the admin/analysis page to understand what all the combinations do. I'd also recommend against dealing with punctuation etc by using wi

Re: Custom Tokenizer

2013-12-05 Thread Furkan KAMACI
Hi; Standard tokenizer includes of that bydefault: StandardFilter, LowerCaseFilter and StopFilter You can consider char filters. Did you read here: https://cwiki.apache.org/confluence/display/solr/CharFilterFactories Thanks; Furkan KAMACI 2013/12/5 > Hi, > > I have used StandardAnalyzer in

Custom Tokenizer

2013-12-05 Thread raghavendra.k.rao
Hi, I have used StandardAnalyzer in my code and it is working fine. One of the challenges that I face is the fact that, this Analyzer by default tokenizes on some special characters such as hyphen, apart from the SPACE character. I want to tokenize only on the SPACE character. Could you please