Re: Text dependent analyzer

2015-04-20 Thread Shay Hummel
Hi Rich Thank you very much, I understand your solution and will try to do something in that spirit. Shay On Fri, Apr 17, 2015 at 8:35 PM Rich Cariens wrote: > Ahoy, ahoy! > > I was playing around with something similar for indexing multi-lingual > documents, Shay. The code is up on github > <

Re: Text dependent analyzer

2015-04-17 Thread Rich Cariens
Ahoy, ahoy! I was playing around with something similar for indexing multi-lingual documents, Shay. The code is up on github and needs attention, but you're welcome to see if anything in there helps. The basic idea is this: 1. A custom Cha

Re: Text dependent analyzer

2015-04-17 Thread Benson Margulies
If you wait tokenization to depend on sentences, and you insist on being inside Lucene, you have to be a Tokenizer. Your tokenizer can set an attribute on the token that ends a sentence. Then, downstream, filters can read-ahead tokens to get the full sentence and buffer tokens as needed. On Fri

Re: Text dependent analyzer

2015-04-17 Thread Ahmet Arslan
Hi Hummel, There was an effort to bring open-nlp capabilities to Lucene: https://issues.apache.org/jira/browse/LUCENE-2899 Lance was working on it to keep it up-to-date. But, it looks like it is not always best to accomplish all things inside Lucene. I personally would do the sentence detection

Re: Text dependent analyzer

2015-04-15 Thread Jack Krupansky
Currently, how are you indexing sentence boundaries? Are you placing sentences in distinct fields, leaving a position gap, or... what? Ultimately it comes down to how you intend to query the data in a way that respects sentence boundaries. To put it simply, whay exactly do you care where the sente

Re: Text dependent analyzer

2015-04-15 Thread Shay Hummel
Hi Ahment, Thank you for the reply, That's exactly what I am doing. At the moment, to index a document, I break it to sentences, and each sentence is analyzed (lemmatizing, stopword removal etc.) Now, what I am looking for is a way to create an analyzer (a class which extends lucene's analyzer). Th

Re: Text dependent analyzer

2015-04-14 Thread Ahmet Arslan
Hi Hummel, You can perform sentence detection outside of the solr, using opennlp for instance, and then feed them to solr. https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.sentdetect Ahmet On Tuesday, April 14, 2015 8:12 PM, Shay Hummel wrote: Hi I would l