Re: Access next token in a stream

2012-02-09 Thread Damerian
Στις 9/2/2012 11:12 μμ, ο/η Steven A Rowe έγραψε: Damerian, When I said "clear the previous token", I was referring to the pseudo-code I gave in my first response to you. There is no built-in method to do that. If you want to conditionally output tokens, you should store AttributeSource clon

RE: Access next token in a stream

2012-02-09 Thread Steven A Rowe
Damerian, When I said "clear the previous token", I was referring to the pseudo-code I gave in my first response to you. There is no built-in method to do that. If you want to conditionally output tokens, you should store AttributeSource clones, as in my pseudo-code. Steve > -Original M

Re: Access next token in a stream

2012-02-09 Thread Damerian
Στις 9/2/2012 10:51 μμ, ο/η Steven A Rowe έγραψε: Damerian, The technique I mentioned would work for you with a little tweaking: when you see consecutive capitalized tokens, then just set the CharTermAttribute to the joined tokens, and clear the previous token. Another idea: you could use Shi

RE: Access next token in a stream

2012-02-09 Thread Steven A Rowe
Damerian, The technique I mentioned would work for you with a little tweaking: when you see consecutive capitalized tokens, then just set the CharTermAttribute to the joined tokens, and clear the previous token. Another idea: you could use ShingleFilter with min size = max size = 2, and then u

Re: Access next token in a stream

2012-02-09 Thread Damerian
Στις 9/2/2012 8:54 μμ, ο/η Steven A Rowe έγραψε: Hi Damerian, One way to handle your scenario is to hold on to the previous token, and only emit a token after you reach at least the second token (or at end-of-stream). Your incrementToken() method could look something like: 1. Get current att

RE: Access next token in a stream

2012-02-09 Thread Steven A Rowe
Hi Damerian, One way to handle your scenario is to hold on to the previous token, and only emit a token after you reach at least the second token (or at end-of-stream). Your incrementToken() method could look something like: 1. Get current attributes: input.incrementToken() 2. If previous toke

Access next token in a stream

2012-02-09 Thread Damerian
Hello i want to implement my custom filter, my wuestion is quite simple but i cannot find a solution to it no matter how i try: How can i access the TermAttribute of the next token than the one i currently have in my stream? For example in the phrase "My name is James Bond" if let's say i a

Re: confirm unsubscribe from java-user@lucene.apache.org

2012-02-09 Thread Christof Schablinski
Mit freundlichen Grüßen Christof Schablinski Devoteam Danet GmbH, Waldburgstrasse 17 - 19, 70563 Stuttgart, Germany Phone: +49 6151 868 8730, Fax: +49 6151 868 8753 E-Mail: christof.schablin...@devoteam.com, URL: www.devoteam.de --

Re: analyzer per document

2012-02-09 Thread Paul Libbrecht
I would use a different field per language and use PerFieldAnalyzer indeed. This is also important for queries whose language is not always clear. paul Le 9 févr. 2012 à 13:01, Vinaya Kumar Thimmappa a écrit : > Hello All, > > I have a requirement of using different analyzer per document. How

Re: Index writing performance of 3.5

2012-02-09 Thread Simon Willnauer
one major thing that changed from 3.0.3 to 3.5 is that we use TieredMergePolicy by default. can you try to use the same merge policy on both 3.0.3 and 3.5 and report back? ie LogByteSizeMergePolicy or whatever you are using... simon On Thu, Feb 9, 2012 at 5:28 AM, Vitaly Funstein wrote: > Hello,

Re: analyzer per document

2012-02-09 Thread Francisco A. Lozano
Why don't you store each "file" in a single document, add a field for each "line" and use a PerFieldAnalyzerWrapper? Francisco A. Lozano On Thu, Feb 9, 2012 at 13:01, Vinaya Kumar Thimmappa wrote: > Hello All, > > I have a requirement of using different analyzer per document. How can > we do t

Re: IndexWriter in 3.5

2012-02-09 Thread Ian Lea
Yes, this changed at some point. In recent releases nothing is written to the index unless you close(), or maybe commit(), the writer. -- Ian. On Thu, Feb 9, 2012 at 12:02 PM, Ganesh wrote: > Hello all, > > In 3.0.3 the following code works fine but in 3.5, it throws exception "No > segments

IndexWriter in 3.5

2012-02-09 Thread Ganesh
Hello all, In 3.0.3 the following code works fine but in 3.5, it throws exception "No segments found". In case of 3.0.3, Just creating writer will create files, segments.gen, segments_1 and write.lock. In case of 3.5, only write.lock is created. Create Index Writer Open Reader Add documents

analyzer per document

2012-02-09 Thread Vinaya Kumar Thimmappa
Hello All, I have a requirement of using different analyzer per document. How can we do this? My analyzer would be locale specific. I have a file with 10 lines, each with different language. Document would be one line and I want analyzer to be changed based on the locale of the line. Is this po

Fwd: Delete words in a specific increment Position with Lucene

2012-02-09 Thread Damerian
Αρχικό Μήνυμα Θέμα: Delete words in a specific increment Position with Lucene Ημερομηνία: Tue, 07 Feb 2012 18:48:03 +0100 Από:Damerian Προς: java-user-subscr...@lucene.apache.org Greetings, I used Lucene to make a simple filter that recognizes main names (Two c