Re: Supported way to get segment from IndexWriter?

2010-01-14 Thread Chris Hostetter
: Since SegmentInfos is now public, you could use SegmentInfos.read to : read the current segments_N file, and then call its .size() method? : : But, this will only count as of the last commit... which is probably : not sufficient for SOLR-1559? Honestly: i have no idea, I'm a little out of touc

Re: IllegalArgumentException when IndexWriter.addDocument

2010-01-14 Thread Chris Lu
I am using a custom analyzer upgrading from Lucene 2.x. Need to get more familiar with Lucene 3.0 behavior. I think this is one of the upgrade pitfalls. Thanks for the help! Chris Uwe Schindler wrote: This problem occurs, if you have a Tokenizer or TokenFilter that produces new tokens but doe

RE: IllegalArgumentException when IndexWriter.addDocument

2010-01-14 Thread Uwe Schindler
This problem occurs, if you have a Tokenizer or TokenFilter that produces new tokens but does not call clearAttributes(). What TokenStreams do you use in your analyzer? If you not call clearAttributes() (see javadocs of Tokenizer!) whenever you produce new tokens (in any type of TokenStream), t

Re: IllegalArgumentException when IndexWriter.addDocument

2010-01-14 Thread Simon Willnauer
Which analyzer are you using? simon On Thu, Jan 14, 2010 at 10:40 PM, Chris Lu wrote: > Notes: I am using Lucene 3.0 >> >> Seems a integer overflow problem? >> >> java.lang.IllegalArgumentException: Increment must be zero or greater: >> -472893952 >>  at >> org.apache.lucene.analysis.tokenattribu

Re: IllegalArgumentException when IndexWriter.addDocument

2010-01-14 Thread Chris Lu
Notes: I am using Lucene 3.0 Seems a integer overflow problem? java.lang.IllegalArgumentException: Increment must be zero or greater: -472893952 at org.apache.lucene.analysis.tokenattributes.PositionIncrementAttributeImpl.setPositionIncrement(PositionIncrementAttributeImpl.java:58) at org

IllegalArgumentException when IndexWriter.addDocument

2010-01-14 Thread Chris Lu
Seems a integer overflow problem? java.lang.IllegalArgumentException: Increment must be zero or greater: -472893952 at org.apache.lucene.analysis.tokenattributes.PositionIncrementAttributeImpl.setPositionIncrement(PositionIncrementAttributeImpl.java:58) at org.apache.lucene.analysis.StopFilt

RE: RangeFilter

2010-01-14 Thread AlexElba
Did you completely re-index? Yes I did Here is method which creates index public void write(List data, Directory directory, Analyzer analyzer) { IndexWriter indexWriter = new IndexWriter(directory, analyzer, MaxFieldLength.LIMITED); try {

Re: Extracting contact data

2010-01-14 Thread mark harwood
> > Do you think I can get any advantage from building a solution on > Lucene? Lucene is generally about information retrieval not information extraction (as suggested, GATE or UIMA are more commonly used for extraction). However, Lucene can play a role in extraction if you use it for determining

Re: Extracting contact data

2010-01-14 Thread Erick Erickson
Ooooh boy, glad I asked the question because I was thinking in terms of real-world locations of the addresses , so nothing I would have written would have had any relevance whatsoever.. Erick On Wed, Jan 13, 2010 at 12:05 PM, Erick Erickson wrote: > Before answering, how to you measure "prox

Re: Extracting contact data

2010-01-14 Thread Julien Nioche
Hi, Tools like GATE (http://www.gate.ac.uk) or Apache UIMA would be good candidates for what you are trying to achieve. HTH -- DigitalPebble Ltd http://www.digitalpebble.com 2010/1/14 Ortelli, Gian Luca > > Well, the exact definition we're going to find out empirically, > as we run an impleme

Re: incremental document field update

2010-01-14 Thread Michael McCandless
Parallel incremental indexing (http://issues.apache.org/jira/browse/LUCENE-1879) is one way to solve this. Mike On Thu, Jan 14, 2010 at 4:27 AM, Babak Farhang wrote: >> Reading that trail, I wish the original poster gave up on his idea ( > > Err, that should have read.. > > "Reading that trail,

RE: Extracting contact data

2010-01-14 Thread Ortelli, Gian Luca
Well, the exact definition we're going to find out empirically, as we run an implementation through our data and look at the quality of results... For now, I would use the number of tokens between the finding ("a...@def.com") and the word that gives context ("Contact"). Anyway, replying to karl

Re: incremental document field update

2010-01-14 Thread Babak Farhang
> Reading that trail, I wish the original poster gave up on his idea ( Err, that should have read.. "Reading that trail, I wish the original poster hadn't given up on his idea" On Thu, Jan 14, 2010 at 2:23 AM, Babak Farhang wrote: > Hi, > > I've been thinking about how to update a single field

incremental document field update

2010-01-14 Thread Babak Farhang
Hi, I've been thinking about how to update a single field of a document without touching its other fields. This is an old problem and I was considering a solution along the lines of Andrzej Bialecki's post to the dev list back in '07: http://markmail.org/message/tbkgmnilhvrt6bii > I have the fo