Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread 1world1love
Ok. Just to followup, I performed the same steps with another of our indexes and did not have the same issue: Opening index @ /lucenedata/index4 Segments file=segments_85 numSegments=1 version=FORMAT_HAS_PROX [Lucene 2.4] 1 of 1: name=_42 docCount=3986767 compound=true hasProx=true

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread 1world1love
Michael McCandless-2 wrote: > > That exception seems to indicate that the fdx file being opened by > FieldsReader is 0 length (it's trying to read the first int from that > file). > > Is the exception repeatable, if you try again to call > IndexReader.open? > > It's odd that CheckIndex finds

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread 1world1love
Toke Eskildsen wrote: > > A quick check when a corrupt index problem is encountered: > Does any of your machines run Java 1.6.0_04-1.6.0_10b25? > Thanks Toke. As I mentioned in my response to Erick, this is complicated by the fact that the error is within a java stored procedure in Oracle. Th

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-07 Thread 1world1love
Erick Erickson wrote: > > I guess my first question, based on your statement that you ran > checkindex from a different machine would be whether you have > the same version of Lucene installed on both machines? And how > did you get your index where it is now? did you optmize it in place > or d

java.io.IOException: read past EOF non-corrupt index

2009-01-06 Thread 1world1love
Greetings all. I have an index that I have optimized and when I try to open the index I get this: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java) at org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInte

Re: optimize: went from 14488449 to 38449

2008-12-22 Thread 1world1love
Michael McCandless-2 wrote: > > > How did you delete the documents? EG, by docID using IndexReader, by > Term or Query using IndexWriter? > > And when you said your previous index had 14488449 docs, was numDocs() > or maxDoc()? > > I deleted by docid. I got the number by numdocs. Jus

Re: optimize: went from 14488449 to 38449

2008-12-18 Thread 1world1love
Ganesh - yahoo wrote: > > Optimize will remove the deletes and rearrange the document numbers. > > Have you done some deletes before deleting 1.3 million docs? > > No, that is the crazy part. I haven't done anything to this index since it was first compiled until I did the deletes. That is

optimize: went from 14488449 to 38449

2008-12-18 Thread 1world1love
Ok. This is crazy. I have an index with 14,488,449 docs in it. Today I did a CheckIndex on it and everything looked fine. I made a copy of the index, ran a delete on about 1.3 million docs and then did an optimize and now my doc count is 38449. The index was originally built with 2.3, but I am no

RESOLVED: help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-17 Thread 1world1love
Just an FYI in case anyone runs into something similar. Essentially I had indexes that I have been searching from a java stored procedure in Oracle without issue for awhile. All of a sudden, I started getting the error I alluded to above when there were more than a certain number of terms (4,5, o

Re: help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-16 Thread 1world1love
OK, a little more information: I run this query via a java stored procedure within Oracle. However, I just ran the same query using the same code compiled in a separate class from a CL on a different server that has the same filesystem mounted. The queries ran fine from there. So I am wondering

help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-16 Thread 1world1love
Greetings all. I am having an issue that is driving me mad. I have many indexes ranging in size from 500K docs to 40mil docs. When I do a simple query containing multiple terms on any of the indexes, I get this: java.lang.ArrayIndexOutOfBoundsException at org.apache.lucene.util.ScorerDoc

Re: retrieve all docs efficiently - just one field

2008-06-11 Thread 1world1love
Thanks Erick. That is what I was assuming but couldn't confirm if it was worth going down those paths to acheive what I was hoping. Your essay was very informative about realistic expectations with the fieldselector. I actually just got through reading the discussion on deprecating hits which ess

Re: retrieve all docs efficiently - just one field

2008-06-11 Thread 1world1love
karl wettin-3 wrote: > > > I might be missing something here -- can't you just add the age field > to the index and include that in your query? > > Thanks for the response Karl: I just used the age field as an example, but in reality the structured data is copious and complex relationshi

retrieve all docs efficiently - just one field

2008-06-10 Thread 1world1love
Greetings all. I have read many posts concerning similar use cases, but I am still a little hazy on the best way to achieve what I need to do. Here is the background: 2 million documents with multiple sections, some sections contain structured data, some unstructured. We parse the docs and place

Re: storing position - keyword

2008-03-05 Thread 1world1love
First off Karl, thanks for your reply and your time. karl wettin-3 wrote: > > One could also say you are classifying your data based on keywords in > the text? > I probably didn't explain myself very well or more specifically provide a good example. In my case, there really isn't any relatio

storing position - keyword

2008-03-05 Thread 1world1love
Greetings all. I am indexing a set of documents where I am extracting terms and mapping them to a controlled vocabulary and then placing the matched vocabulary in a keyword field. What I want to know is if there is a way to store the original term location with the keyword field? Example Text: "T