subject:"Re\: IndexWriter croaks on large file"

Re: IndexWriter croaks on large file

2014-02-19 Thread Tri Cao

John,Sure you can add identical documents to index if you like. I don't think Lucene requires a unique ID field, only Solr does. Lucene documents have internal doc IDs auto generated when indexing or merging index segments.If I remember correctly, Lucene 4.1 started doing cross document compression

Re: IndexWriter croaks on large file

2014-02-19 Thread John Cecere

Thanks Tri. I've tried a variation of the approach you suggested here and it appears to work well. Just one question. Will there be a problem with adding multiple Document objects to the IndexWriter that have the same field names and values for the StoredFields ? They all have different TextField

Re: IndexWriter croaks on large file

2014-02-14 Thread Tri Cao

As docIDs are ints too, it's most likely he'll hit the limit of 2B documents per index though withthat approach though :)I do agree that indexing huge documents doesn't seem to have a lot of value, even when youknow a doc is a hit for a certain query, how are you going to display the results to use

Re: IndexWriter croaks on large file

2014-02-14 Thread Glen Newton

You should consider making each _line_ of the log file a (Lucene) document (assuming it is a log-per-line log file) -Glen On Fri, Feb 14, 2014 at 4:12 PM, John Cecere wrote: > I'm not sure in today's world I would call 2GB 'immense' or 'enormous'. At > any rate, I don't have control over the siz

Re: IndexWriter croaks on large file

2014-02-14 Thread John Cecere

I'm not sure in today's world I would call 2GB 'immense' or 'enormous'. At any rate, I don't have control over the size of the documents that go into my database. Sometimes my customer's log files end up really big. I'm willing to have huge indexes for these things. Wouldn't just changing from

Re: IndexWriter croaks on large file

2014-02-14 Thread Michael McCandless

Hmm, why are you indexing such immense documents? In 3.x Lucene never sanity checked the offsets, so we would silently index negative (int overflow'd) offsets into e.g. term vectors. But in 4.x, we now detect this and throw the exception you're seeing, because it can lead to index corruption when

Re: IndexWriter croaks on large file

Re: IndexWriter croaks on large file

Re: IndexWriter croaks on large file

Re: IndexWriter croaks on large file

Re: IndexWriter croaks on large file

Re: IndexWriter croaks on large file

6 matches

Site Navigation

Mail list logo

Footer information