Re: Sending a document to IndexWriter field by field

Michael McCandless Thu, 20 Feb 2014 12:51:01 -0800

Yes, in 4.x IndexWriter now takes an Iterable that enumerates the
fields one at a time.


You can also pass a Reader to a Field.

That said, there will still be massive RAM required by IW to hold the
inverted postings for that one document, likely much more RAM than the
original document's String contents.

And, such huge documents are rarely useful in practice.  E.g., how
will you "deliver" that hit to the end user at search time?  Will
scores actually make sense for such enormous documents?  It's better
to break them up into more manageable sizes.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 20, 2014 at 3:22 PM, Igor Shalyminov
<ishalymi...@yandex-team.ru> wrote:
> Hello!
>
> I'va faced a problem of indexing huge documents. The indexing itself goes 
> allright, but when the document processing becomes concurrent, OutOfMemories 
> start appearing (even with heap of about 32GB).
> The issue, as I see it, is that I have to create a Document instance to send 
> it to IndexWriter, and Document is just a collection of all the fields, all 
> in RAM.
> With my huge fields, it would be so much better to have the ability of 
> sending document fields for writing one by one, keeping no more than a single 
> field in RAM.
> Is it possible in the latest Lucene?
>
> --
> Best Regards,
> Igor Shalyminov
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Sending a document to IndexWriter field by field

Reply via email to