Re: external file stored field codec

Michael Sokolov Sun, 13 Oct 2013 17:11:17 -0700

On 10/13/2013 1:52 PM, Adrien Grand wrote:

Hi Michael,


I'm not aware enough of operating system internals to know what
exactly happens when a file is open but it sounds to be like having
separate files per document or field adds levels of indirection when
loading stored fields, so I would be surprised it it actually proved
to be more efficient than storing everything in a single file.

That's true, Adrien, there's definitely a cost to using files. There aresome gnarly challenges in here (mostly to do with the large number offiles, as you say, and with cleaning up after deletes - deletion isalways hard). I'm not sure it's going to be possible to both clean upand maintain files for stale commits; this will become problematic inthe way that having index files on NFS mounts are problematic.

I think the hope is that there will be countervailing savings duringwrites and merges (mostly) because we may be able to cleverly avoidcopying the contents of stored fields being merged. There may also besavings when querying due to reduced RAM requirements since the largestored fields won't be paged in while performing queries. As I said,some simple tests do show improvements under at least somecircumstances, so I'm pursuing this a bit further. I have a preliminaryimplementation as a codec now, and I'm learning a bit about Lucene'sindex internals. BTW SimpleTextCodec is a great tool for learning anddebugging.

The background for this is a document store with large files (thinkPDFs, but lots of formats) that have to be tracked, and have associatedmetadata. We've been storing these externally, but it would bebeneficial to have a single data management layer: i.e. to push thisdown into Lucene, for a variety of reasons. For one, we could rely onSolr to do our replication for us.


I'll post back when I have some measurements.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: external file stored field codec

Reply via email to