On 10/11/2013 03:19 PM, Michael Sokolov wrote:
On 10/11/2013 03:04 PM, Adrien Grand wrote:
On Fri, Oct 11, 2013 at 7:03 PM, Michael Sokolov
<msoko...@safaribooksonline.com> wrote:
I've been running some tests comparing storing large fields (documents, say 100K .. 10M) as files vs. storing them in Lucene as stored fields. Initial
results seem to indicate storing them externally is a win (at least for
binary docs which don't compress, and presumably we can compress the
external files if we want, too), which seems to make sense. There will be
some issues with huge directories, but that might be worth solving.

So I'm wondering if there is a codec that does that? I haven't seen one
talked about anywhere.
I don't know about any codec that works this way but such a codec
would quickly exceed the amount of available file descriptors.

I'm not sure I understand. I was thinking that the stored fields would be accessed infrequently (only when writing or reading the particular stored field value), and the file descriptor would only be in use during the read/write operation - they wouldn't be held open. So for example during query scoring one wouldn't need to visit these fields I think? But I may have a fundamental misunderstanding about how Lucene uses its codecs: this is new to me.

-Mike
My thought was to keep a folder hierarchy (per-segment, I think) to avoid too many files in a folder -- maybe that's the problem you were referring to, Adrien? But there is a real problem in that there isn't sufficient information available when merging to avoid copying files, it seems. It would be nice to hard link a file in order to move it to a new segment. I think without that the gains will be much less attractive.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to