On 10/11/2013 03:19 PM, Michael Sokolov wrote:
On 10/11/2013 03:04 PM, Adrien Grand wrote:
On Fri, Oct 11, 2013 at 7:03 PM, Michael Sokolov
<msoko...@safaribooksonline.com> wrote:
I've been running some tests comparing storing large fields
(documents, say
100K .. 10M) as files vs. storing them in Lucene as stored fields.
Initial
results seem to indicate storing them externally is a win (at least for
binary docs which don't compress, and presumably we can compress the
external files if we want, too), which seems to make sense. There
will be
some issues with huge directories, but that might be worth solving.
So I'm wondering if there is a codec that does that? I haven't seen
one
talked about anywhere.
I don't know about any codec that works this way but such a codec
would quickly exceed the amount of available file descriptors.
I'm not sure I understand. I was thinking that the stored fields
would be accessed infrequently (only when writing or reading the
particular stored field value), and the file descriptor would only be
in use during the read/write operation - they wouldn't be held open.
So for example during query scoring one wouldn't need to visit these
fields I think? But I may have a fundamental misunderstanding about
how Lucene uses its codecs: this is new to me.
-Mike
My thought was to keep a folder hierarchy (per-segment, I think) to
avoid too many files in a folder -- maybe that's the problem you were
referring to, Adrien? But there is a real problem in that there isn't
sufficient information available when merging to avoid copying files, it
seems. It would be nice to hard link a file in order to move it to a new
segment. I think without that the gains will be much less attractive.
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org