I would like to try to replace our external storage of documents with Lucene 
stored field, so a few questions before we proceed:

Background: We store currently complete documents  in a simple binary file and 
only keep offsets into this file as a Stored field   in Lucene index. Documents 
(compressed) are short, avg 300-400bytes per document

What we would like to try is to store these documents as binary (compressed 
with simple/fast static huffman + dictionary) stored Fields in order to make 
maintenance of this setup easier as Lucene already does maintain updates and 
fetching of these fields, also indexing works very well from multiple threads 
now. Complete Document would be the only Stored field we have in index (I know 
about lazy loading). 
Simply what we do today is , search index, get offsets from found documents and 
than fetch them from binary file. 

So the questions would be:

1. Does anyone see any theoretical/practical reason why our homemade "fetch 
offset from lucene stored field-> jump to this offset in document" would be 
faster than "fetch stored field from Lucene directly"

2.  We use Lucene Index wit MMAP directory now, so the concern is that index 
could grow too large for MMAP with stored field like that.  Is there a way to 
say, "do not use MMAP Directory for stored Fields, rather FSDirectory". I think 
not, but it is worth to ask as I think this could be useful thing... Imagine to 
be able to say "Load terms with RAM Directory, postings wit MMAP and stored 
fields with FSDirectory"... of course this is only for searching, not indexing.

Thanks a lot.
Eks
 



      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to