Re: FieldCache and 2.9

Yonik Seeley Tue, 11 May 2010 06:41:46 -0700

You are requesting the FieldCache entry from the top-level reader and
hence a whole new FieldCache entry must be created.
Lucene 2.9 sorting requests FieldCache entries at the segment level
and hence reuses entries for those segments that haven't changed.


-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



On Tue, May 11, 2010 at 9:27 AM, Carl Austin <carl.aus...@detica.com> wrote:
> Hi,
>
> I have been using the FieldCache in lucene version 2.9 compared to that
> in 2.4. The load time is massively decreased, however I am not seeing
> any benefit in getting a field cache after re-open of an index reader
> when I have only added a few extra documents.
> A small test class is included below (based off one from Lucid
> Imagination), that creates 5Mil docs, gets a field cache, creates
> another few docs and gets the field cache again. I though the second get
> would be very very fast, as only 1 segment should have changed, however
> it takes more time for the reopen and cache get than it did the
> original.
>
> Am I doing something wrong here or have I misunderstood the new segment
> changes?
>
> Thanks
>
> Carl
>
>
> import java.io.File;
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
>
> public class ContrivedFCTest {
>
>        public static void main(String[] args) throws Exception {
>                Directory dir = FSDirectory.open(new File(args[0]));
>                IndexWriter writer = new IndexWriter(dir, new
> SimpleAnalyzer(), true,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 0; i < 5000000; i++) {
>                        if (i % 100000 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i,
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader = IndexReader.open(dir, true);
>                long start = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader, "field");
>                long end = System.currentTimeMillis();
>                System.out.println("load time for initial field cache:"
> + (end - start)
>                                / 1000.0f + "s");
>
>                writer = new IndexWriter(dir, new SimpleAnalyzer(),
> false,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 5000001; i < 5000005; i++) {
>                        if (i % 100000 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i,
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader2 = reader.reopen(true);
>                System.out.println("reader size = " +
> reader2.numDocs());
>                long start2 = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader2, "field");
>                long end2 = System.currentTimeMillis();
>                System.out.println("load time for re-opened field
> cache:"
>                                + (end2 - start2) / 1000.0f + "s");
>        }
> }
>
> This message should be regarded as confidential. If you have received this 
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy by 
> an authorised signatory.  The contents of this email may relate to dealings 
> with other companies within the Detica Limited group of companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: FieldCache and 2.9

Reply via email to