Looks like the attachment for the algorithm is missing from last email.  I have 
pasted the text here. Thanks in advance for any help.

#Start of the wikipedia-default.alg file

merge.factor=mrg:10:10:10
max.field.length=2147483647
#max.buffered=buf:10:10:100:100
ram.flush.mb=flush:16:16:16

compound=true

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=false
log.step=5000

docs.file=temp/enwiki-20070527-pages-articles.xml

content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=false
# 
-------------------------------------------------------------------------------------

{ "Rounds"

    ResetSystemErase

    { "Populate"
        CreateIndex
        { "MAddDocs" AddDoc > : 200000
        CloseIndex
    }

    NewRound

} : 3

RepSumByName
RepSumByPrefRound MAddDocs

#End of wikipedia-default.alg file

Thanks,

Sean


From: Sean Tong [mailto:st...@jamasoftware.com]
Sent: Sunday, December 11, 2011 11:54 PM
To: java-user@lucene.apache.org
Subject: Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?

Hi,

We plan to upgrade the Lucene library in our application from 2.4.1 to 3.5.0. I 
have been running  benchmark tests that come with Lucence. To my surprise, I 
found that the indexing  in 3.5.0 is significant slower than 2.4.1 for the 
Wikipedia data.

Attached is the algorithm for the tests.   The tests used default Lucence 
settings for flush memory size and merge factor. 512M memory was used  for the 
tasks.  The test machine is a 64-bit Windows 7 machine with Intel Core i7.

The command:
%ant -Dtask.alg=conf/wikipedia-default.alg -Dtask.mem=512M run-task

Here are the test results:

Lucece 2.4.1

       [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 
3 out of 14)

     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem

     [java] MAddDocs_200000     0 16.00  10        1       200000      1,609.1  
    124.29    89,218,496    241,631,232

     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -  - 1,746.4 
-  - 114.52 - 102,365,864 -  241,762,304

     [java] MAddDocs_200000     2 16.00  10        1       200000      1,566.8  
    127.65    69,428,144    174,194,688


Lucene 2.9.4

     [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3 
out of 14)

     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem

     [java] MAddDocs_200000     0 16.00  10        1       200000     1,046.49  
    191.12    82,676,152    139,657,216

     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -   1,165.35 
-  - 171.62 - 119,364,128 -  156,762,112

     [java] MAddDocs_200000     2 16.00  10        1       200000     1,245.86  
    160.53    50,361,760    137,625,600

Lucene 3.5.0

     [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3 
out of 14)

     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem

     [java] MAddDocs_200000     0 16.00  10        1       200000       676.48  
    295.65    70,917,592    129,695,744

     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -  -  626.13 
-  - 319.42 -  50,329,552 -   94,240,768

     [java] MAddDocs_200000     2 16.00  10        1       200000       687.68  
    290.83    57,732,640     92,864,512


The indexing speed using 2.4.1 is 2.3x  of the speed using 3.5.0.   Did I miss 
any settings or configurations?

Thanks,

Sean


Reply via email to