hey,

can you try to use the ClassicAnalyzer instead of StandartAnalzyer in
3.5 since in 3.5 the StandartAnalyzer is a different implementation
than in 2.9 and 2.4 or rerun the 2.4 benchmarks with a
WhitespaceAnalyzer just for the comparison.

simon

On Mon, Dec 12, 2011 at 7:08 PM, Sean Tong <st...@jamasoftware.com> wrote:
> Looks like the attachment for the algorithm is missing from last email.  I 
> have pasted the text here. Thanks in advance for any help.
>
> #Start of the wikipedia-default.alg file
>
> merge.factor=mrg:10:10:10
> max.field.length=2147483647
> #max.buffered=buf:10:10:100:100
> ram.flush.mb=flush:16:16:16
>
> compound=true
>
> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> directory=FSDirectory
>
> doc.stored=true
> doc.tokenized=true
> doc.term.vector=false
> log.step=5000
>
> docs.file=temp/enwiki-20070527-pages-articles.xml
>
> content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
>
> query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker
>
> # task at this depth or less would print when they start
> task.max.depth.log=2
>
> log.queries=false
> # 
> -------------------------------------------------------------------------------------
>
> { "Rounds"
>
>    ResetSystemErase
>
>    { "Populate"
>        CreateIndex
>        { "MAddDocs" AddDoc > : 200000
>        CloseIndex
>    }
>
>    NewRound
>
> } : 3
>
> RepSumByName
> RepSumByPrefRound MAddDocs
>
> #End of wikipedia-default.alg file
>
> Thanks,
>
> Sean
>
>
> From: Sean Tong [mailto:st...@jamasoftware.com]
> Sent: Sunday, December 11, 2011 11:54 PM
> To: java-user@lucene.apache.org
> Subject: Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?
>
> Hi,
>
> We plan to upgrade the Lucene library in our application from 2.4.1 to 3.5.0. 
> I have been running  benchmark tests that come with Lucence. To my surprise, 
> I found that the indexing  in 3.5.0 is significant slower than 2.4.1 for the 
> Wikipedia data.
>
> Attached is the algorithm for the tests.   The tests used default Lucence 
> settings for flush memory size and merge factor. 512M memory was used  for 
> the tasks.  The test machine is a 64-bit Windows 7 machine with Intel Core i7.
>
> The command:
> %ant -Dtask.alg=conf/wikipedia-default.alg -Dtask.mem=512M run-task
>
> Here are the test results:
>
> Lucece 2.4.1
>
>       [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 
> 3 out of 14)
>
>     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s 
>  elapsedSec    avgUsedMem    avgTotalMem
>
>     [java] MAddDocs_200000     0 16.00  10        1       200000      1,609.1 
>      124.29    89,218,496    241,631,232
>
>     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -  - 1,746.4 
> -  - 114.52 - 102,365,864 -  241,762,304
>
>     [java] MAddDocs_200000     2 16.00  10        1       200000      1,566.8 
>      127.65    69,428,144    174,194,688
>
>
> Lucene 2.9.4
>
>     [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3 
> out of 14)
>
>     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s 
>  elapsedSec    avgUsedMem    avgTotalMem
>
>     [java] MAddDocs_200000     0 16.00  10        1       200000     1,046.49 
>      191.12    82,676,152    139,657,216
>
>     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -   1,165.35 
> -  - 171.62 - 119,364,128 -  156,762,112
>
>     [java] MAddDocs_200000     2 16.00  10        1       200000     1,245.86 
>      160.53    50,361,760    137,625,600
>
> Lucene 3.5.0
>
>     [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3 
> out of 14)
>
>     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s 
>  elapsedSec    avgUsedMem    avgTotalMem
>
>     [java] MAddDocs_200000     0 16.00  10        1       200000       676.48 
>      295.65    70,917,592    129,695,744
>
>     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -  -  626.13 
> -  - 319.42 -  50,329,552 -   94,240,768
>
>     [java] MAddDocs_200000     2 16.00  10        1       200000       687.68 
>      290.83    57,732,640     92,864,512
>
>
> The indexing speed using 2.4.1 is 2.3x  of the speed using 3.5.0.   Did I 
> miss any settings or configurations?
>
> Thanks,
>
> Sean
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to