hey,

so what I wonder in general is if the benchmarks are comparable. What
I mean is that the benchmark code has changed since 2.4 a lot so there
might be additional fields and / or different settings on what to
index and how.
could you check with luke if the index has the same fields and if the
settings are the same / similar and report it back? I also wonder if
it maybe now uses update instead of add ie. buffers and applies
deletes etc.

simon

On Mon, Dec 12, 2011 at 10:03 PM, Sean Tong <st...@jamasoftware.com> wrote:
> Thanks Simon for your response.
>
> I just re-ran the 3.5 benchmark with the ClassicAnalyzer. Here are the 
> results:
>
>     [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3 
> out of 14)
>     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s 
>  elapsedSec    avgUsedMem    avgTotalMem
>     [java] MAddDocs_200000     0 16.00  10        1       200000       715.76 
>      279.42    48,828,144    128,057,344
>     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -  -  679.04 
> -  - 294.53 -  68,321,424 -   85,721,088
>     [java] MAddDocs_200000     2 16.00  10        1       200000       761.95 
>      262.49    63,139,256     91,881,472
>
> The performance is slightly better than the one using StandardAnalyzer,  but  
> this is still much worse than the performance with 2.4.1.
>
> Sean
>
> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
> Sent: Monday, December 12, 2011 12:20 PM
> To: java-user@lucene.apache.org
> Subject: Re: Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia 
> data?
>
> hey,
>
> can you try to use the ClassicAnalyzer instead of StandartAnalzyer in
> 3.5 since in 3.5 the StandartAnalyzer is a different implementation than in 
> 2.9 and 2.4 or rerun the 2.4 benchmarks with a WhitespaceAnalyzer just for 
> the comparison.
>
> simon
>
> On Mon, Dec 12, 2011 at 7:08 PM, Sean Tong <st...@jamasoftware.com> wrote:
>> Looks like the attachment for the algorithm is missing from last email.  I 
>> have pasted the text here. Thanks in advance for any help.
>>
>> #Start of the wikipedia-default.alg file
>>
>> merge.factor=mrg:10:10:10
>> max.field.length=2147483647
>> #max.buffered=buf:10:10:100:100
>> ram.flush.mb=flush:16:16:16
>>
>> compound=true
>>
>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>> directory=FSDirectory
>>
>> doc.stored=true
>> doc.tokenized=true
>> doc.term.vector=false
>> log.step=5000
>>
>> docs.file=temp/enwiki-20070527-pages-articles.xml
>>
>> content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentS
>> ource
>>
>> query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker
>>
>> # task at this depth or less would print when they start
>> task.max.depth.log=2
>>
>> log.queries=false
>> #
>> ----------------------------------------------------------------------
>> ---------------
>>
>> { "Rounds"
>>
>>    ResetSystemErase
>>
>>    { "Populate"
>>        CreateIndex
>>        { "MAddDocs" AddDoc > : 200000
>>        CloseIndex
>>    }
>>
>>    NewRound
>>
>> } : 3
>>
>> RepSumByName
>> RepSumByPrefRound MAddDocs
>>
>> #End of wikipedia-default.alg file
>>
>> Thanks,
>>
>> Sean
>>
>>
>> From: Sean Tong [mailto:st...@jamasoftware.com]
>> Sent: Sunday, December 11, 2011 11:54 PM
>> To: java-user@lucene.apache.org
>> Subject: Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?
>>
>> Hi,
>>
>> We plan to upgrade the Lucene library in our application from 2.4.1 to 
>> 3.5.0. I have been running  benchmark tests that come with Lucence. To my 
>> surprise, I found that the indexing  in 3.5.0 is significant slower than 
>> 2.4.1 for the Wikipedia data.
>>
>> Attached is the algorithm for the tests.   The tests used default Lucence 
>> settings for flush memory size and merge factor. 512M memory was used  for 
>> the tasks.  The test machine is a 64-bit Windows 7 machine with Intel Core 
>> i7.
>>
>> The command:
>> %ant -Dtask.alg=conf/wikipedia-default.alg -Dtask.mem=512M run-task
>>
>> Here are the test results:
>>
>> Lucece 2.4.1
>>
>>       [java] ------------> Report sum by Prefix (MAddDocs) and Round
>> (3 about 3 out of 14)
>>
>>     [java] Operation       round flush mrg   runCnt   recsPerRun
>> rec/s  elapsedSec    avgUsedMem    avgTotalMem
>>
>>     [java] MAddDocs_200000     0 16.00  10        1       200000
>> 1,609.1      124.29    89,218,496    241,631,232
>>
>>     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -  -
>> 1,746.4 -  - 114.52 - 102,365,864 -  241,762,304
>>
>>     [java] MAddDocs_200000     2 16.00  10        1       200000
>> 1,566.8      127.65    69,428,144    174,194,688
>>
>>
>> Lucene 2.9.4
>>
>>     [java] ------------> Report sum by Prefix (MAddDocs) and Round (3
>> about 3 out of 14)
>>
>>     [java] Operation       round flush mrg   runCnt   recsPerRun
>> rec/s  elapsedSec    avgUsedMem    avgTotalMem
>>
>>     [java] MAddDocs_200000     0 16.00  10        1       200000
>> 1,046.49      191.12    82,676,152    139,657,216
>>
>>     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -
>> 1,165.35 -  - 171.62 - 119,364,128 -  156,762,112
>>
>>     [java] MAddDocs_200000     2 16.00  10        1       200000
>> 1,245.86      160.53    50,361,760    137,625,600
>>
>> Lucene 3.5.0
>>
>>     [java] ------------> Report sum by Prefix (MAddDocs) and Round (3
>> about 3 out of 14)
>>
>>     [java] Operation       round flush mrg   runCnt   recsPerRun
>> rec/s  elapsedSec    avgUsedMem    avgTotalMem
>>
>>     [java] MAddDocs_200000     0 16.00  10        1       200000
>> 676.48      295.65    70,917,592    129,695,744
>>
>>     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -  -
>> 626.13 -  - 319.42 -  50,329,552 -   94,240,768
>>
>>     [java] MAddDocs_200000     2 16.00  10        1       200000
>> 687.68      290.83    57,732,640     92,864,512
>>
>>
>> The indexing speed using 2.4.1 is 2.3x  of the speed using 3.5.0.   Did I 
>> miss any settings or configurations?
>>
>> Thanks,
>>
>> Sean
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to