This *may* be relevant, I haven't needed to investigate it yet...
http://issues.apache.org/jira/browse/LUCENE-845 Also, see the thread titled "MergeFactor and MaxBufferedDocs value should ...?" for an interesting discussion of how to optimize indexing, although I'm not sure the notion of using IndexWriter.ramSizeInBytes is of too much use when merging indexes.... Erick On 4/18/07, d m <[EMAIL PROTECTED]> wrote:
I'd like to share index merge performance data and have a couple of questions about it... We (AXS-One, www.axsone.com) build one "master" index per day. For backup and recovery purposes, we also build many individual "mini" indexes from the docs added to the master index. Should one of our master indexes become unusable (for whatever reason - and I'm glad to say this has not yet happened), we plan to reconstruct it by merging its mini indexes. I've done some merge testing so we have an idea of how long it will take to reconstruct a master index. For testing purposes, I have created 1,000 mini indexes. Each: - contains 1,000 documents - is optimized - uses the compound file format The avg doc size across the 1 million docs is: 10.8 KB My testing has been to merge the 1,000 mini indexes to an empty destination index. Destination index settings: - mergeFactor: 40 - minMergeDocs: 10,000 - maxMergeDocs: Integer.MAX_VALUE - use compound file format Those values were obtained from some empirical (but not exhaustive) merge testing. In each test run I merge N mini indexes into a single destination index. Each merge starts with an empty destination index. N increases by 25 for each data point. This means our test merges 25 minis to an empty index. Then merges 50 minis to an empty index. Etc... until it merges 1,000 minis to an empty index. Mini indexes merged with V2.1 were indexed with V2.1. Mini indexes merged with V2.0 were indexed with V2.0. Hardware: - 64-bit - 4 CPUs: AMD Opteron 280, 2.41 GHz - 12.8 GB RAM - 1.63 TB disk space / 6 SCSI drives / RAID ?? Software: - Lucene V2.1 and V2.0 - Java 1.6 - Windows Server 2003, SP 1 I did tests with: - Lucene 2.1 using addIndexesNoOptimize() - 2 identical runs - Lucene 2.1 using addIndexes() - 1 run - Lucene 2.0 using addIndexes() - 1 run The recorded merge times (in seconds) include a final call to optimize() the destination index after returning from addIndexesNoOptimize() or addIndexes(). I've included the test data below. (If you'd like, I can email an Excel version of the data with a graph.) A few things caught my attention (seen easily when graphing "Indexes" vs "Merge Time (secs)"): 1. The runs (3 & 4) using addIndexes() show a relatively smooth increase in merge times (as expected). 2. The 2.1 runs (1 & 2) using addIndexesNoOptimize() show multiple spikes in times for a particular merge count - with the next merge counts running faster. The pattern of spikes was identical in both runs. The most notable spike occurs in the addIndexesNoOptimize() merge of 900 indexes with took 44:26 (mm:ss) in one run and 43:37 in the other. In both runs the merge times for 925, 950, 975, and 1000 indexes took less time than the 900 merge. 3. Overall, using addIndexes() appears to be faster than addIndexesNoOptimize(). 4. V2.0 addIndexes() performs better than V2.1 addIndexes(). Look at the very last row of data below - it is the merge rate (in docs/min) for each test run. Can someone explain what might be happening to cause the spikes in 2.1, not seen in 2.0? Any thoughts on 2.0 merging faster than 2.1? Thanks, david. Run 1: Lucene 2.1 / addIndexesNoOptimize() Run 2: Repeat Run 1 Run 3: Lucene 2.1 / addIndexes() Run 4: Lucene 2.0 / addIndexes() All runs include a final call to optimize() Merge Times (seconds) Numb Run Run Run Run Idxs 1 2 3 4 25 39 59 44 38 50 73 93 83 81 75 113 131 128 120 100 147 169 163 154 125 179 198 193 220 150 222 241 227 231 175 246 261 249 239 200 266 273 269 266 225 297 301 288 283 250 323 325 312 308 275 461 471 343 337 300 393 388 376 364 325 424 423 410 401 350 465 467 445 438 375 498 504 475 466 400 527 528 516 503 425 586 567 608 587 450 677 656 703 675 475 876 880 786 750 500 841 832 872 821 525 920 914 937 924 550 1213 1206 1038 995 575 1094 1065 1137 1069 600 1250 1207 1275 1189 625 1367 1337 1385 1315 650 1473 1433 1454 1396 675 1600 1575 1499 1468 700 1570 1552 1563 1516 725 1605 1587 1602 1581 750 1852 1808 1687 1627 775 1761 1719 1732 1668 800 1829 1821 1876 1753 825 2167 2138 1882 1832 850 2045 2042 2057 1887 875 2169 2207 2101 2025 900 2666 2617 2138 2014 925 2174 2218 2206 2057 950 2390 2391 2193 2121 975 2304 2322 2247 2149 1000 2321 2322 2321 2227 20500 43423 43248 41820 40095 <- Totals 28326 28441 29412 30667 <- Merge rate: Docs per minute