I half remember that this has come up before, but I couldn't find the thread. I was running some tests over the weekend that involved indexing 1.9M documents from the English Wiki dump.
I'm consistently seeing that trunk takes about twice as long to index the docs as 1.4, 3.2 and 3x. Optimize is also taking quite a bit longer I admit that these aren't very sophisticated tests, and I only ran the trunk process twice (although both those were consistent). I'm pretty sure my rambuffersize and autocommit settings are identical. I remove the data/index directory before each run. These results are running the indexing program in IntelliJ, on my Mac, both the server and the indexing programs were running locally. No, trunk isn't compiling before running <G>. Here's the server definition: new StreamingUpdateSolrServer(url, 10, 4); and I'm batching up the documents and sending them to Solr in batches of 1,000. So, my question is whether this should be pursued. Note that I'm still getting around 3K docs/second, which I can't complain about. Not that that stops me, you understand. And in return for a memory footprint reduction from 389M to 90M after some off-the-wall sorting and faceting I'll take it! Hmmmm, speaking of which, the memory usage changes seem like a good candidate for a page on the Wiki, anyone want to suggest a home? Solr 1.4.1 Total Time Taken-> 257 seconds Total documents added-> 1917728 Docs/sec-> 7461 starting optimize optimizing took 26 seconds Solr 3.2 Total Time Taken-> 243 seconds Total documents added-> 1917728 Docs/sec-> 7891 starting optimize optimizing took 21 seconds Solr 3x Total Time Taken-> 269 seconds Total documents added-> 1917728 Docs/sec-> 7129 starting optimize optimizing took 21 seconds Solr trunk. 2011-6-11: 17:24 EST Total Time Taken-> 592 seconds Total documents added-> 1917728 Docs/sec-> 3239 starting optimize optimizing took 159 seconds What do folks think? Is there anything I can/should do to narrow this down? Erick --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
