Awesome, thanks for bringing closure Vitaly. Mike McCandless
http://blog.mikemccandless.com On Mon, Jun 4, 2012 at 3:10 PM, Vitaly Funstein <vfunst...@gmail.com> wrote: > Thanks for the tip, Mike. After changing the three calls > > IndexWriter.commit(); > > <revert merge policy to allow merging to happen> > > IndexWriter.maybeMerge(); > IndexWriter.waitForMerges(); > > to simply calling IndexWriter.close(true) the disk size and run time are > now very close to the case of parallel segment merges. > > On Sat, Jun 2, 2012 at 6:43 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> On Fri, Jun 1, 2012 at 8:09 PM, Vitaly Funstein <vfunst...@gmail.com> >> wrote: >> > Yes, I am only calling IndexWriter.addDocument() >> >> OK. >> >> > Interestingly, relative performance of either approach seems to greatly >> > depend on the number of documents per index. In both types of runs, I >> used >> > 10 writer threads, each writing documents with the same set of fields >> (but >> > random values), into its own index as fast as possible, on a 16 core box, >> > using a rotational disk for index storage (results from my original post >> > were obtained from a Fusion IO drive, and an even higher # of cores per >> > machine). >> >> Mmmmm Fusion IO drive :) >> >> > For smaller index sizes, the choice of whether to merge segments >> > in parallel makes much less of a difference, if at all. >> > >> > So the matrix looks like this: >> > >> > # docs/index concurrent merges? total time, sec total disk >> size >> > >> =========================================================================== >> > 200K Y 56.8 1.5 G >> > 200K N 59.6 2.6 G >> > 1M Y 304 7.4 G >> > 1M N 493 14 G >> > >> > As you can see, the total size on disk is always much larger when merging >> > at the end; here are directory listings, for each case: >> >> OK so for a biggish index merging concurrently is faster; this is what >> I'd expect. >> >> > Concurrent merging: >> > >> > total 150M >> > -rw-r--r-- 1 bench perf 0 2012-06-01 16:33 write.lock >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _a.fnm >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _a.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _a.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _a.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _a.frq >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _l.fnm >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _l.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _l.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _l.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _l.frq >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _w.fnm >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _w.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _w.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _w.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _w.frq >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _17.fnm >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _17.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _17.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _17.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _17.frq >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1j.cfs >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _1i.fnm >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1k.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1m.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1l.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1n.cfs >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _1i.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _1i.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _1i.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _1i.frq >> > -rw-r--r-- 1 bench perf 148K 2012-06-01 16:33 _1p.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1o.cfs >> > -rw-r--r-- 1 bench perf 28M 2012-06-01 16:33 _0.cfx >> > -rw-r--r-- 1 bench perf 2.8K 2012-06-01 16:33 segments_2 >> > -rw-r--r-- 1 bench perf 20 2012-06-01 16:33 segments.gen >> > >> > Deferred merging: >> > >> > total 261M >> > -rw-r--r-- 1 bench perf 0 2012-06-01 16:41 write.lock >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _0.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _3.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _2.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _4.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _6.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _5.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _7.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _9.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _8.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _a.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _c.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _b.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _d.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _f.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _e.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _g.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _i.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _h.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _j.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _l.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _k.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _m.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _n.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _p.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _o.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _q.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _s.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _r.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _t.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _v.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _u.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _w.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _x.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _z.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _y.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _11.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _10.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _13.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _12.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _16.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _15.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _14.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _18.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _17.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1b.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1a.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _19.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1d.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1c.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1g.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1f.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1e.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1j.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1i.cfs >> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1h.cfs >> > -rw-r--r-- 1 bench perf 28M 2012-06-01 16:41 _0.cfx >> > -rw-r--r-- 1 bench perf 137K 2012-06-01 16:42 _1k.cfs >> > -rw-r--r-- 1 bench perf 12K 2012-06-01 16:42 segments_2 >> > -rw-r--r-- 1 bench perf 20 2012-06-01 16:42 segments.gen >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1l.fnm >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1n.fnm >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1l.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1l.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1l.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1l.frq >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1o.fnm >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1n.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1n.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1n.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1n.frq >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1p.fnm >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1o.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1o.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1o.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1o.frq >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1p.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1p.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1p.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1p.frq >> > -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1m.fnm >> > -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1m.tis >> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1m.tii >> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1m.prx >> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1m.frq >> >> Hmm: you should close the writer (or do a final commit) before testing >> the size of the index. I suspect in the 2nd case because no final >> commit happened, the original segments are still around. >> >> Mike >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org