Hi list! I'm forwarding as somehow I did not put the list in the CC but the answer I think is noteworthy, so here it is. Please remember to use StringBuffer before blaming lucene ;-)
Actual time consumed by lucene is now ~130 minutes as opposed to 20 hours which is neat. I can do much more passes during the day. Mateusz On Wed, Jun 10, 2009 at 2:31 AM, Michael McCandless<luc...@mikemccandless.com> wrote: > Oh actually could you send this response to the full list (so they see > closure too)? Thanks. > > Mike > > On Tue, Jun 9, 2009 at 6:32 PM, Mateusz Berezecki<mateu...@gmail.com> wrote: >> Hi Michael, >> >> It took a while but thanks to your suggestions I've started poking >> around and it turned out that the bottleneck was with GC for JVM and >> in my use of String instead of StringBuffer. The thing is now over 20 >> times faster ! >> >> Thanks a lot ! >> >> Mateusz >> >> On Mon, Jun 8, 2009 at 2:13 PM, Michael >> McCandless<luc...@mikemccandless.com> wrote: >>> On Mon, Jun 8, 2009 at 7:54 AM, Mateusz Berezecki<mateu...@gmail.com> wrote: >>> >>>> Thanks for a prompt response. >>> >>> You're welcome! >>> >>>>> A mergeFactor of 150 is way too high; I'd put that back to 10 and see >>>>> if the problem persists. Also make sure you're using >>>>> autoCommit=false, and try the suggestions here: >>>>> >>>>> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed >>>> >>>> I've set mergeFactor to 10, 15 and 20 before trying out 150 and the >>>> problem persisted, although I have to admit that 2.9 gives some >>>> serious speed improvements as compared to 2.4.1 which I believe is a >>>> good sign, i.e. it reaches the same document that causes deadlock much >>>> faster than 2.4.1 does >>> >>> Hmm: do you know for certain that a particular document causes this? >>> If you make a standalone test indexing only that document, does the >>> problem happen? >>> >>>>> You're sure the JRE's heap size is big enough? >>>> >>>> I've set it to 3.8 GB and I'm running it on a desktop with 4 GB of RAM. >>> >>> OK sounds like plenty, though likely the OS won't give you 3.8 GB (if >>> the JRE is 32-bit). >>> >>>>> If the problem persists... can you turn on IndexWriter's infoStream >>>>> and post the resulting output leading up to the 100% CPU? You might >>>>> also try "kill -QUIT" when the 100% CPU problem is happening, to catch >>>>> the stack trace of all threads, and post that too... >>>> >>>> Not sure how do I turn on the infoStream and autoCommit? WRT to >>>> autoCommit I did not use the deprecated API with autoCommit flags in >>>> constructors, so assuming I used the recommended API is the autoCommit >>>> on/off by default? >>> >>> For infoStream, eg: IndexWriter.setInfoStream(System.out); >>> >>> And, yes, since you're using a non-deprecated ctor of IndexWriter, you >>> are getting autoCommit=false, so that's good. >>> >>> Mike >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org