Re: semi-infinite loop during merging

2009-04-24 Thread Christiaan Fluit
Michael McCandless wrote: - even though the commitMerge returns false, it should probably not get into an infinite loop. Is this an internal Lucene problem or is there something I can/should do about it myself? Yes, something is wrong with Lucene's handling of OOME. It certainly should not lea

Re: semi-infinite loop during merging

2009-04-21 Thread Christiaan Fluit
Christiaan Fluit wrote: It seems that it gets up to the point to commit, but the "IW: commitMerge done" message is never reached. Furthermore, no exceptions are printed to the output, so handleMergeException does not seem to have been invoked. Should I add more debug statements

Re: semi-infinite loop during merging

2009-04-21 Thread Christiaan Fluit
Michael McCandless wrote: One question: are you using IndexWriter.close(false)? I wonder if there's some path whereby the merges fail to abort (and simply keep retrying) if you do that... No, I don't. More inlined below... On Thu, Apr 16, 2009 at 5:42 AM, Christiaan Fluit wrote

Re: MergeException

2009-04-21 Thread Christiaan Fluit
Michael McCandless wrote: On Tue, Apr 21, 2009 at 4:26 PM, Christiaan Fluit wrote: I have experienced similar problems (see the "semi-infinite loop during merging" thread - still working out the problem): the merger gets into an infinite loop and causes my drive to be filled with

Re: MergeException

2009-04-21 Thread Christiaan Fluit
I have experienced similar problems (see the "semi-infinite loop during merging" thread - still working out the problem): the merger gets into an infinite loop and causes my drive to be filled with temporary files that are not deleted, until it runs out of space. Sometimes it exits with a Merge

Re: semi-infinite loop during merging

2009-04-16 Thread Christiaan Fluit
threads; do you know how to do this on Windows? If so, can you do that at the end when IW starts doing this infinite merging? That would be very helpful towards understanding why this recursion is happening (though it is spooky that this is all happening under JET...) Mike On Tue, Apr 14, 200

semi-infinite loop during merging

2009-04-14 Thread Christiaan Fluit
Hello all, I have a very peculiar problem that is driving me crazy: on some of our datasets and at some point in time during indexing, the merge operation runs into a (semi-)infinite loop and keeps adding files to the index until it runs out of free disk space. The situation: I have an index

Re: Exchange/PST/Mail parsing

2007-07-02 Thread Christiaan Fluit
Hello Grant (cc-ing aperture-devel), I am one of the Aperture admins, I can tell you a bit more about Aperture's mail facilities. Short intro: Aperture is a framework for crawling and full-text and metadata extraction of a growing number of sources and file formats. We try to select the best

Re: which way to index pdf,word,excel

2006-09-06 Thread Christiaan Fluit
l commands, e-mail: [EMAIL PROTECTED] Met vriendelijke groet, Christiaan Fluit -- Aduna - Guided Exploration www.aduna-software.com Prinses Julianaplein 14-b 3817 CS Amersfoort The Netherlands +31-33-4659987 (office)

Re: Lucene indexing RDF

2006-06-28 Thread Christiaan Fluit
adasal wrote: As far as i have researched this I know that the gnowsis project uses both rdf and lucene, but I have not had time to determine their relationship. www.gnowsis.org/ I can tell you a bit about Gnowsis, as we (Aduna) are cooperating with the Gnowsis people on RDF creation, storage

Aperture 2006.1 alpha 2 released

2006-03-09 Thread Christiaan Fluit
aperture-devel mailing list. Regards, Christiaan Fluit. -- [EMAIL PROTECTED] Aduna Prinses Julianaplein 14-b 3817 CS Amersfoort The Netherlands +31 33 465 9987 phone +31 33 465 9987 fax http://aduna.biz - To unsubscribe,

Re: Word files & Build vs. Buy?

2006-02-10 Thread Christiaan Fluit
Dmitry Goldenberg wrote: Awesome stuff. A few questions: is your Excel extractor somehow better than POI's? and, what do you see as the timeframe for adding WordPerfect support? Are you considering supporting any other sources such as MS Project, Framemaker, etc? I just committed a WordPerfectE

Re: Word files & Build vs. Buy?

2006-02-09 Thread Christiaan Fluit
Nick Burch wrote: You could try using org.apache.poi.hwpf.HWPFDocument, and getting the range, then the paragraphs, and grab the text from each paragraph. If there's interest, I could probably commit an extractor that does this to poi. Yes, that's exactly what I'm doing. Having this in POI wo

Re: Word files & Build vs. Buy?

2006-02-09 Thread Christiaan Fluit
Hello all, I'm replying to two threads at once as what I have to say relates to both. My company recently started an open source project called Aperture (http://sourceforge.net/projects/aperture), together with the German DFKI institute. The project is still very much in alpha stage, but I do