Hello all, I am experiencing some performance problems indexing large(ish) amounts of text using the IndexField.Store.COMPRESS option when creating a Field in Lucene.
I have a sample document which has about 4.5MB of text to be stored as compressed data within the field, and the indexing of this document seems to take an inordinate amount of time (over 10 minutes!). When debugging I can see that it's stuck on the deflate() calls of the Deflater used by Lucene. I noted that Lucene by default uses the Deflater.BEST_COMPRESSIONcompression level when encountering a compressed field. I'm not sure if it would help my particular situation, but is there any way to provide the option of specifying the compression level? The level used by Lucene (level 9) is the maximum possible compression level. Ideally I would like to be able to alter the compression level on the basis of the field size. This way I can smooth out the compression times across the various document sizes. I am more interested in consistent time than I am consistent compression. Or... could there some other reason my document takes this long to index? (and hold up all other threads). Thanks.