: I need to store all the attributes of the document i index as part of the : index. And I need to get the size of the files as close to 20% of the : original size as possible. If anyone can help with this I can pay a nominal : fee. Please contact me if anyone can help.
Let's be clear about something here: Lucene is an "indexing" technology, not a compression technology. Lucene indexes *can* be smaller then the source corpus they index for a variety of reasons (most notably because the inverted index structure allows a more susinect represetation of the same words appearing in multiple docuemnts, but things like stop word removal play a big part as well) but this reduced size only applies to the *indexed* fields. If you "store" data in Lucene there is nothing intrinisic in the Lucene that makes it smaller. If you have a chunk of text, and you tell Lucene to store that text, and index that text, it doesn't matter how space efficient the "index" is, Lucene still still needs to store all of that text. Here's a few simple steps to achieve what you want, and i won't charge you anything for it... 1) take your existing code for building an index, and change it so that all of the options to "store" field values are commented out, and only the "indexed" fields exist. 2) index your corpus of size X, note the size of the index Y. compute the value Z such that: Z = (0.20 * X) - Y 3) go find a compression library A that can compress your corpus of size X down to size Z 4) go edit your orriginal code to use algorithm A to compress all of your stored fields before adding them to Lucene. Viola! -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]