RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
That presumably isn't healthy. -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 May 2006 21:27 To: java-user@lucene.apache.org Subject: Re: Seeing what's occupying all the space in the index It kind of sounds like those files are corrupted, but I can&

Re: Seeing what's occupying all the space in the index

2006-05-26 Thread Grant Ingersoll
eckon I can merge the .fdt, .prx and .frq into a compound index? -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 May 2006 18:38 To: java-user@lucene.apache.org Subject: Re: Seeing what's occupying all the space in the index Can you try a smaller s

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
Lucene documents indexed in it now, do you reckon I can merge the .fdt, .prx and .frq into a compound index? -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 May 2006 18:38 To: java-user@lucene.apache.org Subject: Re: Seeing what's occupying all the space i

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
> Note that IndexReader has a main() that will list the contents of compound index files. It looks like some of my index is compound and some isn't. My not very well informed guess is that an optimize() got interrupted somewhere along the line. If I try to optimize the index now, it throws except

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
I just tried to optimise my index, using the lucli command line client, and got: 8< lucli> optimize Starting to optimize index. java.io.IOException: Cannot overwrite: /mnt/sdb1/lucene-index/index-1/_2lhqi.fnm at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.j

Re: Seeing what's occupying all the space in the index

2006-05-26 Thread Doug Cutting
Rob Staveley (Tom) wrote: Is there a tool I can use to see how much of the index is occupied by the different fields I am indexing? Note that IndexReader has a main() that will list the contents of compound index files. Doug --

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
TED] Sent: 26 May 2006 18:38 To: java-user@lucene.apache.org Subject: Re: Seeing what's occupying all the space in the index It seems odd to me that if you are using the CFS format, why you would have the .fdt, .frq and .prx files in addition to the .cfs files. My understanding is all files (e

Re: Seeing what's occupying all the space in the index

2006-05-26 Thread Grant Ingersoll
It seems odd to me that if you are using the CFS format, why you would have the .fdt, .frq and .prx files in addition to the .cfs files. My understanding is all files (except deletable and segment) get put inside of the CFS file. Looking at my indices, I only have the CFS file. Are you optim

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
ect: RE: Seeing what's occupying all the space in the index are you by any chance using different field names for each document -- or do you have a wide range of field names that aren't the same for each document? ... you mentioned indexing emails, email has a very loose header structur

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
> Is there anything I can learn from the index directory's file listing? Running this nasty little BASH one-liner... $ for i in `ls * | perl -nle 'if (/^.+(\..+)/) {print $1;}' | sort | uniq`;do ls -l *$i | awk '{SUM = SUM + $5} END {if (SUM > 1e10) {print "'$i': ", SUM}}'; done ... I see

Re: Seeing what's occupying all the space in the index

2006-05-26 Thread Chris Hostetter
: PS: I am a newbie to the mailing list - I hope I've got the etiquette right you may have figured this out already, but please CC email to multiple lucene mailing lists -- in this particular case, [EMAIL PROTECTED] is just a legacy alias that points at [EMAIL PROTECTED] -- so there's *really* no

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Chris Hostetter
ot;Rob Staveley (Tom)" <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: RE: Seeing what's occupying all the space in the index : : > I can't see how Luke is going to show me what's occupying most of my : index. : : I

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
> I can't see how Luke is going to show me what's occupying most of my index. I do however notice that none of my stored fields are stored compressed. Presumably Field.Store COMPRESS is something that is new in Lucene 1.9 and wasn't available in 1.4.3?? However, it is still hard to see what's c

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
riginal Message- From: Karel Tejnora [mailto:[EMAIL PROTECTED] Sent: 26 May 2006 14:42 To: java-user@lucene.apache.org Subject: Re: Seeing what's occupying all the space in the index Or you can use ssh -X for X11 forwarding. I don't know how it's working in windows (some x client app) bu

Re: Seeing what's occupying all the space in the index

2006-05-26 Thread Karel Tejnora
Or you can use ssh -X for X11 forwarding. I don't know how it's working in windows (some x client app) but great on linux(es) with huge bandwidth. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EM

Re: Seeing what's occupying all the space in the index

2006-05-26 Thread Grant Ingersoll
have Luke's whistles and bells. Does Luke have a non-GUI equivalent, Grant? -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 May 2006 12:41 To: java-user@lucene.apache.org Subject: Re: Seeing what's occupying all the space in the index Give Luke a try.

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
che.org Subject: Re: Seeing what's occupying all the space in the index Give Luke a try. Google for "Luke Lucene" and you should find it. Otherwise check the Lucene website for a reference. smime.p7s Description: S/MIME cryptographic signature

Re: Seeing what's occupying all the space in the index

2006-05-26 Thread Grant Ingersoll
Give Luke a try. Google for "Luke Lucene" and you should find it. Otherwise check the Lucene website for a reference. Rob Staveley (Tom) wrote: In my index of e-mail message parts, it looks like 23K is being used up for each indexed message part, which is way more than I'd expect. I have a

RE: Seeing what's occupying all the space in the index

2006-05-26 Thread Rob Staveley (Tom)
In my index of e-mail message parts, it looks like 23K is being used up for each indexed message part, which is way more than I'd expect. I have a total of 37 fields per message part. I tokenize, index and do not store message part bodies. I store a <= 300 character synopsis of each message part.