Re: read past EOF

2006-08-30 Thread Bhavin Pandya
Mike wrote: Were there any other errors leading up to this? For example, when you move your index after it's built, is this actually a copy (and maybe the disk filled up when copying)? Or a previous [initial] exception when building the index? Are you really sure only one writer is building th

Re: read past EOF

2006-08-30 Thread Michael McCandless
Bhavin Pandya wrote: My guess is ..."One of my index is got corrupted so whenever I am trying to search the index or optimize the index or merging the multiple index ...It will throws same exception but from different class...sometime from IndexReader or sometime from IndexWriter depends on how

Re: SpanRegex speed

2006-08-30 Thread Mark Miller
Ignore that last question. I see that you said prefix wildcard query and not wildcard query. A quick look at the code seems to show it grabbing a prefix as well. Do you think one would be any faster than the other? Should I used Wildcardqueries outside of spanqueries and the regexquery inside

Re: SpanRegex speed

2006-08-30 Thread Mark Miller
Erik Hatcher wrote: On Aug 30, 2006, at 6:13 PM, Mark Miller wrote: * An implementation tying Java's built-in java.util.regex to RegexQuery. * * Note that because this implementation currently only returns null from * [EMAIL PROTECTED] #prefix} that queries using this implementation will enume

Re: Reviving a dead index

2006-08-30 Thread Michael McCandless
Stanislav Jordanov wrote: For a moment I wondered what exactly do you mean by "compound file"? Then I read http://lucene.apache.org/java/docs/fileformats.html and got the idea. I do not have access to that specific machine that all this is happening at. It is a 80x86 machine running Win 2003

Re: SpanRegex speed

2006-08-30 Thread Erik Hatcher
On Aug 30, 2006, at 6:13 PM, Mark Miller wrote: * An implementation tying Java's built-in java.util.regex to RegexQuery. * * Note that because this implementation currently only returns null from * [EMAIL PROTECTED] #prefix} that queries using this implementation will enumerate and * attem

RE: Lucene maxing CPU on Solaris 10

2006-08-30 Thread Salman Shahid
No..index is local. -Original Message- From: karl wettin [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 30, 2006 4:19 PM To: java-user@lucene.apache.org Subject: Re: Lucene maxing CPU on Solaris 10 On Thu, 2006-08-31 at 01:13 +0200, karl wettin wrote: > On Wed, 2006-08-30 at 11:00 -

Re: Lucene maxing CPU on Solaris 10

2006-08-30 Thread karl wettin
On Thu, 2006-08-31 at 01:13 +0200, karl wettin wrote: > On Wed, 2006-08-30 at 11:00 -0700, Salman Shahid wrote: > > > > Oddly this only happens on our production Solaris 10 boxes, with JDK > > 1.5. The load test passes with flying colors on a Windows XP box, even > > for 50+ concurrent access. >

Re: Lucene maxing CPU on Solaris 10

2006-08-30 Thread karl wettin
On Wed, 2006-08-30 at 11:00 -0700, Salman Shahid wrote: > > Oddly this only happens on our production Solaris 10 boxes, with JDK > 1.5. The load test passes with flying colors on a Windows XP box, even > for 50+ concurrent access. This will not help you, but I've got a Solaris 10 box on ADM64 an

SpanRegex speed

2006-08-30 Thread Mark Miller
* An implementation tying Java's built-in java.util.regex to RegexQuery. * * Note that because this implementation currently only returns null from * [EMAIL PROTECTED] #prefix} that queries using this implementation will enumerate and * attempt to [EMAIL PROTECTED] #match} each term for the speci

Re: Sorting based on a selling rate

2006-08-30 Thread Chris Hostetter
: I don't know what is the best way: that depends on your needs ... if Selling rate changes very infrequently, or if you are dealing with teh sell rate for lots of books per query then i'd put it in your index ... if it's constantly in flux and you only care about the sell rate of one or two book

word frequency list?

2006-08-30 Thread Jason Pump
Is there a large list of words and their frequency in the english language? Obviously it would differ by corpus but I would like to see what's already available. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comman

Lucene maxing CPU on Solaris 10

2006-08-30 Thread Salman Shahid
Folks, Recently we are observing an issue with one of our web applications (using Lucene 1.4) maxing out on CPU under concurrent access of ~20 users doing a lucene query. At this point web application fails to serve any more pages and thread dump shows all lucene threads "waiting for monitor

lucene1.4 queries locking up in Solaris

2006-08-30 Thread Salman Shahid
Folks, We had an issue recently in which one of our web application using Lucene 1.4-final started to max out CPU under heavy load with ~20 concurrent users during testing. Web application was not able to serve any more pages after this and we could see all lucene threads "wating for monitor" i

Re: Creating a new index from an existing index

2006-08-30 Thread Xiaocheng Luan
Thanks Erick, it looks like we'll have to recreate from the sources ... Erick Erickson <[EMAIL PROTECTED]> wrote: This just in from the thread "*Re-created fields consistently indexed" *Erik Hatcher replied as below, and believe me, Erik knows waay more about this than I do . On Aug 30

Re: Creating a new index from an existing index

2006-08-30 Thread Erick Erickson
This just in from the thread "*Re-created fields consistently indexed" *Erik Hatcher replied as below, and believe me, Erik knows waay more about this than I do . On Aug 30, 2006, at 11:07 AM, Jason Polites wrote: I understand that it is possible to "re-create" fields which are indexed

Re: Creating a new index from an existing index

2006-08-30 Thread Erick Erickson
Well, assuming you can get all the information you need out of your index, you really only have two choices that I see. 1> iterate through your documents and delete and re-add each document to that same index. 2> iterate through your documents and add the doc to a *new* index, then replace your ol

Re: Re-created fields consistently indexed?

2006-08-30 Thread Erik Hatcher
On Aug 30, 2006, at 11:07 AM, Jason Polites wrote: I understand that it is possible to "re-create" fields which are indexed but not stored (as is done by Luke), and that this is a lossy process, however I am wondering whether the indexed version of this remains consistent. That is, if I re-

First BETA release of MUTIS

2006-08-30 Thread Mario Alejandro M.
I'm happy to announce the first BETA of MUTIS (mutis.sourceforge.net). I include a small demo on searching txt files (this kind of demo is like a must ;) ). I'm looking for more developers to archieve: 1- Plataform independence. Mutis run only under .NET, and I want it to run under native Win32/

Re: Reviving a dead index

2006-08-30 Thread Michael McCandless
Stanislav Jordanov wrote: After all, the Lucene's CFS format is abstraction over the OS's native FS and the App should not be trying to open a native FS file named *.fnm when it is supposed to open the corresponding *.cfs file and "manually" extract the *.fnm file from it. Right? Yes, good c

Re-created fields consistently indexed?

2006-08-30 Thread Jason Polites
Hi all, I understand that it is possible to "re-create" fields which are indexed but not stored (as is done by Luke), and that this is a lossy process, however I am wondering whether the indexed version of this remains consistent. That is, if I re-create a non-stored field, then re-index this fi

Re: Reviving a dead index

2006-08-30 Thread Stanislav Jordanov
I missed something that may be very important: I find it really strange, that the exception log reads: java.io.FileNotFoundException: F:\Indexes\index1\_16f6.fnm (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method)

Re: Reviving a dead index

2006-08-30 Thread Stanislav Jordanov
Michael McCandless wrote: /This means the segments files is referencing a segment named _1j8s and in trying to load that segment, the first thing Lucene does is load the "field infos" (_1j8s.fnm). It tries to do so from a compound file (if you have it turned on & it exists), else from the fi

Re: read past EOF

2006-08-30 Thread Bhavin Pandya
Hi Mike, Here is the full stack trace of error which I got at search time java.io.IOException: read past EOF at org.apache.lucene.store.FSIndexInput.readInternal FSDirectory.java:451) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:45) at org.apache.lucene.

segments' size and getMaxMergeDocs()

2006-08-30 Thread Stanislav Jordanov
If IndexWriter.getMaxMergeDocs() always returns M then which one is true: 1) No segment file will ever contain > M documents; 2) Any segment that participates in a merge contains <= M documents (but the resulting segment of the merge may contain > M documents) Obviously (1) implies (2) but my g

Re: Sorting based on a selling rate

2006-08-30 Thread John Pailet
Yes, that is exactly what I want to do ! My external system gives me sell rate/Top N Selling books matching the user terms (query) I don't know what is the best way: Storing sell rate into lucene Fields of the documents... (multiple combination) and sort by this field, or doing something like

Re: read past EOF

2006-08-30 Thread Michael McCandless
Bhavin Pandya wrote: I am running lucene 1.9 on unix machine...updating my index very frequentlyafter few updation it says "read past eof" I know this exception generally comes when one of the index got corrupted...but i dont know why it got corrupted ? may be mine code problem but i am

read past EOF

2006-08-30 Thread Bhavin Pandya
Hi, Can anyone have any idea that what are the reason to lucene index got corrupted ? I am running lucene 1.9 on unix machine...updating my index very frequentlyafter few updation it says "read past eof" I know this exception generally comes when one of the index got corrupted...but i don