Sorting with ParallelReader
Hi Guys, Does anybody know if it is possible results to be sorted using the ParallelReader? Best Regards, Ivan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sharing SearchIndexer
Mark Miller schrieb: simon litwan wrote: hi all i tried to reuse the IndexSearcher among all of the threads that are doing searches as described in (http://wiki.apache.org/lucene-java/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82) this works fine. but our application does continuous indexing. so the index is changing and the at startup initialized IndexSearcher seems not to be notified to reload the index. is there a way to force the IndexSearcher to reload the index if the index has changed? thanks in advance simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] You want to reopen the Reader under the IndexSearcher, or open a new IndexSearcher. I want to reopen the Reader under the IndexSearcher when the index has changed. is there a way to do so? simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sharing SearchIndexer
Simon There is nothing in lucene to detect that an index has changed and automagically reopen an IndexReader. You can do the notification from your indexing thread, or every nnn mins, or whatever makes sense for your application. Note that IndexReader.reopen() does nothing if the index has not changed - see the javadocs. -- Ian. On Fri, Sep 26, 2008 at 8:41 AM, simon litwan <[EMAIL PROTECTED]> wrote: > Mark Miller schrieb: >> >> simon litwan wrote: >>> >>> hi all >>> >>> i tried to reuse the IndexSearcher among all of the threads that are >>> doing searches as described in >>> (http://wiki.apache.org/lucene-java/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82) >>> >>> this works fine. but our application does continuous indexing. so the >>> index is changing and the at startup initialized IndexSearcher seems not to >>> be notified to reload the index. >>> >>> is there a way to force the IndexSearcher to reload the index if the >>> index has changed? >>> >>> thanks in advance >>> >>> simon >>> >>> - >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >> You want to reopen the Reader under the IndexSearcher, or open a new >> IndexSearcher. > > I want to reopen the Reader under the IndexSearcher when the index has > changed. is there a way to do so? > > simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sharing SearchIndexer
Ian Lea schrieb: Simon There is nothing in lucene to detect that an index has changed and automagically reopen an IndexReader. You can do the notification from your indexing thread, or every nnn mins, or whatever makes sense for your application. Note that IndexReader.reopen() does nothing if the index has not changed - see the javadocs. thanks very much. i will try it this way. cheers simon -- Ian. On Fri, Sep 26, 2008 at 8:41 AM, simon litwan <[EMAIL PROTECTED]> wrote: Mark Miller schrieb: simon litwan wrote: hi all i tried to reuse the IndexSearcher among all of the threads that are doing searches as described in (http://wiki.apache.org/lucene-java/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82) this works fine. but our application does continuous indexing. so the index is changing and the at startup initialized IndexSearcher seems not to be notified to reload the index. is there a way to force the IndexSearcher to reload the index if the index has changed? thanks in advance simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] You want to reopen the Reader under the IndexSearcher, or open a new IndexSearcher. I want to reopen the Reader under the IndexSearcher when the index has changed. is there a way to do so? simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: CorruptIndexException workaround in 2.3-SNAPSHOT? (Attn: Michael McCandless)
Ari Miller wrote: According to https://issues.apache.org/jira/browse/LUCENE-1282?focusedCommentId=12596949 #action_12596949 (Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene), a workaround for the bug which causes the CorruptIndexException was put in to the 2.3 branch and 2.4. However, we are still experiencing this issue (intermittent creation of a corrupt index) with a 2.3-SNAPSHOT from maven. Was the workaround put into 2.3-SNAPSHOT? Are there other issues which would cause the same error (detailed below)? We would prefer to avoid upgrading to JDK 6u10 (http://java.sun.com/javase/downloads/ea/6u10/6u10RC.jsp) until it is a final release, thus the use of the 2.3-SNAPSHOT dated July 22. Can you look in the Lucene core JAR's manifest and report back what version information you see? (It should contain the timestamp that the JAR was built). I committed this workaround to the 2.3 branch on May 22. We don't currently have any automated build that pushes snapshot builds into maven on the 2.3 branch; my guess is that 2.3-SNAPSHOT in maven was from the last trunk build before 2.3 was released (which would not contain this fix), though I'm not sure why it has timestamp July 22. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Caused by: java.io.IOException: read past EOF on Slave
Can you describe the sequence of steps that your replication process goes through? Also, which filesystem is the index being accessed through? Mike rahul_k123 wrote: First of all, thanks to all the people who helped me in getting the lucene replication setup working and right now its live in our production :-) Everything working fine, except that i am seeing some exceptions on slaves. The following is the one which is occuring more often on slaves at java.util.concurrent.Executors $RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) Caused by: com.IndexingException: [SYSTEM_ERROR] Cannot access index [data_dir/index]: [read past EOF] at com .lucene .LuceneSearchService.getSearchResults(LuceneSearchService.java:964) ... 12 more Caused by: java.io.IOException: read past EOF at org .apache .lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146) at org .apache .lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java: 66) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:89) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java: 147) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659) at org .apache .lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:257) at org.apache.lucene.index.IndexReader.document(IndexReader.java:525) and the second one is at java.util.concurrent.Executors $RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.IllegalArgumentException: attempt to access a deleted document at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:657) at org .apache .lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:257) at org.apache.lucene.index.IndexReader.document(IndexReader.java:525) This is on master index . Any help is appreciated Thanks. -- View this message in context: http://www.nabble.com/Caused-by%3A-java.io.IOException%3A-read-past-EOF-on-Slave-tp19682684p19682684.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Sorting with ParallelReader
Sorry about the spam with this thread. We started using ParallelReader in our app and we have some bug in the app with the sorts. I tested with simple standalone app ParallelReader and discovered that sort works in the same way perfectly as with the other Readers. Sorry once again. Best Regards, Ivan Ivan Vasilev wrote: Hi Guys, Does anybody know if it is possible results to be sorted using the ParallelReader? Best Regards, Ivan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ NOD32 3472 (20080925) Information __ This message was checked by NOD32 antivirus system. http://www.eset.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sharing SearchIndexer
Ian Lea schrieb: Simon There is nothing in lucene to detect that an index has changed and automagically reopen an IndexReader. You can do the notification from your indexing thread, or every nnn mins, or whatever makes sense for your application. Note that IndexReader.reopen() does nothing if the index has not changed - see the javadocs. would it make sense for Lucene to introduce this as a feature? Cheers Michael -- Ian. On Fri, Sep 26, 2008 at 8:41 AM, simon litwan <[EMAIL PROTECTED]> wrote: Mark Miller schrieb: simon litwan wrote: hi all i tried to reuse the IndexSearcher among all of the threads that are doing searches as described in (http://wiki.apache.org/lucene-java/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82) this works fine. but our application does continuous indexing. so the index is changing and the at startup initialized IndexSearcher seems not to be notified to reload the index. is there a way to force the IndexSearcher to reload the index if the index has changed? thanks in advance simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] You want to reopen the Reader under the IndexSearcher, or open a new IndexSearcher. I want to reopen the Reader under the IndexSearcher when the index has changed. is there a way to do so? simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
How to restore corrupted index
We have an application in which index will be updated frequently. During development time, found that index files gets corrupted, i.e. more than one cfs files,some other extension files e.g. frq, fnm, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula
Re: CorruptIndexException workaround in 2.3-SNAPSHOT? (Attn: Michael McCandless)
On Sep 26, 2008, at 6:30 AM, Michael McCandless wrote: Ari Miller wrote: According to https://issues.apache.org/jira/browse/LUCENE-1282?focusedCommentId=12596949 #action_12596949 (Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene), a workaround for the bug which causes the CorruptIndexException was put in to the 2.3 branch and 2.4. However, we are still experiencing this issue (intermittent creation of a corrupt index) with a 2.3-SNAPSHOT from maven. Was the workaround put into 2.3-SNAPSHOT? Are there other issues which would cause the same error (detailed below)? We would prefer to avoid upgrading to JDK 6u10 (http://java.sun.com/javase/downloads/ea/6u10/6u10RC.jsp) until it is a final release, thus the use of the 2.3-SNAPSHOT dated July 22. Can you look in the Lucene core JAR's manifest and report back what version information you see? (It should contain the timestamp that the JAR was built). I committed this workaround to the 2.3 branch on May 22. We don't currently have any automated build that pushes snapshot builds into maven on the 2.3 branch; my guess is that 2.3-SNAPSHOT in maven was from the last trunk build before 2.3 was released (which would not contain this fix), though I'm not sure why it has timestamp July 22. Yes, a 2.3-SNAPSHOT would definitely be before 2.3.0. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Restore index
We have an application in which index will be updated frequently. During development time, found that index files gets corrupted, i.e. more than one cfs files,some other extension files e.g. frq, fnm, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula
Re: 2.4 release candidate 2
Looks good. On Sep 25, 2008, at 11:11 AM, Michael McCandless wrote: Hi, I just created the second release candidate for Lucene 2.4, here: http://people.apache.org/~mikemccand/staging-area/lucene2.4rc2 These are the fixes since RC1: * Issues with CheckIndex (LUCENE-1402) * Removed new yet deprecated ctors for IndexWriter, and set autoCommit=false default for new the ctors (LUCENE-1401) * Cases where optimize throws an IOException because a BG merge had problems, yet fails to include the root casue exception (LUCENE-1397) * Improved PhraseQuery.toString (LUCENE-1396) * NullPointerException in NearSpansUnordered.isPayloadAvailable (LUCENE-1404) * A bunch of small javadoc issues, unecessary import lines, missing copyright headers Please continue testing and reporting any issues you find! Thanks. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to restore corrupted index
There is the CheckIndex tool included in the distribution for checking/ fixing bad indexes, but it can't solve everything. The bigger question is why it is happening to begin with. Can you describe your indexing process? How do you know the index is actually corrupted? Are you seeing exceptions when opening it? -Grant On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: We have an application in which index will be updated frequently. During development time, found that index files gets corrupted, i.e. more than one cfs files,some other extension files e.g. frq, fnm, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to restore corrupted index
You say that there are multiple files, but you don't say if the index still works. Does it? If using the index gives you unexpected results, can you tell us about what the failure modes are? Best Erick On Fri, Sep 26, 2008 at 6:49 AM, Chaula Ganatra <[EMAIL PROTECTED]> wrote: > We have an application in which index will be updated frequently. > > During development time, found that index files gets corrupted, i.e. > more than one cfs files,some other extension files e.g. frq, fnm, nrm > > Remains there in index directory. > > Is there any way that such issue does not occur at all or if it happens > we can recover the index data again? > > It would be a great help, if some one can. > > > > > > Regards, > > Chaula > > > > > >
RE: How to restore corrupted index
I found one case when such multiple files are remained, when we call writer.optimise() it throws exception and multiple files remained in index dir. After such multiple files, when we add document in index by calling writer.addDocument it throws java.lang.NegativeArraySizeException Regards, Chaula -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 6:02 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index There is the CheckIndex tool included in the distribution for checking/ fixing bad indexes, but it can't solve everything. The bigger question is why it is happening to begin with. Can you describe your indexing process? How do you know the index is actually corrupted? Are you seeing exceptions when opening it? -Grant On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: > We have an application in which index will be updated frequently. > > During development time, found that index files gets corrupted, i.e. > more than one cfs files,some other extension files e.g. frq, fnm, nrm > > Remains there in index directory. > > Is there any way that such issue does not occur at all or if it > happens > we can recover the index data again? > > It would be a great help, if some one can. > > > > > > Regards, > > Chaula > > > > > -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to restore corrupted index
Can you post the full stack trace in both cases? Mike Chaula Ganatra wrote: I found one case when such multiple files are remained, when we call writer.optimise() it throws exception and multiple files remained in index dir. After such multiple files, when we add document in index by calling writer.addDocument it throws java.lang.NegativeArraySizeException Regards, Chaula -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 6:02 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index There is the CheckIndex tool included in the distribution for checking/ fixing bad indexes, but it can't solve everything. The bigger question is why it is happening to begin with. Can you describe your indexing process? How do you know the index is actually corrupted? Are you seeing exceptions when opening it? -Grant On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: We have an application in which index will be updated frequently. During development time, found that index files gets corrupted, i.e. more than one cfs files,some other extension files e.g. frq, fnm, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search warmup from Tomcat
Hello all, I need to do warmup the searcher object from my JSP pages. Currently i am having a static object, which i am frequently checking whether index got updated, if so i am closing the indexer and re-opening it. These JSP pages are invoked by the User. When User performs any search operation, few are faster and few are slower. This is because the searcher object is getting updated. How do i warm up my searcher object, without the intervention of User request. Regards Ganesh Send instant messages to your online friends http://in.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: How to restore corrupted index
It was the Reader on same index, which I did not close so gave exception in writer.optimise() Chaula -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 7:17 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index Can you post the full stack trace in both cases? Mike Chaula Ganatra wrote: > I found one case when such multiple files are remained, when we call > writer.optimise() it throws exception and multiple files remained in > index dir. > > After such multiple files, when we add document in index by calling > writer.addDocument it throws java.lang.NegativeArraySizeException > > Regards, > Chaula > > -Original Message- > From: Grant Ingersoll [mailto:[EMAIL PROTECTED] > Sent: 26 September, 2008 6:02 PM > To: java-user@lucene.apache.org > Subject: Re: How to restore corrupted index > > There is the CheckIndex tool included in the distribution for > checking/ > fixing bad indexes, but it can't solve everything. > > The bigger question is why it is happening to begin with. Can you > describe your indexing process? How do you know the index is actually > corrupted? Are you seeing exceptions when opening it? > > -Grant > On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: > >> We have an application in which index will be updated frequently. >> >> During development time, found that index files gets corrupted, i.e. >> more than one cfs files,some other extension files e.g. frq, fnm, >> nrm >> >> Remains there in index directory. >> >> Is there any way that such issue does not occur at all or if it >> happens >> we can recover the index data again? >> >> It would be a great help, if some one can. >> >> >> >> >> >> Regards, >> >> Chaula >> >> >> >> >> > > -- > Grant Ingersoll > http://www.lucidimagination.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to restore corrupted index
It's perfectly fine to have a reader open on an index, while an IndexWriter runs optimize. Which version of Lucene are you using? And which OS & filesystem? Mike Chaula Ganatra wrote: It was the Reader on same index, which I did not close so gave exception in writer.optimise() Chaula -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 7:17 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index Can you post the full stack trace in both cases? Mike Chaula Ganatra wrote: I found one case when such multiple files are remained, when we call writer.optimise() it throws exception and multiple files remained in index dir. After such multiple files, when we add document in index by calling writer.addDocument it throws java.lang.NegativeArraySizeException Regards, Chaula -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 6:02 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index There is the CheckIndex tool included in the distribution for checking/ fixing bad indexes, but it can't solve everything. The bigger question is why it is happening to begin with. Can you describe your indexing process? How do you know the index is actually corrupted? Are you seeing exceptions when opening it? -Grant On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: We have an application in which index will be updated frequently. During development time, found that index files gets corrupted, i.e. more than one cfs files,some other extension files e.g. frq, fnm, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Getting all found document ids from a search result
Hello you all, is it somehow possible to get all document ids found by a search. Not only 50 or 100... If it is possible and someone knows it, please help me :-) Thanks and beste regards, Gregor TREND MICRO Deutschland GmbH, Lise-Meitner-Str. 4, D-85716 Unterschleissheim, Germany Geschaeftsfuehrer: Raimund Genes, Amtsgericht Muenchen - HRB 114739 Die in dieser E-Mail und ihren etwaigen Anhaengen enthaltenen Informationen sind vertraulich und koennen gewerblichen Schutzrechten unterliegen. Sollten Sie keiner der vorgesehenen Empfaenger sein, sind Sie nicht berechtigt, diese Nachricht in irgendeiner Weise zu benutzen oder weiterzugeben. Bitte benachrichtigen Sie uns ggf. per Rueckantwort oder telefonisch (089-37479700) und loeschen Sie diese E-Mail aus Ihrem E-Mail-System. The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.
Re: Search warmup from Tomcat
Ganesh: > Hello all, > > I need to do warmup the searcher object from my JSP pages. Currently i am > having a static object, which i am frequently checking whether index got > updated, if so i am closing the indexer and re-opening it. These JSP pages > are invoked by the User. When User performs any search operation, few are > faster and few are slower. This is because the searcher object is getting > updated. > > How do i warm up my searcher object, without the intervention of User > request. I would have used a set of predefined queries which are frequently typed into your searcher. Typically queries with stopwords is good to use an a warmup phase. -- Asbjørn A. Fellinghaug [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: How to restore corrupted index
Lucene 2.2.0, windows XP -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 8:00 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index It's perfectly fine to have a reader open on an index, while an IndexWriter runs optimize. Which version of Lucene are you using? And which OS & filesystem? Mike Chaula Ganatra wrote: > It was the Reader on same index, which I did not close so gave > exception > in writer.optimise() > > Chaula > > -Original Message- > From: Michael McCandless [mailto:[EMAIL PROTECTED] > Sent: 26 September, 2008 7:17 PM > To: java-user@lucene.apache.org > Subject: Re: How to restore corrupted index > > > Can you post the full stack trace in both cases? > > Mike > > Chaula Ganatra wrote: > >> I found one case when such multiple files are remained, when we call >> writer.optimise() it throws exception and multiple files remained in >> index dir. >> >> After such multiple files, when we add document in index by calling >> writer.addDocument it throws java.lang.NegativeArraySizeException >> >> Regards, >> Chaula >> >> -Original Message- >> From: Grant Ingersoll [mailto:[EMAIL PROTECTED] >> Sent: 26 September, 2008 6:02 PM >> To: java-user@lucene.apache.org >> Subject: Re: How to restore corrupted index >> >> There is the CheckIndex tool included in the distribution for >> checking/ >> fixing bad indexes, but it can't solve everything. >> >> The bigger question is why it is happening to begin with. Can you >> describe your indexing process? How do you know the index is >> actually >> corrupted? Are you seeing exceptions when opening it? >> >> -Grant >> On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: >> >>> We have an application in which index will be updated frequently. >>> >>> During development time, found that index files gets corrupted, i.e. >>> more than one cfs files,some other extension files e.g. frq, fnm, >>> nrm >>> >>> Remains there in index directory. >>> >>> Is there any way that such issue does not occur at all or if it >>> happens >>> we can recover the index data again? >>> >>> It would be a great help, if some one can. >>> >>> >>> >>> >>> >>> Regards, >>> >>> Chaula >>> >>> >>> >>> >>> >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com >> >> Lucene Helpful Hints: >> http://wiki.apache.org/lucene-java/BasicsOfPerformance >> http://wiki.apache.org/lucene-java/LuceneFAQ >> >> >> >> >> >> >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Getting all found document ids from a search result
Gregor, You could loop through the results or collect them using a custom HitCollector. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, September 26, 2008 10:31:37 AM > Subject: Getting all found document ids from a search result > > > Hello you all, > > > > is it somehow possible to get all document ids found by a search. Not only 50 > or > 100... > > If it is possible and someone knows it, please help me :-) > > > > > > Thanks and beste regards, > > Gregor > > > > TREND MICRO Deutschland GmbH, Lise-Meitner-Str. 4, D-85716 Unterschleissheim, > Germany > Geschaeftsfuehrer: Raimund Genes, Amtsgericht Muenchen - HRB 114739 > > Die in dieser E-Mail und ihren etwaigen Anhaengen enthaltenen Informationen > sind > vertraulich und koennen gewerblichen Schutzrechten unterliegen. Sollten Sie > keiner der vorgesehenen Empfaenger sein, sind Sie nicht berechtigt, diese > Nachricht in irgendeiner Weise zu benutzen oder weiterzugeben. Bitte > benachrichtigen Sie uns ggf. per Rueckantwort oder telefonisch (089-37479700) > und loeschen Sie diese E-Mail aus Ihrem E-Mail-System. > > The information contained in this email and any attachments is confidential > and > may be subject to copyright or other intellectual property protection. If you > are not the intended recipient, you are not authorized to use or disclose > this > information, and we request that you notify us by reply mail or telephone and > delete the original message from your mail system. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sharing SearchIndexer
I think somebody provided a patch (might have been a whole new IndexReader impl?) mny moons ago (2005?), but it never attracted enough interest to get committed. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Michael Wechner <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, September 26, 2008 6:49:24 AM > Subject: Re: sharing SearchIndexer > > Ian Lea schrieb: > > Simon > > > > > > There is nothing in lucene to detect that an index has changed and > > automagically reopen an IndexReader. > > > > You can do the notification from your indexing thread, or every nnn > > mins, or whatever makes sense for your application. Note that > > IndexReader.reopen() does nothing if the index has not changed - see > > the javadocs. > > > > would it make sense for Lucene to introduce this as a feature? > > Cheers > > Michael > > > > -- > > Ian. > > > > > > On Fri, Sep 26, 2008 at 8:41 AM, simon litwan wrote: > > > >> Mark Miller schrieb: > >> > >>> simon litwan wrote: > >>> > hi all > > i tried to reuse the IndexSearcher among all of the threads that are > doing searches as described in > > (http://wiki.apache.org/lucene-java/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82) > > this works fine. but our application does continuous indexing. so the > index is changing and the at startup initialized IndexSearcher seems not > to > be notified to reload the index. > > is there a way to force the IndexSearcher to reload the index if the > index has changed? > > thanks in advance > > simon > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > >>> You want to reopen the Reader under the IndexSearcher, or open a new > >>> IndexSearcher. > >>> > >> I want to reopen the Reader under the IndexSearcher when the index has > >> changed. is there a way to do so? > >> > >> simon > >> > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Index time Document Boosting and Query Time Sorts
Cheers All 2008/9/24 Karl Wettin <[EMAIL PROTECTED]> > > 24 sep 2008 kl. 12.40 skrev Grant Ingersoll: > > One side note based on your example, below: Index time boosting does not >> have much granularity (only 255 values), in other words, there is a loss of >> precision. Thus, you >> want to make sure your boosts are different enough such that you can >> distinguish between the two. Maybe 1/(2*depth) or something like that. You >> can alter how these 255 values are encoded, but that is fairly advanced >> stuff. >> > > Just a note, the granularity is 255 only if you turn off length > normalization, if not it's something like 25. > > > karl > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- d i n ok o r a h Tel: +44 7956 66 52 83 --- 51°21'50.5902"N 0°6'11.8116"W
Re: Caused by: java.io.IOException: read past EOF on Slave
Michael: I just start testing 2.4rc2 running inside OJVM. I found a similar stack trace during indexing: IW 3 [Root Thread]: flush: segment=_3 docStoreSegment=_3 docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false numDocs=2 numBufDelTerms=2 IW 3 [Root Thread]: index before flush _1:C2->_1 _2:C2->_2 IW 3 [Root Thread]: DW: flush postings as segment _3 numDocs=2 IW 3 [Root Thread]: DW: oldRAMSize=111616 newFlushedSize=264 docs/MB=7,943.758 new/old=0.237% IW 3 [Root Thread]: DW: apply 2 buffered deleted terms and 0 deleted docIDs and 0 deleted queries on 3 segments. IW 3 [Root Thread]: hit exception flushing deletes Exception in thread "Root Thread" java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java) at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java) at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:918) at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java) at org.apache.lucene.indexer.LuceneDomainIndex.sync(LuceneDomainIndex.java:1308) I'll reinstall with a full debug info to see all line numbers in Lucene java code. Is there a list of semantic changes at BufferedIndeInput code? I mean it do sequential or random writes for example. But anyway, I just compiled with latest code and ran my test suites, I'll investigate the problem a bit more. Best regards, Marcelo. On Fri, Sep 26, 2008 at 7:32 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Can you describe the sequence of steps that your replication process goes > through? > > Also, which filesystem is the index being accessed through? > > Mike > > rahul_k123 wrote: > >> >> First of all, thanks to all the people who helped me in getting the lucene >> replication setup working and right now its live in our production :-) >> >> Everything working fine, except that i am seeing some exceptions on >> slaves. >> >> The following is the one which is occuring more often on slaves >> >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >> at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) >> at java.lang.Thread.run(Thread.java:619) >> Caused by: com.IndexingException: [SYSTEM_ERROR] Cannot access index >> [data_dir/index]: [read past EOF] >> at >> >> com.lucene.LuceneSearchService.getSearchResults(LuceneSearchService.java:964) >> ... 12 more >> Caused by: java.io.IOException: read past EOF >> at >> >> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146) >> at >> >> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) >> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:66) >> at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:89) >> at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:147) >> at >> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659) >> at >> >> org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:257) >> at >> org.apache.lucene.index.IndexReader.document(IndexReader.java:525) >> >> and the second one is >> >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >> at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) >> at java.lang.Thread.run(Thread.java:619) >> Caused by: java.lang.IllegalArgumentEx
Re: CorruptIndexException workaround in 2.3-SNAPSHOT? (Attn: Michael McCandless)
Confirmed that the manifest date on the 2.3-SNAPSHOT is much older than the file date: Implementation-Version: 2.3-SNAPSHOT 613047 - hudson - 2008-01-18 04:1 1:25 Is there an available SNAPSHOT of the 2.3 branch with this fix? I've downloaded the 2.4 SNAPSHOT to see if this will resolve the corruption issue -- based on the date in the Manifest, I'm assuming this is a slightly later version than 2.4rc2. 2.4 SNAPSHOT Implementation-Version: 2.4-SNAPSHOT 699151 - 2008-09-26 02:10:14 2.4 RC2 Implementation-Version: 2.4.0-rc2 698976 - 2008-09-25 10:12:47 Best, Ari On Fri, Sep 26, 2008 at 3:30 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Ari Miller wrote: > >> According to >> https://issues.apache.org/jira/browse/LUCENE-1282?focusedCommentId=12596949#action_12596949 >> (Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene), a workaround >> for the bug which causes the CorruptIndexException was put in to the >> 2.3 branch and 2.4. >> However, we are still experiencing this issue (intermittent creation >> of a corrupt index) with a 2.3-SNAPSHOT from maven. >> Was the workaround put into 2.3-SNAPSHOT? Are there other issues >> which would cause the same error (detailed below)? >> >> We would prefer to avoid upgrading to JDK 6u10 >> (http://java.sun.com/javase/downloads/ea/6u10/6u10RC.jsp) until it is >> a final release, thus the use of the 2.3-SNAPSHOT dated July 22. > > Can you look in the Lucene core JAR's manifest and report back what version > information you see? (It should contain the timestamp that the JAR was > built). > > I committed this workaround to the 2.3 branch on May 22. > > We don't currently have any automated build that pushes snapshot builds into > maven on the 2.3 branch; my guess is that 2.3-SNAPSHOT in maven was from the > last trunk build before 2.3 was released (which would not contain this fix), > though I'm not sure why it has timestamp July 22. > > Mike > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to restore corrupted index
Mike, As part of my goal of trying to use Lucene as primary storage mechanism (perhaps not the best idea), what do you think is the best way to handle storing data in Lucene and preventing corrupted data the way something like an SQL database handles corrupted data? Or is there simply no good way to do this? Jason On Fri, Sep 26, 2008 at 10:30 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > It's perfectly fine to have a reader open on an index, while an IndexWriter > runs optimize. > > Which version of Lucene are you using? And which OS & filesystem? > > Mike > > Chaula Ganatra wrote: > >> It was the Reader on same index, which I did not close so gave exception >> in writer.optimise() >> >> Chaula >> >> -Original Message- >> From: Michael McCandless [mailto:[EMAIL PROTECTED] >> Sent: 26 September, 2008 7:17 PM >> To: java-user@lucene.apache.org >> Subject: Re: How to restore corrupted index >> >> >> Can you post the full stack trace in both cases? >> >> Mike >> >> Chaula Ganatra wrote: >> >>> I found one case when such multiple files are remained, when we call >>> writer.optimise() it throws exception and multiple files remained in >>> index dir. >>> >>> After such multiple files, when we add document in index by calling >>> writer.addDocument it throws java.lang.NegativeArraySizeException >>> >>> Regards, >>> Chaula >>> >>> -Original Message- >>> From: Grant Ingersoll [mailto:[EMAIL PROTECTED] >>> Sent: 26 September, 2008 6:02 PM >>> To: java-user@lucene.apache.org >>> Subject: Re: How to restore corrupted index >>> >>> There is the CheckIndex tool included in the distribution for >>> checking/ >>> fixing bad indexes, but it can't solve everything. >>> >>> The bigger question is why it is happening to begin with. Can you >>> describe your indexing process? How do you know the index is actually >>> corrupted? Are you seeing exceptions when opening it? >>> >>> -Grant >>> On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: >>> We have an application in which index will be updated frequently. During development time, found that index files gets corrupted, i.e. more than one cfs files,some other extension files e.g. frq, fnm, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com >>> >>> Lucene Helpful Hints: >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>> http://wiki.apache.org/lucene-java/LuceneFAQ >>> >>> >>> >>> >>> >>> >>> >>> >>> - >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> - >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
How to get the index file entries (IndexReader) in different order?
Hi all, The index has Millions of entries. I need to display the index content in a JTable with columns (terms, field, freq) and the user can choose the sorting order (field, freq, terms), (freq, term, field), etc... What is the best solution to manage the Index sorting I just need some entries at a time, the one displayed in the table. Thus I just sort the entries needed in a TreeMap reading the whole index file (which is quite slow). Any Idea would be welcome Best wishes to all JCD -- View this message in context: http://www.nabble.com/How-to-get-the-index-file-entries-%28IndexReader%29--in-different-order--tp19691563p19691563.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to restore corrupted index
Corrupted data in what sense? EG if you don't trust your IO system to store data properly? Mike Jason Rutherglen wrote: Mike, As part of my goal of trying to use Lucene as primary storage mechanism (perhaps not the best idea), what do you think is the best way to handle storing data in Lucene and preventing corrupted data the way something like an SQL database handles corrupted data? Or is there simply no good way to do this? Jason On Fri, Sep 26, 2008 at 10:30 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: It's perfectly fine to have a reader open on an index, while an IndexWriter runs optimize. Which version of Lucene are you using? And which OS & filesystem? Mike Chaula Ganatra wrote: It was the Reader on same index, which I did not close so gave exception in writer.optimise() Chaula -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 7:17 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index Can you post the full stack trace in both cases? Mike Chaula Ganatra wrote: I found one case when such multiple files are remained, when we call writer.optimise() it throws exception and multiple files remained in index dir. After such multiple files, when we add document in index by calling writer.addDocument it throws java.lang.NegativeArraySizeException Regards, Chaula -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 6:02 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index There is the CheckIndex tool included in the distribution for checking/ fixing bad indexes, but it can't solve everything. The bigger question is why it is happening to begin with. Can you describe your indexing process? How do you know the index is actually corrupted? Are you seeing exceptions when opening it? -Grant On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: We have an application in which index will be updated frequently. During development time, found that index files gets corrupted, i.e. more than one cfs files,some other extension files e.g. frq, fnm, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Caused by: java.io.IOException: read past EOF on Slave
The following are steps.. 1.We do indexing every 5 minutes on master and when indexing is done a snapshot is taken 2. On slave we have a cronjob which runs snappuller every 3 minutes to check for new snapshots and installs it on slave if it finds new one 3.Master and Slave are continuously serving search requests. (I am not using SOLR for indexing ) The file system is ext3 Thanks in advance. Michael McCandless-2 wrote: > > > Can you describe the sequence of steps that your replication process > goes through? > > Also, which filesystem is the index being accessed through? > > Mike > > rahul_k123 wrote: > >> >> First of all, thanks to all the people who helped me in getting the >> lucene >> replication setup working and right now its live in our production :-) >> >> Everything working fine, except that i am seeing some exceptions on >> slaves. >> >> The following is the one which is occuring more often on slaves >> >> at java.util.concurrent.Executors >> $RunnableAdapter.call(Executors.java:441) >>at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>at >> java.util.concurrent.ThreadPoolExecutor >> $Worker.runTask(ThreadPoolExecutor.java:885) >>at >> java.util.concurrent.ThreadPoolExecutor >> $Worker.run(ThreadPoolExecutor.java:907) >>at java.lang.Thread.run(Thread.java:619) >> Caused by: com.IndexingException: [SYSTEM_ERROR] Cannot access index >> [data_dir/index]: [read past EOF] >>at >> com >> .lucene >> .LuceneSearchService.getSearchResults(LuceneSearchService.java:964) >>... 12 more >> Caused by: java.io.IOException: read past EOF >>at >> org >> .apache >> .lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146) >>at >> org >> .apache >> .lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) >>at org.apache.lucene.store.IndexInput.readInt(IndexInput.java: >> 66) >>at >> org.apache.lucene.store.IndexInput.readLong(IndexInput.java:89) >>at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java: >> 147) >>at >> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659) >>at >> org >> .apache >> .lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:257) >>at >> org.apache.lucene.index.IndexReader.document(IndexReader.java:525) >> >> and the second one is >> >> at java.util.concurrent.Executors >> $RunnableAdapter.call(Executors.java:441) >>at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>at >> java.util.concurrent.ThreadPoolExecutor >> $Worker.runTask(ThreadPoolExecutor.java:885) >>at >> java.util.concurrent.ThreadPoolExecutor >> $Worker.run(ThreadPoolExecutor.java:907) >>at java.lang.Thread.run(Thread.java:619) >> Caused by: java.lang.IllegalArgumentException: attempt to access a >> deleted >> document >>at >> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:657) >>at >> org >> .apache >> .lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:257) >>at >> org.apache.lucene.index.IndexReader.document(IndexReader.java:525) >> This is on master index . >> >> >> >> Any help is appreciated >> >> Thanks. >> >> -- >> View this message in context: >> http://www.nabble.com/Caused-by%3A-java.io.IOException%3A-read-past-EOF-on-Slave-tp19682684p19682684.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Caused-by%3A-java.io.IOException%3A-read-past-EOF-on-Slave-tp19682684p19691799.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to restore corrupted index
OK. I really need to see those stack traces to better understand this issue. Also, does the issue still happen on 2.3, or 2.4 RC2? Mike Chaula Ganatra wrote: Lucene 2.2.0, windows XP -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 8:00 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index It's perfectly fine to have a reader open on an index, while an IndexWriter runs optimize. Which version of Lucene are you using? And which OS & filesystem? Mike Chaula Ganatra wrote: It was the Reader on same index, which I did not close so gave exception in writer.optimise() Chaula -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 7:17 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index Can you post the full stack trace in both cases? Mike Chaula Ganatra wrote: I found one case when such multiple files are remained, when we call writer.optimise() it throws exception and multiple files remained in index dir. After such multiple files, when we add document in index by calling writer.addDocument it throws java.lang.NegativeArraySizeException Regards, Chaula -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 6:02 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index There is the CheckIndex tool included in the distribution for checking/ fixing bad indexes, but it can't solve everything. The bigger question is why it is happening to begin with. Can you describe your indexing process? How do you know the index is actually corrupted? Are you seeing exceptions when opening it? -Grant On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: We have an application in which index will be updated frequently. During development time, found that index files gets corrupted, i.e. more than one cfs files,some other extension files e.g. frq, fnm, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to restore corrupted index
I'm thinking more in terms of CRC32 checks performed on database pages. Is there a way to incorporate this technique in a way that does not affect performance too much in Lucene? The question is, when is the CRC32 check is performed, and to which files is it applied if any? On Fri, Sep 26, 2008 at 12:13 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Corrupted data in what sense? > > EG if you don't trust your IO system to store data properly? > > Mike > > Jason Rutherglen wrote: > >> Mike, >> >> As part of my goal of trying to use Lucene as primary storage >> mechanism (perhaps not the best idea), what do you think is the best >> way to handle storing data in Lucene and preventing corrupted data the >> way something like an SQL database handles corrupted data? Or is >> there simply no good way to do this? >> >> Jason >> >> On Fri, Sep 26, 2008 at 10:30 AM, Michael McCandless >> <[EMAIL PROTECTED]> wrote: >>> >>> It's perfectly fine to have a reader open on an index, while an >>> IndexWriter >>> runs optimize. >>> >>> Which version of Lucene are you using? And which OS & filesystem? >>> >>> Mike >>> >>> Chaula Ganatra wrote: >>> It was the Reader on same index, which I did not close so gave exception in writer.optimise() Chaula -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 7:17 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index Can you post the full stack trace in both cases? Mike Chaula Ganatra wrote: > I found one case when such multiple files are remained, when we call > writer.optimise() it throws exception and multiple files remained in > index dir. > > After such multiple files, when we add document in index by calling > writer.addDocument it throws java.lang.NegativeArraySizeException > > Regards, > Chaula > > -Original Message- > From: Grant Ingersoll [mailto:[EMAIL PROTECTED] > Sent: 26 September, 2008 6:02 PM > To: java-user@lucene.apache.org > Subject: Re: How to restore corrupted index > > There is the CheckIndex tool included in the distribution for > checking/ > fixing bad indexes, but it can't solve everything. > > The bigger question is why it is happening to begin with. Can you > describe your indexing process? How do you know the index is actually > corrupted? Are you seeing exceptions when opening it? > > -Grant > On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: > >> We have an application in which index will be updated frequently. >> >> During development time, found that index files gets corrupted, i.e. >> more than one cfs files,some other extension files e.g. frq, fnm, >> nrm >> >> Remains there in index directory. >> >> Is there any way that such issue does not occur at all or if it >> happens >> we can recover the index data again? >> >> It would be a great help, if some one can. >> >> >> >> >> >> Regards, >> >> Chaula >> >> >> >> >> > > -- > Grant Ingersoll > http://www.lucidimagination.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> - >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Caused by: java.io.IOException: read past EOF on Slave
This one looks spooky! Is it easily repeated? If you could print out which 2 terms you had tried to delete, and then zip up the index just before deleting those docs (after closing the writer) and send to me, I can try to understand what's wrong with the index. It looks as if the *.tis file for one of the segments is truncated. If you capture the series of add/update/delete documents, can you get a standalone Java test to show this? Does this test create an entirely new index? We did change the index format in 2.4 to use "true" UTF8 encoding for all text content; not sure that this applies here (to BufferedIndexReader it's all bytes) but it may. BufferedIndexReader in general can do random IO, especially when reading the term dict file (*.tis), when you Mike Marcelo Ochoa wrote: Michael: I just start testing 2.4rc2 running inside OJVM. I found a similar stack trace during indexing: IW 3 [Root Thread]: flush: segment=_3 docStoreSegment=_3 docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false numDocs=2 numBufDelTerms=2 IW 3 [Root Thread]: index before flush _1:C2->_1 _2:C2->_2 IW 3 [Root Thread]: DW: flush postings as segment _3 numDocs=2 IW 3 [Root Thread]: DW: oldRAMSize=111616 newFlushedSize=264 docs/MB=7,943.758 new/old=0.237% IW 3 [Root Thread]: DW: apply 2 buffered deleted terms and 0 deleted docIDs and 0 deleted queries on 3 segments. IW 3 [Root Thread]: hit exception flushing deletes Exception in thread "Root Thread" java.io.IOException: read past EOF at org .apache .lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java) at org .apache .lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java) at org .apache .lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java) at org .apache .lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java) at org .apache .lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:918) at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java) at org .apache.lucene.indexer.LuceneDomainIndex.sync(LuceneDomainIndex.java: 1308) I'll reinstall with a full debug info to see all line numbers in Lucene java code. Is there a list of semantic changes at BufferedIndeInput code? I mean it do sequential or random writes for example. But anyway, I just compiled with latest code and ran my test suites, I'll investigate the problem a bit more. Best regards, Marcelo. On Fri, Sep 26, 2008 at 7:32 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: Can you describe the sequence of steps that your replication process goes through? Also, which filesystem is the index being accessed through? Mike rahul_k123 wrote: First of all, thanks to all the people who helped me in getting the lucene replication setup working and right now its live in our production :-) Everything working fine, except that i am seeing some exceptions on slaves. The following is the one which is occuring more often on slaves at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java: 441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) Caused by: com.IndexingException: [SYSTEM_ERROR] Cannot access index [data_dir/index]: [read past EOF] at com .lucene .LuceneSearchService.getSearchResults(LuceneSearchService.java:964) ... 12 more Caused by: java.io.IOException: read past EOF at org .apache .lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146) at org .apache .lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java: 38) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java: 66) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:89) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java: 147) at org.apache.lucene.index.SegmentReader.document(Seg
Re: Caused by: java.io.IOException: read past EOF on Slave
Which version of Lucene is this? Looks like 2.3.x -- what's the x? Can you run your app server with assertions enabled for org.apache.lucene.*? It may catch something sooner. Can you try running CheckIndex after the snapshot is produced, just to see if there is any corruption? Your first exception (on the slave) seems like the *.fdx file of that one segment is somehow truncated, or, you are passing an out-of-bounds document number to IndexReader.document. Your 2nd one (on the master) looks like an invalid (deleted) document number is being passed to IndexReader.document. What is the context of these IndexReader.document(...) calls? How are you getting the doc numbers that you're passing to them? In both cases, an invalid doc number would explain your exception. Are you doing any search caching, where you cache hits and then much later try to load the documents for each hit, or, something? More questions below... rahul_k123 wrote: The following are steps.. 1.We do indexing every 5 minutes on master and when indexing is done a snapshot is taken The IndexWriter is definitely closed before the snapshot is taken? Are you creating a new index, or, just adding to an existing one? 2. On slave we have a cronjob which runs snappuller every 3 minutes to check for new snapshots and installs it on slave if it finds new one Sounds OK. Does this entail a restart of the reader after the snapshot is installed? (I am not using SOLR for indexing ) The file system is ext3 OK. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to restore corrupted index
OK it does sound like you're primarily protecting against an untrustworthy storage system (or, maybe, Lucene bugs ;). Probably the best option is to do this fully externally, ie, compute digest yourself, store it away in a separate Lucene field, then test the digest on loading the field? Mike Jason Rutherglen wrote: I'm thinking more in terms of CRC32 checks performed on database pages. Is there a way to incorporate this technique in a way that does not affect performance too much in Lucene? The question is, when is the CRC32 check is performed, and to which files is it applied if any? On Fri, Sep 26, 2008 at 12:13 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: Corrupted data in what sense? EG if you don't trust your IO system to store data properly? Mike Jason Rutherglen wrote: Mike, As part of my goal of trying to use Lucene as primary storage mechanism (perhaps not the best idea), what do you think is the best way to handle storing data in Lucene and preventing corrupted data the way something like an SQL database handles corrupted data? Or is there simply no good way to do this? Jason On Fri, Sep 26, 2008 at 10:30 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: It's perfectly fine to have a reader open on an index, while an IndexWriter runs optimize. Which version of Lucene are you using? And which OS & filesystem? Mike Chaula Ganatra wrote: It was the Reader on same index, which I did not close so gave exception in writer.optimise() Chaula -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 7:17 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index Can you post the full stack trace in both cases? Mike Chaula Ganatra wrote: I found one case when such multiple files are remained, when we call writer.optimise() it throws exception and multiple files remained in index dir. After such multiple files, when we add document in index by calling writer.addDocument it throws java.lang.NegativeArraySizeException Regards, Chaula -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: 26 September, 2008 6:02 PM To: java-user@lucene.apache.org Subject: Re: How to restore corrupted index There is the CheckIndex tool included in the distribution for checking/ fixing bad indexes, but it can't solve everything. The bigger question is why it is happening to begin with. Can you describe your indexing process? How do you know the index is actually corrupted? Are you seeing exceptions when opening it? -Grant On Sep 26, 2008, at 6:49 AM, Chaula Ganatra wrote: We have an application in which index will be updated frequently. During development time, found that index files gets corrupted, i.e. more than one cfs files,some other extension files e.g. frq, fnm, nrm Remains there in index directory. Is there any way that such issue does not occur at all or if it happens we can recover the index data again? It would be a great help, if some one can. Regards, Chaula -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
ANNOUNCE: Application Period Opens for Travel Assistance to ApacheCon US 2008
NOTE: This is a cross posted announcement to all Lucene sub-projects, please confine any replies to [EMAIL PROTECTED] - The Travel Assistance Committee is taking in applications for those wanting to attend ApacheCon US 2008 between the 3rd and 7th November 2008 in New Orleans. The Travel Assistance Committee is looking for people who would like to be able to attend ApacheCon US 2008 who need some financial support in order to get there. There are VERY few places available and the criteria is high, that aside applications are open to all open source developers who feel that their attendance would benefit themselves, their project(s), the ASF and open source in general. Financial assistance is available for flights, accomodation and entrance fees either in full or in part, depending on circumstances. It is intended that all our ApacheCon events are covered, so it may be prudent for those in Europe and or Asia to wait until an event closer to them comes up - you are all welcome to apply for ApacheCon US of course, but there must be compelling reasons for you to attend an event further away that your home location for your application to be considered above those closer to the event location. More information can be found on the main Apache website at http://www.apache.org/travel/index.html - where you will also find a link to the application form and details for submitting. Time is very tight for this event, so applications are open now and will end on the 2nd October 2008 - to give enough time for travel arrangements to be made. Good luck to all those that will apply. Regards, The Travel Assistance Committee - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Caused by: java.io.IOException: read past EOF on Slave
Mike: Actually there is more issues at first glance with OJVMDirectory integration. Note this, I am creating an index with two simple documents: INFO: Performing: SELECT /*+ DYNAMIC_SAMPLING(0) RULE NOCACHE(T1) */ T1.rowid,F1,extractValue(F2,'/emp/name/text()') "name",extractValue(F2,'/emp/@id') "id" FROM LUCENE.T1 for update nowait Sep 26, 2008 3:44:16 PM org.apache.lucene.indexer.TableIndexer index FINE: Document indexed,tokenized indexed,tokenized indexed,tokenized> Sep 26, 2008 3:44:16 PM org.apache.lucene.indexer.TableIndexer index FINE: Document indexed,tokenized indexed,tokenized indexed,tokenized> IW 10 [Root Thread]: flush: segment=_0 docStoreSegment=_0 docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false numDocs=2 numBufDelTerms=0 IW 10 [Root Thread]: index before flush IW 10 [Root Thread]: DW: flush postings as segment _0 numDocs=2 IW 10 [Root Thread]: DW: oldRAMSize=111616 newFlushedSize=166 docs/MB=12,633.446 new/old=0.149% IFD [Root Thread]: now checkpoint "segments_1" [1 segments ; isCommit = false] IW 10 [Root Thread]: LMP: findMerges: 1 segments IW 10 [Root Thread]: LMP: level -1.0 to 2.2741578: 1 segments IW 10 [Root Thread]: CMS: now merge IW 10 [Root Thread]: CMS: index: _0:C2->_0 IW 10 [Root Thread]: CMS: no more merges pending; now return IW 10 [Root Thread]: now flush at close IW 10 [Root Thread]: flush: segment=null docStoreSegment=_0 docStoreOffset=2 flushDocs=false flushDeletes=true flushDocStores=true numDocs=0 numBufDelTerms=0 IW 10 [Root Thread]: index before flush _0:C2->_0 IW 10 [Root Thread]: flush shared docStore segment _0 IW 10 [Root Thread]: DW: closeDocStore: 2 files to flush to segment _0 numDocs=2 IW 10 [Root Thread]: CMS: now merge IW 10 [Root Thread]: CMS: index: _0:C2->_0 IW 10 [Root Thread]: CMS: no more merges pending; now return IW 10 [Root Thread]: now call final commit() IW 10 [Root Thread]: startCommit(): start sizeInBytes=0 IW 10 [Root Thread]: startCommit index=_0:C2->_0 changeCount=2 IW 10 [Root Thread]: now sync _0.fnm IW 10 [Root Thread]: now sync _0.frq IW 10 [Root Thread]: now sync _0.prx IW 10 [Root Thread]: now sync _0.tis IW 10 [Root Thread]: now sync _0.tii IW 10 [Root Thread]: now sync _0.nrm IW 10 [Root Thread]: now sync _0.fdx IW 10 [Root Thread]: now sync _0.fdt IW 10 [Root Thread]: done all syncs IW 10 [Root Thread]: commit: pendingCommit != null IFD [Root Thread]: now checkpoint "segments_2" [1 segments ; isCommit = true] IFD [Root Thread]: deleteCommits: now decRef commit "segments_1" IFD [Root Thread]: delete "segments_1" IW 10 [Root Thread]: commit: done IW 10 [Root Thread]: at close: _0:C2->_0 Sep 26, 2008 3:44:16 PM org.apache.lucene.indexer.LuceneDomainIndex ODCIIndexCreate FINER: RETURN 0 Index created. And when I am trying to read the index I got: INFO: Analyzer: [EMAIL PROTECTED] Sep 26, 2008 3:44:48 PM org.apache.lucene.indexer.LuceneDomainIndex ODCIStart INFO: qryStr: DESC(name:ravi) Sep 26, 2008 3:44:48 PM org.apache.lucene.indexer.LuceneDomainIndex ODCIStart INFO: storing cachingFilter: -1378376940 and searcher: 781713581 qryStr: DESC(name:ravi) Sep 26, 2008 3:44:48 PM org.apache.lucene.indexer.LuceneDomainIndex getSort INFO: using sort: , Exception in thread "Root Thread" java.lang.IndexOutOfBoundsException: Index: 6, Size: 4 at java.util.ArrayList.RangeCheck(ArrayList.java) at java.util.ArrayList.get(ArrayList.java) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java) at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java) at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java) at org.apache.lucene.search.Similarity.idf(Similarity.java) at org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java) at org.apache.lucene.search.Query.weight(Query.java) at org.apache.lucene.search.Hits.(Hits.java:85) at org.apache.lucene.search.Searcher.search(Searcher.java) at org.apache.lucene.indexer.LuceneDomainIndex.ODCIStart(LuceneDomainIndex.java) Which definetly means that something is not well saved at OJVM directory BLOB storage :( This are my files: SQL> select file_size,name from it1$t; FILE_SIZE NAME -- -- 10 parameters 1 updateCount 28 segments_1 20 segments.gen 8 _0.frq 8 _0.prx 103 _0.tis 35 _0.tii 12 _0.nrm 22 _0.f
ApacheCon US promo
Cross-posting... Just wanted to let everyone know that there will be a number of Lucene/ Solr/Mahout/Tika related talks, training sessions, and Birds of a Feather (BOF) gatherings at ApacheCon New Orleans this fall. Details: When: November 3-7 Where: Sheraton, New Orleans, USA URL: http://us.apachecon.com/c/acus2008/ Lucene: Advanced Indexing Techniques by Michael Busch: http://us.apachecon.com/c/acus2008/sessions/7 Lucene Boot Camp (2 day hands-on training by me): http://us.apachecon.com/c/acus2008/sessions/69 Solr: Solr out of the Box by Chris Hostetter: http://us.apachecon.com/c/acus2008/sessions/9 Beyond the Box by Hoss: http://us.apachecon.com/c/acus2008/sessions/10 Solr Boot Camp (1 day hands-on training by Erik Hatcher): http://us.apachecon.com/c/acus2008/sessions/91 Mahout: Intro to Mahout and Machine Learning (by me): http://us.apachecon.com/c/acus2008/sessions/11 Tika: Content Analysis for ECM with Apache Tika by Paolo Mottadelli : http://us.apachecon.com/c/acus2008/sessions/12 There's also one more Lucene session that is TBD, but it will be on that same Wednesday as everything else. Chances are it will be an intro to Lucene type talk. BOFs: http://wiki.apache.org/apachecon/BirdsOfaFeatherUs08 Cheers, Grant - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Please help to interpret Lucene Boost results
I am baffled by the results of the following queries. Can it be something to do with the boosting factor? All of these queries are performed in the same environment with the same crawled index/data. A. query1 = +(content:(Pepsi)) resulted in 228 hits. B. query2 = +(content:(Pepsi) ) +(host:(ca)^10 ) resulted in 398 hits. C. query3 = +(host:(ca)^10 )resulted in 212 hits. Two questions (strictly just one): 1. query1 of any content contains Pepsi yielded 228 hits, how could a more limiting query2 (give me all docs that have Pepsi in it with a domain of ca) yield more hits (398)? 2. Since there are 212 hits of Canadian domains, how can query2 return 398 hits? Thanks for any pointers! Cheers, student_t -- View this message in context: http://www.nabble.com/Please-help-to-interpret-Lucene-Boost-results-tp19695313p19695313.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Please help to interpret Lucene Boost results
On Freitag, 26. September 2008, student_t wrote: > A. query1 = +(content:(Pepsi)) I guess this is the string input you use for your queries, isn't it? It's more helpful to look at the toString() output of the parsed query to see how Lucene interpreted your input. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Please help to interpret Lucene Boost results
That certainly doesn't look right. What analyzers are you using at index and query time? Two things that will help track down what's really happening: 1> query.toString() is your friend. 2> get a copy of the excellent Luke tool and have it do its explain magic on your query. Watch that the analyzer you choose when querying is what you expect If neither of those things sheds any light on the problem, let us know what you find Best Erick On Fri, Sep 26, 2008 at 3:55 PM, student_t <[EMAIL PROTECTED]> wrote: > > I am baffled by the results of the following queries. Can it be something > to > do with the boosting factor? All of these queries are performed in the same > environment with the same crawled index/data. > > A. query1 = +(content:(Pepsi)) resulted in 228 > hits. > B. query2 = +(content:(Pepsi) ) +(host:(ca)^10 ) resulted in 398 hits. > C. query3 = +(host:(ca)^10 )resulted in 212 > hits. > > Two questions (strictly just one): > 1. query1 of any content contains Pepsi yielded 228 hits, how could a more > limiting query2 (give me all docs that have Pepsi in it with a domain of > ca) > yield more hits (398)? > 2. Since there are 212 hits of Canadian domains, how can query2 return 398 > hits? > > Thanks for any pointers! > Cheers, > student_t > > > -- > View this message in context: > http://www.nabble.com/Please-help-to-interpret-Lucene-Boost-results-tp19695313p19695313.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
Re: Caused by: java.io.IOException: read past EOF on Slave
I will try the other stuff and will let you know. This is how we do the search , we will get the Hits in one call and we will make another call to get the data from lucene. My guess is when it gets the matching Hits it getting from master and when it tries to retrieve the actual data its hitting the slave where it doesn't have the updated data yet . I will test this scenario and will let you know. Some answers below too Michael McCandless-2 wrote: > > > Which version of Lucene is this? Looks like 2.3.x -- what's the x? > > 2.3.1 > > Can you run your app server with assertions enabled for > org.apache.lucene.*? It may catch something sooner. > > > Can you try running CheckIndex after the snapshot is produced, just to > see if there is any corruption? > > Your first exception (on the slave) seems like the *.fdx file of that > one segment is somehow truncated, or, you are passing an out-of-bounds > document number to IndexReader.document. > > Your 2nd one (on the master) looks like an invalid (deleted) document > number is being passed to IndexReader.document. > > What is the context of these IndexReader.document(...) calls? How are > you getting the doc numbers that you're passing to them? In both > cases, an invalid doc number would explain your exception. Are you > doing any search caching, where you cache hits and then much later try > to load the documents for each hit, or, something? > > More questions below... > > rahul_k123 wrote: > >> >> The following are steps.. >> >> 1.We do indexing every 5 minutes on master and when indexing is done a >> snapshot is taken > > The IndexWriter is definitely closed before the snapshot is taken? > > Are you creating a new index, or, just adding to an existing one? Adding > to the existing one > >> 2. On slave we have a cronjob which runs snappuller every 3 minutes >> to check >> for new snapshots and installs it on slave if it finds new one > > Sounds OK. Does this entail a restart of the reader after the > snapshot is installed? yes, before using reading we check > indexreader.iscurrent() > >> (I am not using SOLR for indexing ) >> >> >> The file system is ext3 > > OK. > > Mike > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Caused-by%3A-java.io.IOException%3A-read-past-EOF-on-Slave-tp19682684p19697398.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Please help to interpret Lucene Boost results
Hi Dan, Thanks for your suggestion. I will definitely check that out. -student_t Daniel Naber-10 wrote: > > On Freitag, 26. September 2008, student_t wrote: > >> A. query1 = +(content:(Pepsi)) > > I guess this is the string input you use for your queries, isn't it? It's > more helpful to look at the toString() output of the parsed query to see > how Lucene interpreted your input. > > Regards > Daniel > > -- > http://www.danielnaber.de > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Please-help-to-interpret-Lucene-Boost-results-tp19695313p19697619.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Please help to interpret Lucene Boost results
Hi Eric, Thanks a bunch for your pointers. I will need to find out the analyzers at index and query time. But is it critical to have the same analyzers during these two times? I had tested with lucli from some of my local segment data and they appeared working fine (i.e., their result sets are reasonable.) Is Luke part of Lucene contrib? I recall there is a GUI that lets you view the indices. Would you please elaborate? Thanks again! student_t Erick Erickson wrote: > > That certainly doesn't look right. What analyzers are you using at index > and query time? > > Two things that will help track down what's really happening: > > 1> query.toString() is your friend. > 2> get a copy of the excellent Luke tool and have it do its explain magic > on > your query. Watch that the analyzer you choose when querying is what you > expect > > If neither of those things sheds any light on the problem, let us know > what > you find > > Best > Erick > > On Fri, Sep 26, 2008 at 3:55 PM, student_t <[EMAIL PROTECTED]> wrote: > >> >> I am baffled by the results of the following queries. Can it be something >> to >> do with the boosting factor? All of these queries are performed in the >> same >> environment with the same crawled index/data. >> >> A. query1 = +(content:(Pepsi)) resulted in >> 228 >> hits. >> B. query2 = +(content:(Pepsi) ) +(host:(ca)^10 ) resulted in 398 >> hits. >> C. query3 = +(host:(ca)^10 )resulted in >> 212 >> hits. >> >> Two questions (strictly just one): >> 1. query1 of any content contains Pepsi yielded 228 hits, how could a >> more >> limiting query2 (give me all docs that have Pepsi in it with a domain of >> ca) >> yield more hits (398)? >> 2. Since there are 212 hits of Canadian domains, how can query2 return >> 398 >> hits? >> >> Thanks for any pointers! >> Cheers, >> student_t >> >> >> -- >> View this message in context: >> http://www.nabble.com/Please-help-to-interpret-Lucene-Boost-results-tp19695313p19695313.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > -- View this message in context: http://www.nabble.com/Please-help-to-interpret-Lucene-Boost-results-tp19695313p19697605.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search warmup from Tomcat
I my opinion. Set up two IndexSearcher, say #1 contain the old index data for query when #2 is updating. and turn query to #2 where updating is completed. 2008/9/26 Asbjørn A. Fellinghaug <[EMAIL PROTECTED]> > Ganesh: > > Hello all, > > > > I need to do warmup the searcher object from my JSP pages. Currently i am > > having a static object, which i am frequently checking whether index got > > updated, if so i am closing the indexer and re-opening it. These JSP > pages > > are invoked by the User. When User performs any search operation, few are > > faster and few are slower. This is because the searcher object is getting > > updated. > > > > How do i warm up my searcher object, without the intervention of User > > request. > > I would have used a set of predefined queries which are frequently typed > into your searcher. Typically queries with stopwords is good to use an a > warmup phase. > > -- > Asbjørn A. Fellinghaug > [EMAIL PROTECTED] > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Sorry for my English!! 明 Please help me correct my English expression and error in syntax