Problem with Add method
This code generate error, kindly tell me that what parameters will be use when we use constructors. Document doc = new Document(); doc.add( Field.Keyword("id", keywords[i])); doc.add( Field.UnIndexed("country", unindexed[i])); doc.add(Field.UnStored("contents", unstored[i])); doc.add( Field.Text("city", text[i])); writer.addDocument(doc); - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Deprecated API
i m studying LIA. but there is a problem with code. When i run the code i get errorsThe errors are related with the use of deprecated APIs.Kindly suggest me the right APIs and also instructions how to handle this situation with other code.. package lia.indexing; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexReader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.SimpleAnalyzer; import junit.framework.TestCase; import java.io.IOException; /** * */ public abstract class BaseIndexingTestCase extends TestCase { protected String[] keywords = {"1", "2"}; protected String[] unindexed = {"Netherlands", "Italy"}; protected String[] unstored = {"Amsterdam has lots of bridges", "Venice has lots of canals"}; protected String[] text = {"Amsterdam", "Venice"}; protected Directory dir; protected void setUp() throws IOException { String indexDir = System.getProperty("java.io.tmpdir", "tmp") + System.getProperty("file.separator") + "index-dir"; dir = FSDirectory.getDirectory(indexDir, true); addDocuments(dir); } protected void addDocuments(Directory dir) throws IOException { IndexWriter writer = new IndexWriter(dir, getAnalyzer(), true); writer.setUseCompoundFile(isCompound()); for (int i = 0; i < keywords.length; i++) { Document doc = new Document(); doc.add(Field.Keyword("id", keywords[i])); doc.add(Field.UnIndexed("country", unindexed[i])); doc.add(Field.UnStored("contents", unstored[i])); doc.add(Field.Text("city", text[i])); writer.addDocument(doc); } writer.optimize(); writer.close(); } protected Analyzer getAnalyzer() { return new SimpleAnalyzer(); } protected boolean isCompound() { return true; } public void testIndexWriter() throws IOException { IndexWriter writer = new IndexWriter(dir, getAnalyzer(), false); assertEquals(keywords.length, writer.docCount()); writer.close(); } public void testIndexReader() throws IOException { IndexReader reader = IndexReader.open(dir); assertEquals(keywords.length, reader.maxDoc()); assertEquals(keywords.length, reader.numDocs()); reader.close(); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problem with Add method
Which Lucene version do you use? If it's 2.2, then Field.Keyword, Field.UnIndexed etc. we removed. You should instead do: Document doc = new Document(); doc.add(new Field("id", keywords[i], Store.NO, Index.UN_TOKENIZED)); doc.add(new Field("country", unindexed[i], Store.YES, Index.UN_TOKENIZED)); etc... On Nov 29, 2007 10:25 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote: > This code generate error, kindly tell me that what parameters will be > use when we use constructors. > > Document doc = new Document(); > doc.add( Field.Keyword("id", keywords[i])); > doc.add( Field.UnIndexed("country", unindexed[i])); > doc.add(Field.UnStored("contents", unstored[i])); > doc.add( Field.Text("city", text[i])); > writer.addDocument(doc); > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards, Shai Erera
Re: CorruptIndexException
That exception means your index was written with a newer version of Lucene than the version you are using to open the IndexReader. It looks like you used the unreleased (2.3 dev) version of Lucli from the Lucene trunk and then went back to an older Lucene JAR (maybe 2.2?) for accessing it? In general writing an index with a newer version of Lucene and then trying to access it using an older version of Lucene doesn't work (whereas the opposite does). I'm afraid you either have to switch to 2.3-dev for reading your index (but beware it could have sneaky bugs ...), or, rebuild your index with the 2.2 version of Lucene and use the 2.2 Lucli in the future. Mike "Melanie Langlois" <[EMAIL PROTECTED]> wrote: > Hi, > > > > I use Lucli to optimize my index, when my application was stopped. And > after restarting my application, I could not serahc my index anymore, I > got the following exception : > > > > org.apache.lucene.index.CorruptIndexException: Unknown format version: -4 > > at > org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:204) > > at > org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:190) > > at > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610) > > at org.apache.lucene.index.IndexReader.open(IndexReader.java:185) > > at org.apache.lucene.index.IndexReader.open(IndexReader.java:167) > > > > I have two questions: > > -why does it occurs ? Should I use another tool to access the index > outside of my application ? > > -do there is way to recover ? > > > > > > Thanks, > > > > Mélanie > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Error with lucene-core-2.2.0.jar
Hi all, I'm using a program that use the Lucene Library. I've downloaded lucene-core-2.2.0.jar file and I'm trying it, but I get this error while trying to index my documents: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.lucene.document.Document.add(Lorg/apache/lucene/document/Field;)V I've searched among old threads and I think that the error is caused by a wrong version of library. So, which one must I use? Thanks. Ale - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
"Bill Janssen" <[EMAIL PROTECTED]> wrote: > > Hmmm ... how many chunks of "about 50 pages" do you do before > > hitting this? Roughly how many docs are in the index when it > > happens? > > Oh, gosh, not sure. I'm guessing it's about half done. Ugh, OK. If we could boil this down to a smaller set that is easily reproducible (and transferable to me) then I could try to track it down. Do you have another PPC machine to reproduce this on? (To rule out bad RAM/hard-drive on the first one). Can you try running with the trunk version of Lucene (2.3-dev) and see if the error still occurs? EG you can download this AM's build here: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/288/artifact/artifacts Another thing to try is turning on the infoStream (IndexWriter.setInfoStream(...)) and capture & post the resulting log. It will be very large since it takes quite a while for the error to occur... > So, I ran the same codebase with lucene-core-2.2.0.jar on an Intel > Mac Pro, OS X 10.5.0, Java 1.5, and no exception is raised. > Different corpus, about 5 pages instead of 2. This is > reinforcing my thinking that it's a big-endian issue. That's a good question. Lucene is endian independent: all writes to files boil eventually down to a writeByte/writeBytes calls in o.a.l.store.IndexOutput such that the ordering is controlled by Lucene, not the underlying CPU architecture. That said, it is clearly a difference in your test so it seems like a compelling lead... is it possible to run this different corpus back on the PPC machine, to rule out a corpus difference leading to the exception? > I've got 1735 documents, 18969 pages -- average page size 10.9, max > page size 1235 (a physics textbook), 578 one-page documents. These > are Web pages, PDFs, articles, photos, scanned stuff, technical > papers, etc. I index six documents at a time, so I guess I'm > averaging about 65 pages per chunk. For each document, I index the > whole text of the document as a Lucene Document, and I index the > text of each page separately as a Document. I use the "contents" > fields and "pagecontents" fields for those two uses. I also add > metadata information to each: "title", multiple "author" fields, > "date", "abstract", etc. OK, sounds like a nice rich corpus :) Are you using term vectors, stored fields, payloads on any of these? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Deprecated API
Have a look at the Field.java class and it's constructors. The other option is to look at what was deprecated on Lucene 1.9 and then look at Lucene 2.x. Also, I have some up to date example files of indexing, etc. at http://www.lucenebootcamp.com (follow the link to the SVN repository) which you can check out and compare. Cheers, Grant On Nov 29, 2007, at 3:42 AM, Liaqat Ali wrote: i m studying LIA. but there is a problem with code. When i run the code i get errorsThe errors are related with the use of deprecated APIs.Kindly suggest me the right APIs and also instructions how to handle this situation with other code.. package lia.indexing; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexReader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.SimpleAnalyzer; import junit.framework.TestCase; import java.io.IOException; /** * */ public abstract class BaseIndexingTestCase extends TestCase { protected String[] keywords = {"1", "2"}; protected String[] unindexed = {"Netherlands", "Italy"}; protected String[] unstored = {"Amsterdam has lots of bridges", "Venice has lots of canals"}; protected String[] text = {"Amsterdam", "Venice"}; protected Directory dir; protected void setUp() throws IOException { String indexDir = System.getProperty("java.io.tmpdir", "tmp") + System.getProperty("file.separator") + "index-dir"; dir = FSDirectory.getDirectory(indexDir, true); addDocuments(dir); } protected void addDocuments(Directory dir) throws IOException { IndexWriter writer = new IndexWriter(dir, getAnalyzer(), true); writer.setUseCompoundFile(isCompound()); for (int i = 0; i < keywords.length; i++) { Document doc = new Document(); doc.add(Field.Keyword("id", keywords[i])); doc.add(Field.UnIndexed("country", unindexed[i])); doc.add(Field.UnStored("contents", unstored[i])); doc.add(Field.Text("city", text[i])); writer.addDocument(doc); } writer.optimize(); writer.close(); } protected Analyzer getAnalyzer() { return new SimpleAnalyzer(); } protected boolean isCompound() { return true; } public void testIndexWriter() throws IOException { IndexWriter writer = new IndexWriter(dir, getAnalyzer(), false); assertEquals(keywords.length, writer.docCount()); writer.close(); } public void testIndexReader() throws IOException { IndexReader reader = IndexReader.open(dir); assertEquals(keywords.length, reader.maxDoc()); assertEquals(keywords.length, reader.numDocs()); reader.close(); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
how to kill IndexSearcher object after every search
Hi All, Is there any possibility to kill the IndexSearcher Object after every search. -- View this message in context: http://www.nabble.com/how-to-kill-IndexSearcher-object-after-every-search-tf4897436.html#a14026451 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Compute the co-occurence beteen a phrase and a word
You run your SpanQuery, and get back the Spans. From there, you need to load the document (either by reanalyzing the tokens or by using Term Vectors) and then you just have to setup your window around the position match. Unfortunately, I don't think there is a better way in Lucene to get those terms in a window around a given position. You might be able to if you altered Lucene to support moving both forward and backwards over positions, but I am not sure how difficult this is to do w/o looking more into it (and it isn't high on my list at the moment.) Also, I adhere to Hoss' philosophy on private email: http://people.apache.org/~hossman/#private_q -Grant On Nov 28, 2007, at 10:46 AM, bigdoginuk wrote: Hi, thanks for the reply. But can anyone give me some more hints? I have checked SpanQuery, but still haven't found out a solution. Thanks. Grant Ingersoll-6 wrote: Have a look at SpanQuery and it's derivatives. You will need to do some post-processing as well. -Grant On Nov 28, 2007, at 6:41 AM, bigdoginuk wrote: Hi all, I want to compute the co-occurence frequency between a word and a phrase( this phrase contains some words, and the words in it should be successive and in order). It's like an NEAR operation (like setting slop at 3...) Does anyone know how to implement this? Thanks in advance. Rooney -- View this message in context: http://www.nabble.com/Compute-the-co-occurence-beteen-a-phrase-and-a-word-tf4887952.html#a13990651 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/Compute-the-co-occurence-beteen-a-phrase-and-a-word-tf4887952.html#a13995126 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Closing index searchers ...
Hi, My application needs to close/open the index searcher periodically so that newly added documents are visible. Is there a way to determine if there are any pending searches running against an index searcher or do I have to do my own reference counting? Thank you. _ You keep typing, we keep giving. Download Messenger and join the i’m Initiative now. http://im.live.com/messenger/im/home/?source=TAGLM
Re: Closing index searchers ...
I had the same issue, and end up doing my own reference counting using "acquire/release" strategy. I used a single instance per searcher, every "acquire" counts +1 and every "release" count -1, when a index is switched it receives a "dispose" signal, then the release checks if there are processing instances, if all releases were made then the last release closes the searcher. The interface looked like this: public interface Acquirable { public R acquire(); public void release(); public boolean isAcquired(); public boolean dispose(); } In my implementation, I use a threadlocal to attach the searcher's referenced instance (although it's a single instance per index switch). Hope it helps... German-K On Nov 29, 2007 12:15 PM, Dragon Fly <[EMAIL PROTECTED]> wrote: > > Hi, > > My application needs to close/open the index searcher periodically so that > newly added documents are visible. Is there a way to determine if there are > any pending searches running against an index searcher or do I have to do my > own reference counting? Thank you. > > _ > You keep typing, we keep giving. Download Messenger and join the i'm > Initiative now. > http://im.live.com/messenger/im/home/?source=TAGLM - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how to kill IndexSearcher object after every search
Yes, you just call "close()" method. But, why would you like to do that? The performance tips remarks exactly the opposite, keeping it alive as long as possible favors internal lucene's caching of terms, query and other internal objects. On Nov 29, 2007 11:14 AM, Sebastin <[EMAIL PROTECTED]> wrote: > > Hi All, >Is there any possibility to kill the IndexSearcher Object after every > search. > -- > View this message in context: > http://www.nabble.com/how-to-kill-IndexSearcher-object-after-every-search-tf4897436.html#a14026451 > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Error with lucene-core-2.2.0.jar
I'm confused about what's going on here. Could you post the raw java code that produces this error? Best Erick On Nov 29, 2007 5:32 AM, ing.sashaa <[EMAIL PROTECTED]> wrote: > Hi all, > I'm using a program that use the Lucene Library. I've downloaded > lucene-core-2.2.0.jar file and I'm trying it, but I get this error while > trying to index my documents: > > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.lucene.document.Document.add > (Lorg/apache/lucene/document/Field;)V > > I've searched among old threads and I think that the error is caused by a > wrong version of library. So, which one must I use? > > Thanks. > Ale > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
> Do you have another PPC machine to reproduce this on? (To rule out > bad RAM/hard-drive on the first one). I'll dig up an old laptop and try it there. > Another thing to try is turning on the infoStream > (IndexWriter.setInfoStream(...)) and capture & post the resulting log. > It will be very large since it takes quite a while for the error to > occur... I can do that. > Lucene is endian independent: all writes to files boil eventually down > to a writeByte/writeBytes calls in o.a.l.store.IndexOutput such that > the ordering is controlled by Lucene, not the underlying CPU > architecture. I was actually thinking about the implementation of the bitstrings, rather than data storage proper. > Are you using term vectors, > stored fields, payloads on any of these? Stored fields. I store a document ID (a 23-character string) for each. Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
> > Another thing to try is turning on the infoStream > > (IndexWriter.setInfoStream(...)) and capture & post the resulting log. > > It will be very large since it takes quite a while for the error to > > occur... > > I can do that. Here's what I see: Optimizing... merging segments _ram_a (1 docs) _ram_b (1 docs) _ram_c (1 docs) _ram_d (1 docs) _ram_e (1 docs) _ram_f (1 docs) _ram_g (1 docs) _ram_h (1 docs) _ram_i (1 docs) into _1va (9 docs) [EMAIL PROTECTED] main: now checkpoint "segments_3ql" [isCommit = true] [EMAIL PROTECTED] main: IncRef "_1v4.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v4_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v5.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v5_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v6.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v6_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v7.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v7_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v8.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v8_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v9.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1va.fnm": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1va.fdx": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1va.fdt": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1va.tii": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1va.tis": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1va.frq": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1va.prx": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1va.nrm": pre-incr count is 0 [EMAIL PROTECTED] main: deleteCommits: now remove commit "segments_3qk" [EMAIL PROTECTED] main: DecRef "_1v4.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v4_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v5.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v5_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v6.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v6_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v7.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v7_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v8.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v8_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v9.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "segments_3qk": pre-decr count is 1 [EMAIL PROTECTED] main: delete "segments_3qk" [EMAIL PROTECTED] main: now checkpoint "segments_3qm" [isCommit = true] [EMAIL PROTECTED] main: IncRef "_1v4.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v4_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v5.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v5_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v6.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v6_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v7.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v7_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v8.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v8_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v9.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1va.cfs": pre-incr count is 0 [EMAIL PROTECTED] main: deleteCommits: now remove commit "segments_3ql" [EMAIL PROTECTED] main: DecRef "_1v4.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v4_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v5.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v5_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v6.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v6_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v7.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v7_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v8.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v8_1.del": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1v9.cfs": pre-decr count is 2 [EMAIL PROTECTED] main: DecRef "_1va.fnm": pre-decr count is 1 [EMAIL PROTECTED] main: delete "_1va.fnm" [EMAIL PROTECTED] main: DecRef "_1va.fdx": pre-decr count is 1 [EMAIL PROTECTED] main: delete "_1va.fdx" [EMAIL PROTECTED] main: DecRef "_1va.fdt": pre-decr count is 1 [EMAIL PROTECTED] main: delete "_1va.fdt" [EMAIL PROTECTED] main: DecRef "_1va.tii": pre-decr count is 1 [EMAIL PROTECTED] main: delete "_1va.tii" [EMAIL PROTECTED] main: DecRef "_1va.tis": pre-decr count is 1 [EMAIL PROTECTED] main: delete "_1va.tis" [EMAIL PROTECTED] main: DecRef "_1va.frq": pre-decr count is 1 [EMAIL PROTECTED] main: delete "_1va.frq" [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
> > Another thing to try is turning on the infoStream > > (IndexWriter.setInfoStream(...)) and capture & post the resulting log. > > It will be very large since it takes quite a while for the error to > > occur... > > I can do that. Here's a more complete dump. I've modified the code so that I now remove any existing versions of the document before re-indexing it and its pages. Bill /Library/Java/Home/bin/java '-Dcom.parc.uplib.indexing.debugMode=true' '-Dcom.parc.uplib.indexing.indexProperties=contents:title:categories$,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email-message-id*:email-guid*:email-subject:email-from-name:email-from-address*:email-attachment-to*:email-thread-index*:email-references$,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music-genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*' -classpath "/local/uplib/share/UpLib-1.7/code/lucene-core-2.2.0.jar:/local/uplib/share/UpLib-1.7/code/LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=2 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index" update /local/janssen-uplib/docs 01174-15-2815-270 01174-15-2552-042 01173-98-5675-575 01173-98-4457-188 01173-83-8266-533 01173-80-8759-205 updating doc_root_dir is /local/janssen-uplib/docs Working on document /local/janssen-uplib/docs/01174-15-2815-270 Adding header 'apparent-mime-type' I to 01174-15-2815-270 Adding header 'authors' IT to 01174-15-2815-270 Adding header 'categories' I (article) to 01174-15-2815-270 Adding header 'date' I (20070317) to 01174-15-2815-270 Adding header 'sha-hash' I to 01174-15-2815-270 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (3566): human nature Full-Mental Nudit page 1 (3100): I know what you're thinking: W Using charset utf8 for contents.txt Using language en for contents.txt Added 01174-15-2815-270 (3 versions) Working on document /local/janssen-uplib/docs/01174-15-2552-042 Adding header 'abstract' IT to 01174-15-2552-042 Adding header 'apparent-mime-type' I to 01174-15-2552-042 Adding header 'categories' I (photo) to 01174-15-2552-042 Adding header 'date' I (20070316) to 01174-15-2552-042 Adding header 'sha-hash' I to 01174-15-2552-042 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Added 01174-15-2552-042 (1 versions) Working on document /local/janssen-uplib/docs/01173-98-5675-575 Adding header 'apparent-mime-type' I to 01173-98-5675-575 Adding header 'authors' IT to 01173-98-5675-575 Adding header 'categories' I (article) to 01173-98-5675-575 Adding header 'categories' I (medical) to 01173-98-5675-575 Adding header 'date' I (20070313) to 01173-98-5675-575 Adding header 'sha-hash' I to 01173-98-5675-575 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (2730): March 13, 2007 DOW JONES REPRI page 1 (4445): But just how far -- and how fa page 2 (2638): "We don't sell snow tires," sa page 3 (981): A spokeswoman for Rite Aid say Using charset utf8 for contents.txt Using language en for contents.txt Added 01173-98-5675-575 (5 versions) Working on document /local/janssen-uplib/docs/01173-98-4457-188 Adding header 'apparent-mime-type' I to 01173-98-4457-188 Adding header 'authors' IT to 01173-98-4457-188 Adding header 'categories' I (article) to 01173-98-4457-188 Adding header 'date' I (19911006) to 01173-98-4457-188 Adding header 'sha-hash' I to 01173-98-4457-188 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (2897): The Economics of the Colonial merging segments _ram_0 (1 docs) _ram_1 (1 docs) _ram_2 (1 docs) _ram_3 (1 docs) _ram_4 (1 docs) _ram_5 (1 docs) _ram_6 (1 docs) _ram_7 (1 docs) _ram_8 (1 docs) _ram_9 (1 docs) into _1v9 (10 docs) flush 6 buffered deleted terms on 6 segments. [EMAIL PROTECTED] main: now checkpoint "segments_3qj" [isCommit = true] [EMAIL PROTECTED] main: IncRef "_1v4.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v4_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v5.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v5_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v6.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v6_1.del": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v7.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v7_1.del": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1v8.cfs": pre-incr count is 1 [EMAIL PROTECTED] main: IncRef "_1v8_1.del": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1v9.fnm": pre-incr count is 0 [EMAIL PROTECTED] main: IncRef "_1v9
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
> Can you try running with the trunk version of Lucene (2.3-dev) and see > if the error still occurs? EG you can download this AM's build here: > > > http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/288/artifact/artifacts Still there. Here's the dump with last night's build: /Library/Java/Home/bin/java '-Dcom.parc.uplib.indexing.debugMode=true' '-Dcom.parc.uplib.indexing.indexProperties=contents:title:categories$,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email-message-id*:email-guid*:email-subject:email-from-name:email-from-address*:email-attachment-to*:email-thread-index*:email-references$,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music-genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*' -classpath "/local/uplib/share/UpLib-1.7/code/lucene-core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=2 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index" update /local/janssen-uplib/docs 01179-00-0750-547 01178-90-9186-558 01178-81-4212-772 01178-81-3305-217 01178-73-1029-141 01178-72-8365-803 updating doc_root_dir is /local/janssen-uplib/docs IFD [main]: setInfoStream [EMAIL PROTECTED] IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.FSDirectory@/local/janssen-uplib/index autoCommit=true [EMAIL PROTECTED] [EMAIL PROTECTED] ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=1 index=_21:c19686 _22:c92 IW 0 [main]: setMaxFieldLength 2147483647 Working on document /local/janssen-uplib/docs/01179-00-0750-547 Adding header 'abstract' IT to 01179-00-0750-547 Adding header 'apparent-mime-type' I to 01179-00-0750-547 Adding header 'authors' IT to 01179-00-0750-547 Adding header 'categories' I (ebooks) to 01179-00-0750-547 Adding header 'categories' I (economics) to 01179-00-0750-547 Adding header 'categories' I (paper) to 01179-00-0750-547 Adding header 'citation' I to 01179-00-0750-547 Adding header 'date' I (20070128) to 01179-00-0750-547 Adding header 'sha-hash' I to 01179-00-0750-547 Adding header 'title' IT (Heterogeneity in Price Stickiness and the Real Effects of Monetary Shocks) to 01179-00-0750-547 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (2181): Heterogeneity in Price Stickin page 1 (2927): 1 Introduction There is ample page 2 (3135): In the presence of strategic c page 3 (3128): Motivated by those questions, page 4 (3214): ploring the tractability of th page 5 (2491): model with Taylor staggered wa page 6 (1548): real rigidities (Ball and Rome page 7 (3098): 2.2 Calibrating the sectoral d page 8 (1913): distribution of price stickine page 9 (1952): reported in Table 1. Hencefort page 10 (1635): Figure 2 presents analogous re page 11 (1743): In the absence of strategic co page 12 (2806): Corollary 1 For an arbitrary h page 13 (2380): 2.4.2 Growth rate shocks In th page 14 (2962): price changes. With heterogene page 15 (3265): ties and heterogeneity in the page 16 (1962): complementarities. The results page 17 (751): to the response of the heterog page 18 (489): economies are embedded into th page 19 (3295): 2.6 Fitting IRFs with an ident page 20 (2066): Table 3a: Best-Fitting Duratio page 21 (2444): This is an important step beca page 22 (1976): where ? is the discount factor page 23 (1183): Et "? Ct+1 Ct ¦?? It Pt Pt+1 # page 24 (2188): can be rewritten as: Pk,t = £ page 25 (1370): pt = Z 1 0 f (k) pk,tdk, (10) page 26 (3269): Heterogeneity in price stickin page 27 (3117): Irrespective of the net effect page 28 (2084): set of parameters involve high page 29 (575): 0 5 10 15 20 25 30 35 40 0 x 1 page 30 (2185): output and falling prices in a page 31 (2358): price changes that minimizes t page 32 (2689): These results are fully consis page 33 (3600): different sources of real rigi page 34 (3168): work in a model with heterogen page 35 (2557): single equation estimation of page 36 (1326): Taking the limit as Æ ? 0 in e page 37 (1796): The output gap is constant at page 38 (1066): The corresponding path for the page 39 (1347): 4) Proof of Corollaries 1 and page 40 (2421): Therefore, for ? Å 0, the expe page 41 (1343): p (t) = Z 1 0 f (k) ? ?? ?? R page 42 (2117): As ? ? 0, this clearly converg page 43 (1375): model around the zero inflatio page 44 (1497): pt = Z 1 0 f (k) pk,tdk, yt = page 45 (1128): Table A.3: Best-Fitting Durati page 46 (898): Multiplying by f (k) ?k and in page 47 (1072): Now, from (23): ?kxk,t = pk,t page 48 (268): Finally, let ¹t ? pt ? pt?1 de page 49 (1694): References [1] Altissimo, F
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
Are you still getting the original exception too or just the Array out of bounds one now? Also, are you doing anything else to the index while this is happening? The code at the point in the exception below is trying to properly handle deleted documents. -Grant On Nov 29, 2007, at 1:34 PM, Bill Janssen wrote: Can you try running with the trunk version of Lucene (2.3-dev) and see if the error still occurs? EG you can download this AM's build here: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/288/artifact/artifacts Still there. Here's the dump with last night's build: /Library/Java/Home/bin/java '- Dcom.parc.uplib.indexing.debugMode=true' '- Dcom.parc.uplib.indexing.indexProperties=contents:title:categories $,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email- message-id*:email-guid*:email-subject:email-from-name:email-from- address*:email-attachment-to*:email-thread-index*:email-references $,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music- genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*' - classpath "/local/uplib/share/UpLib-1.7/code/lucene- core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/ LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=2 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index" update /local/janssen-uplib/docs 01179-00-0750-547 01178-90-9186-558 01178-81-4212-772 01178-81-3305-217 01178-73-1029-141 01178-72-8365-803 updating doc_root_dir is /local/janssen-uplib/docs IFD [main]: setInfoStream deletionPolicy [EMAIL PROTECTED] IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.FSDirectory@/ local/janssen-uplib/index autoCommit=true [EMAIL PROTECTED] mergeScheduler [EMAIL PROTECTED] ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=1 index=_21:c19686 _22:c92 IW 0 [main]: setMaxFieldLength 2147483647 Working on document /local/janssen-uplib/docs/01179-00-0750-547 Adding header 'abstract' IT to 01179-00-0750-547 Adding header 'apparent-mime-type' I to 01179-00-0750-547 Adding header 'authors' IT to 01179-00-0750-547 Adding header 'categories' I (ebooks) to 01179-00-0750-547 Adding header 'categories' I (economics) to 01179-00-0750-547 Adding header 'categories' I (paper) to 01179-00-0750-547 Adding header 'citation' I to 01179-00-0750-547 Adding header 'date' I (20070128) to 01179-00-0750-547 Adding header 'sha-hash' I to 01179-00-0750-547 Adding header 'title' IT (Heterogeneity in Price Stickiness and the Real Effects of Monetary Shocks) to 01179-00-0750-547 Created empty doc Document01179-00-0750-547> stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (2181): Heterogeneity in Price Stickin page 1 (2927): 1 Introduction There is ample page 2 (3135): In the presence of strategic c page 3 (3128): Motivated by those questions, page 4 (3214): ploring the tractability of th page 5 (2491): model with Taylor staggered wa page 6 (1548): real rigidities (Ball and Rome page 7 (3098): 2.2 Calibrating the sectoral d page 8 (1913): distribution of price stickine page 9 (1952): reported in Table 1. Hencefort page 10 (1635): Figure 2 presents analogous re page 11 (1743): In the absence of strategic co page 12 (2806): Corollary 1 For an arbitrary h page 13 (2380): 2.4.2 Growth rate shocks In th page 14 (2962): price changes. With heterogene page 15 (3265): ties and heterogeneity in the page 16 (1962): complementarities. The results page 17 (751): to the response of the heterog page 18 (489): economies are embedded into th page 19 (3295): 2.6 Fitting IRFs with an ident page 20 (2066): Table 3a: Best-Fitting Duratio page 21 (2444): This is an important step beca page 22 (1976): where ? is the discount factor page 23 (1183): Et "? Ct+1 Ct ¦?? It Pt Pt+1 # page 24 (2188): can be rewritten as: Pk,t = £ page 25 (1370): pt = Z 1 0 f (k) pk,tdk, (10) page 26 (3269): Heterogeneity in price stickin page 27 (3117): Irrespective of the net effect page 28 (2084): set of parameters involve high page 29 (575): 0 5 10 15 20 25 30 35 40 0 x 1 page 30 (2185): output and falling prices in a page 31 (2358): price changes that minimizes t page 32 (2689): These results are fully consis page 33 (3600): different sources of real rigi page 34 (3168): work in a model with heterogen page 35 (2557): single equation estimation of page 36 (1326): Taking the limit as Æ ? 0 in e page 37 (1796): The output gap is constant at page 38 (1066): The corresponding path for the page 39 (1347): 4) Proof of Corollaries 1 and page 40 (2421): Therefore, for ? Å 0, the expe page 41 (1343): p (t) = Z 1 0 f (k) ? ?? ?? R page 42 (2117): As ? ? 0, this clearly converg page 43 (1375): mode
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
> Are you still getting the original exception too or just the Array out =20= > > of bounds one now? Also, are you doing anything else to the index =20 > while this is happening? The code at the point in the exception below =20= > > is trying to properly handle deleted documents. Just the array-out-of-bounds one, now. The current version of the code creates a writer, then deletes all old Lucene 'Document' instances belonging to the specified UpLib doc-id, using that writer, then re-indexes that UpLib doc-id (which consists of one-to-N Lucene 'Document's). After doing the six UpLib documents, it calls optimize(). I'm going back to the old code now. It uses the 2.0 APIs, so it uses an IndexReader to delete the existing instances, then closes that reader (which if I understand it properly should flush the index back to disk), then creates a new writer to re-index the same documents, then does the optimize with that writer, which is where the CorruptIndexException started coming up. I'm going to run that again with 2.0, then with last night's build. I'm not sure if the success with 2.0 meant that a corrupted index wasn't being detected, or if it wasn't being corrupted in the first place. Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
"Bill Janssen" <[EMAIL PROTECTED]> wrote: > Here's the dump with last night's build: Those logs look healthy up until the exception. One odd thing is when you instantiate your writer, your index has 2 segments in it. I expected only 1 since each time you visit your index you leave it optimized. (Or, maybe you're not calling setInfoStream immediately after opening the writer?). The error still only happens on the one PPC machine, even after upgrading to trunk? EG not on an Intel box? Have you tried another PPC machine? > I'm going back to the old code now. It uses the 2.0 APIs, so it > uses an IndexReader to delete the existing instances, then closes > that reader (which if I understand it properly should flush the > index back to disk), then creates a new writer to re-index the same > documents, then does the optimize with that writer, which is where > the CorruptIndexException started coming up. I'm going to run that > again with 2.0, then with last night's build. Could you post this part of the code (deleting) too? > I'm not sure if the success with 2.0 meant that a corrupted index > wasn't being detected, or if it wasn't being corrupted in the first > place. Likely the corruption really isn't happening. That particular check for "docs out of order" is present in 2.0 as well. Is it possible to whittle down your test to a smaller set of documents? EG if you only re-index one document at a time, does the exception happen sooner? Ideally we can reduce this to a test I can reproduce then I can track it down... Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
On Nov 29, 2007, at 2:26 PM, Bill Janssen wrote: Are you still getting the original exception too or just the Array out =20= of bounds one now? Also, are you doing anything else to the index =20 while this is happening? The code at the point in the exception below =20= is trying to properly handle deleted documents. Just the array-out-of-bounds one, now. The current version of the code creates a writer, then deletes all old Lucene 'Document' instances belonging to the specified UpLib doc-id, using that writer, then re-indexes that UpLib doc-id (which consists of one-to-N Lucene 'Document's). After doing the six UpLib documents, it calls optimize(). I'm curious what happens if you call optimize after doing the deletion but before the re-indexing. Also, could you try out the CheckIndex tool in 2.3-dev before and after the deletes? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Boost One Term Query
Boosting a one term query does not have an affect on the score. For example: apple Has the same score as: apple^3 But repeating the term will up the score apple apple apple I expected the score to go up when boosting a one term query. Is that a wrong expectation? Thanks! -- View this message in context: http://www.nabble.com/Boost-One-Term-Query-tf4900128.html#a14035572 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
> Could you post this part of the code (deleting) too? Here it is: private static void remove (File index_file, String[] doc_ids, int start) { String number; String list; Term term; TermDocs matches; if (debug_mode) System.err.println("index file is " + index_file + " and it " + (index_file.exists() ? "exists." : "does not exist.")); try { if (index_file.exists() && (doc_ids.length > start)) { IndexReader reader = IndexReader.open(index_file); try { for (int i = start; i < doc_ids.length; i++) { term = new Term("id", doc_ids[i]); int deleted = reader.deleteDocuments(term); System.out.println("Deleted " + deleted + " existing instances of " + doc_ids[i]); } } finally { reader.close(); } } } catch (Exception e) { if (debug_mode) { e.printStackTrace(System.err); } else { System.out.println("* LuceneIndexing 'remove' raised " + e.getClass() + " with message " + e.getMessage()); System.err.println("LuceneIndexing 'remove': caught a " + e.getClass() + "\n with message: " + e.getMessage()); System.out.flush(); } System.exit(JAVA_EXCEPTION); } System.out.flush(); } private static void update (File index_file, File doc_root_dir, String[] ids, int start) { ExtractIndexingInfo.DocumentIterator docit; String number; remove (index_file, ids, start); try { // Now add the documents to the index IndexWriter writer = new IndexWriter(index_file, new StandardAnalyzer(), !index_file.exists()); if (debug_mode) writer.setInfoStream(System.err); writer.setMaxFieldLength(Integer.MAX_VALUE); try { for (int i = start; i < ids.length; i ++) { docit = build_document_iterator(doc_root_dir, ids[i]); int count = 0; while (docit.hasNext()) { writer.addDocument((Document)(docit.next())); count += 1; } System.out.println("Added " + docit.id + " (" + count + " versions)"); System.out.flush(); } } finally { // And close the index System.out.println("Optimizing..."); // See http://www.gossamer-threads.com/lists/lucene/java-dev/47895 about optimize // Can fail if low on disk space writer.optimize(); writer.close(); } } catch (Exception e) { if (debug_mode) { e.printStackTrace(System.err); } else { System.out.println("* Lucene search engine raised " + e.getClass() + " with message " + e.getMessage()); System.err.println(" 'update' caught a " + e.getClass() + "\n with message: " + e.getMessage()); System.out.flush(); } System.exit(JAVA_EXCEPTION); } System.out.flush(); } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
> Have you tried another PPC machine? No. It's in another location, but perhaps I can get it tomorrow. On the other hand, the success when using 2.0 makes it likely to me that the machine isn't the problem. OK, I've reverted to my original codebase (where I first create a reader and do the deletions, then create a writer and do the additions and optimize), and it works fine with lucene-core-2.0.0, but fails with lucene-core-2.3.-whatever (last night's build). Here's the dump: indexing with /Library/Java/Home/bin/java -Dcom.parc.uplib.indexing.debugMode=true "-Dcom.parc.uplib.indexing.indexProperties=contents:title:categories$,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email-message-id*:email-guid*:email-subject:email-from-name:email-from-address*:email-attachment-to*:email-thread-index*:email-references$,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music-genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*" -classpath "/local/uplib/share/UpLib-1.7/code/lucene-core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=2 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index" update /local/janssen-uplib/docs 01160-06-3246-773 01159-97-2914-663 01159-89-7507-719 01159-89-5614-073 01159-89-1159-244 01159-89-0665-499 thr001: acquiring lock: LuceneIndex... thr001: acquired lock: LuceneIndex* thr001: releasing lock: LuceneIndex* thr001: indexing output is stored/uncompressed,indexed stored/uncompressed,indexed> Added 01160-06-3246-773 (1 versions) Working on document /local/janssen-uplib/docs/01159-97-2914-663 Adding header 'apparent-mime-type' I to 01159-97-2914-663 Adding header 'authors' IT to 01159-97-2914-663 Adding header 'categories' I (cartoon) to 01159-97-2914-663 Adding header 'date' I (19951004) to 01159-97-2914-663 Adding header 'sha-hash' I to 01159-97-2914-663 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Added 01159-97-2914-663 (1 versions) Working on document /local/janssen-uplib/docs/01159-89-7507-719 Adding header 'apparent-mime-type' I to 01159-89-7507-719 Adding header 'sha-hash' I to 01159-89-7507-719 Adding header 'title' IT (Photoshop Metal Texture) to 01159-89-7507-719 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (580): Tutorials\xa5 News\xa5 Exclusives\xa5 S page 1 (1680): On a new layer create a gradie page 2 (1118): Scrapes and scratches are irre page 3 (470): Bevel Settings\xa5 Contour Settin Using charset utf8 for contents.txt Using language en for contents.txt Added 01159-89-7507-719 (5 versions) Working on document /local/janssen-uplib/docs/01159-89-5614-073 Adding header 'apparent-mime-type' I to 01159-89-5614-073 Adding header 'sha-hash' I to 01159-89-5614-073 Adding header 'title' IT (Creating Virtual Mats and Frames with The GIMP) to 01159-89-5614-073 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (663): All photographs and articles o page 1 (600): Although real mats and frames page 2 (999): The Procedure First of all you page 3 (615): Run the Add Mat script (Script page 4 (693): in the GIMP toolbox Pattern: U page 5 (703): 3D lighted/shaded appearance. page 6 (719): Bevel Fill Color, pops up a di page 7 (714): texture afterwards. Default: o page 8 (797): recommended, especially if you page 9 (461): moving outwards, as in adding page 10 (67): 11 Creating Virtual Mats and F page 11 (67): 12 Creating Virtual Mats and F page 12 (378): Time to add a frame. Run Scrip page 13 (498): in Frame Fill Color FG color: page 14 (717): and background colors, not in page 15 (685): the pattern to for texturing t page 16 (721): added along the inner boundary page 17 (1006): leave a selection in place cov page 18 (904): A drop shadow on the entire fr page 19 (629): threshold sliders to the right page 20 (901): Bump Map" and fill it with whi page 21 (786): image window, do a Select All page 22 (393): In the Layers dialog, choose t page 23 (937): "Keep Trans." option near the page 24 (239): Last modified: Mon May 9 23:36 Using charset utf8 for contents.txt Using language en for contents.txt Added 01159-89-5614-073 (26 versions) Working on document /local/janssen-uplib/docs/01159-89-1159-244 Adding header 'apparent-mime-type' I to 01159-89-1159-244 Adding header 'authors' IT to 01159-89-1159-244 Adding header 'categories' I (ebooks) to 01159-89-1159-244 Adding header 'categories' I (article) to 01159-89-1159-244 Adding header 'date' I (20050100) to 01159-89-1159-244 Adding header 's
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
> Also, could you try out the CheckIndex tool in 2.3-dev before and > after the deletes? Great idea! I don't suppose there's a jar file of it? Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
So, it's a little clearer. I get the Array-out-of-bounds exception if I'm re-indexing some already indexed documents -- if there are deletions involved. I get the CorruptIndexException if I'm indexing freshly -- no deletions. Here's an example of that (with the latest nightly). I removed the existing index, then reindexed the collection six UpLib docs at a time, till I hit the corruption. Bill /Library/Java/Home/bin/java -Dcom.parc.uplib.indexing.debugMode=true "-Dcom.parc.uplib.indexing.indexProperties=contents:title:categories$,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email-message-id*:email-guid*:email-subject:email-from-name:email-from-address*:email-attachment-to*:email-thread-index*:email-references$,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music-genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*" -classpath "/local/uplib/share/UpLib-1.7/code/lucene-core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=2 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index" update /local/janssen-uplib/docs 01113-86-6099-767 01113-86-5485-936 01113-86-0975-795 01113-62-2881-882 01113-44-7730-580 01113-44-7684-477 thr002: acquiring lock: LuceneIndex... thr002: acquired lock: LuceneIndex* thr002: releasing lock: LuceneIndex* thr002: indexing output is stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (4219): Question: My chives have grown Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-86-6099-767 (2 versions) Working on document /local/janssen-uplib/docs/01113-86-5485-936 Adding header 'abstract' IT to 01113-86-5485-936 Adding header 'apparent-mime-type' I to 01113-86-5485-936 Adding header 'authors' IT to 01113-86-5485-936 Adding header 'categories' I (paper) to 01113-86-5485-936 Adding header 'categories' I (sensepad) to 01113-86-5485-936 Adding header 'citation' I to 01113-86-5485-936 Adding header 'date' I (20040524) to 01113-86-5485-936 Adding header 'sha-hash' I to 01113-86-5485-936 Adding header 'title' IT (Designing Interaction, not Interfaces) to 01113-86-5485-936 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (3855): Designing Interaction, not Int page 1 (5688): Figure 1. Interaction as a phe page 2 (5831): Interaction models can be eval page 3 (5770): Reification turns concepts and page 4 (5558): Figure 6. A mock-up of the DPI page 5 (5963): In joint work with Yves Guiard page 6 (6819): I propose making interactions page 7 (5622): Graphical Application. Proc. A Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-86-5485-936 (9 versions) Working on document /local/janssen-uplib/docs/01113-86-0975-795 Adding header 'apparent-mime-type' I to 01113-86-0975-795 Adding header 'categories' I (article) to 01113-86-0975-795 Adding header 'date' I (20050414) to 01113-86-0975-795 Adding header 'sha-hash' I to 01113-86-0975-795 Adding header 'source' IT to 01113-86-0975-795 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (1851): About sponsorship Simplifying page 1 (2900): Latvia and Lithuania, Estonia' page 2 (3088): How much fairness is gained fo page 3 (5317): At the time of its reform, Est page 4 (1101): In part, the tax system is bur Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-86-0975-795 (6 versions) Working on document /local/janssen-uplib/docs/01113-62-2881-882 Adding header 'apparent-mime-type' I to 01113-62-2881-882 Adding header 'categories' I (article) to 01113-62-2881-882 Adding header 'date' I (20050328) to 01113-62-2881-882 Adding header 'keywords' I (neuroeconomics) to 01113-62-2881-882 Adding header 'sha-hash' I to 01113-62-2881-882 Adding header 'title' IT (Neuroeconomics: Why Logic Often Takes a Backseat) to 01113-62-2881-882 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (2957): Close Window MARCH 28, 2005 EC page 1 (3856): these attacks on rationality ? page 2 (484): Even believers in neuroeconomi Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-62-2881-882 (4 versions) Working on document /local/janssen-uplib/docs/01113-44-7730-580 Adding header 'apparent-mime-type' I to 01113-44-7730-580 Adding header 'categories' I (flowport) to 01113-44-7730-580 Adding header 'categories' I (receipt) to 01113-44-7730-580
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
This is in the nightly JAR. It's o.a.l.index.CheckIndex (it defines a static main). Mike "Bill Janssen" <[EMAIL PROTECTED]> wrote: > > Also, could you try out the CheckIndex tool in 2.3-dev before and > > after the deletes? > > Great idea! I don't suppose there's a jar file of it? > > Bill > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
Just a theory (make that a guess), Mike, but is it possible that the one merge scheduler is hitting a synchronization issue with the deletedDocuments bit vector? That is one thread is cleaning it up and the other is accessing and they aren't synchronizing their access? This doesn't explain the original problem, but maybe this one? On Nov 29, 2007, at 4:46 PM, Bill Janssen wrote: Have you tried another PPC machine? No. It's in another location, but perhaps I can get it tomorrow. On the other hand, the success when using 2.0 makes it likely to me that the machine isn't the problem. OK, I've reverted to my original codebase (where I first create a reader and do the deletions, then create a writer and do the additions and optimize), and it works fine with lucene-core-2.0.0, but fails with lucene-core-2.3.-whatever (last night's build). Here's the dump: indexing with /Library/Java/Home/bin/java - Dcom.parc.uplib.indexing.debugMode=true "- Dcom.parc.uplib.indexing.indexProperties=contents:title:categories $,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email- message-id*:email-guid*:email-subject:email-from-name:email-from- address*:email-attachment-to*:email-thread-index*:email-references $,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music- genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*" - classpath "/local/uplib/share/UpLib-1.7/code/lucene- core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/ LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=2 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index" update /local/janssen-uplib/docs 01160-06-3246-773 01159-97-2914-663 01159-89-7507-719 01159-89-5614-073 01159-89-1159-244 01159-89-0665-499 thr001: acquiring lock: LuceneIndex... thr001: acquired lock: LuceneIndex* thr001: releasing lock: LuceneIndex* thr001: indexing output is IFD [main]: setInfoStream deletionPolicy [EMAIL PROTECTED] IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.FSDirectory@/ local/janssen-uplib/index autoCommit=true [EMAIL PROTECTED] mergeScheduler [EMAIL PROTECTED] ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=1 index=_4j:c19686 IW 0 [main]: setMaxFieldLength 2147483647 Working on document /local/janssen-uplib/docs/01160-06-3246-773 Adding header 'apparent-mime-type' I to 01160-06-3246-773 Adding header 'authors' IT to 01160-06-3246-773 Adding header 'categories' I (cartoon) to 01160-06-3246-773 Adding header 'date' I (19951005) to 01160-06-3246-773 Adding header 'sha-hash' I to 01160-06-3246-773 Created empty doc Document01160-06-3246-773> stored/uncompressed,indexed stored/uncompressed,indexed> Added 01160-06-3246-773 (1 versions) Working on document /local/janssen-uplib/docs/01159-97-2914-663 Adding header 'apparent-mime-type' I to 01159-97-2914-663 Adding header 'authors' IT to 01159-97-2914-663 Adding header 'categories' I (cartoon) to 01159-97-2914-663 Adding header 'date' I (19951004) to 01159-97-2914-663 Adding header 'sha-hash' I to 01159-97-2914-663 Created empty doc Document01159-97-2914-663> stored/uncompressed,indexed stored/uncompressed,indexed> Added 01159-97-2914-663 (1 versions) Working on document /local/janssen-uplib/docs/01159-89-7507-719 Adding header 'apparent-mime-type' I to 01159-89-7507-719 Adding header 'sha-hash' I to 01159-89-7507-719 Adding header 'title' IT (Photoshop Metal Texture) to 01159-89-7507-719 Created empty doc Document01159-89-7507-719> stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (580): Tutorials\xa5 News\xa5 Exclusives\xa5 S page 1 (1680): On a new layer create a gradie page 2 (1118): Scrapes and scratches are irre page 3 (470): Bevel Settings\xa5 Contour Settin Using charset utf8 for contents.txt Using language en for contents.txt Added 01159-89-7507-719 (5 versions) Working on document /local/janssen-uplib/docs/01159-89-5614-073 Adding header 'apparent-mime-type' I to 01159-89-5614-073 Adding header 'sha-hash' I to 01159-89-5614-073 Adding header 'title' IT (Creating Virtual Mats and Frames with The GIMP) to 01159-89-5614-073 Created empty doc Document01159-89-5614-073> stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (663): All photographs and articles o page 1 (600): Although real mats and frames page 2 (999): The Procedure First of all you page 3 (615): Run the Add Mat script (Script page 4 (693): in the GIMP toolbox Pattern: U page 5 (703): 3D lighted/shaded appearance. page 6 (719): Bevel Fill Color, pops up a di page 7 (714): texture afterwards. Default: o page 8 (797): recommended, especially if you page 9 (461): moving outwards, as in adding page 10 (67): 11 Creating Virtual Mats and F
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
"Bill Janssen" <[EMAIL PROTECTED]> wrote: > No. It's in another location, but perhaps I can get it tomorrow. > On the other hand, the success when using 2.0 makes it likely to me > that the machine isn't the problem. Yeah good point. Seems like a long shot (wishful thinking on my part!). Your errors seem to happen around the same area (~20K docs). If you skip the first say ~18K docs does the error still happen? We need to somehow narrow this down. Or is there any way I could get a temporary account to log into this box and try to track this down? (If indeed it doesn't happen on an x86 box -- I unfortunately don't have access to a PPC machine). Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
"Grant Ingersoll" <[EMAIL PROTECTED]> wrote: > Just a theory (make that a guess), Mike, but is it possible that the > one merge scheduler is hitting a synchronization issue with the > deletedDocuments bit vector? That is one thread is cleaning it up and > the other is accessing and they aren't synchronizing their access? Well, in trunk I think we are hitting the bit vector in synchronized contexts, correctly. (I sure think/hope so :). Also, in the context of merging, the deleted docs bit vector is read only. This sure does spookily sound like LUCENE-140!! I hope that one is not coming back from the dead! Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
I have PPC and Intel access if that helps. Just need a test case. On Nov 29, 2007, at 5:37 PM, Michael McCandless wrote: "Bill Janssen" <[EMAIL PROTECTED]> wrote: No. It's in another location, but perhaps I can get it tomorrow. On the other hand, the success when using 2.0 makes it likely to me that the machine isn't the problem. Yeah good point. Seems like a long shot (wishful thinking on my part!). Your errors seem to happen around the same area (~20K docs). If you skip the first say ~18K docs does the error still happen? We need to somehow narrow this down. Or is there any way I could get a temporary account to log into this box and try to track this down? (If indeed it doesn't happen on an x86 box -- I unfortunately don't have access to a PPC machine). Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: CorruptIndexException
Thank you, I indeed use newer version of Lucli by mistake. -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: Thursday, November 29, 2007 6:30 PM To: java-user@lucene.apache.org Subject: Re: CorruptIndexException That exception means your index was written with a newer version of Lucene than the version you are using to open the IndexReader. It looks like you used the unreleased (2.3 dev) version of Lucli from the Lucene trunk and then went back to an older Lucene JAR (maybe 2.2?) for accessing it? In general writing an index with a newer version of Lucene and then trying to access it using an older version of Lucene doesn't work (whereas the opposite does). I'm afraid you either have to switch to 2.3-dev for reading your index (but beware it could have sneaky bugs ...), or, rebuild your index with the 2.2 version of Lucene and use the 2.2 Lucli in the future. Mike "Melanie Langlois" <[EMAIL PROTECTED]> wrote: > Hi, > > > > I use Lucli to optimize my index, when my application was stopped. And > after restarting my application, I could not serahc my index anymore, I > got the following exception : > > > > org.apache.lucene.index.CorruptIndexException: Unknown format version: -4 > > at > org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:204) > > at > org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:190) > > at > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610) > > at org.apache.lucene.index.IndexReader.open(IndexReader.java:185) > > at org.apache.lucene.index.IndexReader.open(IndexReader.java:167) > > > > I have two questions: > > -why does it occurs ? Should I use another tool to access the index > outside of my application ? > > -do there is way to recover ? > > > > > > Thanks, > > > > Mélanie > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]