Re: ClassCastException when writing to index writer

2008-10-03 Thread Edwin Lee
i think, very likely, you have another copy of java.util.Vector loaded, and this one tries to be too clever with its implementation of clone (instantiate a new Vector instance) instead of delegating to its super class (Object). HTH, Edwin --- Chris Hostetter <[EMAIL PROTECTED]> wrote: > > :

Re: ClassCastException when writing to index writer

2008-10-03 Thread Edwin Lee
Hi Paul, The clone() in SegmentInfos is correct. The best practice of clone is to delegate the clone to the super class (if you look at the source code for Vector, it too delegates to its super class, which is the Object) to create a shallow copy, and then do a cloning of each of its mutable field

Re: ClassCastException when writing to index writer

2008-10-03 Thread Chris Hostetter
:SegmentInfos sis = (SegmentInfos) super.clone(); : We see that it is trying to cast a Vector into SegmentInfos which explains : the ClassCastException. This is definitely a bug. That is in the correct and specified use of clone() ... note the javadocs for Object.clone() and the Clonea

Re: ClassCastException when writing to index writer

2008-10-03 Thread Paul Chan
I am using Sun's JRE 1.6.0_02 on Windows XP Actually...are you sure it would work? java.util.Vector has the following clone() method: public Object *clone*() { } I didn't think you can cast a base class (Vector) to its derived clas

Re: ClassCastException when writing to index writer

2008-10-03 Thread Michael McCandless
That's Sun's JRE? That should be fine, unless there's something seriously wrong with it's java.util.Vector implementation. But, this is an exceptionally strange exception. Maybe try a different version of the JRE? Any odd JARs on your CLASSPATH? What hardware/OS? Mike Paul Chan wrote

Re: ClassCastException when writing to index writer

2008-10-03 Thread Paul Chan
I am using Java 1.6.0_02. Is this a problem? On Fri, Oct 3, 2008 at 5:35 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Which Java environment are you running? > > super.clone() from SegmentInfos should produce a new SegmentInfos object. > > It seems like in your case it's somehow produc

Re: ClassCastException when writing to index writer

2008-10-03 Thread Michael McCandless
Which Java environment are you running? super.clone() from SegmentInfos should produce a new SegmentInfos object. It seems like in your case it's somehow producing a Vector instead? Mike Paul Chan wrote: Hi Mike, I am actually using the Compass Search Engine which in turn makes use of

Re: ClassCastException when writing to index writer

2008-10-03 Thread Paul Chan
Hi Mike, I am actually using the Compass Search Engine which in turn makes use of Lucene. They are doing the following in their code: IndexWriter indexWriter = new IndexWriter(dir, autoCommit, analyzer, create, deletionPolicy); where autoCommit = false. In turn, Lucene will do the foll

Re: ClassCastException when writing to index writer

2008-10-03 Thread Michael McCandless
Can you describe what led up to this exception? Ie, what calls you made to Lucene before this. Mike Paul Chan wrote: I think I know what the problem is looking at the code: In SegmentInfos.java (line 321): class SegmentInfos extends Vector { public Object clone() { SegmentInfos

Re: ClassCastException when writing to index writer

2008-10-03 Thread Paul Chan
I think I know what the problem is looking at the code: In SegmentInfos.java (line 321): class SegmentInfos extends Vector { public Object clone() { SegmentInfos sis = (SegmentInfos) super.clone(); for(int i=0;i wrote: > Hi, > > I am using lucene 2.3.2 and I encounter the follo

ClassCastException when writing to index writer

2008-10-03 Thread Paul Chan
Hi, I am using lucene 2.3.2 and I encounter the following exception when I try to insert a object into the index. Caused by: java.lang.ClassCastException: java.util.Vector cannot be cast to org.apache.lucene.index.SegmentInfos at org.apache.lucene.index.SegmentInfos.clone(SegmentInfos.java:321) a

Re: Extracting Dates

2008-10-03 Thread Otis Gospodnetic
David, this is not really a Lucene issue. Here is some Perl code that you could either use or rewrite in Java if you need it in Java: http://search.cpan.org/dist/Date-Extract/ Tika won't help with this, and I believe UIMA itself with not help either, although there may be components for date ex

Re: Single searcher vs Multi Searcher

2008-10-03 Thread Anshum
Hi Ganesh, I have experimented with sharded indexes and they seem to benefit me(atleast in my case). I would like to know a few things before I answer your question: 1. Do you have a reasonable criteria ( a calculated one) to shard the indexes? 2. How do you plan to split the index? Is it going to

Re: Document larger than setRAMBufferSizeMB()

2008-10-03 Thread Michael McCandless
Note that large stored fields do not use up any RAM in IndexWriter's RAM buffer because these stored fields are immediately written to the directory and not stored in RAM for very long. Aditi, I'd love to see the full stack trace of the OOME that was originally hit if you still have it...

Re: Document larger than setRAMBufferSizeMB()

2008-10-03 Thread Ganesh
Single document of 16 MB seems to be big. I think you are trying to store the entire document content. If it is so drop the stored field and store its reference information in the database, which could help to retreive the content later. Regards Ganesh - Original Message - From: "Adi

Re: Document larger than setRAMBufferSizeMB()

2008-10-03 Thread Michael McCandless
First off, IndexWriter's RAM buffer size is "approximate": after each doc is added, we check if the RAM consumed is greater than our bugdet, and if so, we flush. When you add a doc that's larger than the RAM buffer size, all that will happen is after that doc is indexed, we flush. In oth

Re: Document larger than setRAMBufferSizeMB()

2008-10-03 Thread Aditi Goyal
Thanks Anshum. Although it raises another query, committing the current buffer will commit the docs before and what will happen to the current doc which threw an error while adding a field to it, will that also get committed in the half?? Thanks a lot Aditi On Fri, Oct 3, 2008 at 2:12 PM, Anshum

Single searcher vs Multi Searcher

2008-10-03 Thread Ganesh
Hello all, My indexing is growing by 1 million records per day and the memory consumption of the searcher object is quite high. There are different opinion in the groups. Few suggest to use single database and few to use sharding. My Database has 10 million records now and it might go till 3

Re: Document larger than setRAMBufferSizeMB()

2008-10-03 Thread Anshum
Hi Aditi, I guess increasing the buffer size would be a solution here, but in case you wouldn't know the expected max doc size. I guess the best way to handle that would be a regular try catch block in which you could commit the current buffer. At the least you could just continue the loop after d

Document larger than setRAMBufferSizeMB()

2008-10-03 Thread Aditi Goyal
Hi Everyone, I have an index which I am opening at one time only. I keep adding the documents to it until I reach a limit of 500. After this, I close the index and open it again. (This is done in order to save time taken by opening and closing the index) Also, I have set setRAMBufferSizeMB to 16MB

Re: Concurrent search

2008-10-03 Thread Mindaugas Žakšauskas
Hello Carmelo, Can you clarify what is "index" in your case? Concurrency issues, I believe, are well explained in: http://darksleep.com/lucene/ (see "Digression: Thread Safety" chapter). Regards, Mindaugas On Thu, Oct 2, 2008 at 7:49 PM, Carmelo Saffioti <[EMAIL PROTECTED]> wrote: > Hi everybo

Re: Concurrent search

2008-10-03 Thread Anshum
Hi Camello, It is pretty straight opening index searchers for the same index directory. In other words just open multiple saerchers to the same index location and it would work fine. -- -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the op