Michael: Yep, previous implementation have his own buffer to reduce BLOB IO overriding Input/OutputBufferStream base classes. Now Input/Output stream classes use BufferStream implementation and only implement flushInternal and setSize methods among others minimun changes :) Best regards, Marcelo.
PD: I am not in my notebook now, so I can not put the diff to see the differences :( On Tue, Sep 30, 2008 at 9:32 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > I'm glad to hear this! > > But: what was the root cause of the problems? Why didn't your previous > Input/Output implementations, which worked with Lucene 2.3, work win 2.4? > It's kinda spooky. > > Mike > > Marcelo Ochoa wrote: > >> Michael: >> I have OJVMDirectory working with 2.4rc2 code base. >> I have refactored Output and Input streams classes according to >> latest implementation of Buffered base classes and works OK. >> All of my tests suites runs OK also the bug in the 11g JITs compiler >> is not reproducible. >> I'll test this new implementation of Input/Output stream classes if >> they are compatible with 2.3 code base to release a new binary version >> of Oracle Lucene integration with 2.3 distribution and once 2.4 >> release is in production I'll release another binary version too. >> Best regards, Marcelo. >> >> On Mon, Sep 29, 2008 at 6:58 AM, Michael McCandless >> <[EMAIL PROTECTED]> wrote: >>> >>> Marcelo, >>> >>> Do you have any sense whether this is an issue with your integration (eg >>> your Directory implementation that stores data in BLOB columns) vs >>> something >>> with Lucene 2.4? >>> >>> It seems odd to me that there would be a bug in your Directory >>> implementation that 2.3 didn't tickle but 2.4 did. >>> >>> Can you dump the index, as stored in BLOB columns, out into the >>> filesystem, >>> and run CheckIndex on it? (Or maybe run CheckIndex within Oracle). >>> >>> Mike >>> >>> Marcelo Ochoa wrote: >>> >>>> Mike: >>>> Actually there is more issues at first glance with OJVMDirectory >>>> integration. >>>> Note this, I am creating an index with two simple documents: >>>> INFO: Performing: SELECT /*+ DYNAMIC_SAMPLING(0) RULE NOCACHE(T1) */ >>>> T1.rowid,F1,extractValue(F2,'/emp/name/text()') >>>> "name",extractValue(F2,'/emp/@id') "id" FROM LUCENE.T1 for update nowait >>>> Sep 26, 2008 3:44:16 PM org.apache.lucene.indexer.TableIndexer index >>>> FINE: Document<stored/uncompressed,indexed<rowid:AAARLCAAEAAAm2QAAA> >>>> indexed,tokenized<F1:001> indexed,tokenized<name:ravi> >>>> indexed,tokenized<id:01>> >>>> Sep 26, 2008 3:44:16 PM org.apache.lucene.indexer.TableIndexer index >>>> FINE: Document<stored/uncompressed,indexed<rowid:AAARLCAAEAAAm2QAAB> >>>> indexed,tokenized<F1:003> indexed,tokenized<name:murthy> >>>> indexed,tokenized<id:03>> >>>> IW 10 [Root Thread]: flush: segment=_0 docStoreSegment=_0 >>>> docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false >>>> numDocs=2 >>>> numBufDelTerms=0 >>>> IW 10 [Root Thread]: index before flush >>>> IW 10 [Root Thread]: DW: flush postings as segment _0 numDocs=2 >>>> IW 10 [Root Thread]: DW: oldRAMSize=111616 newFlushedSize=166 >>>> docs/MB=12,633.446 new/old=0.149% >>>> IFD [Root Thread]: now checkpoint "segments_1" [1 segments ; isCommit = >>>> false] >>>> IW 10 [Root Thread]: LMP: findMerges: 1 segments >>>> IW 10 [Root Thread]: LMP: level -1.0 to 2.2741578: 1 segments >>>> IW 10 [Root Thread]: CMS: now merge >>>> IW 10 [Root Thread]: CMS: index: _0:C2->_0 >>>> IW 10 [Root Thread]: CMS: no more merges pending; now return >>>> IW 10 [Root Thread]: now flush at close >>>> IW 10 [Root Thread]: flush: segment=null docStoreSegment=_0 >>>> docStoreOffset=2 flushDocs=false flushDeletes=true flushDocStores=true >>>> numDocs=0 numBufDelTerms=0 >>>> IW 10 [Root Thread]: index before flush _0:C2->_0 >>>> IW 10 [Root Thread]: flush shared docStore segment _0 >>>> IW 10 [Root Thread]: DW: closeDocStore: 2 files to flush to segment _0 >>>> numDocs=2 >>>> IW 10 [Root Thread]: CMS: now merge >>>> IW 10 [Root Thread]: CMS: index: _0:C2->_0 >>>> IW 10 [Root Thread]: CMS: no more merges pending; now return >>>> IW 10 [Root Thread]: now call final commit() >>>> IW 10 [Root Thread]: startCommit(): start sizeInBytes=0 >>>> IW 10 [Root Thread]: startCommit index=_0:C2->_0 changeCount=2 >>>> IW 10 [Root Thread]: now sync _0.fnm >>>> IW 10 [Root Thread]: now sync _0.frq >>>> IW 10 [Root Thread]: now sync _0.prx >>>> IW 10 [Root Thread]: now sync _0.tis >>>> IW 10 [Root Thread]: now sync _0.tii >>>> IW 10 [Root Thread]: now sync _0.nrm >>>> IW 10 [Root Thread]: now sync _0.fdx >>>> IW 10 [Root Thread]: now sync _0.fdt >>>> IW 10 [Root Thread]: done all syncs >>>> IW 10 [Root Thread]: commit: pendingCommit != null >>>> IFD [Root Thread]: now checkpoint "segments_2" [1 segments ; isCommit = >>>> true] >>>> IFD [Root Thread]: deleteCommits: now decRef commit "segments_1" >>>> IFD [Root Thread]: delete "segments_1" >>>> IW 10 [Root Thread]: commit: done >>>> IW 10 [Root Thread]: at close: _0:C2->_0 >>>> Sep 26, 2008 3:44:16 PM org.apache.lucene.indexer.LuceneDomainIndex >>>> ODCIIndexCreate >>>> FINER: RETURN 0 >>>> >>>> Index created. >>>> >>>> And when I am trying to read the index I got: >>>> INFO: Analyzer: [EMAIL PROTECTED] >>>> Sep 26, 2008 3:44:48 PM org.apache.lucene.indexer.LuceneDomainIndex >>>> ODCIStart >>>> INFO: qryStr: DESC(name:ravi) >>>> Sep 26, 2008 3:44:48 PM org.apache.lucene.indexer.LuceneDomainIndex >>>> ODCIStart >>>> INFO: storing cachingFilter: -1378376940 and searcher: 781713581 >>>> qryStr: DESC(name:ravi) >>>> Sep 26, 2008 3:44:48 PM org.apache.lucene.indexer.LuceneDomainIndex >>>> getSort >>>> INFO: using sort: <score>,<doc> >>>> Exception in thread "Root Thread" java.lang.IndexOutOfBoundsException: >>>> Index: 6, Size: 4 >>>> at java.util.ArrayList.RangeCheck(ArrayList.java) >>>> at java.util.ArrayList.get(ArrayList.java) >>>> at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java) >>>> at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java) >>>> at org.apache.lucene.index.TermBuffer.read(TermBuffer.java) >>>> at >>>> org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java) >>>> at >>>> org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java) >>>> at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) >>>> at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) >>>> at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java) >>>> at >>>> org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java) >>>> at org.apache.lucene.search.Similarity.idf(Similarity.java) >>>> at >>>> org.apache.lucene.search.TermQuery$TermWeight.<init>(TermQuery.java) >>>> at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java) >>>> at org.apache.lucene.search.Query.weight(Query.java) >>>> at org.apache.lucene.search.Hits.<init>(Hits.java:85) >>>> at org.apache.lucene.search.Searcher.search(Searcher.java) >>>> at >>>> >>>> org.apache.lucene.indexer.LuceneDomainIndex.ODCIStart(LuceneDomainIndex.java) >>>> >>>> Which definetly means that something is not well saved at OJVM >>>> directory BLOB storage :( >>>> This are my files: >>>> SQL> select file_size,name from it1$t; >>>> >>>> FILE_SIZE NAME >>>> ---------- ------------------------------ >>>> 10 parameters >>>> 1 updateCount >>>> 28 segments_1 >>>> 20 segments.gen >>>> 8 _0.frq >>>> 8 _0.prx >>>> 103 _0.tis >>>> 35 _0.tii >>>> 12 _0.nrm >>>> 22 _0.fnm >>>> 48 _0.fdt >>>> 20 _0.fdx >>>> 62 segments_2 >>>> I'll add some debugging information at my classes which save/load >>>> buffers to see how many calls and which arguments are used. >>>> Marcelo. >>>> >>>> On Fri, Sep 26, 2008 at 1:41 PM, Michael McCandless >>>> <[EMAIL PROTECTED]> wrote: >>>>> >>>>> This one looks spooky! >>>>> >>>>> Is it easily repeated? If you could print out which 2 terms you had >>>>> tried >>>>> to delete, and then zip up the index just before deleting those docs >>>>> (after >>>>> closing the writer) and send to me, I can try to understand what's >>>>> wrong >>>>> with the index. It looks as if the *.tis file for one of the segments >>>>> is >>>>> truncated. >>>>> >>>>> If you capture the series of add/update/delete documents, can you get a >>>>> standalone Java test to show this? >>>>> >>>>> Does this test create an entirely new index? >>>>> >>>>> We did change the index format in 2.4 to use "true" UTF8 encoding for >>>>> all >>>>> text content; not sure that this applies here (to BufferedIndexReader >>>>> it's >>>>> all bytes) but it may. >>>>> >>>>> BufferedIndexReader in general can do random IO, especially when >>>>> reading >>>>> the >>>>> term dict file (*.tis), when you >>>>> >>>>> Mike >>>>> >>>>> Marcelo Ochoa wrote: >>>>> >>>>>> Michael: >>>>>> I just start testing 2.4rc2 running inside OJVM. >>>>>> I found a similar stack trace during indexing: >>>>>> IW 3 [Root Thread]: flush: segment=_3 docStoreSegment=_3 >>>>>> docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false >>>>>> numDocs=2 numBufDelTerms=2 >>>>>> IW 3 [Root Thread]: index before flush _1:C2->_1 _2:C2->_2 >>>>>> IW 3 [Root Thread]: DW: flush postings as segment _3 numDocs=2 >>>>>> IW 3 [Root Thread]: DW: oldRAMSize=111616 newFlushedSize=264 >>>>>> docs/MB=7,943.758 new/old=0.237% >>>>>> IW 3 [Root Thread]: DW: apply 2 buffered deleted terms and 0 deleted >>>>>> docIDs and 0 deleted queries on 3 segments. >>>>>> IW 3 [Root Thread]: hit exception flushing deletes >>>>>> Exception in thread "Root Thread" java.io.IOException: read past EOF >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java) >>>>>> at org.apache.lucene.index.TermBuffer.read(TermBuffer.java) >>>>>> at >>>>>> org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java) >>>>>> at >>>>>> org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java) >>>>>> at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) >>>>>> at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java) >>>>>> at >>>>>> org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java) >>>>>> at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:918) >>>>>> at >>>>>> org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java) >>>>>> at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java) >>>>>> at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java) >>>>>> at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.lucene.indexer.LuceneDomainIndex.sync(LuceneDomainIndex.java:1308) >>>>>> >>>>>> I'll reinstall with a full debug info to see all line numbers in >>>>>> Lucene java code. >>>>>> Is there a list of semantic changes at BufferedIndeInput code? >>>>>> I mean it do sequential or random writes for example. >>>>>> But anyway, I just compiled with latest code and ran my test suites, >>>>>> I'll investigate the problem a bit more. >>>>>> Best regards, Marcelo. >>>>>> >>>>>> On Fri, Sep 26, 2008 at 7:32 AM, Michael McCandless >>>>>> <[EMAIL PROTECTED]> wrote: >>>>>>> >>>>>>> Can you describe the sequence of steps that your replication process >>>>>>> goes >>>>>>> through? >>>>>>> >>>>>>> Also, which filesystem is the index being accessed through? >>>>>>> >>>>>>> Mike >>>>>>> >>>>>>> rahul_k123 wrote: >>>>>>> >>>>>>>> >>>>>>>> First of all, thanks to all the people who helped me in getting the >>>>>>>> lucene >>>>>>>> replication setup working and right now its live in our production >>>>>>>> :-) >>>>>>>> >>>>>>>> Everything working fine, except that i am seeing some exceptions on >>>>>>>> slaves. >>>>>>>> >>>>>>>> The following is the one which is occuring more often on slaves >>>>>>>> >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >>>>>>>> at >>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) >>>>>>>> at java.lang.Thread.run(Thread.java:619) >>>>>>>> Caused by: com.IndexingException: [SYSTEM_ERROR] Cannot access index >>>>>>>> [data_dir/index]: [read past EOF] >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> com.lucene.LuceneSearchService.getSearchResults(LuceneSearchService.java:964) >>>>>>>> ... 12 more >>>>>>>> Caused by: java.io.IOException: read past EOF >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) >>>>>>>> at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:66) >>>>>>>> at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:89) >>>>>>>> at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:147) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:257) >>>>>>>> at >>>>>>>> org.apache.lucene.index.IndexReader.document(IndexReader.java:525) >>>>>>>> >>>>>>>> and the second one is >>>>>>>> >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >>>>>>>> at >>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) >>>>>>>> at java.lang.Thread.run(Thread.java:619) >>>>>>>> Caused by: java.lang.IllegalArgumentException: attempt to access a >>>>>>>> deleted >>>>>>>> document >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:657) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:257) >>>>>>>> at >>>>>>>> org.apache.lucene.index.IndexReader.document(IndexReader.java:525) >>>>>>>> This is on master index . >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Any help is appreciated >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://www.nabble.com/Caused-by%3A-java.io.IOException%3A-read-past-EOF-on-Slave-tp19682684p19682684.html >>>>>>>> Sent from the Lucene - Java Users mailing list archive at >>>>>>>> Nabble.com. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Marcelo F. Ochoa >>>>>> http://marceloochoa.blogspot.com/ >>>>>> http://marcelo.ochoa.googlepages.com/home >>>>>> ______________ >>>>>> Do you Know DBPrism? Look @ DB Prism's Web Site >>>>>> http://www.dbprism.com.ar/index.html >>>>>> More info? >>>>>> Chapter 17 of the book "Programming the Oracle Database using Java & >>>>>> Web Services" >>>>>> http://www.amazon.com/gp/product/1555583296/ >>>>>> Chapter 21 of the book "Professional XML Databases" - Wrox Press >>>>>> http://www.amazon.com/gp/product/1861003587/ >>>>>> Chapter 8 of the book "Oracle & Open Source" - O'Reilly >>>>>> http://www.oreilly.com/catalog/oracleopen/ >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Marcelo F. Ochoa >>>> http://marceloochoa.blogspot.com/ >>>> http://marcelo.ochoa.googlepages.com/home >>>> ______________ >>>> Want to integrate Lucene and Oracle? >>>> >>>> >>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html >>>> Is Oracle 11g REST ready? >>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >> >> >> >> -- >> Marcelo F. Ochoa >> http://marceloochoa.blogspot.com/ >> http://marcelo.ochoa.googlepages.com/home >> ______________ >> Want to integrate Lucene and Oracle? >> >> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html >> Is Oracle 11g REST ready? >> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Marcelo F. Ochoa http://marceloochoa.blogspot.com/ http://marcelo.ochoa.googlepages.com/home ______________ Want to integrate Lucene and Oracle? http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html Is Oracle 11g REST ready? http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]