Hi Erick and Mike
  
  Really thanks a lot for the advice... =)
  
  I will fix my code..I'll let you guys know if any problem arises.
  
  
  Many thanks and best regards ^ ^
  
  MauReen
  
  

Michael McCandless <[EMAIL PROTECTED]> wrote:  Erick Erickson wrote:
> Don't do it that way. You're opening and closing your indexwrwiter for each
> document, which is extremely wasteful. And given locking has been a source
> of much discussion on this list, it's not clear that locking will withstand
> this kind of hammering. You want to do something like

Agreed, this is very inefficient.  You will get much better
performance by creating the writer once, doing all your adds, then
closing it.

But, locking should be fine even for this "hammering" use case (and if
it's not, that's a bug, and I'd really like to know about it!).  So it
would still be good to get to the root cause if this comes up again.

Also I see you're ignoring all Exceptions coming out of the
writer.addDocument(...) and writer.close().  At least you should print
the exception & save it somewhere.  Your locking problem could very
well be an exception that occurred in addDocument and thus the writer
failed to be closed.

Also it's best to do this:

   writer = new IndexWriter(...);

   try {
     ...do lots of stuff with writer...
   } finally {
     writer.close();
   }

Mike

> IndexWriter writer = new IndexWriter();
> 
> for each document {
>    writer.add(document);
> }
> 
> close writer.
> 
> 
> I'd guess this will be much faster as well.
> 
> And one style note: You'll save yourself endless grief if you avoid 
> starting
> an operation one place (especially in a called method) and clean up 
> after it
> someplace else. In your code snippet, opening the IndexWriter in your
> DocumentIndexer then having to remember to close it in main is a recipe for
> disaster. Trust me on this one, I've spent waaaaay more time than I'd like
> to admit debugging this kind of problem .
> 
> Best
> Erick
> 
> 
> On 1/25/07, maureen tanuwidjaja  wrote:
>>
>>
>>   Hi Mike,thanks for the reply...
>>
>>
>>   1.Here is the class that I use for indexing..
>>
>>   package edu.ntu.ce.maureen.index;
>>
>>   import org.apache.lucene.index.IndexWriter;
>>   import org.apache.lucene.analysis.Analyzer;
>>   import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>   import org.apache.lucene.document.Document;
>>   import org.apache.lucene.document.Field;
>>   import edu.ntu.ce.maureen.search.MySimilarity;
>>
>>   import java.lang.String;
>>
>>
>>   public class DocumentIndexer
>>   {
>>       public IndexWriter writer;
>>       public Analyzer analyzer;
>>       public Document doc;
>>       public String indexDir;
>>
>>       public Document getDoc(){
>>           return doc;
>>       }
>>       public DocumentIndexer(int index,String f)
>>       {
>>       indexDir = "C:/sweetpea/dual_index/DI";
>>       analyzer = new StandardAnalyzer();
>>
>>       try{
>>       writer = new IndexWriter(indexDir, analyzer,index  ==0);//true if
>> index ==0,or the first document is indexed...which means  it creates new
>> lucene indexfile,and false for the next documents  (append to existing
>> indexfile)
>>
>>       writer.setSimilarity(new MySimilarity());
>>           doc = new Document();
>>           doc.add(new  Field("filename",f,Field.Store.YES,
>> Field.Index.TOKENIZED,Field.TermVector.YES));
>>       }catch (Exception e) {
>>           System.out.println(e.toString());
>>       }
>>
>>       }
>>
>>
>>       public void setIndex(XMLData current,String f){
>>       try{
>>       doc.add(new Field(current.tag,current.value,Field.Store.YES,
>> Field.Index.TOKENIZED,Field.TermVector.YES));
>>       }catch(Exception e){}
>>       }
>>
>>       public Document closeIndex(){
>>           try{
>>           writer.addDocument(doc);
>>           writer.close();
>>           //System.out.println("index closed..");
>>           }
>>           catch(Exception e){
>>
>>           }
>>           return doc;
>>       }
>>   }
>>
>>   2.The class above(DocumentIndexer.java) is called by the main ,here is
>> the part of the code:
>>
>>
>>           File[] listfiles = dir.listFiles();
>>           for (int i = 0;i< listfiles.length ;i++)
>>           {
>>               if (listfiles[i].isDirectory())
>>               {
>>                   FileTraverse(listfiles[i], myMap2,counter);
>>                   continue;
>>               }
>>
>>               if (!listfiles[i].toString().endsWith(".xml")) continue;
>>
>>               String filename = listfiles[i].getAbsolutePath();
>>                DocumentIndexer luceneIndex =
>> new  DocumentIndexer(counter,filename);//this counter is initially set to
>> 0  so that for the first file  indexed,it creates new index (the 
>> IndexWriter
>> flag set to true and  false otherwise) and later it is incremented.
>>
>>
>>               System.out.println("Indexing
>> "+luceneIndex.getDoc().get("filename"));
>>                indexing(luceneIndex,filename,myMap2);//this method will
>> basically  call  setIndex(XMLData current,String f) of the
>> DocumentIndexer  class
>>
>>               
>> AssignTagNormFactor("article/",1.0,myMap2,luceneIndex.getDoc
>> ());
>>               counter++;//a file has succesfully indexed,increment the
>> counter
>>                luceneIndex.closeIndex();//  ==>I close the index
>> already,so it  should have released the lock....
>>           }
>>
>>
>>   3.I have deleted the directory where the indexfile exist and try
>> to  index from the beginning...I dunno wheter 7 hrs later it will raise
>> the  same problem"Lock obtain timed out"
>>
>>   4.I use the latest version of Lucene (nightly build)
>>
>>   Thanks and Regards,
>>   Maureen
>>
>> Michael McCandless  wrote:  maureen 
>> tanuwidjaja
>> wrote:
>>
>> >   I am indexing thousands of XML document,then it stops after indexing
>> for about 7 hrs
>> >
>> >   ...
>> >   Indexing C:\sweetpea\wikipedia_xmlfiles\part-0\37027.xml
>> >   java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED]
>> :\sweetpea\dual_index\DI\write.lock
>> >   java.lang.NullPointerException
>> >
>> >   Can anyone suggest how to overcome this lock time out error?
>>
>> Which version of Lucene are you using?
>>
>> (Looks like it may be a recent trunk nightly build since the
>> write.lock is stored in the index directory?).
>>
>> This often happens if the JVM exited un-gracefully previously and left
>> the "write.lock" in the filesystem.  If that's what's hitting you then
>> you can just remove that file and restart your indexing process.  You
>> also might want to try switching to the NativeFSLockFactory (if you
>> are using trunk) locking implementation.  Because it uses native (OS
>> level) locking, the locks are always freed by the OS when the JVM
>> exits.
>>
>> Also make sure you're not accidentally trying to create two writers
>> in the same index.  There can be only one.
>>
>> If that's not what's happening here, can you provide more details
>> about how you are using Lucene?  Is there only one IndexWriter, that you
>> periodically close / reopen?  The IndexWriter only acquires this lock
>> when it's first instantiated.
>>
>> Also: where is the java.lang.NullPointerException coming from?  Can
>> you provide a traceback?
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>>
>> ---------------------------------
>> Don't get soaked.  Take a quick peak at the forecast
>> with theYahoo! Search weather shortcut.
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



 
---------------------------------
 Get your own web address.
 Have a HUGE year through Yahoo! Small Business.

Reply via email to