Erick Erickson wrote:
Don't do it that way. You're opening and closing your indexwrwiter for each
document, which is extremely wasteful. And given locking has been a source
of much discussion on this list, it's not clear that locking will withstand
this kind of hammering. You want to do something like

Agreed, this is very inefficient.  You will get much better
performance by creating the writer once, doing all your adds, then
closing it.

But, locking should be fine even for this "hammering" use case (and if
it's not, that's a bug, and I'd really like to know about it!).  So it
would still be good to get to the root cause if this comes up again.

Also I see you're ignoring all Exceptions coming out of the
writer.addDocument(...) and writer.close().  At least you should print
the exception & save it somewhere.  Your locking problem could very
well be an exception that occurred in addDocument and thus the writer
failed to be closed.

Also it's best to do this:

  writer = new IndexWriter(...);

  try {
    ...do lots of stuff with writer...
  } finally {
    writer.close();
  }

Mike

IndexWriter writer = new IndexWriter();

for each document {
   writer.add(document);
}

close writer.


I'd guess this will be much faster as well.

And one style note: You'll save yourself endless grief if you avoid starting an operation one place (especially in a called method) and clean up after it
someplace else. In your code snippet, opening the IndexWriter in your
DocumentIndexer then having to remember to close it in main is a recipe for
disaster. Trust me on this one, I've spent waaaaay more time than I'd like
to admit debugging this kind of problem <G>.

Best
Erick


On 1/25/07, maureen tanuwidjaja <[EMAIL PROTECTED]> wrote:


  Hi Mike,thanks for the reply...


  1.Here is the class that I use for indexing..

  package edu.ntu.ce.maureen.index;

  import org.apache.lucene.index.IndexWriter;
  import org.apache.lucene.analysis.Analyzer;
  import org.apache.lucene.analysis.standard.StandardAnalyzer;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.document.Field;
  import edu.ntu.ce.maureen.search.MySimilarity;

  import java.lang.String;


  public class DocumentIndexer
  {
      public IndexWriter writer;
      public Analyzer analyzer;
      public Document doc;
      public String indexDir;

      public Document getDoc(){
          return doc;
      }
      public DocumentIndexer(int index,String f)
      {
      indexDir = "C:/sweetpea/dual_index/DI";
      analyzer = new StandardAnalyzer();

      try{
      writer = new IndexWriter(indexDir, analyzer,index  ==0);//true if
index ==0,or the first document is indexed...which means  it creates new
lucene indexfile,and false for the next documents  (append to existing
indexfile)

      writer.setSimilarity(new MySimilarity());
          doc = new Document();
          doc.add(new  Field("filename",f,Field.Store.YES,
Field.Index.TOKENIZED,Field.TermVector.YES));
      }catch (Exception e) {
          System.out.println(e.toString());
      }

      }


      public void setIndex(XMLData current,String f){
      try{
      doc.add(new Field(current.tag,current.value,Field.Store.YES,
Field.Index.TOKENIZED,Field.TermVector.YES));
      }catch(Exception e){}
      }

      public Document closeIndex(){
          try{
          writer.addDocument(doc);
          writer.close();
          //System.out.println("index closed..");
          }
          catch(Exception e){

          }
          return doc;
      }
  }

  2.The class above(DocumentIndexer.java) is called by the main ,here is
the part of the code:


          File[] listfiles = dir.listFiles();
          for (int i = 0;i< listfiles.length ;i++)
          {
              if (listfiles[i].isDirectory())
              {
                  FileTraverse(listfiles[i], myMap2,counter);
                  continue;
              }

              if (!listfiles[i].toString().endsWith(".xml")) continue;

              String filename = listfiles[i].getAbsolutePath();
               DocumentIndexer luceneIndex =
new  DocumentIndexer(counter,filename);//this counter is initially set to
0 so that for the first file indexed,it creates new index (the IndexWriter
flag set to true and  false otherwise) and later it is incremented.


              System.out.println("Indexing
"+luceneIndex.getDoc().get("filename"));
               indexing(luceneIndex,filename,myMap2);//this method will
basically  call  setIndex(XMLData current,String f) of the
DocumentIndexer  class

AssignTagNormFactor("article/",1.0,myMap2,luceneIndex.getDoc
());
              counter++;//a file has succesfully indexed,increment the
counter
               luceneIndex.closeIndex();//  ==>I close the index
already,so it  should have released the lock....
          }


  3.I have deleted the directory where the indexfile exist and try
to  index from the beginning...I dunno wheter 7 hrs later it will raise
the  same problem"Lock obtain timed out"

  4.I use the latest version of Lucene (nightly build)

  Thanks and Regards,
  Maureen

Michael McCandless <[EMAIL PROTECTED]> wrote: maureen tanuwidjaja
wrote:

>   I am indexing thousands of XML document,then it stops after indexing
for about 7 hrs
>
>   ...
>   Indexing C:\sweetpea\wikipedia_xmlfiles\part-0\37027.xml
>   java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED]
:\sweetpea\dual_index\DI\write.lock
>   java.lang.NullPointerException
>
>   Can anyone suggest how to overcome this lock time out error?

Which version of Lucene are you using?

(Looks like it may be a recent trunk nightly build since the
write.lock is stored in the index directory?).

This often happens if the JVM exited un-gracefully previously and left
the "write.lock" in the filesystem.  If that's what's hitting you then
you can just remove that file and restart your indexing process.  You
also might want to try switching to the NativeFSLockFactory (if you
are using trunk) locking implementation.  Because it uses native (OS
level) locking, the locks are always freed by the OS when the JVM
exits.

Also make sure you're not accidentally trying to create two writers
in the same index.  There can be only one.

If that's not what's happening here, can you provide more details
about how you are using Lucene?  Is there only one IndexWriter, that you
periodically close / reopen?  The IndexWriter only acquires this lock
when it's first instantiated.

Also: where is the java.lang.NullPointerException coming from?  Can
you provide a traceback?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------
Don't get soaked.  Take a quick peak at the forecast
with theYahoo! Search weather shortcut.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to