Hello again, The file are local, sorry for using the confusing /mounts...., I can see where that is confusing.
What exactly is a RAMDirectory, I didn't see it mentioned on that page. Is there example code of using it? Do I just create a Ram Directory and then use it like it's a normal directory? --JP On 8/13/07, Kai Hu <[EMAIL PROTECTED]> wrote: > > Hi, John > I think you cost too much time in I/O,and if you use RAMDirectory > first will better.see > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed > > kai > > -----邮件原件----- > 发件人: Erick Erickson [mailto:[EMAIL PROTECTED] > 发送时间: 2007年8月13日 星期一 1:57 > 收件人: java-user@lucene.apache.org > 主题: Re: Indexing correctly? > > Where are your source files and index? If they're somewhere > out there on the network, you may be having some slowdown > because of network latency (the part about "/mount/....." leads > me to ask this one). > > If this is the case, you might get an improvement if all the files are > local... > > Best > Erick > > On 8/11/07, John Paul Sondag <[EMAIL PROTECTED]> wrote: > > > > It takes roughly 6 hours for me to index a Gig of data. The benchmarks > > take > > quite a bit less if I'm reading it correctly. I'll try out the > > StringBuffer/Builder and let you know. Thanks for the quick response > and > > if > > you have any more suggestions please let me know. > > > > --JP > > > > On 8/11/07, karl wettin <[EMAIL PROTECTED]> wrote: > > > > > > How much slower than anticipated is it? > > > > > > I would start by using a StringBuffer/Builder rather than appending > > > (immutable) strings to each other. > > > > > > > > > 11 aug 2007 kl. 19.05 skrev John Paul Sondag: > > > > > > > Hi, > > > > > > > > I was hoping that maybe you guys could see if I'm somehow indexing > > > > inefficiently. I'm putting relevant parts of my code below. I've > > > > looked at > > > > the "benchmarks" page on Lucene and my indexing time is taking a > > > > substantial > > > > amount of time more than what I see posted. I'm not sure when I > > > > should call > > > > flush() ( I saw that I should be doing that on the > > > > ImproveIndexingSpeed > > > > page). I'd really appreciate any advice. > > > > > > > > Here's my code: > > > > > > > > File directory = new File( "/mounts/falcon5/disks/0/tcheng3/ > > > > Dataset"); > > > > File[] theFiles = directory.listFiles(); > > > > > > > > //go through each file inside the directory and index it > > > > for(int curFile = 0; curFile < theFiles.length; curFile++) > > > > { > > > > File fin=theFiles[curFile]; > > > > > > > > //open up the file > > > > FileInputStream inf = new FileInputStream(fin); > > > > InputStreamReader isr = new InputStreamReader(inf, > > > > "US-ASCII"); > > > > BufferedReader in = new BufferedReader(isr); > > > > String text=""; > > > > String docid=""; > > > > > > > > while (true) { > > > > > > > > //read in the file one line at a time, and act > accordingly > > > > String line = in.readLine(); > > > > if (line == null) { break;} > > > > > > > > if (line.startsWith("<DOC>") ) { > > > > //get docID > > > > line = in.readLine(); > > > > String tempStr = line.substring(8,line.length > ()); > > > > int pos = tempStr.indexOf(' '); > > > > docid = tempStr.substring(0,pos); > > > > }else if (line.startsWith("</DOC>")) { > > > > > > > > Document doc = new Document(); > > > > > > > > doc.add(new Field("contents",text, > > > > Field.Store.NO, > > > > Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS )); > > > > doc.add(new Field("DocID",docid, Field.Store.YES > , > > > > Field.Index.NO)); > > > > writer.addDocument(doc); > > > > text=""; > > > > } else { > > > > text = text + "\n" + line; > > > > } > > > > } > > > > > > > > } > > > > > > > > > > > > int numIndexed = writer.docCount(); > > > > > > > > writer.optimize(); > > > > writer.close(); > > > > > > > > > > > > Thanks, > > > > > > > > --JP > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >