I understand why buffered indexing seems running faster. It seems that initialization operation takes obvious time and impact the indexing performance. I found ram indexing is faster if I run buffered indexing prior to ram indexing. So I think the method "addDocuments" will take more time at first running than second time. This is why the buffered indexing runs faster when ram indexing prior to buffered .
> -----Original Message----- > From: Flik Shen [mailto:[EMAIL PROTECTED] > Sent: Monday, June 12, 2006 11:29 AM > To: java-user@lucene.apache.org > Subject: RE: An interesting thing > > 1. I use buffered indexing and ram indexing to index same 3000 > documents. So I think they have same total sizes. > 2. I store them in ram directory firstly no matter ram or buffered. I > think it should have same performances. Then I take further step to hold > index into file system directory. > So why do we consume less time when we take more actions based on same > previous action? > > -----Original Message----- > From: yueyu lin [mailto:[EMAIL PROTECTED] > Sent: Monday, June 12, 2006 10:38 AM > To: java-user@lucene.apache.org > Subject: Re: An interesting thing > > 1. Buffered index is using ram. They are small and samll enough to be > easy > for OS to allocate several(or only one) pages to store them. > 2. RAMDirectory will have to apply huge blocks of ram from OS. Sometimes > OS > cannot allocate so many ram efficiently. So some of pages are moved to > disk > and a ram-disk mapping is constructed. When using those kinds of pages, > OS > has to first remove some using pages in memory and then load some pages > from > disk into ram. That's a great cost. > > For any program, applying huge memory is always slow than applying small > memory. > > On 6/12/06, Flik Shen <[EMAIL PROTECTED]> wrote: > > > > One thing could not be explained clearly. > > That is why "RAM" ALWAYS take more time than buffered indexing. > > On other hand the buffered indexing is to use "RAM" as a buffer. > > Is there some difference between these two "RAM"? > > To use "RAM" as a buffer I take additional step to convert buffered > > index to FS index eventually. How could it save time? > > > > Following is the class declaration: > > ================= Class FSversusRAMDirectoryTest Begin > ================= > > package study.lucene.chapter2; > > > > import java.io.IOException; > > import java.util.ArrayList; > > import java.util.Collection; > > import java.util.Iterator; > > > > import junit.framework.TestCase; > > > > import org.apache.lucene.analysis.SimpleAnalyzer; > > import org.apache.lucene.document.Document; > > import org.apache.lucene.document.Field; > > import org.apache.lucene.index.IndexWriter; > > import org.apache.lucene.store.Directory; > > import org.apache.lucene.store.FSDirectory; > > import org.apache.lucene.store.RAMDirectory; > > > > public class FSversusRAMDirectoryTest extends TestCase { > > > > private Directory fsDir; > > > > private Directory ramDir; > > > > private Directory fsBufDir; > > > > private Collection docs = loadDocuments(3000, 5); > > > > protected void setUp() throws Exception { > > > > String fsIndexDir = System.getProperty("java.io.tmpdir", > "tmp") > > + System.getProperty("file.separator") > > + "fs-index"; > > String fsBufIndexDir = System.getProperty("java.io.tmpdir", > > "tmp") > > + > System.getProperty("file.separator") > > + "fs-buffered-index"; > > // Create Directory whose content is held in RAM > > ramDir = new RAMDirectory(); > > > > // Create Directory whose content is stored on disk > > fsDir = FSDirectory.getDirectory(fsIndexDir, true); > > > > // Create Directory whose content is stored on disk > > fsBufDir = FSDirectory.getDirectory(fsBufIndexDir, true); > > } > > > > public void testTiming() throws IOException { > > > > // RAMDirectory is faster than FSDirectory > > long ramTiming = timeIndexWriter(ramDir); > > long fsTiming = timeIndexWriter(fsDir); > > assertTrue(fsTiming > ramTiming); > > > > long bfsTiming = timeBufIndexWriter(fsBufDir); > > > > System.out.println("RAMDirectory Time: " + (ramTiming) + " > ms"); > > System.out.println("FSDirectory Time : " + (fsTiming) + " > ms"); > > System.out.println("BufferedDirectory Time : " + (bfsTiming) + > " > > ms"); > > } > > > > private long timeBufIndexWriter(Directory dir) throws IOException > { > > > > long start = System.currentTimeMillis(); > > addBufDocuments(dir); > > > > long stop = System.currentTimeMillis(); > > return (stop - start); > > } > > > > private void addBufDocuments(Directory dir) throws IOException { > > > > Directory[] tempDirs = {new RAMDirectory()}; > > addDocuments(tempDirs[0]); > > > > IndexWriter writer = new IndexWriter(dir, new > SimpleAnalyzer(), > > true); > > writer.addIndexes(tempDirs); > > > > writer.optimize(); > > writer.close(); > > } > > > > private long timeIndexWriter(Directory dir) throws IOException { > > long start = System.currentTimeMillis(); > > addDocuments(dir); > > long stop = System.currentTimeMillis(); > > return (stop - start); > > } > > > > private void addDocuments(Directory dir) throws IOException { > > IndexWriter writer = new IndexWriter(dir, new > SimpleAnalyzer(), > > true); > > > > /* > > // change to adjust performance of indexing with FSDirectory > > // Parameters that affect performance of FSDirectory > > writer.setMergeFactor(writer.getMergeFactor()); > > writer.setMaxMergeDocs(writer.getMaxMergeDocs()); > > writer.setMaxBufferedDocs(writer.getMaxBufferedDocs()); > > */ > > > > for (Iterator iter = docs.iterator(); iter.hasNext();) { > > > > Document doc = new Document(); > > String word = (String) iter.next(); > > doc.add(new Field("keyword", word, Field.Store.YES, > > Field.Index.UN_TOKENIZED)); > > doc.add(new Field("unindexed", word, Field.Store.YES, > > Field.Index.NO)); > > doc.add(new Field("unstored", word, Field.Store.NO, > > Field.Index.NO_NORMS)); > > doc.add(new Field("text", word, Field.Store.YES, > > Field.Index.TOKENIZED)); > > writer.addDocument(doc); > > } > > writer.optimize(); > > writer.close(); > > } > > > > private Collection loadDocuments(int numDocs, int wordsPerDoc) { > > Collection docs = new ArrayList(numDocs); > > for (int i = 0; i < numDocs; i++) { > > StringBuffer doc = new StringBuffer(wordsPerDoc); > > for (int j = 0; j < wordsPerDoc; j++) { > > doc.append("Bibamus "); > > } > > docs.add(doc.toString()); > > } > > return docs; > > } > > } > > ================= Class FSversusRAMDirectoryTest End > ================= > > -----Original Message----- > > From: yueyu lin [mailto:[EMAIL PROTECTED] > > Sent: Sunday, June 11, 2006 7:31 PM > > To: java-user@lucene.apache.org > > Subject: Re: An interesting thing > > > > In some OS, the ram is not only "RAM". The virtual ram uses the disk. > > That's > > very slow. > > In some windows platform, you will find half of some application's ram > > is > > virtual ram. > > That's some why windows is slow in some fields. > > > > On 6/11/06, Flik Shen <[EMAIL PROTECTED]> wrote: > > > > > > Hi, > > > > > > > > > > > > I am freshman to Lucene and I am reading the book "Lucene In > Action". > > > > > > Just as that we know, there are two kinds of directory to hold > index, > > one > > > is File System and the other is RAM. > > > > > > There is a sample to compare performances of these two kind > > directories > > > and there is also a piece of code about "Batch indexing by using > > > RAMDirectory as a buffer". > > > > > > When I follow some samples, I found an interesting thing about > > indexing > > > performance. > > > > > > > > > > > > I combine these two pieces of codes and time each kind directory > > indexing. > > > (Please refer the attachment for details processes) > > > > > > I load 3000 docs and 5 words per doc. I use File System Directory > and > > RAM > > > Directory to indexing these docs directly. The time of these two are > > 10737ms > > > and 1575ms. > > > > > > Then I use a RAM directory as a buffer for indexing and use method > > > "addIndexes" of a new Index writer which finally holds index in a > File > > > System directory. > > > > > > The time it consumed is 1348ms. > > > > > > How could this be? > > > > > > I think the time that buffered indexing consumes should base on the > > time > > > of RAM indexing. > > > > > > I wonder why a buffered indexing even has a good performance than a > > ram > > > indexing. > > > > > > So interesting! > > > > > > > > > > > > Best regards, > > > > > > Flik Shen > > > **************** CAUTION - Disclaimer ***************** > > > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION > intended > > > solely for the use of the addressee(s). If you are not the intended > > > recipient, please notify the sender by e-mail and delete the > original > > > message. Further, you are not to copy, disclose, or distribute this > > e-mail > > > or its contents to any other person and any such actions are > unlawful. > > This > > > e-mail may contain viruses. Infosys has taken every reasonable > > precaution to > > > minimize this risk, but is not liable for any damage you may sustain > > as a > > > result of any virus in this e-mail. You should carry out your own > > virus > > > checks before opening the e-mail or attachment. Infosys reserves the > > right > > > to monitor and review the content of all messages sent to or from > this > > > e-mail address. Messages sent to or from this e-mail address may be > > stored > > > on the Infosys e-mail system. > > > ***INFOSYS******** End of Disclaimer ********INFOSYS*** > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > -- > > -- > > Yueyu Lin > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > -- > -- > Yueyu Lin > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]