How are you measuring? There is a bunch of setup work for the first few queries that go through the system. In either case (RAM or FS), you should fire a few representative warmup queries at the search engine before you go ahead and measure the response time.
You also *must* isolate your search time from your response assembly time. That is, if you have something like Hits hits = search() for (each element of hits) { do something with the hit } you MUST measure the time for the search() call exclusive of the for loop before you know where to concentrate your efforts. In this example, if you get more than 100 hits, your query is actually re-executed every 100 times through the above loop. There are other gotchas if you process your query results other ways, so be sure you know exactly what is taking the time before worrying about the proper way to speed things up. I strongly suspect that the RAMDir is a complete red herring. a 17M index will almost certainly be cached by the system after a bit of use. There's a whole section up on the Lucene website that talks about various ways to speed up processing.... Measure, *then* optimize <G>...... Best Erick On Wed, Aug 13, 2008 at 7:42 AM, Darren Govoni <[EMAIL PROTECTED]> wrote: > Hoss, > Thank you for the detailed response. What I found weird was it > seemed to take 0.09 seconds to create a RAMDirectory off a 17MB index. > Suspiciously fast, but ok. > > Yet, when I do a simple fuzzy search on a single field > > "word: someword~0.76" > > It was taking .35 seconds. That's a very very long time all things > considered. I understand about the OS paging and such but in > doing some variations of this to "throw the OS off", I still saw > no difference between on-disk and RAM times. But despite that, the > times are really slow. > > Any ideas? > > thanks again, > Darren > > On Tue, 2008-08-12 at 19:55 -0700, Chris Hostetter wrote: > > : On one index, I am seeing no speed change when flipping between > > : RAMDirectory IndexSearcher and file system version. > > > > that is probably because even if you just use an FSDirectory, your OS > will > > cache the disk "pages" in RAM for you -- all using a RAMDirectory does > for > > you is garuntee that the entire index is copied into the heap you > allocate > > for your JVM. If you've got 16GB or RAM, and a 5GB index, and you > > allocated 12GB of RAM to the JVM and read your index into a RAMDirectory, > > your index will always be in RAM, no matter what other processes do on > > your machine. > > > > If instead you only allocate 6GB of RAM to the JVM, and nothing else is > > using up the rest of your RAM, the OS has plenty to load the whole index > > into RAM as part of the filesystem cache once you use it -- but if > another > > process comes along and really needs that RAM (or if something reads a > lot > > of other pages of disk) your index might get bumped from the filesystem > > cache, and the next few reads could be slow. > > > > : Creating the RAMDirectory from the on-disk index only takes 0.09 > > : seconds. It appears it is not loading the data into memory, but maybe > > : just the file names of the index? > > > > passing an FSDIrectory to the constructor of a RAMDIrectory uses the > > Directory.copy() method whose source is fairly straight forward and easy > > to read -- unless your index is ginormous it's not suprising that it's > > "fast" particularly if it's already in the filesystem cache. > > > > > > > > > > -Hoss > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >