Hi Jamie, Are you running concurrent searches on the index i.e. spawning multiple threads and not handling them? I have been having similar issues and I am planning to try out a workaround for it using Java's Interface Executor. http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Executor.html This might help as it would control the resource utilization and of what I know it as, it would pool resources and threads to handle concurrency, thereby decreasing time. Let me know in case you come across something better.
-- Anshum On Wed, 2008-02-27 at 07:37 +0530, h t wrote: > Hi Michael, > I guess the hotspot of lucene is > org.apache.lucene.search.IndexSearcher.search() > > Hi Jamie, > What's the original text size of a million emails? > I estimate the size of an email is around 100k, is this true? > When you doing search, what kind keywords did you input, words or short > sentence? > How many results return? > Did you use filter to shrink the results size? > > 2008/2/27, Michael Stoppelman <[EMAIL PROTECTED]>: > > > > So you're saying searches are taking 10 seconds on a 5G index? If so that > > seems ungodly slow. > > If you're on *nix, have you watched your iostat statistics? Maybe > > something > > is hammering your hds. > > Something seems amiss. > > > > What lucene methods were pointed to as hotspots by YourKit? > > > > > > -M > > > > > > On Tue, Feb 26, 2008 at 2:13 PM, Jamie <[EMAIL PROTECTED]> wrote: > > > > > Hi Michael > > > > > > Perhaps this will help. We are using Lucene to index emails and provide > > > a search interface to search through those emails. Many of our customers > > > have 3-5 TB's or more of email data. The index size tends to be around 5 > > > GB per million messages. On a 3 GHZ intel core duo with standard 7200 mb > > > drive, it takes approx. 10 seconds to search across a million emails. We > > > need sub second search times, especially since, as time progresses, some > > > of our archives are expected to reach 10-20 TB of data. In future, we > > > will be recommending the use of SSD drives, but I'd like to know if they > > > are any other strategies can pursued. One such strategy is to > > > automatically create a new index after the index gets to a certain size. > > > Then, when a search is conducted, based on date, search only those > > > indexes that fall between specified dates. I've run my code through the > > > YourKit profiler. The time appears to be consumed by Lucene itself and > > > not by my code. > > > > > > Any other ideas? > > > > > > > > > Michael Stoppelman wrote: > > > > On Tue, Feb 26, 2008 at 10:18 AM, Jamie <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > > > >> Hi > > > >> > > > >> I am looking for a way to improve the search performance of my > > > >> application. I've followed every suggestion in the Lucene Wiki but > > the > > > >> search is still too slow with large indexes. I was wondering whether > > > >> > > > > > > > > > > > > Did you optimize your index yet? That gave me a 2x bump. > > > > > > > > Have you put timers around parts of your code? Maybe it's something > > > > unrelated to lucene. > > > > You should probably give more details on your setup if you want more > > > helpful > > > > advice. > > > > > > > > > > > > > > > >> there was a way to restrict a search to a specific time period and in > > > >> doing so sacrifice the quality of search results? Any other > > suggestions > > > >> on how to improve search performance? > > > >> > > > >> Much appreciate > > > >> > > > >> Jamie > > > >> > > > >> > > > >> --------------------------------------------------------------------- > > > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > > > >> For additional commands, e-mail: [EMAIL PROTECTED] > > > >> > > > >> > > > >> > > > > > > > > > > > > > > > > > -- > > > Stimulus Software - MailArchiva > > > Email Archiving And Compliance > > > USA Tel: +1-713-366-8072 ext 3 > > > UK Tel: +44-20-80991035 ext 3 > > > Email: [EMAIL PROTECTED] > > > Web: http://www.mailarchiva.com > > > > > > To receive MailArchiva Enterprise Edition product announcements, send a > > > message to: <[EMAIL PROTECTED]> > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]