Hi Michael, I guess the hotspot of lucene is org.apache.lucene.search.IndexSearcher.search()
Hi Jamie, What's the original text size of a million emails? I estimate the size of an email is around 100k, is this true? When you doing search, what kind keywords did you input, words or short sentence? How many results return? Did you use filter to shrink the results size? 2008/2/27, Michael Stoppelman <[EMAIL PROTECTED]>: > > So you're saying searches are taking 10 seconds on a 5G index? If so that > seems ungodly slow. > If you're on *nix, have you watched your iostat statistics? Maybe > something > is hammering your hds. > Something seems amiss. > > What lucene methods were pointed to as hotspots by YourKit? > > > -M > > > On Tue, Feb 26, 2008 at 2:13 PM, Jamie <[EMAIL PROTECTED]> wrote: > > > Hi Michael > > > > Perhaps this will help. We are using Lucene to index emails and provide > > a search interface to search through those emails. Many of our customers > > have 3-5 TB's or more of email data. The index size tends to be around 5 > > GB per million messages. On a 3 GHZ intel core duo with standard 7200 mb > > drive, it takes approx. 10 seconds to search across a million emails. We > > need sub second search times, especially since, as time progresses, some > > of our archives are expected to reach 10-20 TB of data. In future, we > > will be recommending the use of SSD drives, but I'd like to know if they > > are any other strategies can pursued. One such strategy is to > > automatically create a new index after the index gets to a certain size. > > Then, when a search is conducted, based on date, search only those > > indexes that fall between specified dates. I've run my code through the > > YourKit profiler. The time appears to be consumed by Lucene itself and > > not by my code. > > > > Any other ideas? > > > > > > Michael Stoppelman wrote: > > > On Tue, Feb 26, 2008 at 10:18 AM, Jamie <[EMAIL PROTECTED]> > wrote: > > > > > > > > >> Hi > > >> > > >> I am looking for a way to improve the search performance of my > > >> application. I've followed every suggestion in the Lucene Wiki but > the > > >> search is still too slow with large indexes. I was wondering whether > > >> > > > > > > > > > Did you optimize your index yet? That gave me a 2x bump. > > > > > > Have you put timers around parts of your code? Maybe it's something > > > unrelated to lucene. > > > You should probably give more details on your setup if you want more > > helpful > > > advice. > > > > > > > > > > > >> there was a way to restrict a search to a specific time period and in > > >> doing so sacrifice the quality of search results? Any other > suggestions > > >> on how to improve search performance? > > >> > > >> Much appreciate > > >> > > >> Jamie > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > > >> For additional commands, e-mail: [EMAIL PROTECTED] > > >> > > >> > > >> > > > > > > > > > > > > -- > > Stimulus Software - MailArchiva > > Email Archiving And Compliance > > USA Tel: +1-713-366-8072 ext 3 > > UK Tel: +44-20-80991035 ext 3 > > Email: [EMAIL PROTECTED] > > Web: http://www.mailarchiva.com > > > > To receive MailArchiva Enterprise Edition product announcements, send a > > message to: <[EMAIL PROTECTED]> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > >