Re: lucene 2.9.0RC4 slower than 2.4.1?

Thomas Becker Tue, 15 Sep 2009 06:39:02 -0700

Urm and uploaded here:
http://ankeschwarzer.de/tmp/graph.jpg


Sorry.

Thomas Becker wrote:
> Missed the attachment, sorry.
> 
> Thomas Becker wrote:
>> Hi all,
>>
>> I'm experiencing a performance degradation after migrating to 2.9 and running
>> some tests. I'm getting out of ideas and any help to identify the reasons why
>> 2.9 is slower than 2.4 are highly appreciated.
>>
>> We've had some issues with custom sorting in lucene 2.4.1. We worked around 
>> them
>> by sorting the resultsets manually and caching the results after sorting 
>> (memory
>> consuming but fast).
>>
>> I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource
>> implementation for sorting and refactored all deprecated api calls to the new
>> lucene 2.9 api.
>>
>> Everything works fine from a functional perspective. But performance severly 
>> is
>> (negatively) affected by lucene 2.9.
>>
>> I profiled the application for a couple of hours, build a jmeter load test 
>> and
>> compared the following scenarios:
>>
>> 1. lucene 2.9 - new api
>> 2. lucene 2.9 - old api and custom sorting after lucene
>> 3. lucene 2.4.1 - old api and custom sorting after lucene (what we had 
>> up2now)
>>
>> Please find attached an rrd graph showing the results. The lighter the color 
>> the
>> faster the request has been served. y=# requests, x=time.
>>
>> Most interestingly simply switching the lucene jars between 2.4 and 2.9 
>> degraded
>> response times and therefore throughput (see results of testcase 2 and 3).
>> Adapting to the new api decreased performance again. The difference between
>> testcase 1 and 2 is most probably due to precached custom sorted results.
>>
>> The application under test is a dedicated lucene search engine doing nothing
>> else, but serving search requests. We're running a cluster of them in prd and
>> it's incredibly fast. With the old implementation and prd traffic we've above
>> 98% of the requests served in 200ms.
>> The index under test contains about 3 million documents (with lots of 
>> fields),
>> consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by
>> the linux kernel.
>>
>> Most interesting methods used for searching are:
>>
>> getHitsCount (is there a way to speed this up?):
>>
>>      public int getHitsCount(String query, Filter filter) throws
>> LuceneServiceException {
>>              log.debug("getHitsCount('{}, {}')", query, filter);
>>              if (StringUtils.isBlank(query)) {
>>                      log.warn("getHitsCount: empty lucene query");
>>                      return 0;
>>              }
>>              long startTimeMillis = System.currentTimeMillis();
>>              int count = 0;
>>
>>              if (indexSearcher == null) {
>>                      return 0;
>>              }
>>
>>              BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>              Query q = null;
>>              try {
>>                      q = createQuery(query);
>>                      TopScoreDocCollector tsdc = 
>> TopScoreDocCollector.create(1, true);
>>                      indexSearcher.search(q, filter, tsdc);
>>                      count = tsdc.getTotalHits();
>>                      log.info("getHitsCount: count = {}",count);
>>              } catch (ParseException ex) {
>>                      throw new LuceneServiceException("invalid lucene 
>> query:" + query, ex);
>>              } catch (IOException e) {
>>                      throw new LuceneServiceException(" indexSearcher could 
>> be corrupted", e);
>>              } finally {
>>                      long durationMillis = System.currentTimeMillis() - 
>> startTimeMillis;
>>                      if (durationMillis > slowQueryLimit) {
>>                              log.warn("getHitsCount: Slow query: {} ms, 
>> query={}", durationMillis, query);
>>                      }
>>                      log.debug("getHitsCount: query took {} ms", 
>> durationMillis);
>>              }
>>              return count;
>>      }
>>
>> search:
>>      public List<Document> search(String query, Filter filter, Sort sort, 
>> int from,
>> int size) throws LuceneServiceException {
>>              log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { 
>> indexAlias, query,
>> filter, sort, from, size });
>>              long startTimeMillis = System.currentTimeMillis();
>>
>>              List<Document> docs = new ArrayList<Document>();
>>              if (indexSearcher == null) {
>>                      return docs;
>>              }
>>              Query q = null;
>>              try {
>>                      if (query == null) {
>>                              log.warn("search: lucene query is null...");
>>                              return docs;
>>                      }
>>                      q = createQuery(query);
>>                      BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>                      if (size < 0 || size > maxNumHits) {
>>                              // set hard limit for numHits
>>                              size = maxNumHits;
>>                              if (log.isDebugEnabled())
>>                                      log.debug("search: Size set to 
>> hardlimit: {} for query: {} with filter:
>> {}", new Object[] { size, query, filter });
>>                      }
>>                      TopFieldCollector collector = 
>> TopFieldCollector.create(sort, size + from,
>> true, false, false, true);
>>                      indexSearcher.search(q, filter, collector);
>>                      if(size > collector.getTotalHits())
>>                              size = collector.getTotalHits();
>>                      if (size > 100000)
>>                              log.info("search: size: {} bigger than 100.000 
>> for query: {} with filter:
>> {}", new Object[] { size, query, filter });
>>                      TopDocs td = collector.topDocs(from, size);
>>                      ScoreDoc[] scoreDocs = td.scoreDocs;
>>                      for (ScoreDoc scoreDoc : scoreDocs) {
>>                              docs.add(indexSearcher.doc(scoreDoc.doc));
>>                      }
>>              } catch (ParseException e) {
>>                      log.warn("search: ParseException: {}", e.getMessage());
>>                      if (log.isDebugEnabled())
>>                              log.warn("search: ParseException: ", e);
>>                      return Collections.emptyList();
>>              } catch (IOException e) {
>>                      log.warn("search: IOException: ", e);
>>                      return Collections.emptyList();
>>              } finally {
>>                      long durationMillis = System.currentTimeMillis() - 
>> startTimeMillis;
>>                      if (durationMillis > slowQueryLimit) {
>>                              log.warn("search: Slow query: {} ms, query={}, 
>> indexUsed={}",
>>                                              new Object[] { durationMillis, 
>> query,
>> indexSearcher.getIndexReader().directory() });
>>                      }
>>                      log.debug("search: query took {} ms", durationMillis);
>>              }
>>              return docs;
>>      }
>>
>> I'm wondering why others are experiencing better performance with 2.9 and why
>> our implementations performance is going bad. Maybe our way of using the 2.9 
>> api
>> is not the best and sorting is definetly expensive.
>>
>> Any ideas are appreciated. I'm torning out my hair since hours and days to 
>> find
>> the root cause. Also hints how I could find the bottlenecks myself are 
>> appreciated.
>>
>> Cheers,
>> Thomas
>>
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org

-- 
Thomas Becker
Senior JEE Developer

net mobile AG
Zollhof 17
40221 Düsseldorf
GERMANY

Phone:    +49 211 97020-195
Fax:      +49 211 97020-949
Mobile:   +49 173 5146567 (private)
E-Mail:   mailto:thomas.bec...@net-m.de
Internet: http://www.net-m.de

Registergericht:  Amtsgericht Düsseldorf, HRB 48022
Vorstand:         Theodor Niehues (Vorsitzender), Frank Hartmann,
                 Kai Markus Kulas, Dieter Plassmann
Vorsitzender des
Aufsichtsrates:   Dr. Michael Briem

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Reply via email to