Op Wednesday 19 November 2008 00:43:56 schreef Tim Sturge: > I've finished a query time implementation of a column stride filter, > which implements DocIdSetIterator. This just builds the filter at > process start and uses it for each subsequent query. The index itself > is unchanged. > > The results are very impressive. Here are the results on a 45M > document index: > > Firstly without an age constraint as a baseline: > > Query "+name:tim" > startup: 0 > Hits: 15089 > first query: 1004 > 100 queries: 132 (1.32 msec per query) > > Now with a cached filter. This is ideal from a speed standpoint but > there are too many possible start/end combinations to cache all the > filters. > > Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on cached > RangeFilter) startup: 3 > Hits: 11156 > first query: 1830 > 100 queries: 287 (2.87 msec per query) > > Now with an uncached filter. This is awful. > > Query "+name:tim age:[18 TO 35]" (uncached ConstantScoreRangeQuery) > startup: 3 > Hits: 11156 > first query: 1665 > 100 queries: 51862 (yes, 518 msec per query, 200x slower) > > A RangeQuery is slightly better but still bad (and has a different > result set) > > Query "+name:tim age:[18 TO 35]" (uncached RangeQuery) > startup: 0 > Hits: 10147 > first query: 1517 > 100 queries: 27157 (271 msec is 100x slower than the filter) > > Now with the prebuilt column stride filter: > > Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on prebuilt > column stride filter)
With "Allow Filter as clause to BooleanQuery": https://issues.apache.org/jira/browse/LUCENE-1345 one could even skip the ConstantScoreQuery with this. Unfortunately 1345 is unfinished for now. > startup: 2811 > Hits: 11156 > first query: 1395 > 100 queries: 441 (back down to 4.41msec per query) > > This is less than 2x slower than the dedicated bitset and more than > 50x faster than the range boolean query. > > Mike, Paul, I'm happy to contribute this (ugly but working) code if > there is interest. Let me know and I'll open a JIRA issue for it. In case you think more performance improvements based on this are possible... Regards, Paul Elschot. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]