Hey Mark, thanks for your reply. Will do. Results will follow in a couple of minutes.
Yes the custom sorts are doing something tricky. :) I'll try to explain them in few words and paste the code. But even w/o them 2.9 is slower. Testcase 2 and 3 have only different lucene jars. CustomFieldComparatorPrefix.java: a field containing for example releaseDates for sorting. But there's different releaseDates for a single document and different countries for example. They're prefixed and comma separated in a single field. Here's the code: public final class CustomFieldComparatorPrefix extends FieldComparatorSource { /** * */ private static final long serialVersionUID = 200907240001L; private final String prefix; public CustomFieldComparatorPrefix(String prefix) { this.prefix = prefix; } /* * (non-Javadoc) * * @see * org.apache.lucene.search.FieldComparatorSource#newComparator(java.lang * .String, int, int, boolean) */ public FieldComparator newComparator(final String fieldname, final int numHits, int sortPos, boolean reversed) throws IOException { return new FieldComparator() { private int[] currentReaderValues; private int[] values = new int[numHits]; private int bottom; /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#compare(int, int) */ public int compare(int slot1, int slot2) { // TODO: there are sneaky non-branch ways to compute // -1/+1/0 sign // Cannot return values[slot1] - values[slot2] because that // may overflow final int v1 = values[slot1]; final int v2 = values[slot2]; if (v1 > v2) { return 1; } else if (v1 < v2) { return -1; } else { return 0; } } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#compareBottom(int) */ public int compareBottom(int doc) throws IOException { // TODO: there are sneaky non-branch ways to compute // -1/+1/0 sign // Cannot return bottom - values[slot2] because that // may overflow final int v2 = currentReaderValues[doc]; if (bottom > v2) { return 1; } else if (bottom < v2) { return -1; } else { return 0; } } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#copy(int, int) */ public void copy(int slot, int doc) throws IOException { values[slot] = currentReaderValues[doc]; } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#setBottom(int) */ public void setBottom(int slot) { this.bottom = values[slot]; } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#sortType() */ public int sortType() { return SortField.CUSTOM; } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#value(int) */ public Comparable<Integer> value(int slot) { return new Integer(values[slot]); } @Override public void setNextReader(IndexReader reader, int docBase) throws IOException { currentReaderValues = FieldCache.DEFAULT.getInts(reader, fieldname, new PrefixedIntParser(prefix)); } }; } } CustomFieldComparatorPosition.java: works similar to the one above. But in the field are different positions for different contentgroups. Example "10_1,11_5,14_1". Whereas the prefix are defining the contentgroup id and the digit after the underscore is the actual position to define the documents place in the sort order. This one could in theory use the same comparator as above, but the app will run oom due to too many different contentgroups. Since only few (~280.000) documents have positions set I wrote my own lucene cache implementation storing only not default values. Comparator Source: public final class CustomFieldComparatorPosition extends FieldComparatorSource { private final Logger log = LoggerFactory.getLogger(this.getClass()); /** * */ private static final long serialVersionUID = 200907240001L; private final String prefix; private static PositionCache cache; private StopWatch sw = new StopWatch(); public CustomFieldComparatorPosition(String prefix) { if (cache == null) { log.debug("CustomFieldComparatorPosition:initializing PositionCache"); cache = new PositionCache(); } this.prefix = prefix; } /* * (non-Javadoc) * * @see * org.apache.lucene.search.FieldComparatorSource#newComparator(java.lang * .String, int, int, boolean) */ public FieldComparator newComparator(final String fieldname, final int numHits, int sortPos, boolean reversed) throws IOException { return new FieldComparator() { private Map<Integer, Integer> currentReaderValues; private int[] values = new int[numHits]; private int bottom; /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#compare(int, int) */ public int compare(int slot1, int slot2) { // TODO: there are sneaky non-branch ways to compute // -1/+1/0 sign // Cannot return values[slot1] - values[slot2] because that // may overflow final int v1 = values[slot1]; final int v2 = values[slot2]; if (v1 > v2) { return 1; } else if (v1 < v2) { return -1; } else { return 0; } } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#compareBottom(int) */ public int compareBottom(int doc) throws IOException { int i; try { i = currentReaderValues.get(doc); } catch (NullPointerException e) { i = 999999; } // TODO: there are sneaky non-branch ways to compute // -1/+1/0 sign // Cannot return bottom - values[slot2] because that // may overflow final int v2 = i; if (bottom > v2) { return 1; } else if (bottom < v2) { return -1; } else { return 0; } } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#copy(int, int) */ public void copy(int slot, int doc) throws IOException { // This will be executed n times where n is the amount of // documents in the index. int value; try{ value = currentReaderValues.get(doc); }catch(NullPointerException e){ value = 999999; } values[slot] = value; } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#setBottom(int) */ public void setBottom(int slot) { this.bottom = values[slot]; } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#sortType() */ public int sortType() { return SortField.CUSTOM; } /* * (non-Javadoc) * * @see org.apache.lucene.search.FieldComparator#value(int) */ public Comparable<Integer> value(int slot) { return new Integer(values[slot]); } @Override public void setNextReader(IndexReader reader, int docBase) throws IOException { sw.start(); try { currentReaderValues = cache.getInts(reader, fieldname, new PrefixedIntParser(prefix)); } catch (InterruptedException e) { throw new IllegalStateException(e.getCause()); } sw.stop(); if (sw.getTime() > 3000) { log.info("setNextReader: Slow: Time to get currentReaderValues from cache: {}ms. Items in cache: {}", sw.getTime(), cache.getItemCount()); } } }; } } PositionCache.java: /** * This cache implementation caches the position values of items stored as * documents in lucene. This cache has a WeakHashMap with an IndexReader * reference as a key thus if the indexReader reference gets deleted, the cache * is marked to be gced. The innerCache is a Map containing field + parser * (contracttocontentgroup prefix) as the key and as a value yet another map. * The latter map finally contains the docIds as key and positionvalue for this * prefix as value. * * @author Thomas Becker (thomas.bec...@net-m.de) * */ public class PositionCache { final Map<Object, Map<String, Future<Map<Integer, Integer>>>> readerCache = new WeakHashMap<Object, Map<String, Future<Map<Integer, Integer>>>>(); AtomicInteger itemCount = new AtomicInteger(0); // only for debugging...only increasing public Map<Integer, Integer> getInts(final IndexReader reader, final String field, final IntParser parser) throws InterruptedException { String key = parser.toString(); // Future<Map<Integer,Integer> contains docId as key and position as // value HashMap<String, Future<Map<Integer, Integer>>> innerCache; final Object readerKey = reader.getFieldCacheKey(); synchronized (readerCache) { innerCache = (HashMap<String, Future<Map<Integer, Integer>>>) readerCache.get(readerKey); if (innerCache == null) { innerCache = new HashMap<String, Future<Map<Integer, Integer>>>(); readerCache.put(readerKey, innerCache); } } Future<Map<Integer, Integer>> f = innerCache.get(key); if (f == null) { Callable<Map<Integer, Integer>> eval = new Callable<Map<Integer, Integer>>() { public Map<Integer, Integer> call() throws InterruptedException, IOException { HashMap<Integer, Integer> docPositions = new HashMap<Integer, Integer>(); TermDocs termDocs = reader.termDocs(); TermEnum termEnum = reader.terms(new Term(field)); try { do { Term term = termEnum.term(); if (term == null || term.field() != field) break; int termval = parser.parseInt(term.text()); termDocs.seek(termEnum); while (termDocs.next()) { // do not store defaults to save memory if (termval < 999999) { itemCount.incrementAndGet(); docPositions.put(termDocs.doc(), termval); } } } while (termEnum.next()); } finally { termDocs.close(); termEnum.close(); } return docPositions; } }; FutureTask<Map<Integer, Integer>> ft = new FutureTask<Map<Integer, Integer>>(eval); f = innerCache.put(key, ft); if (f == null) { f = ft; ft.run(); } } try { return f.get(); } catch (CancellationException e) { innerCache.remove(key); } catch (ExecutionException e) { throw new IllegalStateException(e.getCause()); } return null; } Cheers, Thomas Mark Miller wrote: > Hey Thomas - any chance you can do some quick profiling and grab the > hotspots from the 3 configurations? > > Are your custom sorts doing anything tricky? > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org