On Thu, 2009-12-17 at 07:48 +0100, Ganesh wrote: > I am using v2.9.1 I am having multiple shards and i need to do only > date time sorting. Sorting consumes 50% of RAM.
I'm guessing that your date-times are representable in 32 bits (signed seconds since epoch or such - that'll work until 2038)? If so, it should be possible to do very efficient sorting, both memory- and performance-wise. Make your own sorter by implementing SortComparatorSource. The SortComparatorSource returns a ScoreDocComparator which contains an array of longs, in which the first 32 bits designates the order of the document at the given index (docID) and the last 32 bits holds the date time. The ScoreDocComparator's methods are trivial: public int compare (ScoreDoc i, ScoreDoc j) { return order[i.doc] - order[j.doc]; // Or is it the other way? I always mix them up } public Comparable sortValue (ScoreDoc i) { return Integer.valueOf((int)(order[i.doc] & 0xFFFFFFFF)); } public int sortType(){ return SortField.CUSTOM; } Now, for generating the order-array, we do something like this in the SortComparatorSource: TermDocs termDocs = reader.termDocs(); // inverted[docID] == datetime | docID long[] inverted = new long[reader.maxDoc]; TermEnum termEnum = reader.terms(new Term(fieldname, "")); do { Term term = termEnum.term(); if (term == null || !fieldname.equals(term.field())) { break; } long dateTime = (long)stringDateTimeToInt(term.text()) << 32; termDocs.seek(termEnum); while (termDocs.next()) { inverted[termDocs.doc()] = dateTime | termDocs.doc(); } } while (termEnum.next()); termEnum.close(); // inverted[order] == datetime | docID Arrays.sort(inverted); // works for date time 1970+ // order[docID] == order | datetime long[] order = new long[inverted.length]; for (long o = 0 ; o < inverted.length ; o++) { int docID = (int)(inverted[o] & 0xFFFFFFFF); order[docID] = (o << 32) | (inverted[o] >>> 32); } It would be nice to avoid the extra array needed for reordering, but I'm fresh out of ideas. Still, the memory-overhead is just 8 bytes (long) * 2 (arrays) * maxDoc and performance should high as Arrays.sort(long[]) is fast and everything runs without taxing the garbage collector. Caveat lector: I haven't implemented the stuff above, so it's just an idea written in not-so-pseudo code. Regards, Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org