RE: Alternative way to simulate sorting without doing actual sort

Uwe Schindler Thu, 23 Jul 2009 02:47:20 -0700

I would propose to not sort the date/time by its string value, instead I
would try to represent the date/time as a integer value (e.g. the long
returned by Date.getTime()). If you do not need precision to the
millisecond, you could divide it by some value, e.g.
Date.getTime()/(1000L*60L) to have it in full minutes. Because lower
precision, you can then cast the value to an int and index this integer as a
number. E.g. by using the new NumericField in 2.9 (if you also need range
fast numeric queries on this field) or simply using Integer.toString(). You
can then use SortField.INT to sort. The only problem is that all NULL values
are represented by the integer 0, so if you have negative values, the
0-vallue is somewhere in the middle.


Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Ganesh [mailto:emailg...@yahoo.co.in]
> Sent: Thursday, July 23, 2009 8:01 AM
> To: java-user@lucene.apache.org
> Subject: Re: Alternative way to simulate sorting without doing actual sort
> 
> Hello Eric,
> 
> I agree, the number of unique terms might be less, but [ 4 *
> reader.maxdoc() * different fields ] will increase the memory consumption.
> I am having 100 million records spread across 10 DB. 4 * 100M is itself
> 400 MB. If i try to use 2 fields for sorting then it would be 800 MB. The
> unique terms may be less but this array itself consume huge memory.
> 
> I am not able to understand you approach of splitting date time would
> reduce memory consumption.
> 
> Regards
> Ganesh
> 
> ----- Original Message -----
> From: "Erick Erickson" <erickerick...@gmail.com>
> To: <java-user@lucene.apache.org>
> Sent: Thursday, July 23, 2009 1:55 AM
> Subject: Re: Alternative way to simulate sorting without doing actual sort
> 
> 
> >I was assuming you were storing things as strings, in which case
> > it works something like this:
> > Let's say you broke it up into
> > YYYY
> > MM
> > DD
> > HH
> > MM
> >
> > The number of unique terms that need to be kept in
> > memory to sort is just (let's say your documents
> > span 100 years)
> > 100 + 12 + 31 + 24 + 60.
> >
> > But that's a much different case.
> >
> > On Wed, Jul 22, 2009 at 5:31 AM, Ganesh <emailg...@yahoo.co.in> wrote:
> >
> >> Hello Eric,
> >>
> >> Thanks for your reply.
> >>
> >> Memory reqd for sorting: 4 * reader.maxdoc()
> >> .
> >> I am sorting datetime with minute resolution. 100 records are
> representing
> >> a minute then in a 1 million record database, there will be around
> 20000
> >> unique terms. the amount of memory consumed would be 4 * 1000000 +
> 20000 * 8
> >> [Considering date time as Long]
> >>
> >> The more amount of memory consumed by 4 * reader.maxdoc. If i have two
> or
> >> three fields say  (YYYMMDD, hh, mm) then the amount of memory
> consumption
> >> would be too high. How could you say that splitting the field will help
> in
> >> reducing the memory usage.
> >>
> >> Please correct me if i am wrong. I require some justification to split
> the
> >> date in to multiple terms.
> >>
> >> Regards
> >> Ganesh
> >>
> >>
> >> ----- Original Message -----
> >> From: "Erick Erickson" <erickerick...@gmail.com>
> >> To: <java-user@lucene.apache.org>
> >> Sent: Tuesday, July 21, 2009 7:29 PM
> >> Subject: Re: Alternative way to simulate sorting without doing actual
> sort
> >>
> >>
> >> > Have you tried splitting your times into separate fields, perhaps one
> >> with
> >> > YYYYMMDD and another with HHMM, then do a primary sort on the YYYMMDD
> and
> >> > secondary on HHMM. That'll reduce your total unique values greatly
> and
> >> > should improve your memory consumption.
> >> > Best
> >> > Erick
> >> >
> >> > On Tue, Jul 21, 2009 at 4:27 AM, Ganesh <emailg...@yahoo.co.in>
> wrote:
> >> >
> >> >> Hello all
> >> >>
> >> >> I am sorting on datetime with minute resolution. It easily reaches
> the
> >> >> maximum heap size. I am having almost 100M records and it is using
> 1.5
> >> GB. I
> >> >> am now in a situitation to stop sorting and to find some other
> >> alternative
> >> >> way.
> >> >>
> >> >> I tried adding document boost and field boost for date time.
> document
> >> boost
> >> >> alone is not working. document boost and field boost has impact on
> >> score.
> >> >> Search on datetime gives me the sorted datetime results but search
> on
> >> any
> >> >> other field didn't works.
> >> >>
> >> >> I am doing updates and it changes the doc id.. I want to get the
> results
> >> >> sorted by FIRST TIME inserted order. Updates should not disturb the
> >> results
> >> >> set. I think Solr has some facilities to get the list of recently
> added
> >> >> documents.
> >> >>
> >> >> Any ideas are greatly appreciated.
> >> >>
> >> >> Regards
> >> >> GaneshSend instant messages to your online friends
> >> >> http://in.messenger.yahoo.com
> >> >>
> >> >> --------------------------------------------------------------------
> -
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> Send instant messages to your online friends
> http://in.messenger.yahoo.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
> Send instant messages to your online friends http://in.messenger.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Alternative way to simulate sorting without doing actual sort

Reply via email to