1. redefine the archivedate field as YYmmDD format,
2. add another field using timestamp for sort use.
3. use RangeFilter to get result and then sort by timestamp.

2008/2/27, Jamie <[EMAIL PROTECTED]>:
>
> Hi Michael & Others
>
> Ok. I've gathered some more statistics from a different machine for your
> analysis.
> (I had to switch machines because the original one was in production and
> my tests were interfering).
>
> Here are the statistics from the new machine:
>
> Total Documents: 1.2 million
> Results Returned:  900k
> Store Size 238G (size of original documents)
> Index Size 1.2G (lucene index size)
> Index / Store Ratio 0.5%
>
> The search query is as follows:
>
> archivedate:[d20071229010000 TO d20080228235900]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~why there is an
> extra 'd' ?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> As you can see, I am using a range query to search between specific dates.
> Question: should this query be moved to a filter rather? I did not do
> this as I needed to have the option to sort on date.
>
> There are no other specific filters applied and in this example sorting
> is turned off.
>
> On this particular machine the search time varies between 2.64 seconds
> and about 5 seconds.
>
> The limitations of this machine are that it does uses a normal IDE drive
> to house the index, not a SATA drive
>
> IOStat Statistics
>
> Linux 2.6.20-15-server 27/02/2008
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          20.25    0.00    3.23    0.34    0.00   76.19
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               7.12        50.67       186.41   38936841  143240688
>
> See attached for hardware info and the CPU call tree (taken from YourKit).
>
> I would appreciate your recommendations.
>
>
> Jamie
>
>
> h t wrote:
> Hi Michael,
> I guess the hotspot of lucene is
> org.apache.lucene.search.IndexSearcher.search()
>
> Hi Jamie,
> What's the original text size of a million emails?
> I estimate the size of an email is around 100k, is this true?
> When you doing search, what kind keywords did you input, words or short
> sentence?
> How many results return?
> Did you use filter to shrink the results size?
>
> 2008/2/27, Michael Stoppelman <[EMAIL PROTECTED]>:
>   So you're saying searches are taking 10 seconds on a 5G index? If so
> that
> seems ungodly slow.
> If you're on *nix, have you watched your iostat statistics? Maybe
> something
> is hammering your hds.
> Something seems amiss.
>
> What lucene methods were pointed to as hotspots by YourKit?
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to