Good stuff, Daniel...

Thanks for taking the time to tabulate the results and present them. If your results hold, it may have a significant impact on my application. I'm working on a Perl/XS port, and I think a lot of people who want to run it won't be running mod_perl, so startup times are quite important to me. I may end up setting the default IndexInterval considerably higher than 128 as a result of this discussion.

The formatting of the results turned up a little screwy in my email reader, so here's a reformatted version...

Timings for a simple TermQuery on the term "one" (docFreq = 22):

   skip    time range for query (ms)    approx mem usage of JVM (MB)
     1      28 ~  30                     49.2
     2      28 ~  30
     4      28 ~  30
     8      29 ~  31
    16      29 ~  32                     15.9 (!!)
    32      29 ~  33
    64      38 ~  42
   128      59 ~  61
   256      99 ~ 102                     14.1

Timings for a simple TermQuery on the term "test" (docFreq = 31,356):

   skip    time range for query (ms)
     1       6.8 ~  7.6
    16       9.7 ~ 10.2
   256      69   ~ 72

So, more frequent terms get a larger penalty due to this modification,
but the time was relatively fast to start with. Rarer terms get less of
a penalty, perhaps because they already take so much longer to find.

This doesn't sound right to me. The time to locate the term via the TermInfosReader shouldn't have anything to do with the doc_freq, since that's kept as a single number in .tis and .tii. Within the term dictionary, all terms are more or less created equal.

I'm only passingly familiar with the org.apache.lucene.search package, so I'm not sure what could account for this; I would normally expect a more common term to take longer, as there are more docs to score. Anybody got a expanation handy?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to