Good stuff, Daniel...
Thanks for taking the time to tabulate the results and present them.
If your results hold, it may have a significant impact on my
application. I'm working on a Perl/XS port, and I think a lot of
people who want to run it won't be running mod_perl, so startup times
are quite important to me. I may end up setting the default
IndexInterval considerably higher than 128 as a result of this
discussion.
The formatting of the results turned up a little screwy in my email
reader, so here's a reformatted version...
Timings for a simple TermQuery on the term "one" (docFreq = 22):
skip time range for query (ms) approx mem usage of JVM (MB)
1 28 ~ 30 49.2
2 28 ~ 30
4 28 ~ 30
8 29 ~ 31
16 29 ~ 32 15.9 (!!)
32 29 ~ 33
64 38 ~ 42
128 59 ~ 61
256 99 ~ 102 14.1
Timings for a simple TermQuery on the term "test" (docFreq = 31,356):
skip time range for query (ms)
1 6.8 ~ 7.6
16 9.7 ~ 10.2
256 69 ~ 72
So, more frequent terms get a larger penalty due to this modification,
but the time was relatively fast to start with. Rarer terms get
less of
a penalty, perhaps because they already take so much longer to find.
This doesn't sound right to me. The time to locate the term via the
TermInfosReader shouldn't have anything to do with the doc_freq,
since that's kept as a single number in .tis and .tii. Within the
term dictionary, all terms are more or less created equal.
I'm only passingly familiar with the org.apache.lucene.search
package, so I'm not sure what could account for this; I would
normally expect a more common term to take longer, as there are more
docs to score. Anybody got a expanation handy?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]