I'm not intimately familiar with FVH myself, but that sounds reasonable.
Tests usually don't lie. I'd definitely like to see a patched version
that avoids that!
Itamar.
On 22/06/2011 05:29, Michael Sokolov wrote:
OK - it seems as if there is a blow-up in FieldPhraseList if a
document has a la
OK - it seems as if there is a blow-up in FieldPhraseList if a document
has a large number of occurrences of a term that is in the query. In
one example, I searched for "1", and this occurs just under 2000 times
in one of my test documents (as the value of HTML attributes).
Admittedly a weird
I did that, and the benchmark indicates FVH is 10x faster than
Highlighter now. I ran with a subset of the wikipedia data since I
didn't want to deal with the whole thing. I'm trying to reconcile these
weirdly varying results. One difference is that the benchmark doesn't
use PhraseQueries -
Koji- I'm not familiar with the benchmarking system, but maybe I'll see
if I can run that benchmark on my test data as a point of comparison -
thanks for the pointer!
-Mike
On 6/20/2011 8:21 PM, Koji Sekiguchi wrote:
Mike,
FVH used to be faster for large docs. I wrote FVH section for Lucene
Mike,
FVH used to be faster for large docs. I wrote FVH section for Lucene in Action
and it said:
In contrib/benchmark (covered in appendix C), there’s an algorithm
file called highlight-vs-vector-highlight.alg that lets you see the difference
between two highlighters in processing time. As of
Our apps use highlighting, and I expect that highlighting is an
expensive operation since it requires processing the text of the
documents, but I ran a test and was surprised just how expensive it is.
I made a test index with three fields: path, modified, and contents. I
made the index using