On May 4, 2008, at 7:28 PM, DanaWhite wrote:
I arrived at this MAP by modifying IndexFiles to use a StopAnalyzer
and work
in a way that was acceptable for TReC files. The SearchFiles was
modified
to use a StopAnalyzer and output data in a trec_eval suitable format.
Trec_eval reports about 11% at this setting.
I am not competing in TReC I am just doing an evaluation of
different search
engines.
At this point I am not going to add anything to Lucene to get a
higher MAP
because I am trying to get a feel for its "out of the box"
performance.
It's kind of tough to say what an "out of the box" experience is in
Lucene, so I frankly wouldn't read to much into any numbers you arrive
at on TREC. For instance, it is curious that you chose the
StopAnalyzer over the more "out of the box" StandardAnalyzer. If
anything were out of the box, I guess it would be, given the name, the
StandardAnalyzer, but that isn't too say it will do any better, I
haven't tried it. Most studies, have also shown that stemming is
beneficial, but neither of those analyzers offer stemming. Remember,
Lucene really is just the canvas, paint and the brushes, it's up to
you to do the actual painting.
Just my advice, make sure you are comparing apples to apples, or at
least as close as you can reasonably get. I think you will find that
Lucene stacks up quite well.
Cheers,
Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]