On Tue, 28 Apr 2009, Max Lynch wrote:
I am trying to get a list of all terms that matched a document. So, if I
search for "John Smith", I want to know if I found "John Smith" specifically
in the document. I can use the lucene results but I need to do more
processing based on exactly what was found. I am using a highlighter and
formatter for this, but if I use the QueryScorer it breaks up the phrase
into "John" and "Smith", but only if the whole name was found. I have
uncovered that maybe the SpanScorer would preserve the whole phrase, but
when I try to use it I get NotImplementedError. Has it not been interfaced
yet? Is it a difficult thing to do?
If you are trying to use the highlighter package's SpanScorer class, there
may be a problem with it clashing (by name) with the
org.apache.lucene.search.spans.SpanScorer class:
>>> import lucene
>>> lucene.initVM(lucene.CLASSPATH)
>>> lucene.SpanScorer.class_
<Class: class org.apache.lucene.search.spans.SpanScorer>
But without a specific example of what you're trying to do, it's mostly
just guesswork here.
If I guessed this right, enhancing JCC so that specific classes involved in
a name clash can be renamed in Python (because java packages are flattened
out in Python, yet not in the underlying generated C++) shouldn't be too
hard.
Could you please include a piece of code that reproduces the problem ?
Thanks !
Andi..