The following addresses the reuse of the Spans object and fixes a bug in
checking for required clauses (boolean test was reversed). Again, the
only testing I have done involves one doc for "real" hit highlighting so
your mileage may vary. Attempt number two:
import org.apache.lucene.index.Ind
That may be the difference then. I'm actually working with both a complete
index and a memory index, depending on what phase I'm in. It turns out that
I probably can't put the document in a memoryindex on the fly
because...well...because ... That said, though, I can pretty easily use
this as a bas
1> my test cases throw some exceptions with the code as-is. The
spans.get(0)
is a problem in that it's not guaranteed that the spans returned will
have
anything in them. Also, I don't think that the test for
reqSpans.get(0).next
in queryClauses[i].isRequired is correct (even if it doesn't th
It must be time to eat lunch, since the more I stare at this code, the less
sense it makes to me. Which is a sure sign that I need a break .
But a couple of things.
1> my test cases throw some exceptions with the code as-is. The spans.get(0)
is a problem in that it's not guaranteed that the
Excellent! I'll give it a whirl in the morning. This may keep me from having
to rebuild my index as well, oh joy!
Thanks
Erick
On 2/15/07, Mark Miller <[EMAIL PROTECTED]> wrote:
Here is my initial attempt...I believe it might be sufficient:
import org.apache.lucene.index.IndexReader;
import o
Here is my initial attempt...I believe it might be sufficient:
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.PhraseQuery;
import org.apac
Mark:
Thanks, that reassures me that I'm not hallucinating. If it gets on my
priority list I can certainly share the code, since I stole it in the first
place . I have a semi-solution for now that gets me out from under the
immediate problem, but it really wants a more robust solution than the on
Good catch Erick! I'll have to tackle this as well. Mark H is the
originator of that code so maybe he will chime in, but what I am think
is this:
In the getSpansFromBooleanquery, keep track of which clauses are
required. Then based on if any Spans are actually returned from
getSpansFromTerm f
I hope you're all following this old thread, because I've just run into
something I don't quite know what to do about with the SpansExtractor code
that I shamelessly stole.
Let's say my text is "a b c d e f g h" and my query is "a AND z". The
implementation I stole for SpansExtractor (mentioned s
mark harwood wrote:
Hi Mark,
Have you looked at the returned spans from any other potential problem scenarios (other than the 3
word one you suggest) e.g. complex nested "SpanOr" or "SpanNot" logic?
Nothing super intense, but I haved look at some semi complex nesting and
it all looks great
g API so would probably have to create a new SpansBasedHighlighter.
Cheers,
Mark
- Original Message
From: Mark Miller <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, 2 February, 2007 3:58:01 PM
Subject: Re: Multiword Highlighting
I have been away from this for a week,
I have been away from this for a week, but my interest has started
building again. The whole spans implementation seems to work great for
finding the actual hits but there is a somewhat annoying limitation:
because I am using Spans it seems I can only either highlight the entire
found span or j
I do use the NullFragmenter now. I have no interest in the fragments at
the moment, just in showing hits on the source document. It would be
great if I could just show the real hits though. The span approach seems
to work fine for me. I have even tested the highlighting using my
sentence and pa
>>For what it's worth Mark (Miller), there *is* a need for "just
highlight the query terms without trying to get excerpts" functionality
>>- something a la Google cache (different colours...mmm, nice).
FWIW, the existing highlighter doesn't *have* to fragment - just pass a
NullFragmenter to the
nality, too. Please contrib to
contrib/ if you end up working on this.
Otis
--
Simpy -- http://www.simpy.com/ -- Tag. Search. Share.
- Original Message
From: Mark Miller <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Sunday, January 28, 2007 7:39:29 AM
Subject:
Maybe a new highlighter with no attempt at summarising could more
easily address phrase support for small pieces of content. It will
always be hard to faithfully represent all possible query match logic
- especially if there are NOTs, ANDs and ORs mixed in with all the
term proximity logic
markharw00d wrote:
>>Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? I
I haven't conducted a survey but it's the typical web search engine
scenario - select only a small subset of the matching document content
for display in SERP
>>Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? I
I haven't conducted a survey but it's the typical web search engine
scenario - select only a small subset of the matching document content
for display in SERPS. I would expect that
Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? Isn't it you that suggested
turning the query into a SpanQuery, extracting the spans and then doing
the highlighting after a rewrite? This seems somewhat trivial so what am
I missing? I
This is a deficiency in the highlighter functionality that has been
discussed several times before. The summary is - not a trivial fix.
See here for background:
http://marc2.theaimsgroup.com/?l=lucene-user&m=114631181214303&w=1
http://www.gossamer-threads.com/lists/engine?do=post_view_printa
20 matches
Mail list logo