If I were feeling adventurous, and I wanted to help out Mark with
Lucene-1001, I'd try this:
Get the trunk and apply Lucene-1001.
Index all of your docs with the highlight coords as payloads.
At highlight time, do something like the SpanHighlighter does - I've got
a class called something lik
> Following the history of Payloads from its beginnings
> (https://issues.apache.org/jira/browse/LUCENE-755,
> https://issues.apache.org/jira/browse/LUCENE-761,
> https://issues.apache.org/jira/browse/LUCENE-834,
> http://wiki.apache.org/lucene-java/Payload_Planning) it looks like
> TermP
Yeah, unfortunately, they are two distinct things, completely
unrelated in terms of storage, access, etc. I've often felt the
access to position/offset information is hard in Lucene, which makes
it harder to do things like highlighting, co-occurrence analysis, etc.
LUCENE-1001 (which Mark
Hi,
Following the history of Payloads from its beginnings
(https://issues.apache.org/jira/browse/LUCENE-755,
https://issues.apache.org/jira/browse/LUCENE-761,
https://issues.apache.org/jira/browse/LUCENE-834,
http://wiki.apache.org/lucene-java/Payload_Planning) it looks like
TermPostionsV
Dear Lucene Users and Tricia Williams,
The way we're operating our lucene index is one where we index all the
terms but not store the text. From your SOLR-380 patch example Tricia I
was able to get a very good idea of how to set things up. Historically I
have used TermPositionsVector instead of Te
Thank you very much, I'm using Solr so it's very relivent to me. Even
though the indexing is being done by a smaller RMI method (since Solr
doesn't support streaming of very large files and has term limits) but
all the searching is done through solr.
Thanks again,
Best Regards, Martin Owens
On T
Hi Martin,
Take a look at what I've done with SOLR-380
(https://issues.apache.org/jira/browse/SOLR-380). It might solve your
problem, or at least give you a good starting point.
Tricia
Michael McCandless wrote:
I think you could use payloads (= arbitrary/opaque byte[]) for this?
You ca
I think you could use payloads (= arbitrary/opaque byte[]) for this?
You can attach a payload to each term occurrence during tokenization
(indexing), and then retrieve the payload during searching.
Mike
Martin Owens wrote:
Hello Users,
I'm working on a project which attempts to store dat