Re: Term Based Meta Data

2008-08-11 Thread Mark Miller
If I were feeling adventurous, and I wanted to help out Mark with Lucene-1001, I'd try this: Get the trunk and apply Lucene-1001. Index all of your docs with the highlight coords as payloads. At highlight time, do something like the SpanHighlighter does - I've got a class called something lik

Re: Term Based Meta Data

2008-08-11 Thread Martin Owens
> Following the history of Payloads from its beginnings > (https://issues.apache.org/jira/browse/LUCENE-755, > https://issues.apache.org/jira/browse/LUCENE-761, > https://issues.apache.org/jira/browse/LUCENE-834, > http://wiki.apache.org/lucene-java/Payload_Planning) it looks like > TermP

Re: Term Based Meta Data

2008-08-09 Thread Grant Ingersoll
Yeah, unfortunately, they are two distinct things, completely unrelated in terms of storage, access, etc. I've often felt the access to position/offset information is hard in Lucene, which makes it harder to do things like highlighting, co-occurrence analysis, etc. LUCENE-1001 (which Mark

Re: Term Based Meta Data

2008-08-08 Thread Tricia Williams
Hi, Following the history of Payloads from its beginnings (https://issues.apache.org/jira/browse/LUCENE-755, https://issues.apache.org/jira/browse/LUCENE-761, https://issues.apache.org/jira/browse/LUCENE-834, http://wiki.apache.org/lucene-java/Payload_Planning) it looks like TermPostionsV

Re: Term Based Meta Data

2008-08-08 Thread Martin Owens
Dear Lucene Users and Tricia Williams, The way we're operating our lucene index is one where we index all the terms but not store the text. From your SOLR-380 patch example Tricia I was able to get a very good idea of how to set things up. Historically I have used TermPositionsVector instead of Te

Re: Term Based Meta Data

2008-08-05 Thread Martin Owens
Thank you very much, I'm using Solr so it's very relivent to me. Even though the indexing is being done by a smaller RMI method (since Solr doesn't support streaming of very large files and has term limits) but all the searching is done through solr. Thanks again, Best Regards, Martin Owens On T

Re: Term Based Meta Data

2008-08-05 Thread Tricia Williams
Hi Martin, Take a look at what I've done with SOLR-380 (https://issues.apache.org/jira/browse/SOLR-380). It might solve your problem, or at least give you a good starting point. Tricia Michael McCandless wrote: I think you could use payloads (= arbitrary/opaque byte[]) for this? You ca

Re: Term Based Meta Data

2008-08-05 Thread Michael McCandless
I think you could use payloads (= arbitrary/opaque byte[]) for this? You can attach a payload to each term occurrence during tokenization (indexing), and then retrieve the payload during searching. Mike Martin Owens wrote: Hello Users, I'm working on a project which attempts to store dat