On 10/3/13 6:04 PM, Alice Wong wrote:
Mike,
That's an interesting idea. The only drawback is we have to re-parse
the doc and find where it matches and what the associated values are.
It could be a performance issue if the doc becomes bigger and more
complex.
It's true there is some overhead for document-oriented processing. Lux
ameliorates this by storing a predigested binary xml form that can be
traversed efficiently without the need for xml parsing. However,
I am wondering if there is a way to index a value a1 for a field A and
store a different value "1,2" associated with a1 in Lucene. Or there
might be a hack for this?
If you want to use only low-level Lucene constructs, I think payloads
and/or complicated field values are the way to go. You could, for
example, index for document D, a field called "extra" with values like
"a1:1,2", "a2:2,3". I think that's what Aditya suggested. You still
have to parse these though, so why not use a prebuilt flexible parsing
infrastructure?
Thanks.
On Thu, Oct 3, 2013 at 1:49 PM, Michael Sokolov
<msoko...@safaribooksonline.com
<mailto:msoko...@safaribooksonline.com>> wrote:
On 10/02/2013 07:12 PM, Alice Wong wrote:
Hello,
We would like to index some documents. Each field of a
document may have
multiple values. And for each (field,value) pair there are
some associated
values. These associated values are just for retrieving, not
searching.
For example, a document D could have a field named A. This
field has two
values a1 and a2.
It is easy to index D, adding term a1 and a2 to field A, so
either query
"A=a1" or "A=a2" will return D.
Assuming we have other values associated with (A,a1) and
(A,a2) for D. We
would like to retrieve these associated values depending on
whether "A=a1"
or "A=a2" is queried.
For example, if query "A=a1" returns D, we would like to
return values 1
and 2. And if query "A=a2" returns D, we want to return values
3 and 10.
Is it possible to do this with Lucene? Initially we want to
hack postings
to return associated values, but this seems quite complex.
Thanks!
Why not store a (nonindexed) text field with some internal
structure (XML, JSON, CSV) that you can analyze after retrieving.
For example,
<D>
<A>
<value>a1</value>
<associated-values>
... whatever you want ...
</associated-values>
</A>
</D>
If you use Lux (luxdb.org <http://luxdb.org>), which is XML query
support on top of Lucene, you can do this all automatically, and
retrieve the results with a simple query like:
/D[A=a1]/associated-values
plus if you want to pull out the values and manipulate them, you
have XQuery to do it with.
-Mike