On 10/3/13 6:04 PM, Alice Wong wrote:
Mike,

That's an interesting idea. The only drawback is we have to re-parse the doc and find where it matches and what the associated values are. It could be a performance issue if the doc becomes bigger and more complex.
It's true there is some overhead for document-oriented processing. Lux ameliorates this by storing a predigested binary xml form that can be traversed efficiently without the need for xml parsing. However,

I am wondering if there is a way to index a value a1 for a field A and store a different value "1,2" associated with a1 in Lucene. Or there might be a hack for this?
If you want to use only low-level Lucene constructs, I think payloads and/or complicated field values are the way to go. You could, for example, index for document D, a field called "extra" with values like "a1:1,2", "a2:2,3". I think that's what Aditya suggested. You still have to parse these though, so why not use a prebuilt flexible parsing infrastructure?

Thanks.


On Thu, Oct 3, 2013 at 1:49 PM, Michael Sokolov <msoko...@safaribooksonline.com <mailto:msoko...@safaribooksonline.com>> wrote:

    On 10/02/2013 07:12 PM, Alice Wong wrote:

        Hello,

        We would like to index some documents. Each field of a
        document may have
        multiple values. And for each (field,value) pair there are
        some associated
        values. These associated values are just for retrieving, not
        searching.

        For example, a document D could have a field named A. This
        field has two
        values a1 and a2.

        It is easy to index D, adding term a1 and a2 to field A, so
        either query
        "A=a1" or "A=a2" will return D.

        Assuming we have other values associated with (A,a1) and
        (A,a2) for D. We
        would like to retrieve these associated values depending on
        whether "A=a1"
        or "A=a2" is queried.

        For example, if query "A=a1" returns D, we would like to
        return values 1
        and 2. And if query "A=a2" returns D, we want to return values
        3 and 10.

        Is it possible to do this with Lucene? Initially we want to
        hack postings
        to return associated values, but this seems quite complex.

        Thanks!

    Why not store a (nonindexed) text field with some internal
    structure (XML, JSON, CSV) that you can analyze after retrieving.
     For example,

    <D>
      <A>
         <value>a1</value>
         <associated-values>
           ... whatever you want ...
         </associated-values>
      </A>
    </D>

    If you use Lux (luxdb.org <http://luxdb.org>), which is XML query
    support on top of Lucene, you can do this all automatically, and
    retrieve the results with a simple query like:

    /D[A=a1]/associated-values

    plus if you want to pull out the values and manipulate them, you
    have XQuery to do it with.

    -Mike



Reply via email to