Okay, it makes complete sense. Thanks.
On Fri, Oct 4, 2013 at 5:15 AM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > On 10/3/13 6:04 PM, Alice Wong wrote: > > Mike, > > That's an interesting idea. The only drawback is we have to re-parse the > doc and find where it matches and what the associated values are. It could > be a performance issue if the doc becomes bigger and more complex. > > It's true there is some overhead for document-oriented processing. Lux > ameliorates this by storing a predigested binary xml form that can be > traversed efficiently without the need for xml parsing. However, > > > I am wondering if there is a way to index a value a1 for a field A and > store a different value "1,2" associated with a1 in Lucene. Or there might > be a hack for this? > > If you want to use only low-level Lucene constructs, I think payloads > and/or complicated field values are the way to go. You could, for example, > index for document D, a field called "extra" with values like "a1:1,2", > "a2:2,3". I think that's what Aditya suggested. You still have to parse > these though, so why not use a prebuilt flexible parsing infrastructure? > > > Thanks. > > > On Thu, Oct 3, 2013 at 1:49 PM, Michael Sokolov < > msoko...@safaribooksonline.com> wrote: > >> On 10/02/2013 07:12 PM, Alice Wong wrote: >> >>> Hello, >>> >>> We would like to index some documents. Each field of a document may have >>> multiple values. And for each (field,value) pair there are some >>> associated >>> values. These associated values are just for retrieving, not searching. >>> >>> For example, a document D could have a field named A. This field has two >>> values a1 and a2. >>> >>> It is easy to index D, adding term a1 and a2 to field A, so either query >>> "A=a1" or "A=a2" will return D. >>> >>> Assuming we have other values associated with (A,a1) and (A,a2) for D. We >>> would like to retrieve these associated values depending on whether >>> "A=a1" >>> or "A=a2" is queried. >>> >>> For example, if query "A=a1" returns D, we would like to return values 1 >>> and 2. And if query "A=a2" returns D, we want to return values 3 and 10. >>> >>> Is it possible to do this with Lucene? Initially we want to hack postings >>> to return associated values, but this seems quite complex. >>> >>> Thanks! >>> >>> Why not store a (nonindexed) text field with some internal structure >> (XML, JSON, CSV) that you can analyze after retrieving. For example, >> >> <D> >> <A> >> <value>a1</value> >> <associated-values> >> ... whatever you want ... >> </associated-values> >> </A> >> </D> >> >> If you use Lux (luxdb.org), which is XML query support on top of Lucene, >> you can do this all automatically, and retrieve the results with a simple >> query like: >> >> /D[A=a1]/associated-values >> >> plus if you want to pull out the values and manipulate them, you have >> XQuery to do it with. >> >> -Mike >> > > >