Off the top of my head... 1) is certainly easiest. This looks suspiciously like synonyms. That is, at index time you inject the ID as a synonym in the text and it gets indexed at the same position as the token. Why this helps is that then phrase queries continue to work. Lucene in Action has an example of creating a synonym analyzer. 2) I don't see how payloads really help you here. I confess I'm not intimately familiar with payloads, but what I've seen is that they're useful when you match the *term* and want to do something special. Uses I've seen are, for instance, parts of speech. So one can alter the score of, say, nouns to boost matches on nouns. But I don't recall seeing something that allows the payload data to be the match. 3) I have no idea what an attribute is in this context <G>..... Although you could simply create another field that contained all of the IDs for the document and add an SHOULD clause to all your queries on that field.
HTH Erick On Tue, Sep 21, 2010 at 3:11 PM, Christopher Condit <con...@sdsc.edu> wrote: > I'm curious about embedding extra information in an index (and being able > to search the extra information as well). In this case certain tokens > correspond to recognized entities with ids. I'd like to get the ids into the > index so that searching for the id of the entity will also return that > document. I can think of three ways and I was curious if there's a preferred > way: > 1) Add the id as another token during filtering > 2) Add the id as a payload > 3) Add the id as an attribute (although I don't know how to search on the > attribute value) > > Thanks, > -Chris > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >