Re: Fields with the same name?? - Was Re: Payloads and tokenizers

Antony Bowesman Mon, 18 Aug 2008 16:16:22 -0700

Doron Cohen wrote:

The API definitely doesn't promise this.
AFAIK implementation wise it happens to be like this but I can be wrong and
plus it might change in the future. It would make me nervous to rely on
this.

I made some tests and it 'seems' to work, but I agree, it also makes me nervousto rely on empirical evidence for the design rather than a clearly documented API!

Anyhow, for your need I can think of two options:

Option 1:  just index the owenerID, do not store it, do not index or store
accessID (unless you wish to search by it, in this case just index it). In
addition store a dedicated mapping field that maps from ownerID to accessID.
E.g. with serialization of HashMap or something thinner. At runtime retrieve
this map from the document and it has all that information.

Hey that's an interesting idea! I'd not considered storing the mapping, onlyre-creating it from fields at runtime. I'll explore this.

Option 2: as you describe above, just index the ownerID with accessID as
payload, and then for the hitting docid of interest use termPositions to get
the payload, i.e. something like:
    TermPositions tp = reader.termPositions();
    tp.seek(new Term("ownerID",oid));
    tp.skipTo(docid);
    tp.nextPosition();
    if (tp.isPayloadAvailable()) {
      byte [] accessIDBytes = tp.getPayload(...);
      ...

Yes, I was playing with this technique yesterday. It's not easy to determinethe performance implications of this method. I will be using caches, but myvolumes are potentially so large that I may never be able to cache everything(perhaps 500M Docs), so this has to be very quick.


I'll play with both approaches and see which works best.

Thanks for you time and I appreciate your valuable insight Doron.
Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fields with the same name?? - Was Re: Payloads and tokenizers

Reply via email to