Fields with the same name?? - Was Re: Payloads and tokenizers

Antony Bowesman Sun, 17 Aug 2008 21:42:09 -0700

I assume you already know this but just to make sure what I meant was clear
- on tokenization but still indexing just means that the entire field's text
becomes a single unchanged token. I believe this is exactly what
SingleTokenTokenStream can buy you - a single token, for which you can pre
set a payload.


Yes, I was with you :)

It is.  Field maintains its  value and it is either string/stream/etc. Once
you set it to tokenStream the string value is lost and there's no way to
store it.

Thanks for that - I delved a little further into FieldsWriter and see what youmean.

How about adding this field in two parts, one part for indexing with the
payload and the other part for storing, i.e. something like this:

    Token token = new Token(...);
    token.setPayload(...);
    SingleTokenTokenStream ts = new SingleTokenTokenStream(token);

    Field f1 = new Field("f","some-stored-content",Store.YES,Index.NO);
    Field f2 = new Field("f", ts);

Now that got me thinking and I have exposed a rather large misconception in myunderstanding of the Lucene internals when consider fields of the same name.

Your idea above looked like a good one. However, I realise I am probably tryingto use payloads wrongly. I have the following information to store for a singleDocument


contentId - 1 instance
ownerId 1..n instances
accessId 1..n instances

One ownerId has a corresponding accessId for the contentId.

My search criteria are ownerId:XXX + user criteria. When there is a hit, I needthe contentId and the corresponding accessId (for the owner) back. So, I wantedto store the accessId as a payload to the ownerId.

This is where I came unstuck. For 'n=3' above, I used theSingleTokenTokenStream as you suggested with the accessId as the payload forownerId. However, at the Document level, I cannot get the payloads from thefield so, in trying to understand fields with the same name, I discovered thatthere is a big difference between


(a)
Field f = new Field("ownerId", "OID1", Store.YES, Index.NO_NORMS);
f = new Field("ownerId", "OID2", Store.YES, Index.NO_NORMS);
f = new Field("ownerId", "OID3", Store.YES, Index.NO_NORMS);

and (b)
Field f = new Field("ownerId", "OID1 OID2 OID3", Store.YES, Index.NO_NORMS);

as Document.getFields("ownerId") for (a) will be 3 and for (b) it will be 1.

My question then is, if I do

for (int i = 0; i < owners; i++)
{
    f = new Field("ownerId", oid[i], Store.YES, Index.NO_NORMS);
    doc.add(f);
    f = new Field("accessId", aid[i], Store.YES, Index.NO_NORMS);
    doc.add(f);
}

then will the array elements for the corresponding Field arrays returned by

Document.getFields("ownerId")
Document.getFields("accessId")

**guarantee** that the array element order is the same as the order they were 
added?

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Fields with the same name?? - Was Re: Payloads and tokenizers

Reply via email to