Hi Jan,

On Wed, Nov 24, 2010 at 9:12 AM,  <jan.kure...@nokia.com> wrote:
> Of course:
>
> We are trying to search in documents that contain text in several languages. 
> We are also investigating other approaches*, so this is not about finding 
> other variants.
> the goal is to only match tokens from 1 or more given languages and not to 
> match the token if it is by accident the same in another language.
>
> For the payloads my plan is to add the correct language to each and every 
> token during indexing (I'm not sure how to solve this best, but I'm sure this 
> can be solved at least with lucene directly).
> On search side my current idea is to wrap around a TermPosition and skip all 
> docs, where the current payload has not one of the requested languages.
> I probably need to use my own Query/Weight for this?

You don't need to start from nothing here, I suggest you to look at
SpanTermQuery and TermSpans which uses DocsAndPositionsEnum (or rather
TermPositions in non-trunk versions). TermSpan gives you the ability
to override #next() and #skipTo() which is from what I understand what
you are looking for, right?

> Another approach would be to just overwrite the Similarity, but this will 
> only influence scoring and depending on the underlying query not completely 
> skip the token - I have to test the difference for the final score between 
> this approaches.

Well as you figured correctly this is rather for scoring really.
>
> This one blog made me curious if there is already something similar, that 
> skips TermPositions based on given attributes? I could imagine something 
> similar to the current Tokenattribute concept during index time, but also 
> available during search and controlled by a similarity...

Actually in lucene 4.0 each Flex-Enum has a AttributeSource that
allows you to add custom attributes to you enumerations. Yet there is
no logic that skips based on that though.

Simon
>
> Jan
>
> -----Original Message-----
> From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com]
> Sent: Dienstag, 23. November 2010 17:50
> To: java-user@lucene.apache.org
> Subject: Re: custom attributs in tokens
>
> On Tue, Nov 23, 2010 at 4:50 PM,  <jan.kure...@nokia.com> wrote:
>> Yes, payloads I will use. But they perform at score time and not at search 
>> time. I just wanted to know if there is anything like that.
>>
> So what is the difference? Maybe you can elaborate a little what are
> you trying to do?
>
> simon
>> "not even on trunk" does this mean there is a discussion about this ongoing 
>> somewhere? I'm just curious.
>>
>> Jan
>>
>> -----Original Message-----
>> From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com]
>> Sent: Dienstag, 23. November 2010 16:44
>> To: java-user@lucene.apache.org
>> Subject: Re: custom attributs in tokens
>>
>> Attribute Serialization is not implemented yet, not even in trunk. You
>> can use payloads instead.
>>
>> Simon
>>
>> On Tue, Nov 23, 2010 at 2:43 PM,  <jan.kure...@nokia.com> wrote:
>>> Hi,
>>>
>>> I found a blog post from 2008 where it says, there will be additional 
>>> custom attributes for tokens in the future, that will be searchable.
>>> What is the status of these?
>>>
>>> Jan
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to