Hi Jan, On Wed, Nov 24, 2010 at 9:12 AM, <jan.kure...@nokia.com> wrote: > Of course: > > We are trying to search in documents that contain text in several languages. > We are also investigating other approaches*, so this is not about finding > other variants. > the goal is to only match tokens from 1 or more given languages and not to > match the token if it is by accident the same in another language. > > For the payloads my plan is to add the correct language to each and every > token during indexing (I'm not sure how to solve this best, but I'm sure this > can be solved at least with lucene directly). > On search side my current idea is to wrap around a TermPosition and skip all > docs, where the current payload has not one of the requested languages. > I probably need to use my own Query/Weight for this?
You don't need to start from nothing here, I suggest you to look at SpanTermQuery and TermSpans which uses DocsAndPositionsEnum (or rather TermPositions in non-trunk versions). TermSpan gives you the ability to override #next() and #skipTo() which is from what I understand what you are looking for, right? > Another approach would be to just overwrite the Similarity, but this will > only influence scoring and depending on the underlying query not completely > skip the token - I have to test the difference for the final score between > this approaches. Well as you figured correctly this is rather for scoring really. > > This one blog made me curious if there is already something similar, that > skips TermPositions based on given attributes? I could imagine something > similar to the current Tokenattribute concept during index time, but also > available during search and controlled by a similarity... Actually in lucene 4.0 each Flex-Enum has a AttributeSource that allows you to add custom attributes to you enumerations. Yet there is no logic that skips based on that though. Simon > > Jan > > -----Original Message----- > From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com] > Sent: Dienstag, 23. November 2010 17:50 > To: java-user@lucene.apache.org > Subject: Re: custom attributs in tokens > > On Tue, Nov 23, 2010 at 4:50 PM, <jan.kure...@nokia.com> wrote: >> Yes, payloads I will use. But they perform at score time and not at search >> time. I just wanted to know if there is anything like that. >> > So what is the difference? Maybe you can elaborate a little what are > you trying to do? > > simon >> "not even on trunk" does this mean there is a discussion about this ongoing >> somewhere? I'm just curious. >> >> Jan >> >> -----Original Message----- >> From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com] >> Sent: Dienstag, 23. November 2010 16:44 >> To: java-user@lucene.apache.org >> Subject: Re: custom attributs in tokens >> >> Attribute Serialization is not implemented yet, not even in trunk. You >> can use payloads instead. >> >> Simon >> >> On Tue, Nov 23, 2010 at 2:43 PM, <jan.kure...@nokia.com> wrote: >>> Hi, >>> >>> I found a blog post from 2008 where it says, there will be additional >>> custom attributes for tokens in the future, that will be searchable. >>> What is the status of these? >>> >>> Jan >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org