Hi Markus, thanks for your response!

Now I feel stupid, that is clearly a much simpler approach and it has the
added benefits that it would not require me to meddle into the scoring
process, which I'm still a bit terrified of. Thanks for the tip.

I guess the question is still valid though? i.e. how would one take into
account payloads for scoring entire spans? Does this make sense at all? Any
links to a more-or-less straightforward example?

On the length of payloads: I understood that you have other restrictions,
but payloads take a bytesref as value, so you can encode arbitrary data in
them as long as you encode and decode properly. E.g. you could encode the
long array that backs a fixed bitset as a bytesref and pass that, though
I'm not sure it would be efficient unless you have at least 64 flags.

thanks!
jta



On Wed, Jun 14, 2017 at 4:45 PM Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello,
>
> We use POS-tagging too, and encode them as payload bitsets for scoring,
> which is, as far as is know, the only possibility with payloads.
>
> So, instead of encoding them as payloads, why not index your treebanks
> POS-tags as tokens on the same position, like synonyms. If you do that, you
> can use spans and phrase queries to find chunks of multiple POS-tags.
>
> This would be the first approach i can think of. Treating them as regular
> tokens enables you to use regular search for them.
>
> Regards,
> Markus
>
>
>
> -----Original message-----
> > From:José Tomás Atria <jtat...@gmail.com>
> > Sent: Wednesday 14th June 2017 22:29
> > To: java-user@lucene.apache.org
> > Subject: Using POS payloads for chunking
> >
> > Hello!
> >
> > I'm not particularly familiar with lucene's search api (as I've been
> using
> > the library mostly as a dumb index rather than a search engine), but I am
> > almost certain that, using its payload capabilities, it would be trivial
> to
> > implement a regular chunker to look for patterns in sequences of
> payloads.
> >
> > (trying not to be too pedantic, a regular chunker looks for 'chunks'
> based
> > on part-of-speech tags, e.g. noun phrases can be searched for with
> patterns
> > like "(DT)?(JJ)*(NN|NP)+", that is, an optional determinant and zero or
> > more adjectives preceding a bunch of nouns, etc)
> >
> > Assuming my index has POS tags encoded as payloads for each position, how
> > would one search for such patterns, irrespective of terms? I started
> > studying the spans search API, as this seemed like the natural place to
> > start, but I quickly got lost.
> >
> > Any tips would be extremely appreciated. (or references to this kind of
> > thing, I'm sure someone must have tried something similar before...)
> >
> > thanks!
> > ~jta
> > --
> >
> > sent from a phone. please excuse terseness and tpyos.
> >
> > enviado desde un teléfono. por favor disculpe la parquedad y los erroers.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> --

sent from a phone. please excuse terseness and tpyos.

enviado desde un teléfono. por favor disculpe la parquedad y los erroers.

Reply via email to