Hello,

We use POS-tagging too, and encode them as payload bitsets for scoring, which 
is, as far as is know, the only possibility with payloads.

So, instead of encoding them as payloads, why not index your treebanks POS-tags 
as tokens on the same position, like synonyms. If you do that, you can use 
spans and phrase queries to find chunks of multiple POS-tags.

This would be the first approach i can think of. Treating them as regular 
tokens enables you to use regular search for them.

Regards,
Markus

 
 
-----Original message-----
> From:José Tomás Atria <jtat...@gmail.com>
> Sent: Wednesday 14th June 2017 22:29
> To: java-user@lucene.apache.org
> Subject: Using POS payloads for chunking
> 
> Hello!
> 
> I'm not particularly familiar with lucene's search api (as I've been using
> the library mostly as a dumb index rather than a search engine), but I am
> almost certain that, using its payload capabilities, it would be trivial to
> implement a regular chunker to look for patterns in sequences of payloads.
> 
> (trying not to be too pedantic, a regular chunker looks for 'chunks' based
> on part-of-speech tags, e.g. noun phrases can be searched for with patterns
> like "(DT)?(JJ)*(NN|NP)+", that is, an optional determinant and zero or
> more adjectives preceding a bunch of nouns, etc)
> 
> Assuming my index has POS tags encoded as payloads for each position, how
> would one search for such patterns, irrespective of terms? I started
> studying the spans search API, as this seemed like the natural place to
> start, but I quickly got lost.
> 
> Any tips would be extremely appreciated. (or references to this kind of
> thing, I'm sure someone must have tried something similar before...)
> 
> thanks!
> ~jta
> -- 
> 
> sent from a phone. please excuse terseness and tpyos.
> 
> enviado desde un teléfono. por favor disculpe la parquedad y los erroers.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to