Markus - how are you encoding payloads as bitsets and use them for scoring? Curious to see how folks are leveraging them.
Erik > On Jun 14, 2017, at 4:45 PM, Markus Jelsma <markus.jel...@openindex.io> wrote: > > Hello, > > We use POS-tagging too, and encode them as payload bitsets for scoring, which > is, as far as is know, the only possibility with payloads. > > So, instead of encoding them as payloads, why not index your treebanks > POS-tags as tokens on the same position, like synonyms. If you do that, you > can use spans and phrase queries to find chunks of multiple POS-tags. > > This would be the first approach i can think of. Treating them as regular > tokens enables you to use regular search for them. > > Regards, > Markus > > > > -----Original message----- >> From:José Tomás Atria <jtat...@gmail.com> >> Sent: Wednesday 14th June 2017 22:29 >> To: java-user@lucene.apache.org >> Subject: Using POS payloads for chunking >> >> Hello! >> >> I'm not particularly familiar with lucene's search api (as I've been using >> the library mostly as a dumb index rather than a search engine), but I am >> almost certain that, using its payload capabilities, it would be trivial to >> implement a regular chunker to look for patterns in sequences of payloads. >> >> (trying not to be too pedantic, a regular chunker looks for 'chunks' based >> on part-of-speech tags, e.g. noun phrases can be searched for with patterns >> like "(DT)?(JJ)*(NN|NP)+", that is, an optional determinant and zero or >> more adjectives preceding a bunch of nouns, etc) >> >> Assuming my index has POS tags encoded as payloads for each position, how >> would one search for such patterns, irrespective of terms? I started >> studying the spans search API, as this seemed like the natural place to >> start, but I quickly got lost. >> >> Any tips would be extremely appreciated. (or references to this kind of >> thing, I'm sure someone must have tried something similar before...) >> >> thanks! >> ~jta >> -- >> >> sent from a phone. please excuse terseness and tpyos. >> >> enviado desde un teléfono. por favor disculpe la parquedad y los erroers. >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org