Markus - how are you encoding payloads as bitsets and use them for scoring?   
Curious to see how folks are leveraging them.

        Erik

> On Jun 14, 2017, at 4:45 PM, Markus Jelsma <markus.jel...@openindex.io> wrote:
> 
> Hello,
> 
> We use POS-tagging too, and encode them as payload bitsets for scoring, which 
> is, as far as is know, the only possibility with payloads.
> 
> So, instead of encoding them as payloads, why not index your treebanks 
> POS-tags as tokens on the same position, like synonyms. If you do that, you 
> can use spans and phrase queries to find chunks of multiple POS-tags.
> 
> This would be the first approach i can think of. Treating them as regular 
> tokens enables you to use regular search for them.
> 
> Regards,
> Markus
> 
> 
> 
> -----Original message-----
>> From:José Tomás Atria <jtat...@gmail.com>
>> Sent: Wednesday 14th June 2017 22:29
>> To: java-user@lucene.apache.org
>> Subject: Using POS payloads for chunking
>> 
>> Hello!
>> 
>> I'm not particularly familiar with lucene's search api (as I've been using
>> the library mostly as a dumb index rather than a search engine), but I am
>> almost certain that, using its payload capabilities, it would be trivial to
>> implement a regular chunker to look for patterns in sequences of payloads.
>> 
>> (trying not to be too pedantic, a regular chunker looks for 'chunks' based
>> on part-of-speech tags, e.g. noun phrases can be searched for with patterns
>> like "(DT)?(JJ)*(NN|NP)+", that is, an optional determinant and zero or
>> more adjectives preceding a bunch of nouns, etc)
>> 
>> Assuming my index has POS tags encoded as payloads for each position, how
>> would one search for such patterns, irrespective of terms? I started
>> studying the spans search API, as this seemed like the natural place to
>> start, but I quickly got lost.
>> 
>> Any tips would be extremely appreciated. (or references to this kind of
>> thing, I'm sure someone must have tried something similar before...)
>> 
>> thanks!
>> ~jta
>> -- 
>> 
>> sent from a phone. please excuse terseness and tpyos.
>> 
>> enviado desde un teléfono. por favor disculpe la parquedad y los erroers.
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to