If you want to search on part-of-speech tag, I'd just make a parallel
field ("text_pos" for the field "text", for example) and search on that
field (text_pos:noun).
Kuro
On 3/14/12 9:37 AM, Mark McGuire wrote:
I'm working on a project where I need to tag both the part of speech
and other syntactic information on tokens so that this information is
searchable. I have read the threads on the mailing list regarding
part of speech tagging here
<http://mail-archives.apache.org/mod_mbox/lucene-java-user/201105.mbox/%3cbanlktimwqcq_gf2pxe8hyc_r75ncwdr...@mail.gmail.com%3E>
and the many responses to similar questions. To me, inserting 0
increment tokens seems rather clunky, especially when TypeAttributes
appear to be what one would want to use. Does Lucene do anything
extra when the Type is set to or not set to its default, "word"? Is
it possible to write a search that uses multiple attributes from
TokenAttributes (ie a search that searches for CharTermAttribute "dog"
followed by a TypeAttribute of verb)?
Also if I were to use 0 increment tokens for tagging, would data like
document length or sumTotalTermFreq be different from a document
indexed without these tags? How would I counteract these differences
if any occur?
Thanks,
Mark McGuire
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org