Re: Lucene 4 - POS and Syntactic Tagging

T. Kuro Kurosaka Mon, 09 Apr 2012 13:10:39 -0700

If you want to search on part-of-speech tag, I'd just make a parallelfield ("text_pos" for the field "text", for example) and search on thatfield (text_pos:noun).


Kuro


On 3/14/12 9:37 AM, Mark McGuire wrote:

I'm working on a project where I need to tag both the part of speechand other syntactic information on tokens so that this information issearchable. I have read the threads on the mailing list regardingpart of speech tagging here<http://mail-archives.apache.org/mod_mbox/lucene-java-user/201105.mbox/%3cbanlktimwqcq_gf2pxe8hyc_r75ncwdr...@mail.gmail.com%3E>and the many responses to similar questions. To me, inserting 0increment tokens seems rather clunky, especially when TypeAttributesappear to be what one would want to use. Does Lucene do anythingextra when the Type is set to or not set to its default, "word"? Isit possible to write a search that uses multiple attributes fromTokenAttributes (ie a search that searches for CharTermAttribute "dog"followed by a TypeAttribute of verb)?
Also if I were to use 0 increment tokens for tagging, would data likedocument length or sumTotalTermFreq be different from a documentindexed without these tags? How would I counteract these differencesif any occur?
Thanks,
Mark McGuire



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene 4 - POS and Syntactic Tagging

Reply via email to