get wordno, lineno, pageno for term/phrase

arun r Tue, 03 Aug 2010 07:58:51 -0700

hi all,
            I am new to Lucene. I am trying to use Lucene to generate
data for a document classifier. I need to generate wordno, lineno,
pageno for each term/phrase. I was able to use SpanQuery/SpanNearQuery
to get the wordno (span.start()) for the term/phrase. To get pageno
and lineno, a custom Analyzer needs to be written ? Can the Analyzer
be made to recognize and newline and page feed characters and keep
track of lineno and pageno for the tokens ?


Is it possible with existing Lucene Analyzer ?

Thanks,
Arun

-- 
Where there is a will, there is a way !

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

get wordno, lineno, pageno for term/phrase

Reply via email to