Re: get wordno, lineno, pageno for term/phrase

2010-08-07 Thread arun r
I tried putting each page as a document, if the phrase is spread across two pages, then the span search does not capture it. Is there a work around for this ? On Sat, Aug 7, 2010 at 8:00 PM, Babak Farhang wrote: > How about making each line a separate document? You'd worry about > scaling it late

Re: get wordno, lineno, pageno for term/phrase

2010-08-06 Thread arun r
I am trying to create a custom analyzer that will check for pagebreak and linebreak and add the payload data for each term. In the custom filter I have this code: public boolean incrementToken() throws IOException { if(input.incrementToken()) {

Re: get wordno, lineno, pageno for term/phrase

2010-08-04 Thread arun r
;> No, you can't do this with any existing analyzers I know of. Part >>> of the problem here is that there's no good generic way to KNOW >>> what a page and line are. >>> >>> Have you investigated payloads? I'm not sure that's a good fit for &g

get wordno, lineno, pageno for term/phrase

2010-08-03 Thread arun r
hi all, I am new to Lucene. I am trying to use Lucene to generate data for a document classifier. I need to generate wordno, lineno, pageno for each term/phrase. I was able to use SpanQuery/SpanNearQuery to get the wordno (span.start()) for the term/phrase. To get pageno and lineno, a c