I tried putting each page as a document, if the phrase is spread
across two pages, then the span search does not capture it. Is there a
work around for this ?
On Sat, Aug 7, 2010 at 8:00 PM, Babak Farhang wrote:
> How about making each line a separate document? You'd worry about
> scaling it late
I am trying to create a custom analyzer that will check for pagebreak
and linebreak and add the payload data for each term. In the custom
filter I have this code:
public boolean incrementToken() throws IOException {
if(input.incrementToken())
{
;> No, you can't do this with any existing analyzers I know of. Part
>>> of the problem here is that there's no good generic way to KNOW
>>> what a page and line are.
>>>
>>> Have you investigated payloads? I'm not sure that's a good fit for
&g
hi all,
I am new to Lucene. I am trying to use Lucene to generate
data for a document classifier. I need to generate wordno, lineno,
pageno for each term/phrase. I was able to use SpanQuery/SpanNearQuery
to get the wordno (span.start()) for the term/phrase. To get pageno
and lineno, a c