Re: Indexing and Hit Highlighting OCR Data

2005-06-03 Thread Corey Keith
With this approach all work is done at the word level. When we have a phrase query the results will contain pages with the entire phrase but when we go to highlight the document _all_ words in the phrase regardless of being in the phrase will be highlighted. Is that correct? It would also be

Indexing and Hit Highlighting OCR Data

2005-06-02 Thread Corey Keith
Hi, I am involved in a project which is trying to provide searching and hit highlighting on the scanned image of historical newspapers. We have an XML based OCR format. A sample is below. We need to index the CONTENT attribute of the String element which is the easy part. We would like to