With this approach all work is done at the word level. When we have a phrase
query the results will contain pages with the entire phrase but when we go to
highlight the document _all_ words in the phrase regardless of being in the
phrase will be highlighted. Is that correct? It would also be
Hi,
I am involved in a project which is trying to provide searching and hit
highlighting on the scanned image of historical newspapers. We have an XML
based OCR format. A sample is below. We need to index the CONTENT attribute
of the String element which is the easy part. We would like to