Thanks, gives me food for thought. So no { N, N+1 } ideas specifically...
--
View this message in context:
http://lucene.472066.n3.nabble.com/question-about-using-lucene-on-large-documents-tp4115343p4115465.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---
Ideally you would chunk a document at logical boundaries that will make
sense as units of both search and presentation. For some content, these
boundaries don't align; for example you might want to search for matches
within a paragraph scope, or within a section, chapter, or part of a
book, bu
On 2/4/2014 2:50 PM, Earl Hood wrote:
On Tue, Feb 4, 2014 at 1:16 PM, Michael Sokolov wrote:
You might be interested in looking at Lux, which layers XML services like
XQuery on top of Lucene and Solr, and includes an XML-aware highlighter:
https://github.com/msokolov/lux/blob/master/src/main/ja
Hi,
This question may well be very familiar to experienced Lucene people... in
which case all I need is to be pointed somewhere. I am new.
If you have a large document, e.g. a large Word file, and you want to split
it into text, e.g. by using Apache POI, what techniques are best used?
It seem
On Tue, Feb 4, 2014 at 1:16 PM, Michael Sokolov wrote:
> You might be interested in looking at Lux, which layers XML services like
> XQuery on top of Lucene and Solr, and includes an XML-aware highlighter:
> https://github.com/msokolov/lux/blob/master/src/main/java/lux/search/highlight/XmlHighligh
On 2/4/14 12:16 PM, Earl Hood wrote:
On Tue, Feb 4, 2014 at 12:20 AM, Trejkaz wrote:
I'm trying to find a precise and reasonably efficient way to highlight
all occurrences of terms in the query, only highlighting fields which
...
[snip]
I am in a similiar situation with a web-based applica
On Tue, Feb 4, 2014 at 12:20 AM, Trejkaz wrote:
> I'm trying to find a precise and reasonably efficient way to highlight
> all occurrences of terms in the query, only highlighting fields which
> match the corresponding fields used in the query. This seems like it
> would be a fairly common require
This will be of no immediate help, but in the next iteration of LUCENE-5317,
which I'll post in a few weeks (if I can find the time), I'll have an option to
pull concordance windows from character offsets which can be stored at index
time (so you wouldn't have to re-analyze). The current versio