Re: question about using lucene on large documents

2014-02-04 Thread mrodent
Thanks, gives me food for thought. So no { N, N+1 } ideas specifically... -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-using-lucene-on-large-documents-tp4115343p4115465.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

Re: question about using lucene on large documents

2014-02-04 Thread Michael Sokolov
Ideally you would chunk a document at logical boundaries that will make sense as units of both search and presentation. For some content, these boundaries don't align; for example you might want to search for matches within a paragraph scope, or within a section, chapter, or part of a book, bu

Re: Highlighting text, do I seriously have to reimplement this from scratch?

2014-02-04 Thread Michael Sokolov
On 2/4/2014 2:50 PM, Earl Hood wrote: On Tue, Feb 4, 2014 at 1:16 PM, Michael Sokolov wrote: You might be interested in looking at Lux, which layers XML services like XQuery on top of Lucene and Solr, and includes an XML-aware highlighter: https://github.com/msokolov/lux/blob/master/src/main/ja

question about using lucene on large documents

2014-02-04 Thread mrodent
Hi, This question may well be very familiar to experienced Lucene people... in which case all I need is to be pointed somewhere. I am new. If you have a large document, e.g. a large Word file, and you want to split it into text, e.g. by using Apache POI, what techniques are best used? It seem

Re: Highlighting text, do I seriously have to reimplement this from scratch?

2014-02-04 Thread Earl Hood
On Tue, Feb 4, 2014 at 1:16 PM, Michael Sokolov wrote: > You might be interested in looking at Lux, which layers XML services like > XQuery on top of Lucene and Solr, and includes an XML-aware highlighter: > https://github.com/msokolov/lux/blob/master/src/main/java/lux/search/highlight/XmlHighligh

Re: Highlighting text, do I seriously have to reimplement this from scratch?

2014-02-04 Thread Michael Sokolov
On 2/4/14 12:16 PM, Earl Hood wrote: On Tue, Feb 4, 2014 at 12:20 AM, Trejkaz wrote: I'm trying to find a precise and reasonably efficient way to highlight all occurrences of terms in the query, only highlighting fields which ... [snip] I am in a similiar situation with a web-based applica

Re: Highlighting text, do I seriously have to reimplement this from scratch?

2014-02-04 Thread Earl Hood
On Tue, Feb 4, 2014 at 12:20 AM, Trejkaz wrote: > I'm trying to find a precise and reasonably efficient way to highlight > all occurrences of terms in the query, only highlighting fields which > match the corresponding fields used in the query. This seems like it > would be a fairly common require

RE: Highlighting text, do I seriously have to reimplement this from scratch?

2014-02-04 Thread Allison, Timothy B.
This will be of no immediate help, but in the next iteration of LUCENE-5317, which I'll post in a few weeks (if I can find the time), I'll have an option to pull concordance windows from character offsets which can be stored at index time (so you wouldn't have to re-analyze). The current versio