Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-15 Thread Robert Muir
On Tue, Oct 15, 2013 at 10:57 AM, Michael McCandless wrote: > On Tue, Oct 15, 2013 at 10:11 AM, Robert Muir wrote: >> On Tue, Oct 15, 2013 at 9:59 AM, Michael McCandless >> wrote: >>> Well, unfortunately, this is a trap that users do hit. >>> >>> By requiring the user to think about the limit on

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-15 Thread Michael McCandless
On Tue, Oct 15, 2013 at 10:11 AM, Robert Muir wrote: > On Tue, Oct 15, 2013 at 9:59 AM, Michael McCandless > wrote: >> Well, unfortunately, this is a trap that users do hit. >> >> By requiring the user to think about the limit on creating >> PostingsHighlighter, he/she would think about it and re

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-15 Thread Jon Stewart
I'm very grateful for the assistance. It'd be great to know the value of DEFAULT_MAX_LENGTH in the documentation. I know the majority of applications care more about precision than recall... but I know of a lot of people using Lucene for high recall applications, too. Working in high recall domains

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-15 Thread Robert Muir
On Tue, Oct 15, 2013 at 9:59 AM, Michael McCandless wrote: > Well, unfortunately, this is a trap that users do hit. > > By requiring the user to think about the limit on creating > PostingsHighlighter, he/she would think about it and realize they are > in fact setting a limit. > > Silent limits ar

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-15 Thread Michael McCandless
Well, unfortunately, this is a trap that users do hit. By requiring the user to think about the limit on creating PostingsHighlighter, he/she would think about it and realize they are in fact setting a limit. Silent limits are dangerous because you don't offhand know what's wrong / why you see no

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-15 Thread Robert Muir
I strongly disagree: there is no trap, its a reasonable default for good summarization, and the behavior is no different than the other highlighters here. Typically people *do* care about performance and its important to have a clean simple API too. In my opinion increasing this limit is very eso

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-15 Thread Michael McCandless
Maybe we should make the max length a required argument to PostingsHighlighter ctor? Because it's trappy now, since you don't realize offhand that it's silently enforcing a limit ... Mike McCandless http://blog.mikemccandless.com On Tue, Oct 15, 2013 at 9:31 AM, Robert Muir wrote: > Thanks Jo

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-15 Thread Robert Muir
Thanks Jon. Ill add some stuff to the javadocs here to try to make it more obvious. On Tue, Oct 15, 2013 at 5:54 AM, Jon Stewart wrote: > Awesome, that did it! I didn't realize that DEFAULT_MAX_LENGTH was > only 10,000. I've now upped it to 16MB (I'm not doing the usual thing > and performance is

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-15 Thread Jon Stewart
Awesome, that did it! I didn't realize that DEFAULT_MAX_LENGTH was only 10,000. I've now upped it to 16MB (I'm not doing the usual thing and performance is not a particular concern). Thanks, Jon On Mon, Oct 14, 2013 at 9:58 PM, Robert Muir wrote: > are your documents large? > > try PostingsHig

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-14 Thread Robert Muir
are your documents large? try PostingsHighlighter(int) ctor with a larger value than DEFAULT_MAX_LENGTH. sounds like the passages you see with matches are very deep into the document and its just hitting the default limit and returning the default summarization (getEmptyHighlight()) otherwise, p

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-14 Thread Jon Stewart
I upgraded to 4.5. Same results, unfortunately. Most docs in the result set will have a Passage where numMatches() > 0, but some do not. In these cases, the Passage array's length is greater than zero. Jon On Mon, Oct 14, 2013 at 5:24 PM, Robert Muir wrote: > did you try the latest release? Th

Re: PostingsHighlighter/PassageFormatter has zero matches for some results

2013-10-14 Thread Robert Muir
did you try the latest release? There are some bugs fixed... On Mon, Oct 14, 2013 at 2:11 PM, Jon Stewart wrote: > Hello, > > I've observed that when using PostingsHighlighter in Lucene 4.4 that > some of the responsive documents in TopDocs will have zero matches in > the associated array of Pass