Re: Bet you didn't know Lucene can...

2011-10-31 Thread Andrzej Bialecki
On 31/10/2011 21:42, Petite Abeille wrote: On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote: similarity-preserving hash function was calculated on each sentence, and the hash was added as a field. The property of the hash was that similar documents (sentences) would produce a similar hash

Re: idf calculation in Lucene ?

2011-10-31 Thread Robert Muir
yes: override that method idfExplain(java.util.Collection, org.apache.lucene.search.Searcher) On Mon, Oct 31, 2011 at 5:24 PM, David Ryan wrote: > Thanks!  Is there any way to extend the Similarity class to overwrite the > behavior (e.g.,  using the max idf instead of the sum of each term idfs)?

Re: idf calculation in Lucene ?

2011-10-31 Thread David Ryan
Thanks! Is there any way to extend the Similarity class to overwrite the behavior (e.g., using the max idf instead of the sum of each term idfs)? On Thu, Oct 27, 2011 at 5:41 AM, Robert Muir wrote: > On Thu, Oct 20, 2011 at 3:11 PM, David Ryan wrote: > > > > > However, in some case, when I

Re: Bet you didn't know Lucene can...

2011-10-31 Thread Petite Abeille
On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote: > similarity-preserving hash function was calculated on each sentence, and the > hash was added as a field. The property of the hash was that similar > documents (sentences) would produce a similar hash, with only some bit-level > perturbati

Re: Bet you didn't know Lucene can...

2011-10-31 Thread Andrzej Bialecki
On 22/10/2011 11:11, Grant Ingersoll wrote: Hi All, I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." (http://na11.apachecon.com/talks/18396). It's based on my observation, that over the years, a number of us in the community have done some pretty cool things using Luc

Re: multiple phrase search for topic

2011-10-31 Thread Ian Lea
Nice not to have to worry about performance. You say there is another question, but not what it is. The code you show looks like it should do what you want. For anything non-trivial I prefer to build the queries directly in code rather than concatenating strings to be parsed, because I find it h

Re: multiple phrase search for topic

2011-10-31 Thread deb.lucene
thanks Ian for your response. This is a one-time offline program so am not bothered about the performance (i.e. speed etc.). one more question, there are some situations where I need to run a AND clause (i.e. more than one phrase, such as "Apple" AND "Steve Jobs"). My approach was something like :

Re: IndexReader#reopen() on externally changed index

2011-10-31 Thread Michael McCandless
That's a good idea, if your index is "large enough", and/or you make heavy use of FieldCache (eg, sorting by field), regardless of whether you use NRT or "normal" commit + reopen to reopen your reader. Mike McCandless http://blog.mikemccandless.com On Sun, Oct 30, 2011 at 7:36 PM, Denis Bazhenov

Re: index bigger than it should be?

2011-10-31 Thread Ian Lea
Do the individual docs get bigger after 28 million? Can you try loading the last few million docs, from when the size jumps, and see what happens? Or load them in reverse order or something, again to see what happens? I don't have indexes with that many docs, but I believe that plenty of people

Re: Weighted Query Sequence

2011-10-31 Thread Ian Lea
Sounds custom made for boosting. Depending on how you are structuring your fields and queries you could use either index or query time boosts, or even both. http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F -- Ian.