Re: SweetSpotSimilarity

2012-03-06 Thread Robert Muir
On Tue, Mar 6, 2012 at 5:57 PM, Paul Taylor wrote: >> Hello, >> >> what is previously Similarity in older releases is moved to >> TFIDFSimilarity: it extends Similarity and exposes a vector-space API, >> with its same formulas in the javadocs: >> >> https://builds.apache.org/view/G-L/view/Lucene/j

Re: SweetSpotSimilarity

2012-03-06 Thread Paul Taylor
On 05/03/2012 23:24, Robert Muir wrote: On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill wrote: I would definitely not suggest using SSS for fields like legal brief text or emails where there is huge variability in the length of the content -- i can't think of any context where a "short" email is de

Re: SweetSpotSimilarity

2012-03-06 Thread Paul Taylor
On 05/03/2012 19:26, Chris Hostetter wrote: : very small to occasionally very large. It also might be the case that : cover letters and e-mails while short might not be really something to : heavily discount. The lower discount range can be ignored by setting : the min of any sweet spot to 1.

Re: SweetSpotSimilarity

2012-03-05 Thread Robert Muir
On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill wrote: >> I would definitely not suggest using SSS for fields like legal brief text or >> emails where there is huge >> variability in the length of the content -- i can't think of any context >> where a "short" email is >> definitively better/worse then

RE: SweetSpotSimilarity

2012-03-05 Thread Paul Hill
> -Original Message- > My only thought is that the new stuff seems to be at the expense of the > formulas listed in the old > class overview for Similarity. > http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/searc > h/Similarity.html Opps, my bad

RE: SweetSpotSimilarity

2012-03-05 Thread Paul Hill
> I would definitely not suggest using SSS for fields like legal brief text or > emails where there is huge > variability in the length of the content -- i can't think of any context > where a "short" email is > definitively better/worse then a "long" email. more traditional TF/IDF seems > like

RE: SweetSpotSimilarity

2012-03-05 Thread Chris Hostetter
: very small to occasionally very large. It also might be the case that : cover letters and e-mails while short might not be really something to : heavily discount. The lower discount range can be ignored by setting : the min of any sweet spot to 1. Then one starts to wonder if there is : r

RE: SweetSpotSimilarity

2012-03-01 Thread Paul Hill
;thetamin4=0&thetamax0=2pi&thetamax1=2pi&thetamax2=2pi&thetamax3=2pi&thetamax4=2pi&ipw=1&ixmin=-50&ixmax=150&iymin=-0.5&iymax=1.5&igx=10&igy=0.25&igl=1&igs=1&iax=0&ila=1&xmin=-50&xmax=150&ymin=-0.5&ymax=1.5 It is hard

RE: SweetSpotSimilarity

2012-02-28 Thread Chris Hostetter
: i'll try to get some graphs commited and linked to from the javadocs that : make it more clear how tweaking the settings affect the formula http://svn.apache.org/viewvc?rev=1294920&view=rev -Hoss - To unsubscribe, e-mail:

RE: SweetSpotSimilarity

2012-02-28 Thread Chris Hostetter
: A picture -- or more precisely a graph -- would be worth a 1000 words. fair enough. I think the reason i never committed one initially was because the formula in the javadocs was trivial to plot in gnuplot... gnuplot> min=0 gnuplot> max=2 gnuplot> base=1.3 gnuplot> xoffset=10 gnuplot> set

RE: SweetSpotSimilarity

2012-02-17 Thread Paul Allan Hill
of these (or any other) tf functions, general discussion of document collection sweet spot, and any insight into parameters of this class (hyperbolic tangent or otherwise)? -Paul > > : > And I am aware that SweetSpotSimilarity resulted from this paper > : > > : > http:

RE: SweetSpotSimilarity

2012-02-15 Thread Chris Hostetter
s a convenience for subclasses that want to use a hyperbolic tf function." As for what hyperbolicTf is trying to do ... it creates a hyperbolic function letting you specify a hard max no matter how many terms there are. : > And I am aware that SweetSpotSimilarity resulted from this pa

RE: SweetSpotSimilarity

2012-02-15 Thread Paul Allan Hill
ginal Message- > From: Peyman Faratin [mailto:pey...@robustlinks.com] > Sent: Wednesday, February 15, 2012 6:40 AM > To: java-user@lucene.apache.org > Subject: SweetSpotSimilarity > > Hi > > I have a noobie question. I am trying to use the SweetSpotSimilarity (SSS) > c

SweetSpotSimilarity

2012-02-15 Thread Peyman Faratin
Hi I have a noobie question. I am trying to use the SweetSpotSimilarity (SSS) class. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/SweetSpotSimilarity.html I understand the scoring behavior of Lucene http://lucene.apache.org/core

Re: SweetSpotSimilarity

2011-07-21 Thread Ian Lea
engths. Then we started to use SweetSpotSimilarity and it seemed to > solve the problem. But now we face an other difficulty: it's hard to set the > correct parameters for SweetSpotSimilarity. For example we want the title of > a page to always have the highest boost than its content no matt

SweetSpotSimilarity

2011-07-20 Thread Tajti Ákos
Dear List, in our application there are many long documents that we index. Previously we had a problem with lucene's scoring: some documents got low scores because their lengths. Then we started to use SweetSpotSimilarity and it seemed to solve the problem. But now we face an other diffi