On Tue, Mar 6, 2012 at 5:57 PM, Paul Taylor wrote:
>> Hello,
>>
>> what is previously Similarity in older releases is moved to
>> TFIDFSimilarity: it extends Similarity and exposes a vector-space API,
>> with its same formulas in the javadocs:
>>
>> https://builds.apache.org/view/G-L/view/Lucene/j
On 05/03/2012 23:24, Robert Muir wrote:
On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill wrote:
I would definitely not suggest using SSS for fields like legal brief text or
emails where there is huge
variability in the length of the content -- i can't think of any context where a
"short" email is
de
On 05/03/2012 19:26, Chris Hostetter wrote:
: very small to occasionally very large. It also might be the case that
: cover letters and e-mails while short might not be really something to
: heavily discount. The lower discount range can be ignored by setting
: the min of any sweet spot to 1.
On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill wrote:
>> I would definitely not suggest using SSS for fields like legal brief text or
>> emails where there is huge
>> variability in the length of the content -- i can't think of any context
>> where a "short" email is
>> definitively better/worse then
> -Original Message-
> My only thought is that the new stuff seems to be at the expense of the
> formulas listed in the old
> class overview for Similarity.
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/searc
> h/Similarity.html
Opps, my bad
> I would definitely not suggest using SSS for fields like legal brief text or
> emails where there is huge
> variability in the length of the content -- i can't think of any context
> where a "short" email is
> definitively better/worse then a "long" email. more traditional TF/IDF seems
> like
: very small to occasionally very large. It also might be the case that
: cover letters and e-mails while short might not be really something to
: heavily discount. The lower discount range can be ignored by setting
: the min of any sweet spot to 1. Then one starts to wonder if there is
: r
;thetamin4=0&thetamax0=2pi&thetamax1=2pi&thetamax2=2pi&thetamax3=2pi&thetamax4=2pi&ipw=1&ixmin=-50&ixmax=150&iymin=-0.5&iymax=1.5&igx=10&igy=0.25&igl=1&igs=1&iax=0&ila=1&xmin=-50&xmax=150&ymin=-0.5&ymax=1.5
It is hard
: i'll try to get some graphs commited and linked to from the javadocs that
: make it more clear how tweaking the settings affect the formula
http://svn.apache.org/viewvc?rev=1294920&view=rev
-Hoss
-
To unsubscribe, e-mail:
: A picture -- or more precisely a graph -- would be worth a 1000 words.
fair enough. I think the reason i never committed one initially was
because the formula in the javadocs was trivial to plot in gnuplot...
gnuplot> min=0
gnuplot> max=2
gnuplot> base=1.3
gnuplot> xoffset=10
gnuplot> set
of these
(or any other) tf functions, general discussion of document collection sweet
spot, and any insight into parameters of this class (hyperbolic tangent or
otherwise)?
-Paul
>
> : > And I am aware that SweetSpotSimilarity resulted from this paper
> : >
> : > http:
s a convenience for subclasses
that want to use a hyperbolic tf function."
As for what hyperbolicTf is trying to do ... it creates a hyperbolic
function letting you specify a hard max no matter how many terms there
are.
: > And I am aware that SweetSpotSimilarity resulted from this pa
ginal Message-
> From: Peyman Faratin [mailto:pey...@robustlinks.com]
> Sent: Wednesday, February 15, 2012 6:40 AM
> To: java-user@lucene.apache.org
> Subject: SweetSpotSimilarity
>
> Hi
>
> I have a noobie question. I am trying to use the SweetSpotSimilarity (SSS)
> c
Hi
I have a noobie question. I am trying to use the SweetSpotSimilarity (SSS)
class.
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/SweetSpotSimilarity.html
I understand the scoring behavior of Lucene
http://lucene.apache.org/core
engths. Then we started to use SweetSpotSimilarity and it seemed to
> solve the problem. But now we face an other difficulty: it's hard to set the
> correct parameters for SweetSpotSimilarity. For example we want the title of
> a page to always have the highest boost than its content no matt
Dear List,
in our application there are many long documents that we index. Previously we
had a problem with lucene's scoring: some documents got low scores because
their lengths. Then we started to use SweetSpotSimilarity and it seemed to
solve the problem. But now we face an other diffi
16 matches
Mail list logo