>>So I'm afraid I can't use the technique you recommend.
ah right - so the TermVector you use from the index will return mixed
and lower case versions of the same text.
One point to note - this would mean that of the 25 or so top terms
selected by MoreLikeThis for querying there is a reasonable
comparison in MoreLikeThis class in Lucene's
contrib/queries project
>>the case matters only for those words that should be included.
Jong, just want to check we're on the same page - you do know MoreLikeThis
has a kind of automatic Stop-Wording built in , yes?
MoreLikeThis looks at
>>the case matters only for those words that should be included.
Jong, just want to check we're on the same page - you do know
MoreLikeThis has a kind of automatic Stop-Wording built in , yes?
MoreLikeThis looks at the document frequency of all terms in the "this"
text you provide and only sele
- Original Message
From: Jong Kim <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, 9 July, 2007 3:55:03 PM
Subject: RE: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project
>>Or are you saying that you have deliberately chosen t
the
useful class even more useful.
/Jong
-Original Message-
From: mark harwood [mailto:[EMAIL PROTECTED]
Sent: Monday, July 09, 2007 11:54 AM
To: java-user@lucene.apache.org
Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project
OK. I can see the
ying
different analyzer is no good for my case.
/Jong
-Original Message-
From: mark harwood [mailto:[EMAIL PROTECTED]
Sent: Monday, July 09, 2007 5:01 AM
To: java-user@lucene.apache.org
Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project
>>
supply stop words in a
case-insensitive fashion?
- Original Message
From: Jong Kim <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, 9 July, 2007 3:00:05 PM
Subject: RE: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project
My applicat
case-insensitive fashion?
- Original Message
From: Jong Kim <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, 9 July, 2007 3:00:05 PM
Subject: RE: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project
My application stores term vecto
-
From: mark harwood [mailto:[EMAIL PROTECTED]
Sent: Monday, July 09, 2007 5:01 AM
To: java-user@lucene.apache.org
Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project
>>I need this comparison to be case-insensitive
The choice of case-sensi
>>I need this comparison to be case-insensitive
The choice of case-sensitivity (and preservation of punctuation, numbers etc
etc) is controlled by your choice of analyzer that you pass to MoreLikeThis. If
you want to ensure your list of stop words adheres to the same logic - use the
same analyz
: I need this comparison to be case-insensitive, but I don't see any way of
: achieving it by extending this class. I would have created a subclass of
: MoreLikeThis and override the isNoiseWord() method. However, the problem is
: that, neither isNoiseWord() method nor the instance variables refer
11 matches
Mail list logo