Hi,
Mahout and Carrot2 can cluster the documents from lucene index.
ahmet
On Tuesday, November 11, 2014 10:37 PM, Elshaimaa Ali
wrote:
Hi All,
I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to
extract a Document-term matrix, and Document Document similarity matri
The project semanticvectors might be doing what you are looking for.
paul
On 11 nov. 2014, at 22:37, parnab kumar wrote:
> hi,
>
> While indexing the documents , store the Term Vectors for the content
> field. Now for each document you will have an array of terms and their
> corresponding fre
hi,
While indexing the documents , store the Term Vectors for the content
field. Now for each document you will have an array of terms and their
corresponding frequency in the document. Using the Index Reader you can
retrieve this term vectors. Similarity between two documents can be
computed as
Hi All,
I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to
extract a Document-term matrix, and Document Document similarity matrix
in-order to use it to cluster the documents. My questions:1- How can I extract
the matrix and compute the similarity between documents in
Ahmet,
Yes that is quite true. But as this is only a proof of concept application,
I'm prepared for things to be 'imperfect'.
Martin O'Shea.
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: 11 Nov 2014 18 26
To: java-user@lucene.apache.org
Subject: Re: How
Hi,
With that analyser, your searches (for same word, but different capitalised)
could return different results.
Ahmet
On Tuesday, November 11, 2014 6:57 PM, Martin O'Shea
wrote:
In the end I edited the code of the StandardAnalyzer and the
SnowballAnalyzer to disable the calls to the LowerCa
In the end I edited the code of the StandardAnalyzer and the
SnowballAnalyzer to disable the calls to the LowerCaseFilter. This seems to
work.
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: 10 Nov 2014 15 19
To: java-user@lucene.apache.org
Subject: Re: How
Hi Rajendra,
Thanks for your reply. Normalization is good way to solve it. But there is
problem, if normalize by your formula, the score of top one doc would be 100.
Although it map score range from 0~100, but the score maybe not show the
similarity between query and hit docs.
My system is t
Harry ,
basically converting score into range 0 to 100 require
normalization(dividing each score with highest record and multiply by .100)
.but this score does n't represent matching %.
On Tue, Nov 11, 2014 at 7:48 PM, Harry Yu <502437...@qq.com> wrote:
> Hi everyone,
>
>
> I met a new trouble.
Hi everyone,
I met a new trouble. In my system, we should score the doc range from 0 to 100.
There are some easy ways to map lucene scores to this scope. Thanks for your
help~
Yu
On Tue, Nov 11, 2014 at 4:26 AM, Ian Lea wrote:
> Telling us the version of lucene and the OS you're running on is
> always a good idea.
>
Oops, yes. Lucene 4.10.0, Linux.
A guess here is that you aren't closing index readers, so the JVM will
> be holding on to deleted files until it exits.
>
Hi Harry,
May be you can use BooleanQuery#setMinimumNumberShouldMatch method. What
happens when you use set it to half of the numTerms?
ahmet
On Tuesday, November 11, 2014 8:35 AM, Harry Yu <502437...@qq.com> wrote:
Hi everyone,
I have been using Lucene to build a POI searching & geocoding
Telling us the version of lucene and the OS you're running on is
always a good idea.
A guess here is that you aren't closing index readers, so the JVM will
be holding on to deleted files until it exits.
A combination of du, ls, and lsof commands should prove it, or just
losf: run it against the j
13 matches
Mail list logo