date:20141111

Re: Document Term matrix

2014-11-11 Thread Ahmet Arslan

Hi, Mahout and Carrot2 can cluster the documents from lucene index. ahmet On Tuesday, November 11, 2014 10:37 PM, Elshaimaa Ali wrote: Hi All, I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to extract a Document-term matrix, and Document Document similarity matri

Re: Document Term matrix

2014-11-11 Thread Paul Libbrecht

The project semanticvectors might be doing what you are looking for. paul On 11 nov. 2014, at 22:37, parnab kumar wrote: > hi, > > While indexing the documents , store the Term Vectors for the content > field. Now for each document you will have an array of terms and their > corresponding fre

Re: Document Term matrix

2014-11-11 Thread parnab kumar

hi, While indexing the documents , store the Term Vectors for the content field. Now for each document you will have an array of terms and their corresponding frequency in the document. Using the Index Reader you can retrieve this term vectors. Similarity between two documents can be computed as

Document Term matrix

2014-11-11 Thread Elshaimaa Ali

Hi All, I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to extract a Document-term matrix, and Document Document similarity matrix in-order to use it to cluster the documents. My questions:1- How can I extract the matrix and compute the similarity between documents in

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-11 Thread Martin O'Shea

Ahmet, Yes that is quite true. But as this is only a proof of concept application, I'm prepared for things to be 'imperfect'. Martin O'Shea. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: 11 Nov 2014 18 26 To: java-user@lucene.apache.org Subject: Re: How

Re: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-11 Thread Ahmet Arslan

Hi, With that analyser, your searches (for same word, but different capitalised) could return different results. Ahmet On Tuesday, November 11, 2014 6:57 PM, Martin O'Shea wrote: In the end I edited the code of the StandardAnalyzer and the SnowballAnalyzer to disable the calls to the LowerCa

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

2014-11-11 Thread Martin O'Shea

In the end I edited the code of the StandardAnalyzer and the SnowballAnalyzer to disable the calls to the LowerCaseFilter. This seems to work. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: 10 Nov 2014 15 19 To: java-user@lucene.apache.org Subject: Re: How

回复： How to map lucene scores to range from 0~100?

2014-11-11 Thread Harry Yu

Hi Rajendra, Thanks for your reply. Normalization is good way to solve it. But there is problem, if normalize by your formula, the score of top one doc would be 100. Although it map score range from 0~100, but the score maybe not show the similarity between query and hit docs. My system is t

Re: How to map lucene scores to range from 0~100?

2014-11-11 Thread Rajendra Rao

Harry , basically converting score into range 0 to 100 require normalization(dividing each score with highest record and multiply by .100) .but this score does n't represent matching %. On Tue, Nov 11, 2014 at 7:48 PM, Harry Yu <502437...@qq.com> wrote: > Hi everyone, > > > I met a new trouble.

How to map lucene scores to range from 0~100?

2014-11-11 Thread Harry Yu

Hi everyone, I met a new trouble. In my system, we should score the doc range from 0 to 100. There are some easy ways to map lucene scores to this scope. Thanks for your help~ Yu

Re: Index keeps growing, then shrinks on restart

2014-11-11 Thread Rob Nikander

On Tue, Nov 11, 2014 at 4:26 AM, Ian Lea wrote: > Telling us the version of lucene and the OS you're running on is > always a good idea. > Oops, yes. Lucene 4.10.0, Linux. A guess here is that you aren't closing index readers, so the JVM will > be holding on to deleted files until it exits. >

Re: How to improve the performance in Lucene when query is long?

2014-11-11 Thread Ahmet Arslan

Hi Harry, May be you can use BooleanQuery#setMinimumNumberShouldMatch method. What happens when you use set it to half of the numTerms? ahmet On Tuesday, November 11, 2014 8:35 AM, Harry Yu <502437...@qq.com> wrote: Hi everyone, I have been using Lucene to build a POI searching & geocoding

Re: Index keeps growing, then shrinks on restart

2014-11-11 Thread Ian Lea

Telling us the version of lucene and the OS you're running on is always a good idea. A guess here is that you aren't closing index readers, so the JVM will be holding on to deleted files until it exits. A combination of du, ls, and lsof commands should prove it, or just losf: run it against the j

Re: Document Term matrix

Re: Document Term matrix

Re: Document Term matrix

Document Term matrix

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

Re: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

RE: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2

回复： How to map lucene scores to range from 0~100?

Re: How to map lucene scores to range from 0~100?

How to map lucene scores to range from 0~100?

Re: Index keeps growing, then shrinks on restart

Re: How to improve the performance in Lucene when query is long?

Re: Index keeps growing, then shrinks on restart

13 matches

Site Navigation

Mail list logo

Footer information