Re: for check similarity of two sentences

2015-04-02 Thread Gimantha Bandara
Hi Heshan, I think you can achieve what you are looking for. You may read "lucene in Action 2nd edition" about lucene scoring system and FuzzyQuery. Hope this may help. May be someone can suggest much better approach. On Wed, Apr 1, 2015 at 8:14 AM, hesh jay wrote: > hi, > I am second year under

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Gimantha Bandara
Hi All, I have successfully setup a merged indices and drilldown and usual search operations work perfect. But, I have a side question. If I selected RAMDirectory as the destination Indices in merging, probably the jvm can go out of memory if the merged indices are too big. Is there a way I can ha

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Christoph Kaser
Hi Gimantha, why do you use a RAMDirectory? If your merged index fits into RAM completely, a MMapDirectory should offer almost the same performance. And if not, it is definitely the better choice. Regards Christoph Am 02.04.2015 um 12:38 schrieb Gimantha Bandara: Hi All, I have successfully

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Shai Erera
In some cases, MMapDirectory offers even better performance, since the JVM doesn't need to manage that RAM when it's doing GC. Also, using only RAMDirectory is not safe in that if the JVM crashes, your index is lost. On Thu, Apr 2, 2015 at 12:54 PM, Christoph Kaser wrote: > Hi Gimantha, > > why

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Gimantha Bandara
Hi Christoph and Shai, Thanks for the quick response!. Indices are stored in a relational database ( using a custom Directory implementation ). The Problem comes since the indices are sharded (both taxonomy indices and normal doc indices), when a user wants to drilldown, I have to merge all the in

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Gimantha Bandara
Btw I was using a RAMDirectory for just testing purposes.. On Thu, Apr 2, 2015 at 5:16 PM, Gimantha Bandara wrote: > Hi Christoph and Shai, > > Thanks for the quick response!. > Indices are stored in a relational database ( using a custom Directory > implementation ). The Problem comes since the

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Shai Erera
MMapDirectory uses memory-mapped files. This is an operating system level feature, where even though the file resides on disk, the OS can memory-map it and access it more efficiently. It is loaded into memory outside the JVM heap, and usually on a properly configured server you should not worry abo

Re: for check similarity of two sentences

2015-04-02 Thread Robust Links
Hi Heshan one approach could be something like this: 1- vectorize each ngram of each sentence. One vectorization strategy is to use word2vec (the deep learning package). i believe someone has ported word2vec (originally in C) to Lucene. do google search 2- aggregate each word vector (i.e some clu

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Gimantha Bandara
Hi Shai Currently I am using a DB, But the platform we are developing needs to support RDBMS, HBase and other Datasource types for indices to be stored. So the user should be able to use whatever the underlying filesystem he wants to use. I am not sure if Solr can support multiple datasource types

RE: [EXTERNAL] Re: general question

2015-04-02 Thread Fielder, Todd Patrick
I can't get the suggested way to work (either the child scorer or creating a query wrapper), so may end up doing a query on each field, just not sure how expensive that will end up being... Additional thoughts? -Todd -Original Message- From: Sanne Grinovero [mailto:sanne.grinov...@gmai