Re: Cosine similarity

2009-07-24 Thread Otis Gospodnetic
Yes, have a look at this: http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: starz10de > T

Re: most frquent term in the index

2009-07-24 Thread Otis Gospodnetic
Hello, Here is a class you can use for that: ./contrib/miscellaneous/src/java/org/apache/lucene/misc/HighFreqTerms.java Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: star

RE: How to get rid of unused fields?

2009-07-24 Thread Chris Hostetter
: : The same here, even with trunk from yesterday. If you create a field, it : : stays there forever, even after deleting *all* documents from index, : : reindexing without the field and optimizing. : : Uwe: if you have a quick test case already written can you try it against : 2.4 (and maybe 2.

most frquent term in the index

2009-07-24 Thread starz10de
How to get the most frequent terms in the index in descending order? Thanks -- View this message in context: http://www.nabble.com/most-frquent-term-in-the-index-tp24651807p24651807.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

Cosine similarity

2009-07-24 Thread starz10de
Does lucene use cosine smiliarity measure to measure the similarity between the query and the indexed documents? Thanks -- View this message in context: http://www.nabble.com/Cosine-similarity-tp24651759p24651759.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Re: Doc IDs via IndexReader?

2009-07-24 Thread Shai Erera
There are a couple of things I can think of: 1) From IndexReader's javadoc ( http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/IndexReader.html#deleteDocument%28int%29): "An IndexReader can be opened on a directory for which an IndexWriter is opened already, but it cannot be used to

Re: arabic analyzer

2009-07-24 Thread Robert Muir
walid, it is true some of what you mentioned (from aramorph) works in light stemming version, some does not. The problem is that its not clear to me that what aramorph is doing is really the best. >From the paper I sent you: The best stemmer in our experiments, light8-s was very simple and did no

Re: Removing diacritics with ISOLatin1AccentFilter

2009-07-24 Thread luther blisset
yes Ahmet Arslan ...this works!! I've just tested it and works nicely... really thanks.. Ahmet Arslan wrote: > > > Or alternatively: > > String test = "HÄllo HÄllo HÄllo HÄllo HÄllo"; > > ISOLatin1AccentFilter filter = new ISOLatin1AccentFilter(new > KeywordTokenizer(new St

Re: Removing diacritics with ISOLatin1AccentFilter

2009-07-24 Thread luther blisset
I'm trying to index all the words without accent. I do the same when I'm querying, I remove the accent and lower case the search term. Why should I pass the string through the analyzer? or what is wrong if don't pass it through the analyzer? and what are the benefits? I'm just a newbie with Lucene

Re: Removing diacritics with ISOLatin1AccentFilter

2009-07-24 Thread AHMET ARSLAN
Or alternatively: String test = "HÄllo HÄllo HÄllo HÄllo HÄllo"; ISOLatin1AccentFilter filter = new ISOLatin1AccentFilter(new KeywordTokenizer(new StringReader(test))); final Token reusableToken = new Token(); Token nextToken; if ((nextToken = filter.next(re

Re: Removing diacritics with ISOLatin1AccentFilter

2009-07-24 Thread Simon Willnauer
On Fri, Jul 24, 2009 at 11:41 AM, luther blisset wrote: > > Hi folks, > I just upgrading Hibernate Search library of my app and so I had to upgrade > Lucene too and pass from 2.2 to 2.4 version. > In Lucene 2.4 the ISOLatin1AccentFilter class has changed and I can't figure > how it works. > I use a

Removing diacritics with ISOLatin1AccentFilter

2009-07-24 Thread luther blisset
Hi folks, I just upgrading Hibernate Search library of my app and so I had to upgrade Lucene too and pass from 2.2 to 2.4 version. In Lucene 2.4 the ISOLatin1AccentFilter class has changed and I can't figure how it works. I use a TwoWayFieldBridge to index the data and this is my set method: publ

Re: Loading an index into memory

2009-07-24 Thread Thomas Becker
We've a centralized lucene index running on a nfs share. This index gets an update per 30 min. The LuceneServer nodes will notice the update and copy the index (about 2,5gig) to a local tmpfs directory. Searching is way faster in our case compared to a local disk. However eks' concerns are valid an

Re: arabic analyzer

2009-07-24 Thread walid
We were using the aramorph library for some time and so we mapped out the set of features it provides, they come as follows: The ء and ~