RE: two applications accessing same index

2006-02-05 Thread Vanlerberghe, Luc
Sure, the only danger is you have to make sure that both processes store their lock files in the same directory (default they are in your home directory I believe) unless you use a different locking mechanism. There are supposed to be problems when accessing indices over network shares, but I use

RE: index merging

2006-02-05 Thread Vanlerberghe, Luc
Sorry to contradict you Yonik, but I'm pretty sure the commit lock is *not* locked during a merge, only while the "segments" file is being updated. The merge process takes a set of 'old' segment files, writes new segment files and 'registers' them in the "segments" file when they are ready to be

Re: two problems of using the lucene.

2006-02-05 Thread jason
Hi, I try to read the source code of the lucene. But i only find in the TermScorer.java where the tf/idf measure is really implemented. I guess that whether the Queryparser class will convert each word into a termquery first. Then, queries such as the the Booleanquery are built. The source code o

Re: Question.

2006-02-05 Thread jason
You can get the term frequency matrix first. Then, select the most frequent terms. One letter has said how to build the term frequency matrix. regards jiang xing On 2/6/06, Pranay Jain <[EMAIL PROTECTED]> wrote: > > I have earlier used lucene and I must say it has performed bug free for > the >

Re: Inappropriate content detection

2006-02-05 Thread gekkokid
Hi, what scale is this website? millions of posts or under? wouldn't it be easiler to use a bayesian algorithm to scan each new post before it is posted to detect whether it is acceptable or not? just a quick idea of my head _gk - Original Message - From: "Jeff Thorne" <[EMAIL PRO

Question.

2006-02-05 Thread Pranay Jain
I have earlier used lucene and I must say it has performed bug free for the limited use I deployed it for. I now want to deploy lucene to do something more. Once indexed, I want to know, which is the word which occurs maximum times among all the rest in a document set. Does lucene already provide s

Re: Inappropriate content detection

2006-02-05 Thread Jeff Rodenburg
You can generate a token stream for a block of text without having to index it. Take a look at the highlighter code, it does this very thing. On 2/5/06, Jeff Thorne <[EMAIL PROTECTED]> wrote: > > I am trying to figure out whether or not Lucene is an appropriate solution > for a problem that our

Re: Inappropriate content detection

2006-02-05 Thread Daniel Noll
Jeff Thorne wrote: I am trying to figure out whether or not Lucene is an appropriate solution for a problem that our site faces. I would like to analyze each users post for various words and expressions before publishing their post to the DB. I am reading through the Lucene in action book and

Inappropriate content detection

2006-02-05 Thread Jeff Thorne
I am trying to figure out whether or not Lucene is an appropriate solution for a problem that our site faces. Our site allows users to post their opinions on various topics. Due to various government legislations around the world our management would like us to scan each users post against various

two applications accessing same index

2006-02-05 Thread Pradeep Sharma
I have two applications, one which will be generating all the indexes and the second one which will be reading those indexes. I cannot keep them in the same application, because one will run all the times in batches via some sort of scheduler to generate the indexes and the application which wil

AW: two problems of using the lucene.

2006-02-05 Thread Klaus
Hi, you have to write your own similarity object and pass it to your analyzer. http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h tml Cheers, Klaus -Ursprüngliche Nachricht- Von: xing jiang [mailto:[EMAIL PROTECTED] Gesendet: Sonntag, 5. Februar 2006 04:27 An

Reducing Inflated Similarity Scores

2006-02-05 Thread Eugene Ezekiel
Hi All, I'm currently using the Default Similarity with the Boolean Query add function to append clauses. The problem I face is this, given a query , where = a term it returns me a document which that has just ONE term in it say and nothing else. Surprisingly, the hits score for this

Re: Field search problem(only single word query works)

2006-02-05 Thread Erik Hatcher
I recommend you take a look at your indexes with Luke and see what actually is indexed. Erik On Feb 4, 2006, at 11:54 PM, Xin Herbert Wu wrote: Hi, I have two libraries A and B indexed from database tables where A has about 10 fields and B has about 30 fields(with about a couple