Re: lucene index in a cluster.

2007-10-18 Thread Otis Gospodnetic
Alexander, I'd stay away from NFS (slow). It sounds like you'd benefit from moving from vanilla Lucene to Solr that your app(s) could query from different boxes. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Alexander Wallace <[EMAIL PROTEC

Re: Scoring algorithm suggestion?

2007-10-18 Thread Otis Gospodnetic
Uwe, I don't have the answer to your main question, but will point you to the ngram set of tokenizers in Lucene's contrib/, in case you want to use that instead of maintaining your own bi-gram tokenizer. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.

Re: Sort by date with Lucene 2.2.0 ...

2007-10-18 Thread Erick Erickson
Maybe I'm missing something, but that looks like the correct order to me, they are both on September 02, 2007, 11:30 P.M, and ]24 seconds is before 48 seconds. Or is it just late and I'm missing the obvious (a specialty of mine)... Erick On 10/18/07, Dragon Fly <[EMAIL PROTECTED]> wrote: > >

Sort by date with Lucene 2.2.0 ...

2007-10-18 Thread Dragon Fly
Hi, I'm am trying to sort a date field in my index but I'm seeing strange results. I have searched the Lucene user mail archive for Datetools but still couldn't figure out the problem. The date field is indexed as follows (i.e. DateTools is used, date field is stored and untokenized): Strin

Re: Chinese test resources wanted

2007-10-18 Thread Ivan Vasilev
Hi Guys, Do anyone who tests the Analyzers can give me some CJK test resources or advice me from where to obtain. Best Regards, Ivan Ivan Vasilev wrote: Hi Guys, We just implemented multi language support in our application. We tested it with some files which content is copy/pasted from s

Scoring algorithm suggestion?

2007-10-18 Thread Uwe Goetzke
We use lucene in our product since version 1.2. I have developed a new Bigramm stemmer and would like to get a suggestion how to implement the needed scorer for it. Using a Boolean query with a slope I get most of the time the correct documents. For example: The Bigramm split for "docume

Re: contrib/benchmark Parallel tasks ?

2007-10-18 Thread Doron Cohen
Hi Grant, Grant Ingersoll wrote: > I think the answer is: > [{ "MAddDocs" AddDoc } : 5000] : 4 > > Is this the functional equivalent of doing: > { "MAddDocs" AddDoc } : 2 > > in parallel? Yes, this is correct, it reads as "create 4 threads, each adding 5000 docs to the index, and start/run t