Re: Why not normalization?

2010-07-08 Thread manjula wijewickrema
Hi Rebecca, Thanks for your valuble comments. Yes I observed tha, once the number of terms of the goes up, fieldNorm value goes down correspondingly. I think, therefore there won't be any default due to the variation of total number of terms in the document. Am I right? Manjula. On Thu, Jul 8, 2

RE: Best way to use Lucene from perl

2010-07-08 Thread Uwe Schindler
I forgot, if you still want it in-process (inside your scripts not outside in a separate server like Solr), have a look a Lucy / KinoSearch. The index format is different but the API is like Lucene's (it's a loose port). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.

RE: Best way to use Lucene from perl

2010-07-08 Thread Uwe Schindler
Have a look at Apache Solr - it implements a server that manages and queries the index, supports schema configurations and makes the "API" accessible RESTful. You have to run Lucene in a Servlet Container like Jetty or Tomcat, but your application is written in any language that supports REST/JSON/

Best way to use Lucene from perl

2010-07-08 Thread Igor Chudov
I am extremely impressed with Lucene and would like to thank Naveen and Otis for your kind help. I am not really a Java person, I am a perl and C++ guy and my website is done with mod_perl. So, my obvious question is what perl implementation of lucene access you would recommend. It would seem th

Re: Personal Intro and a question on "find top 10 similar items" functionality

2010-07-08 Thread Otis Gospodnetic
Igor, You can treat that question as the query and use it to search the index where you've indexed other questions. More Like This is another option. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message

Re: Personal Intro and a question on "find top 10 similar items" functionality

2010-07-08 Thread Naveen Kumar
Hi Check out a class called MoreLikeThis in lucene. It should solve your problem. Naveen Kumar On Fri, Jul 9, 2010 at 3:42 AM, Igor Chudov wrote: > Hello, > > My name is Igor and I own a website algebra.com. I just joined. > > I have a database of answered algebra questions (208,000 and growing

Personal Intro and a question on "find top 10 similar items" functionality

2010-07-08 Thread Igor Chudov
Hello, My name is Igor and I own a website algebra.com. I just joined. I have a database of answered algebra questions (208,000 and growing). A typical question is here (original spelling): ``who long does it take 2 people to finish painting a house if the first one takes 6 days and the second

Re: Retrieve term payloads / custom PayloadFilter

2010-07-08 Thread Erick Erickson
If you know this at index time, could you index language-specific fields? i.e. text_en, text_de, title_en, title_de etc? Perhaps you could have a catch-all that contained everything too. Then your searching would be on a per field_lang basis. PerFieldAnalyzerWrapper would automatically use the pro

Re: Issue Lucene-2421 and NativeFSLockFactory.clearLock behaviour?

2010-07-08 Thread Ted McFadden
Thanks for that. Cheers, Ted On 8 July 2010 23:12, Shai Erera wrote: > I committed a fix earlier today. clearLock will fail if the lock cannot be > released (meaning someone else holds it), however ignore the result of > file.delete(). > > Shai > > On Wed, Jul 7, 2010 at 7:41 PM, Shai Erera

Re: Issue Lucene-2421 and NativeFSLockFactory.clearLock behaviour?

2010-07-08 Thread Shai Erera
I committed a fix earlier today. clearLock will fail if the lock cannot be released (meaning someone else holds it), however ignore the result of file.delete(). Shai On Wed, Jul 7, 2010 at 7:41 PM, Shai Erera wrote: > Double-checking the code, this isn't that simple :). Someone can call > clear

Retrieve term payloads / custom PayloadFilter

2010-07-08 Thread Bernhard Haslhofer
Hi, in my application I have documents that may contain terms and term translations in multiple languages. The language tag of each term is explicitly given and should be available in the index in order to enable queries for documents that contain a certain term (optionally in a given language)