Can I using HFS in lucene 2.3.1?

2008-04-25 Thread Alex Chew
Hi, Does somebody have practice building a distributed application with lucene and Hadoop/HFS? Lucene 2.3.1 looks not explose HFSDirectory. Any advice will be appreciated. Regards, Alex

Re: Can I using HFS in lucene 2.3.1?

2008-04-25 Thread Mathieu Lecarme
Alex Chew a écrit : Hi, Does somebody have practice building a distributed application with lucene and Hadoop/HFS? Lucene 2.3.1 looks not explose HFSDirectory. Any advice will be appreciated. Regards, Alex have a look to Nutch. M. --

Can I using HFS in lucene 2.3.1?

2008-04-25 Thread Alex Chew
Hi, Does somebody have practice building a distributed application with lucene and Hadoop/HFS? Lucene 2.3.1 looks not explose HFSDirectory. Any advice will be appreciated. Regards, Alex

Re: Sorting consumes hundreds of MBytes RAM

2008-04-25 Thread Nadav Har'El
On Mon, Apr 14, 2008, Chris Hostetter wrote about "Re: Sorting consumes hundreds of MBytes RAM": > : And question #2: what am I going to do against it? Index sharding? > > The only suggestion i can offer is to take a look at LUCENE-769 ... it > takes a completley differnet appraoch of using a F

Re: Binding lucene instance/threads to a particular processor(or core)

2008-04-25 Thread Glen Newton
Hi Anshum, I looked at the log and couldn't make too much sense of it. But I have an update to my original suggestion: try the following command line parameters: -Xms1024m -XX:-UseParallelGC -XX:+ScavengeBeforeFullGC I got rid of the "-XX:+AggressiveOpts" because perhaps these are not always

Really dumb search problem

2008-04-25 Thread Chris Mannion
Hi All I'm going to start this with an apology because, as you'll see below, I've probably missed out something quite fundamental about how Lucene works. However, I'll explain the problem, we have a system setup which indexes and searches records about business properties. One of the fields we i

Re: Really dumb search problem

2008-04-25 Thread Ulf Dittmer
Have you tried double-quoting the postcode instead of using parentheses: postcode:"M11 1LQ" Ulf --- Chris Mannion <[EMAIL PROTECTED]> wrote: > "(postcode:(M11 1LQ) )" > > However, the postcode search never returns any results. __

Re: Really dumb search problem

2008-04-25 Thread Mark Miller
You can't tokenize the the search query if its on that field...using maybe a per field analyzer and the keyword analyzer? Check em out if you havn't. On Fri, 2008-04-25 at 16:01 +0100, Chris Mannion wrote: > Hi All > > I'm going to start this with an apology because, as you'll see below, I've > p

boosting relevance of certain documents

2008-04-25 Thread Daniel Freudenberger
Hello, I'm using lucene within a new project and I'm not sure about how to solve the following problem: My index consists of the two attributes "id" and "searchable". "id" is the id of a product and "searchable" is a combination of the product name and its category name. example: id

Re: boosting relevance of certain documents

2008-04-25 Thread Jonathan Ariel
How are you analyzing the searchable field? On Fri, Apr 25, 2008 at 12:49 PM, Daniel Freudenberger < [EMAIL PROTECTED]> wrote: > Hello, > > > > I'm using lucene within a new project and I'm not sure about how to solve > the following problem: My index consists of the two attributes "id" and > "se

RE: boosting relevance of certain documents

2008-04-25 Thread Daniel Freudenberger
I'm using the StandardAnalyzer - hope this answers your question (I'm quite new to the lucene thing) -Original Message- From: Jonathan Ariel [mailto:[EMAIL PROTECTED] Sent: Friday, April 25, 2008 6:59 PM To: java-user@lucene.apache.org Subject: Re: boosting relevance of certain documents

Re: boosting relevance of certain documents

2008-04-25 Thread Jonathan Ariel
Ok. So I'm not an expert of the scoring algorithm, but based on tf*idf you can tell that the returned document is more relevant because it has more term frequency. Using the explain you can see the following: Doc 1 0.643841 = (MATCH) fieldWeight(searchable:fifa in 0), product of: 1.0 = tf(termF

Quickie Luke Question

2008-04-25 Thread Matthew Hall
Does anyone know how to set the MaxClauseCount in luke? I'm in a situation where I've had to override it when searching against my indexes, but now I can't use luke to examine what's going on with my queries anymore. Any help would be appreciated. Matt -- Matthew Hall Software Engineer Mous

RE: boosting relevance of certain documents

2008-04-25 Thread Daniel Freudenberger
Thanks for your response. I already knew that the relevance is based on the term frequency but in some cases it's just not what the user expects. As I already mentioned, "fifa 2003 fifa 03" vs. "fifa 08" is such a case - searching for "fifa" would return the "fifa 2003 fifa 03" document first but

Re: boosting relevance of certain documents

2008-04-25 Thread Grant Ingersoll
It really depends. Hand tuning scoring algs for a specific query is very prone to local maxima problems. In other words, you fix one query and break 50 others. Sometimes, a good old "configurable" hard code is the way to go. If you want a specific doc to be #1, make it number one. You

Re: boosting relevance of certain documents

2008-04-25 Thread Anshum
Hi Daniel, Just a suggestion, how bout storing an extra field while indexing that has the "length" of the document. You could just divide the score of the document (change the lucene code) with the length of the document (or something on the same lines) while calculating the score. In this manner,

Re: boosting relevance of certain documents

2008-04-25 Thread Otis Gospodnetic
If this is really about adjusting score based on field length (didn't follow the thread closely), then this sounds like a job for a custom Similarity with a custom implementation of lengthNorm method. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message

Please help with Gradient Formatter of Highlighter

2008-04-25 Thread Mohammad Hasan
Dear all, I need a working example of Gradient Formatter. I want to highlight a searched word after it is found in the database. I am using NHibernate Search & Lucene. But I am an entrly level programmer, so I do not know how to use Gradient Formatter. There are plenty of examples of HTML Format

Does lucene support distributed indexing?

2008-04-25 Thread Samuel Guo
Hi all, I am a lucene newbie:) It seems that lucene doesn't support distributed indexing:( As some IR research papers mentioned, when the documents collection become large, the index will be large also. When one single machine can't hold all the index, some strategies are used to solve it. such a