Hi,
Does somebody have practice building a distributed application with lucene
and Hadoop/HFS?
Lucene 2.3.1 looks not explose HFSDirectory.
Any advice will be appreciated.
Regards,
Alex
Alex Chew a écrit :
Hi,
Does somebody have practice building a distributed application with lucene
and Hadoop/HFS?
Lucene 2.3.1 looks not explose HFSDirectory.
Any advice will be appreciated.
Regards,
Alex
have a look to Nutch.
M.
--
Hi,
Does somebody have practice building a distributed application with lucene
and Hadoop/HFS?
Lucene 2.3.1 looks not explose HFSDirectory.
Any advice will be appreciated.
Regards,
Alex
On Mon, Apr 14, 2008, Chris Hostetter wrote about "Re: Sorting consumes
hundreds of MBytes RAM":
> : And question #2: what am I going to do against it? Index sharding?
>
> The only suggestion i can offer is to take a look at LUCENE-769 ... it
> takes a completley differnet appraoch of using a F
Hi Anshum,
I looked at the log and couldn't make too much sense of it.
But I have an update to my original suggestion: try the following
command line parameters:
-Xms1024m -XX:-UseParallelGC -XX:+ScavengeBeforeFullGC
I got rid of the "-XX:+AggressiveOpts" because perhaps these are not
always
Hi All
I'm going to start this with an apology because, as you'll see below, I've
probably missed out something quite fundamental about how Lucene works.
However, I'll explain the problem, we have a system setup which indexes and
searches records about business properties. One of the fields we i
Have you tried double-quoting the postcode instead of
using parentheses:
postcode:"M11 1LQ"
Ulf
--- Chris Mannion <[EMAIL PROTECTED]>
wrote:
> "(postcode:(M11 1LQ) )"
>
> However, the postcode search never returns any results.
__
You can't tokenize the the search query if its on that field...using
maybe a per field analyzer and the keyword analyzer? Check em out if you
havn't.
On Fri, 2008-04-25 at 16:01 +0100, Chris Mannion wrote:
> Hi All
>
> I'm going to start this with an apology because, as you'll see below, I've
> p
Hello,
I'm using lucene within a new project and I'm not sure about how to solve
the following problem: My index consists of the two attributes "id" and
"searchable". "id" is the id of a product and "searchable" is a combination
of the product name and its category name.
example:
id
How are you analyzing the searchable field?
On Fri, Apr 25, 2008 at 12:49 PM, Daniel Freudenberger <
[EMAIL PROTECTED]> wrote:
> Hello,
>
>
>
> I'm using lucene within a new project and I'm not sure about how to solve
> the following problem: My index consists of the two attributes "id" and
> "se
I'm using the StandardAnalyzer - hope this answers your question (I'm quite
new to the lucene thing)
-Original Message-
From: Jonathan Ariel [mailto:[EMAIL PROTECTED]
Sent: Friday, April 25, 2008 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: boosting relevance of certain documents
Ok. So I'm not an expert of the scoring algorithm, but based on tf*idf you
can tell that the returned document is more relevant because it has more
term frequency.
Using the explain you can see the following:
Doc 1
0.643841 = (MATCH) fieldWeight(searchable:fifa in 0), product of:
1.0 = tf(termF
Does anyone know how to set the MaxClauseCount in luke?
I'm in a situation where I've had to override it when searching against
my indexes, but now I can't use luke to examine what's going on with my
queries anymore.
Any help would be appreciated.
Matt
--
Matthew Hall
Software Engineer
Mous
Thanks for your response. I already knew that the relevance is based on the
term frequency but in some cases it's just not what the user expects.
As I already mentioned, "fifa 2003 fifa 03" vs. "fifa 08" is such a case -
searching for "fifa" would return the "fifa 2003 fifa 03" document first but
It really depends. Hand tuning scoring algs for a specific query is
very prone to local maxima problems. In other words, you fix one
query and break 50 others. Sometimes, a good old "configurable" hard
code is the way to go. If you want a specific doc to be #1, make it
number one. You
Hi Daniel,
Just a suggestion, how bout storing an extra field while indexing that has
the "length" of the document. You could just divide the score of the
document (change the lucene code) with the length of the document (or
something on the same lines) while calculating the score. In this manner,
If this is really about adjusting score based on field length (didn't follow
the thread closely), then this sounds like a job for a custom Similarity with a
custom implementation of lengthNorm method.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
Dear all,
I need a working example of Gradient Formatter. I want to highlight a
searched word after it is found in the database. I am using NHibernate
Search & Lucene. But I am an entrly level programmer, so I do not know how
to use Gradient Formatter. There are plenty of examples of HTML Format
Hi all,
I am a lucene newbie:)
It seems that lucene doesn't support distributed indexing:(
As some IR research papers mentioned, when the documents collection become
large, the index will be large also. When one single machine can't hold all
the index, some strategies are used to solve it. such a
19 matches
Mail list logo