The formula hasn't changed (but the first printing of the book had a portion of
it missing, check javadoc for (Default?)Similarity for the real and current
formula).
Here is a simple IDF example, or at least how I "visualize" IDF.
You have an index with a bunch of documents and terms in it. A t
:I was recently looking thru the lucene in action book and came across the
: scoring formula. I was wondering if the formula has changed since the book
: was written?
no, but the book has some mistakes, and the scoring formula is one of
them...
http://lucenebook.com/blog/errata/
http://lucene
Thanks Erick for reply.it will help us.
Regards,
Amit
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 05, 2006 6:32 PM
To: java-user@lucene.apache.org; [EMAIL PROTECTED]
Subject: Re: Function writing using lucene
Amit:
You can make a
Hello,
I was recently looking thru the lucene in action book and came across the
scoring formula. I was wondering if the formula has changed since the book
was written?
Also was wondering if someone can breifly explain what the IDF(t) term in
the formula means? In the book it says that it's th
Look for MoreLikeThis class in Lucene's contrib/ directory.
Otis
- Original Message
From: Dominik Bruhn <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, July 6, 2006 7:54:20 PM
Subject: Searching for similiar texts
Hy,
I index articles using two fields, one for the ti
Hy,
I index articles using two fields, one for the title and one for the text. Now
I want to display 5 similiar Articles for every Article during viewing. How
can I manage this? Any premade solutions?
Thanks
--
Dominik Bruhn
mailto: [EMAIL PROTECTED]
http://www.dbruhn.de
--
: Could the file name (fully qualified filepath/filename) be used as the
: search
: term ?
:
: Could the entire file be stringified (one long string, with or without
: new-lines)
: and that be used as the term (probably not, since not tokenized) ?
either of those can work -- it all depends on how
: I found this thread to be very useful when deciding
: upon an indexing strategy.
:
: http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12700.html
FYI: that thread was the basis of the mechanism Solr uses to create
"snapshots" of indexes for replication fro ma Master to multiple Slav
Van Nguyen wrote:
I just want results that have:
ID: 1234 OR 2344 OR 2323
LOCATION: A1
LANGUAGE: ENU
This query returns everything from my index. How would I create a query
that will only return results the must have LOCATION and LANGUAGE and
have only those three IDs.
I think you'll ne
I have a BooleanQuery that looks like this:
BooleanQuery query = new BooleanQuery();
TermQuery term1 = new TermQuery(new Term(ID, "1234"));
TermQuery term2 = new TermQuery(new Term(ID, "2344"));
TermQuery term2 = new TermQuery(new Term(ID, "2323"));
TermQuery termLocation = new TermQuery
Hello Lucene community
So, having looked at the api and at numerous email postings and exchanges,
I see that updating a particular document in the index that represents a
given file
that has changed involves
1) deleting with deleteDocument (of either IndexReader or IndexModifier)
and then
2
Hey,
I found this thread to be very useful when deciding
upon an indexing strategy.
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12700.html
The system I work on has 3 million or so documents and
it was (until a non-lucene performance issue came up)
setup to add/delete new docum
We have a similar setup, although probably only 1/5th the number of
documents and updates. I'd suggest just making periodic index backups.
I've been storing my index as follows:
//data/ (lucene index directory)
//backups/
The "data" is what's passed into IndexWriter/IndexReader. Additionally,
I've been asked to do a project which provides full-text search for a
large database of articles. The expectation is that most of the
articles are fairly small (<2k bytes). There will be an initial
population of around 400,000 articles. There will then be approximately
2000 new articles added ea
Thanks, I always appreciate someone else doing work for me
Best
Erick
Hi James,
A paper was mentioned on this list in the last couple of months which
presents a solution to your sampling problem without having to know the
total results size in advance. The paper
(http://www2005.org/cdrom/docs/p245.pdf) presents two solutions which
utilize a random variable.
Hey,
Sorry, I will explain a bit more about my collect
method. Currently my collect method is executing
IndexSearcher.doc(id) and storing some stuff in a Map
which I can then retrieve from the HitCollector (much
like the example in the Lucene In Action book). Of
course that's somewhat expensive, s
Hi all.
I just want to share my experience with the Berkeley DB JEDirectory
implementation from the contrib. area. I spend two days evaluating and
testing it and found out that it does work, but has very bad performance
and very high disk requirements for medium size document volume.
I indexed
> i guess I needed to RTFC
I found that recently too. My only contribution to Lucene has been asking
for a Javadoc addition to prevent others from falling into a trap, which I
fell into. My issue was http://issues.apache.org/jira/browse/LUCENE-594.
Similarly, you could ask for a Javadoc comment fo
hi all,
i am want to ask if files are indexed
and on Query search in what order the paths of files are
displayed.
is it the highest no. of match occur in one file will be
displayed first than others ?
regards
amit kumar
DISCLAIMER
==
This e-mail may contain privileged and confidential
Thanks for the helpful tip, it makes sense now.
I had previously assumed (wrongly) that RAMDirectory.close() would
free up its memory buffers.. but i guess I needed to RTFC...
RAMDirectory.close() is just an empty method.
On 7/5/06, Rob Staveley (Tom) <[EMAIL PROTECTED]> wrote:
My two bits...
21 matches
Mail list logo