Thanks, works like a charm.
-Original Message-
From: Paul Elschot [mailto:[EMAIL PROTECTED]
Sent: Monday, March 14, 2005 11:05 AM
To: java-user@lucene.apache.org
Subject: Re: Simple Search Question.
On Monday 14 March 2005 19:59, Kyong Kwak wrote:
>
> I looked and didn't find anything
I think what they do at Google is a fancy heuristic -- as David Spencer
mentioned, suburls of a given page, identical snippets, or titles... My
idea was more towards providing a 'realistic overview' of subjects in
pages. So you could pick, say, the first document from each cluster and
show them
Otis Gospodnetic wrote:
The problem with 2c is that scores are currently relative, and not
absolute. I am hoping Chuck's patch makes it into the source, as
making scores absolute would be helpful in situations like this one.
Good point.
If the orig MoreLikeThis query allows the source doc to be re
This will help:
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermEnum.html
Otis
--- Kyong Kwak <[EMAIL PROTECTED]> wrote:
>
> I looked and didn't find anything and wanted to know what the best
> way
> might be for getting a unique list of values in a given field?
> so if I hav
On Monday 14 March 2005 19:59, Kyong Kwak wrote:
>
> I looked and didn't find anything and wanted to know what the best way
> might be for getting a unique list of values in a given field?
> so if I have a field named "category" ( it's a keyword ) and I wanted to
> get all the unique values for th
Nutch is a full-blown search engine (fetcher/crawler, web links
databases, etc.). luceneweb.war is simply a web-app with with a Lucene
demo. Lucene is only a toolkit, not a full-blows application.
Otis
--- Hasan Diwan <[EMAIL PROTECTED]> wrote:
> I just checked out a copy of the svn sources a
The problem with 2c is that scores are currently relative, and not
absolute. I am hoping Chuck's patch makes it into the source, as
making scores absolute would be helpful in situations like this one.
Otis
--- David Spencer <[EMAIL PROTECTED]> wrote:
> Miles Barr wrote:
>
> > Has anyone tried
I looked and didn't find anything and wanted to know what the best way
might be for getting a unique list of values in a given field?
so if I have a field named "category" ( it's a keyword ) and I wanted to
get all the unique values for that, how would I go about it?
thanks!
I just checked out a copy of the svn sources and was wondering what
the difference is between luceneweb.war and nutch. I'm certain there
must be differences, else there wouldn't be two different projects.
--
Cheers,
Hasan Diwan <[EMAIL PROTECTED]>
-
Miles Barr wrote:
Has anyone tried to remove similar documents from their search results?
It looks like Google does some on the fly filtering of the results,
hiding pages which is thinks are too similar, i.e. when you see:
"In order to show you the most relevant results, we have omitted some
entrie
Hi Dawid,
On Mon, 2005-03-14 at 18:55 +0100, Dawid Weiss wrote:
> I can imagine if you apply clustering to search results anyway then the
> information about clusters can help you determine 'similar' results and
> reorder the output list.
That's an interesting idea. How easy is it to 'tighten'
Hi Miles :)
I can imagine if you apply clustering to search results anyway then the
information about clusters can help you determine 'similar' results and
reorder the output list.
Just a thought.
D.
Miles Barr wrote:
Has anyone tried to remove similar documents from their search results?
It loo
Has anyone tried to remove similar documents from their search results?
It looks like Google does some on the fly filtering of the results,
hiding pages which is thinks are too similar, i.e. when you see:
"In order to show you the most relevant results, we have omitted some
entries very similar to
Hi all.
I have large index of documents (about 1.6 millions)
One field (for example called “number”) contains string of digits.
I need to do wildcard search on this field such as “*expression*” (i.e.
all documents that contains “expression” in this field.
When I run such search with very short e
Hi Guys
The process is correct ,
but It is Impossible to have the optional terms.
The Documents we Index is in millions with similar word trailers .
Any other Ideas , Please advise
Thx in advance
Karthik
-Original Message-
From: sergiu gordea [mailto:[EMAIL PROTECTED]
Sent: Monday
Karthik N S wrote:
Hi Guys
Is there a way around for which the query parser would have something like
this
(+digital +camera +optics) -(All other Default variables)
But a run time Once cannot determine the default values.
I am stuck in between for this cause :(D
You can ask the u
Hi Guys
Is there a way around for which the query parser would have something like
this
(+digital +camera +optics) -(All other Default variables)
But a run time Once cannot determine the default values.
I am stuck in between for this cause :(D
-Original Message-
From: ser
Well,
If I understand the workings of the TF/IDF model used by Lucene correctly, then
doc 6 should score lower than 3 because of the extra noise caused by 'CABEL
ACCESSORIES', and setting the threshold high enough for feedback of the highest
score should do the trick. Right?
Bram Kouwenberg
Karthik N S wrote:
**
*Hi Guys*
*Apologies...*
*I have Indexed documents sucessfully and they would be
**Document 1 contains = ELECTRONICS DIGITAL CAMERA
***Document 2 contains = ELECTRONICS DIGITAL CAMERA BATTERY
ACCESSORIES*
*Document 3 contains* = ELECTRONICS DIGITAL CAME
Hi
Guys
Apologies...
I
have Indexed documents
sucessfully and they would be
Document 1
contains = ELECTRONICS DIGITAL CAMERA
Document 2 contains = ELECTRONICS
DIGITAL CAMERA BATTERY
ACCESSORIESDocument 3 contains =
ELECTRONICS DIGITAL CAMERA 0PTICSDocument 4 conta
20 matches
Mail list logo