hi list,
i'm trying to use Lucene (1.4.3) to replace an existing MySQL search system.
so far, this is working great, but i have a couple of questions.
firstly, when my index updater is (re)indexing a lot of documents at once, i
often get errors like
"FileNotFoundException: /usr/local/searchin
: ..but this means, that the scores are not comparable across queries,
: because a hit with the score '0.7' from one query mustn't be as 'good' as
: a '0.7' from another query...and this is only the case, whether the original,
: unnormalized top score value was less than 1.0.
Scores are not compa
1) Yes. One location per document.
2) Using the SimpleAnalyzer (for now). I have city, state and country as
separate fields, so I could tokenize each as a single token if that
would work better. I think that avoids the need for a delimiter at index
time.
3) I am not making any assumptions now at
this code works in a couple other boxes as is.that deleting code removes
the active index after this one builds in a different location. then the
searcher is told to make this newest one the current and the old one is
deleted. it effects directories and their entire contents. it would
: Its still not keeping the segments file around. Is that necessary?
You seem to have some code at the end that (i'm guess) is supposed to
remove older copies of the index. Are you sure that code does what you
think it does? Have you tried commenting it out and seeing if that fixes
your pro
Dmitry Goldenberg wrote:
Hi,
I'm trying to figure out a way to locate tokens which include special characters. The actual text in the file being indexed is something like "function() { statement1; statement2; }"
The query I'm using is "function\()" since I want to locate precisely "function
Hi,
I'm trying to figure out a way to locate tokens which include special
characters. The actual text in the file being indexed is something like
"function() { statement1; statement2; }"
The query I'm using is "function\()" since I want to locate precisely
"function()" - the query succeeds
Few questions.
(1) Does each document contain only one geographical location?
(2) Given a document, how are you tokenizing it into city, state and
country? I am assuming "," as the delimiter here. Otherwise determining the
boundary for names like "St. Louis du Ha Ha" would be difficult.
(3) Are t
The reason I only want 2 hits is because [2] is more "specific" in my
domain -- I could also have Toronto, Ontario; Kingston, Ontario etc.
which would take the hits up to 5 now.
What I'm really after is finding a way to index and search that would
make [2] an invalid retrieval.
My latest attempt
Hi Colin,
Even assuming you came up with a good way of indexing, the
example query "Ontario, CA" should yield 3 hits. All 2, 3 and 4 are
valid retrievals. Could you please justify which 2 hits you want and why?
Thanks,
Rajesh Munavalli
Colin Young wrote:
I'm having some trouble comi
Hi Colin,
Even assuming you came up with a good way of indexing, the example
query "Ontario, CA" should yield 3 hits. All 2, 3 and 4 are valid
retrievals. Could you please justify which 2 hits you want and why?
Thanks,
Rajesh Munavalli
On 1/27/06, Colin Young <[EMAIL PROTECTED]> wrote:
>
Daniel Pfeifer wrote:
Are we both talking about Lucene? I am using Lucene 1.4.3 and can't find
a class called MapDirectory or MMapDirectory.
It is post-1.4.
You can download a nightly build of the current trunk at:
http://cvs.apache.org/dist/lucene/java/nightly/
Doug
---
I'm having some trouble coming up with a good search strategy for geographical
data. e.g., given:
[1] city: London, United Kingdom
[2] city: London, Ontario, Canada
[3] city: Ontario, California, United States
[4] state: Ontario, Canada
[5] city: Vancouver, Washington, United States
[6] city: Va
The lucene info is:
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.6.1
Created-By: Apache Jakarta
Name: org/apache/lucene
Specification-Title: Lucene Search Engine
Specification-Version: 1.4.3
Specification-Vendor: Lucene
Implementation-Title: org.apache.lucene
Implementation-Version: build 2004-
On 1/27/06, Chris Lamprecht <[EMAIL PROTECTED]> wrote:
> Actually, I just looked at the code, and it actually does this by
> taking 1/maxScore and then multiplying this by each score (equivalent
> results in the end, maybe more efficient(?)).
Very much so... fdiv commonly takes 20 to 40 clock cycl
..but this means, that the scores are not comparable across queries,
because a hit with the score '0.7' from one query mustn't be as 'good' as
a '0.7' from another query...and this is only the case, whether the original,
unnormalized top score value was less than 1.0.
Looks this really like a fea
Are we both talking about Lucene? I am using Lucene 1.4.3 and can't find
a class called MapDirectory or MMapDirectory.
/Daniel
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: den 27 januari 2006 11:43
To: java-user@lucene.apache.org
Subject: [SPAM] - Re: Performance
petite_abeille wrote:
I would love to see this. I presently have a somewhat unwieldy
conversion table [1] that I would love to get ride of :))
[snip]
[1] http://dev.alt.textdrive.com/browser/lu/LUStringBasicLatin.txt
I've attached the perl script -- feed
http://www.unicode.org/Public/4.1.0/u
Daniel Pfeifer wrote:
We are sporting Solaris 10 on a Sun Fire-machine with four cores and
12GB of RAM and mirrored Ultra 320-disks. I guess I could try switching
to FSDirectory and hope for the best.
Or, since you're on a 64-bit platform, try MMapDirectory, which supports
greater parallelism
hi,
thank you for your help.
On 1/27/06, Chris Lamprecht <[EMAIL PROTECTED]> wrote:
>
> It takes the highest scoring document, if greater than 1.0, and
> divides every hit's score by this number, leaving them all <= 1.0.
> Actually, I just looked at the code, and it actually does this by
> takin
On Friday 27 January 2006 02:36, Chun Wei Ho wrote:
> Thanks for the info :) One last related question.
>
> If I delete documents using a IndexReader(), can I assume that the
> internal document numbers of other undeleted documents (obtained using
> the same IndexReader instance) will not change u
Well,
We are sporting Solaris 10 on a Sun Fire-machine with four cores and
12GB of RAM and mirrored Ultra 320-disks. I guess I could try switching
to FSDirectory and hope for the best.
-Original Message-
From: Chris Lamprecht [mailto:[EMAIL PROTECTED]
Sent: den 27 januari 2006 08:50
To:
It takes the highest scoring document, if greater than 1.0, and
divides every hit's score by this number, leaving them all <= 1.0.
Actually, I just looked at the code, and it actually does this by
taking 1/maxScore and then multiplying this by each score (equivalent
results in the end, maybe more
23 matches
Mail list logo