Hi Grant,
> but you should have a look at Searcher.explain()
I was half-expecting this answer. :(
The query is very basic and the scoring seems completely arbitrary.
Documents with the same number of ocurrences and (seemingly)
distribution are being given widely different scores.
> Chris Host
: I'm returning the document plus hits.score(i) * 100 but when the
NOTE: the score returned by Hits is not a "percentage" ... it is an
arbitrary number less then 1. it might be the "raw score" of the document
or it might be the result of dividing the "raw score" by the "raw score"
of the high
Hi
I have indexed this html document
=z1
zo zo zo zo zo zo zo zo zo zo zo zo
zo zo zo zo zo zo zo zo zo zo zo zo
zo zo zo zo zo zo zo zo zo zo zo zo
=z2=
zo zo zo zo zo zo zo zo zo zo zo zo
zo zo zo zo
Not sure what UI you are referring to, but you should have a look at
Searcher.explain() for giving you information about why a particular
document scored the way it does
-Grant
On Oct 31, 2007, at 2:14 PM, Tom Conlon wrote:
Hi All,
Query: systems AND 2000
Results:558 total matchi
Hi All,
Query: systems AND 2000
Results:558 total matching documents
I'm returning the document plus hits.score(i) * 100 but when the
relevance is examined in the User interface it doesn't seem to be
working.
E.g. 'rough' feedback in terms of occurences
61.txt 18.356403 100%
31 okt 2007 kl. 15.18 skrev Cool Coder:
Hi Group,
I need to display list of tokens (tags) in my side
those have got maximum occurances in my index. One way I can think
of is to keep track of all tokens during analysis and accordingly
display them. Is there any other way? e.g
On Wednesday 31 October 2007 14:51:12 Tobias Hill wrote:
> My documents all hava a field with variables number of terms
> (but rather few):
> Doc1.field = "foo bar gro"
> Doc2.field = "foo bar gro mot slu"
> Now I would like to search using the terms "foo bar gro"
>
> Problem 1:
> I like to express
Thanks! I also noticed there is a mention of this in the documentation
of Document.getBoost():
"Note: This value is not stored directly with the document in the index.
Documents returned from IndexReader.document(int) and Hits.doc(int) may
thus not have the same value present as when this document
You have no stack trace? Come man...you must be able to get a stack
trace :) There is no way to tell what is causing that very generic
error. I would say though: certainly not the size of your index.
I suppose, out of curiosity, how big is it? I'll bet my two broken
wrists thats not the proble
Hi, Jan,
You really need to be more specific about your configuration and error log.
Lucene surely has been used on many large websites.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.c
Hello to you all!
I've programmed a portlet search solution by using lucene.
Now that our new Website is short before release the file volume
increesed fast.
My lucene based index program works fine until the number of files
incresed so much. Now it works for about 10 minutes and then gives m
Hi Group,
I need to display list of tokens (tags) in my side those have got
maximum occurances in my index. One way I can think of is to keep track of all
tokens during analysis and accordingly display them. Is there any other way?
e.g. if I want to display tokens in order of their
I am not an expert but I think you can solve problem 1 by overriding the
coord function in the similarity class:
1. coord(q,d) is a score factor based on how many of the query terms
are found in the specified document. Typically, a document that contains
more of the query's terms will rece
My documents all hava a field with variables number of terms
(but rather few):
Doc1.field = "foo bar gro"
Doc2.field = "foo bar gro mot slu"
Now I would like to search using the terms "foo bar gro"
Problem 1:
I like to express that at least any two of the three terms
must match. Do I have to const
You can check out the file format of Lucene's term dictionary here:
http://lucene.apache.org/java/docs/fileformats.html#Term%20Dictionary
That might give you some insight.
Lucene does not keep id's for terms that I can tell though...just for
documents...and then the id is really just an offset
If you haven't seen it, a good source for this is here:
http://wiki.apache.org/lucene-java/Support
Though thats not as nice as having people contact you :)
Kiffin Gish wrote:
Hi there!
Currently I am looking for an expert developer/consultant who can assist
my development team with an impleme
I want to have IDs for the terms (words) not the documents!
Also, I need the same ID for a word if it appears in more than one documents.
Example:
Doc1: The sea is blue
Doc2: Sky is blue
For these two docs the dictionary would be [the]->1 [sea]->2 [is]->3
[blue]->4 [sky]->5
So I want to represen
Hi there!
Currently I am looking for an expert developer/consultant who can assist
my development team with an implementation of Lucene for an exciting and
innovative project in Amsterdam, Holland.
This is for a scalable, robust and high-performing web-based system
running in an Java EE environme
Hi,
I understand optimizing could take longer when index is bigger, so it
might take a while when index is huge.
I think I remember seeing something in the lucene list about optimizing but not
to the optimum case, only to a less than optimum state, but using less
time, is that correct?
Does some
The id does change. You need to index your own "id" field with the document.
Ilias Flaounas wrote:
Dear experts,
I need to store and index a string of text into Lucene, and later I
want to get the Id of each term inside this string. Is it possible?
How can I do that?
I want a unique associati
Dear experts,
I need to store and index a string of text into Lucene, and later I
want to get the Id of each term inside this string. Is it possible?
How can I do that?
I want a unique association, term (in my case a word) -> Id. I know,
that If I delete a document, the dictionary changes. Does t
Just to clarify here: yes, you really should have a single JVM with a
single instance of IndexWriter, but use multiple threads calling
IndexWriter.addDocument.
Under the hood, IndexWriter can make use of alot of concurrency, so
you should see a substantial gain in indexing throughput if you use
m
Bruno Dery wrote:
Thanks for the help, you're right your example works. However looking in
Luke I also see only ones (1 1 1) as the document boost.
Then perhaps this value should be removed from the Luke's display ...
because it will always read 1, and it's a correct value (see below).
I
23 matches
Mail list logo