date:20071031

RE: Hits.score mystery

2007-10-31 Thread Tom Conlon

Hi Grant, > but you should have a look at Searcher.explain() I was half-expecting this answer. :( The query is very basic and the scoring seems completely arbitrary. Documents with the same number of ocurrences and (seemingly) distribution are being given widely different scores. > Chris Host

Re: Hits.score mystery

2007-10-31 Thread Chris Hostetter

: I'm returning the document plus hits.score(i) * 100 but when the NOTE: the score returned by Hits is not a "percentage" ... it is an arbitrary number less then 1. it might be the "raw score" of the document or it might be the result of dividing the "raw score" by the "raw score" of the high

problem undestanding the hits.score

2007-10-31 Thread Jamal jamalator

Hi I have indexed this html document =z1 zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo =z2= zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo zo

Re: Hits.score mystery

2007-10-31 Thread Grant Ingersoll

Not sure what UI you are referring to, but you should have a look at Searcher.explain() for giving you information about why a particular document scored the way it does -Grant On Oct 31, 2007, at 2:14 PM, Tom Conlon wrote: Hi All, Query: systems AND 2000 Results:558 total matchi

Hits.score mystery

2007-10-31 Thread Tom Conlon

Hi All, Query: systems AND 2000 Results:558 total matching documents I'm returning the document plus hits.score(i) * 100 but when the relevance is examined in the User interface it doesn't seem to be working. E.g. 'rough' feedback in terms of occurences 61.txt 18.356403 100%

Re: Best way to count tokens

2007-10-31 Thread Karl Wettin

31 okt 2007 kl. 15.18 skrev Cool Coder: Hi Group, I need to display list of tokens (tags) in my side those have got maximum occurances in my index. One way I can think of is to keep track of all tokens during analysis and accordingly display them. Is there any other way? e.g

Re: 2/3 of terms matched + coverage filter

2007-10-31 Thread Paul Elschot

On Wednesday 31 October 2007 14:51:12 Tobias Hill wrote: > My documents all hava a field with variables number of terms > (but rather few): > Doc1.field = "foo bar gro" > Doc2.field = "foo bar gro mot slu" > Now I would like to search using the terms "foo bar gro" > > Problem 1: > I like to express

RE: Document boost, is it working?

2007-10-31 Thread Bruno Dery

Thanks! I also noticed there is a mention of this in the documentation of Document.getBoost(): "Note: This value is not stored directly with the document in the index. Documents returned from IndexReader.document(int) and Hits.doc(int) may thus not have the same value present as when this document

Re: Problems while indexing

2007-10-31 Thread Mark Miller

You have no stack trace? Come man...you must be able to get a stack trace :) There is no way to tell what is causing that very generic error. I would say though: certainly not the size of your index. I suppose, out of curiosity, how big is it? I'll bet my two broken wrists thats not the proble

Re: Problems while indexing

2007-10-31 Thread Chris Lu

Hi, Jan, You really need to be more specific about your configuration and error log. Lucene surely has been used on many large websites. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.c

Problems while indexing

2007-10-31 Thread Jan F.

Hello to you all! I've programmed a portlet search solution by using lucene. Now that our new Website is short before release the file volume increesed fast. My lucene based index program works fine until the number of files incresed so much. Now it works for about 10 minutes and then gives m

Best way to count tokens

2007-10-31 Thread Cool Coder

Hi Group, I need to display list of tokens (tags) in my side those have got maximum occurances in my index. One way I can think of is to keep track of all tokens during analysis and accordingly display them. Is there any other way? e.g. if I want to display tokens in order of their

Re: 2/3 of terms matched + coverage filter

2007-10-31 Thread Donna L Gresh

I am not an expert but I think you can solve problem 1 by overriding the coord function in the similarity class: 1. coord(q,d) is a score factor based on how many of the query terms are found in the specified document. Typically, a document that contains more of the query's terms will rece

2/3 of terms matched + coverage filter

2007-10-31 Thread Tobias Hill

My documents all hava a field with variables number of terms (but rather few): Doc1.field = "foo bar gro" Doc2.field = "foo bar gro mot slu" Now I would like to search using the terms "foo bar gro" Problem 1: I like to express that at least any two of the three terms must match. Do I have to const

Re: Get term id from dictionary

2007-10-31 Thread Mark Miller

You can check out the file format of Lucene's term dictionary here: http://lucene.apache.org/java/docs/fileformats.html#Term%20Dictionary That might give you some insight. Lucene does not keep id's for terms that I can tell though...just for documents...and then the id is really just an offset

Re: Looking for a keen Lucene developer/consultant ...

2007-10-31 Thread Mark Miller

If you haven't seen it, a good source for this is here: http://wiki.apache.org/lucene-java/Support Though thats not as nice as having people contact you :) Kiffin Gish wrote: Hi there! Currently I am looking for an expert developer/consultant who can assist my development team with an impleme

Re: Get term id from dictionary

2007-10-31 Thread Ilias Flaounas

I want to have IDs for the terms (words) not the documents! Also, I need the same ID for a word if it appears in more than one documents. Example: Doc1: The sea is blue Doc2: Sky is blue For these two docs the dictionary would be [the]->1 [sea]->2 [is]->3 [blue]->4 [sky]->5 So I want to represen

Looking for a keen Lucene developer/consultant ...

2007-10-31 Thread Kiffin Gish

Hi there! Currently I am looking for an expert developer/consultant who can assist my development team with an implementation of Lucene for an exciting and innovative project in Amsterdam, Holland. This is for a scalable, robust and high-performing web-based system running in an Java EE environme

optimizing only during certain time

2007-10-31 Thread jm

Hi, I understand optimizing could take longer when index is bigger, so it might take a while when index is huge. I think I remember seeing something in the lucene list about optimizing but not to the optimum case, only to a less than optimum state, but using less time, is that correct? Does some

Re: Get term id from dictionary

2007-10-31 Thread Mark Miller

The id does change. You need to index your own "id" field with the document. Ilias Flaounas wrote: Dear experts, I need to store and index a string of text into Lucene, and later I want to get the Id of each term inside this string. Is it possible? How can I do that? I want a unique associati

Get term id from dictionary

2007-10-31 Thread Ilias Flaounas

Dear experts, I need to store and index a string of text into Lucene, and later I want to get the Id of each term inside this string. Is it possible? How can I do that? I want a unique association, term (in my case a word) -> Id. I know, that If I delete a document, the dictionary changes. Does t

RE: Threading Indexing Processes : Can we write concurrently to Index?

2007-10-31 Thread Michael McCandless

Just to clarify here: yes, you really should have a single JVM with a single instance of IndexWriter, but use multiple threads calling IndexWriter.addDocument. Under the hood, IndexWriter can make use of alot of concurrency, so you should see a substantial gain in indexing throughput if you use m

Re: Document boost, is it working?

2007-10-31 Thread Andrzej Bialecki

Bruno Dery wrote: Thanks for the help, you're right your example works. However looking in Luke I also see only ones (1 1 1) as the document boost. Then perhaps this value should be removed from the Luke's display ... because it will always read 1, and it's a correct value (see below). I

RE: Hits.score mystery

Re: Hits.score mystery

problem undestanding the hits.score

Re: Hits.score mystery

Hits.score mystery

Re: Best way to count tokens

Re: 2/3 of terms matched + coverage filter

RE: Document boost, is it working?

Re: Problems while indexing

Re: Problems while indexing

Problems while indexing

Best way to count tokens

Re: 2/3 of terms matched + coverage filter

2/3 of terms matched + coverage filter

Re: Get term id from dictionary

Re: Looking for a keen Lucene developer/consultant ...

Re: Get term id from dictionary

Looking for a keen Lucene developer/consultant ...

optimizing only during certain time

Re: Get term id from dictionary

Get term id from dictionary

RE: Threading Indexing Processes : Can we write concurrently to Index?

Re: Document boost, is it working?

23 matches

Site Navigation

Mail list logo

Footer information