date:20061212

Advice on 3NF Data Structures and Lucene Please

2006-12-12 Thread Andrew Hughes

Hey All, I am very interested in indexing a 3NF Data Structure. Is there any advice that someone can provide with this? From what I have seen Lucene is typically a flat "First Normal Form" (Flat) data structure The only way I can see to combine the relational links between multiple indexe

Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Doron Cohen

"Karl Koch" <[EMAIL PROTECTED]> wrote: > For the documents Lucene employs > its norm_d_t which is explained as: > > norm_d_t : square root of number of tokens in d in the same field as t Actually (by default) it is: 1 / sqrt(#tokens in d with same field as t) > basically just the square root

Re: How to delete partial index

2006-12-12 Thread spinergywmy

Hi, I manage to delete the document based on term, but that is just 1 part. I wonder do lucene support how I can pull out the info that I have been indexed and place it into other index file. Is it the only way that I have to use indexwriter to perform indexing again with all the necessary fie

Re: How to delete partial index

2006-12-12 Thread Erick Erickson

you have to search against something known. You simply (as has been mentioned many times) cannot rely on the document IDs. So, I'd store the full path (untokenized) of the file. When you move a file, search for the path in the appropriate field in your index that the file was originally stored in

Re: How to delete partial index

2006-12-12 Thread spinergywmy

Hi, When I perform delete document and delete document based on the Id, does the Id is the unique key and by deleting based on the Id, all the related info will be deleted as well? If so, how can I know the document Id? Thanks. regards, Wooi Meng -- View this message in context: http://www

Re: How to delete partial index

2006-12-12 Thread spinergywmy

Hi, I m just wondering is there any unique key that I can use to delete particular document? How can I check the postion of a particular document inside index file? Is there any example that I can refer to on how to delete documents by a term. For second scenario, the reason why I m doing

RE: de-boosting fields

2006-12-12 Thread Scott Smith

I've implemented the zero boost solution and it seems to be doing what I want. Thanks to everyone who had suggestions. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, December 11, 2006 11:45 AM To: java-user@lucene.apache.org Subject: Re: de-boosting fiel

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe

Karl Koch wrote: > Is there any other paper that actually shows the benefit of doing > this particular normalisation with coord_q_d? I am not suggesting > here that it is not useful, I am just looking for evidence how the > idea developed. I think it's a mischaracterization to call coordination a

Re: java requirements for lucene

2006-12-12 Thread Chris Hostetter

it appears that you may have multiple copies of hte lucene code base in your class path. : $ java org.apache.lucene.demo.IndexFiles ../data/medline/docs/ : Indexing to directory 'index'... : adding ../data/medline/docs/1.txt : Exception in thread "main" java.lang.IncompatibleClassChangeError: fie

Re: How to delete partial index

2006-12-12 Thread Doron Cohen

spinergywmy <[EMAIL PROTECTED]> wrote: > > Hi, > >I have ask this question before but may be the question wasn't clear. > >How can I delete particular index that I want to and keep the rest? For > instance, I have been indexed document Id, date, user Id and contents, my > question is does t

Re: Indexing large files

2006-12-12 Thread Otis Gospodnetic

Hi, Yes, you can't get to the "stored" field content in your Hits because you are using a (File)Reader. Otis - Original Message From: abdul aleem <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, December 12, 2006 8:22:54 AM Subject: Indexing large files Hi There, I

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Karl Koch

Hello Steven, I looked up the paper and read the relevant part. The text quote you provided is from the introcution. I belief that quote referes to the basic purpose of an information retrieval system in general. At least to the purpose of a vector space model IR system. If this is the theore

Re: search by field, not field value

2006-12-12 Thread Erick Erickson

Try this. It returns Found a term Austria in doc 7 Found a term Botswana in doc 6 Found a term New in doc 3 Found a term Tennessee, in doc 1 Found a term US in doc 0 Found a term US in doc 1 Found a term US in doc 3 Found a term US in doc 4 Found a term Utah, in doc 4 Found a term Virginia, in do

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe

Karl Koch wrote: > The coord(q,d) normalisation is "a score factor based on how many of > the query terms are found in the specified document." and described > here: > > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord > > Does this have a theoretical

RE: search by field, not field value

2006-12-12 Thread Koji Sekiguchi

Erick, Sorry for replying to a bit old topic. > TermDocs.seek(new Term("specific_field", "")); > > Note that the "" as the value of the term gets all the terms. Then use > TermDocs.next until it returns false. At each point, TermDocs.doc() will > give you the Lucene ID of a document containing t

Re: lucene search

2006-12-12 Thread Erick Erickson

Well. searching documents for text is what Lucene is *made* for . So, yes, this would be a fine thing to use Lucene for. You'll have to deal with coordinating between when a document is added to the directory and when it's added to the Lucene index. Also, give some thought to what form your docum

lucene search

2006-12-12 Thread Bloem, E.J.W. van $Erik, Student CS$

Hi, I am building a portal where users are able to maintain a personal doc, placed in a database or dir on server. I want the users to be able to search all other users doc's for keywords. Like give me a top ten of documents containing the work bike. Is Lucene useful for this? Or do you

Re: Lucene scoring: Term frequency normalisation

2006-12-12 Thread Marvin Humphrey

On Dec 12, 2006, at 2:23 AM, Karl Koch wrote: However, what exactly is the advantage of using sqare root instead of log? Speaking anecdotally, I wouldn't say there's an advantage. There's a predictable effect: very long documents are rewarded, since the damping factor is not as strong.

java requirements for lucene

2006-12-12 Thread Miles Efron

i have successfully compiled, installed, and run lucene-based applications on several machines, but i am currently trying to get lucene to run on a sever that i do not administer and am having an odd problem... perhaps someone can decipher it? if i try, for instance, to run the basic lucene de

Indexing large files

2006-12-12 Thread abdul aleem

Hi There, I have been working with Lucene API for the past 1 day we are in the process of building a log viewer tool, this is how the log file looks [2006-12-11 01:52:40.179] [lon0571xus] [DEBUG] [TIE heartbeat monitor (monitor.heartbeat.fxstreamrates)] [unknown] [] [] ActiveRateServerIdList -

Complex query filtering

2006-12-12 Thread Maxim Patramanskij

I need to apply a set of custom filters to my query. One of the filters, which optionally can be applied, is a filter by date range. For the moment I'm using a BooleanQuery approach for this. I know that it is not the best from the score accuracy nor performance point of view and I want change th

Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Soeren Pekrul

Hello Karl, I’m very interested in the details of Lucene’s scoring as well. Karl Koch wrote: For this reason, I do not understand why Lucene (in version 1.2) normalises the query(!) with norm_q : sqrt(sum_t((tf_q*idf_t)^2)) which is also called cosine normalisation. This is a technique that

Lucene scoring: coord_q_d factor

2006-12-12 Thread Karl Koch

Hello group, The coord(q,d) normalisation is "a score factor based on how many of the query terms are found in the specified document." and described here: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord Does this have a theoretical base? On what b

Lucene scoring: Term frequency normalisation

2006-12-12 Thread Karl Koch

Hi, I have a question about the current Lucene scoring algoritm. In this scoring algorithm, the term frequency is calcualted by using the square root of the number of occuring terms as described in http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_tf Havi

Re: Lucene id generation

2006-12-12 Thread Waheed Mohammed

Thanks for the instant reply, I see what rajesh advises is something lilke what MultiReader does. That would be my last approach becouse of the complexities it will introduce in developing the business case I have. Any thing other than that would be a appriciable ppointer On Monday 11 December 2

Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Karl Koch

Hello Doron (and all the others who read here):), thank you for your effort and your time. I really appreciate it. :) I understand why normalisation is done in general. Mainly, to normalise the bias of oversized documents. In the literature I have read so far, there is usually a high effort on

How to delete partial index

2006-12-12 Thread spinergywmy

Hi, I have ask this question before but may be the question wasn't clear. How can I delete particular index that I want to and keep the rest? For instance, I have been indexed document Id, date, user Id and contents, my question is does that particular contents will be deleted if I just sp

Advice on 3NF Data Structures and Lucene Please

Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

Re: How to delete partial index

Re: How to delete partial index

Re: How to delete partial index

Re: How to delete partial index

RE: de-boosting fields

Re: Lucene scoring: coord_q_d factor

Re: java requirements for lucene

Re: How to delete partial index

Re: Indexing large files

Re: Lucene scoring: coord_q_d factor

Re: search by field, not field value

Re: Lucene scoring: coord_q_d factor

RE: search by field, not field value

Re: lucene search

lucene search

Re: Lucene scoring: Term frequency normalisation

java requirements for lucene

Indexing large files

Complex query filtering

Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

Lucene scoring: coord_q_d factor

Lucene scoring: Term frequency normalisation

Re: Lucene id generation

Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

How to delete partial index

27 matches

Site Navigation

Mail list logo

Footer information