RE: Managing a large archival (and constantly changing) database

2006-07-07 Thread Scott Smith
Thanks to everyone who commented. Clearly, I have a lot to think about, but thanks for the help. Scott -Original Message- From: Rob Staveley (Tom) [mailto:[EMAIL PROTECTED] Sent: Friday, July 07, 2006 2:53 PM To: java-user@lucene.apache.org Subject: RE: Managing a large archival (and co

Re: modify existing non-indexed field

2006-07-07 Thread Doron Cohen
> dan2000 <[EMAIL PROTECTED]> wrote on 07/07/2006 15:03:35: > but if you remove it and add it again, you'll need to re-index it again. > don't you? When you do re-index, you'll have to close the reader, which > would pause the search. Any better way of doint it? INHO yes and no - There's no need

Re: modify existing non-indexed field

2006-07-07 Thread dan2000
but if you remove it and add it again, you'll need to re-index it again. don't you? When you do re-index, you'll have to close the reader, which would pause the search. Any better way of doint it? -- View this message in context: http://www.nabble.com/modify-existing-non-indexed-field-tf1905726.

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Jason Calabrese
We only display the 10 hits at a time, so we don't need to iterate through all the hits. It feels like there should be a way to pull a document out 1 index and stick it into an other and bring all the unstored fields along with it. On Friday 07 July 2006 12:52, Erick Erickson wrote: > Did you

RE: Managing a large archival (and constantly changing) database

2006-07-07 Thread Rob Staveley (Tom)
Aha, OK that makes sense. Likewise James Pine's explanation. Thanks both of you. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 07 July 2006 20:40 To: java-user@lucene.apache.org Subject: RE: Managing a large archival (and constantly changing) database : How ca

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Erick Erickson
Did you use a Hits object to assemble your results? And is that what you're measuring when you say it's slow? In other words, were you measuring the time it took to execute the statement Hits hits = searcher.search(query, new Sort("fullname")); or the time it took to iterate over the Hits object

RE: Managing a large archival (and constantly changing) database

2006-07-07 Thread Chris Hostetter
: How can that be so? When the segments file is re-written it will surely : clobber the copy rather than creating a new INODE, because it has the same : name... wouldn't it? if you take a look at SegmentInfos.java you'll see that an existing segments file is never modified. a new segments file i

Re: Modifying the standard analyzer

2006-07-07 Thread Mark Miller
Thank you so much. I apologize for my ignorance. Mark On 7/7/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : > But ParseException extends IOException, so I don't see a problem there. : I wish my compiler agreed with you:) Which it seems to do until you : rebuild the files with javacc. I saw

Re: Modifying the standard analyzer

2006-07-07 Thread Chris Hostetter
: > But ParseException extends IOException, so I don't see a problem there. : I wish my compiler agreed with you:) Which it seems to do until you : rebuild the files with javacc. I saw at least two other posts about this : problem on the web with no answer given... : This guy also found the same

Re: Modifying the standard analyzer

2006-07-07 Thread Mark Miller
Daniel Naber wrote: On Freitag 07 Juli 2006 16:20, Mark Miller wrote: the javacc generated StandardTokenizer next() method is declared to throw a ParseException final public org.apache.lucene.analysis.Token next() throws ParseException, IOException { unfortunately, org.apache.lucene.anal

Re: Modifying the standard analyzer

2006-07-07 Thread Daniel Naber
On Freitag 07 Juli 2006 16:20, Mark Miller wrote: > the javacc generated StandardTokenizer next() method is declared to > throw a ParseException > >   final public org.apache.lucene.analysis.Token next() throws > ParseException, IOException { > > unfortunately, org.apache.lucene.analysis.Token nex

RE: Nutch- Better than Lucene?

2006-07-07 Thread Wang, Jeff
Heh, you said it better than I. I was just about to reply with the witty "Nutch is Lucene, isn't it?" Jeff -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Friday, July 07, 2006 10:28 AM To: java-user@lucene.apache.org Subject: Re: Nutch- Better than Lucene? :

Re: Searcher performance

2006-07-07 Thread Chris Hostetter
there was a thread discussing the performance differneces just recently... http://www.nabble.com/forum/Search.jtp?forum=44&local=y&query=MultiReader+MultiSearcher : Date: Fri, 7 Jul 2006 16:34:08 +0100 : From: Mike Streeton <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java

Re: Nutch- Better than Lucene?

2006-07-07 Thread Chris Hostetter
: Subject: Nutch- Better than Lucene? : > http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine Asking if Nutch is better then Lucene is like asking if a Truck is better then a Combustion Engine -- you can't compare them. A truck is a vehicle tht does stuff, and it gets it's power from a

Re: Berkeley DB JEDirectory Performance

2006-07-07 Thread Otis Gospodnetic
Thanks Jo. You may want to look for Andi Vajda's email with performance numbers, too. I think he did send them out when he first contributed DbDirectory, and I don't recall the numbers being this bad. Otis - Original Message From: Johannes Christen <[EMAIL PROTECTED]> To: java-user@l

Re: modify existing non-indexed field

2006-07-07 Thread Otis Gospodnetic
Yes, you can do something like that, but of course you have to delete the old Document, and add the new, modified oneto the index, too. I do something like that on one of the Simpy.com indices and it works nicely. Otis - Original Message From: dan2000 <[EMAIL PROTECTED]> To: java-user

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Jason Calabrese
> When you say you keep your documents ordered alphabetically, it's confusing > to me. Are you saying that you pre-sort all your documents then insert them > one after another so that automatically-generated internal Lucene ID maps > exactly to the alphabetical ordering? That is, for any document I

RE: Managing a large archival (and constantly changing) database

2006-07-07 Thread James Pine
--- "Rob Staveley (Tom)" <[EMAIL PROTECTED]> wrote: > Doug says: > > > 1. On the index master, periodically checkpoint > the index. Every minute or > so the IndexWriter is closed and a 'cp -lr index > index.DATE' command is > executed from Java, where DATE is the current date > and time. This > e

Searcher performance

2006-07-07 Thread Mike Streeton
What performs best across multiple indexes: Each index with an IndexReader with an IndexSearcher on top and the searchers linked with a ParallelMultiSearcher Or Each index with an IndexReader linked with a MultiReader and an IndexSearcher on top Many Thanks Mike www.ardentia

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Erick Erickson
When you say you keep your documents ordered alphabetically, it's confusing to me. Are you saying that you pre-sort all your documents then insert them one after another so that automatically-generated internal Lucene ID maps exactly to the alphabetical ordering? That is, for any document IDs D1 a

RE: Lucene search formula

2006-07-07 Thread zheng
Hi, Can somebody explain the lengthNorm, queryNorm and coord in lucene? lengthNorm is the (term freq)/(total terms number) or (term freq)/(max term freq) or something else. queryNorm is the (term squared weight)/(sumOfSqureWeights)? Why we still need queryNorm when it will not affect the score for

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Jason Calabrese
All, I sent this the other day, but didn't get any responses. I'm hoping that it was just missed, so I'm trying again. There has to be a better way to to insert a document in to an index then reindexing everything. --Jason On Wednesday 05 July 2006 5:06 pm, Jason Calabrese wrote: > All, > >

Modifying the standard analyzer

2006-07-07 Thread Mark Miller
I have added support for sent/para prox search by modifying the notspan query. In doing so I have changed the standard analyzer javacc .jj file. Here is my problem: the javacc generated StandardTokenizer next() method is declared to throw a ParseException final public org.apache.lucene.analysis

Re: modify existing non-indexed field

2006-07-07 Thread Erick Erickson
I don't think you've done anything to the index. This code is really equivalent to something like Field field = hits.doc(i).getField('address"); field.set("11 Diana Street"); You've changed the value of the field instance, but that is essentially a local variable (even though not explicit in you

modify existing non-indexed field

2006-07-07 Thread dan2000
Is it possible to modify a stored field but not indexed? for example, if I have a field like this: new Field("address", address, Field.Store.YES, Field.Index.NO) and I want to modify it like this: hits.doc(i).getField("address").set("11 Diana Street"); Is it possible? -- View this message in co

Re: addIndexes getting slower and slower plus eating up Mem

2006-07-07 Thread Dominik Bruhn
Hy, On Friday 07 July 2006 12:23 mark harwood wrote: > Out of interest, why are you using a RAMDirectory here? An IndexWriter uses > one internally of size IndexWriter.setMaxBufferedDocs so you get the > benefits of buffering automatically when writing to a File-based directory. realy? I read the

Re: addIndexes getting slower and slower plus eating up Mem

2006-07-07 Thread mark harwood
The answer is because addIndexes() currently always does an optimize post-merge. If I recall correctly optimize() will create a complete copy of the existing index during the optimize process then delete the old one so this shouldn't be done too often. Out of interest, why are you using a RAMDi

addIndexes getting slower and slower plus eating up Mem

2006-07-07 Thread Dominik Bruhn
Hy, I use the following code to index about 1 Million Documents to a empty index: = private static void do_searchindex(Connection target) throws SQLException,IOException { int i=1164; PostIndexer.createIndexDir(); //Creates Index-Director

Re: Lucene search formula

2006-07-07 Thread Aleksander M. Stensby
I have written a paper about Topic Detection and Tracking, where I also explain the TF-IDF-scheme. If you like, i can send you the paper. Aleksander On Fri, 07 Jul 2006 04:46:52 +0200, Rajiv Roopan <[EMAIL PROTECTED]> wrote: Hello, I was recently looking thru the lucene in action book

RE: Managing a large archival (and constantly changing) database

2006-07-07 Thread Rob Staveley (Tom)
I should probably direct this to Doug Cutting, but following that thread I come to Doug's post at http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12709.html . Doug says: > 1. On the index master, periodically checkpoint the index. Every minute or so the IndexWriter is closed and a

Nutch- Better than Lucene?

2006-07-07 Thread Sarvadnya Mutalik
> http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine