Re: Extracting data from Lucene index files

2006-12-20 Thread Doron Cohen
Using term vectors means passing on the terms too many times - i.e - loop on terms - - loop on docs of a term - - - loop on terms of a doc Would something like this be better: do { System.out.println(tenum.term()+" appears in "+tenum.docFreq()+" docs!"); TermDocs td = reader.termDo

Re: sorting by per doc hit count

2006-12-20 Thread Chris Hostetter
: this thread that Hoss's solution was perfect and I indeed was able to add a : new dynamically changeable Term frequency relevance scoring system. The Cool ... "PINE is my IDE." -Hoss - To unsubscribe, e-mail: [EMAIL PROTE

Re: Rebuilding index on a regular basis

2006-12-20 Thread Patrick Turcotte
Hi, How about this: 1) You copy the files that make your index in a new folder 2) You update your index in that new folder (forcing if necessary, old locks will not be valid) 3) When update is completed, close your readers, and open them on the new index. 4) Copy the fresh index files to the pre

Re: JAVA JVM Question

2006-12-20 Thread Simon Willnauer
OOM Errors are not uncommon during redeployment on application server e.g. servlet container. Redeploy on Tomcat servers very often cause OOM due to the perm gen space which get not GCed(that should go away with 5.5). The JBoss can usually deal with these issue but just in case you could check you

Stefan Raspl/Germany/IBM is out of the office.

2006-12-20 Thread Stefan Raspl
I will be out of the office starting 12/21/2006 and will not return until 01/02/2007. I will respond to your message when I return. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: JAVA JVM Question

2006-12-20 Thread Otis Gospodnetic
Are you using 2.1-dev version of Lucene? Try the latest nightly build, it as a fix for a certain OOM bug (see LUCENE-754). Otis - Original Message From: Van Nguyen <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, December 20, 2006 6:39:58 PM Subject: JAVA JVM Questio

JAVA JVM Question

2006-12-20 Thread Van Nguyen
I have an index that's approximately 875MB. I'm using JBoss Application Server 4.04 w/ Apache HTTP Server 2.2. My min/max JVM size is: 128MB/512MB. On initial startup, everything works fine. I'm able to search (although it takes a while doing the first search because it's loading the index into

Re: First search is slow after updating index .. subsequent searches very fast

2006-12-20 Thread Otis Gospodnetic
To populate FieldCache, the number of matches doesn't matter. There is no need to be scrimy there - you don't really save anything by running a query that matches only a few docs. Just run something that looks like a common query. For warming up new indices, one can also use the `dd' trick und

RE: First search is slow after updating index .. subsequent searches very fast

2006-12-20 Thread Bryan Dotzour
One question about this, Otis... When "warming up" the new searcher, should the query return a lot of results, or does it matter? Can I just do like an ID = X query and get one document back? Is that sufficient or is it better to run a query that will get lots of hits? Thanks again, Bryan -

RE: First search is slow after updating index .. subsequent searches very fast

2006-12-20 Thread Bryan Dotzour
Sounds like a possibility Otis, I know we are indeed using sort other than the default. I'll try out your suggestion. Thanks! Bryan -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 20, 2006 3:28 PM To: java-user@lucene.apache.org Subject: Re

Re: First search is slow after updating index .. subsequent searches very fast

2006-12-20 Thread Otis Gospodnetic
All sounds good. Opening a new IndexReader can take a bit of time. If you use sorting of any kind other than default sorting by relevance, this delay on the first search is also probably caused by the lazy FieldCache population. The cure for that is to open a new IndexReader/Searcher before y

Re: sorting by per doc hit count

2006-12-20 Thread Mark Miller
Mr. Hostetter, you are a Godsend. Just wanted to report to anyone following this thread that Hoss's solution was perfect and I indeed was able to add a new dynamically changeable Term frequency relevance scoring system. The value of such a thing may not be high, but man do I love Lucene for making

First search is slow after updating index .. subsequent searches very fast

2006-12-20 Thread Bryan Dotzour
I'm investigating some performance issues with the way we're using Lucene in our web app and am interested if anyone could shed some light on what might be going on. Hopefully I can provide enough information, please let me know if there's more I can give. We're using Lucene 2.0.0 and I'm curr

Re: Help with jump from 1.4.3 to 2.0.0

2006-12-20 Thread JT Kimbell
I figured it out. Gopi asked me some questions that got me searching and it turns out my JVM wasn't 1.5.06, it was 1.4.2. I grabbed the newest version and made it the default JVM and now I no longer have the problem. Thanks a bunch for your help Gopi. JT JT Kimbell wrote: > > I've sent the

RE: giving different boost to different terms in a same document

2006-12-20 Thread Michael Rusch
It's definitely my understanding that this is not possible. Maybe somebody can give you a hardcore way of doing it by subclassing one of the classes involved in indexing, but I'm too green for that :) One solution that may or may not work depending on how specific you want to get is that you can

Re: giving different boost to different terms in a same document

2006-12-20 Thread Eun Yong Kang
Yes I want to do boost in indexing time. But I want to do boost for terms instead of fields. I want to give different weight for different terms even if the field of two terms are same. For example, doc A contains field1 : term1 (weight C) field1 : term2 (weight F) I want to give diffe

Re: Rebuilding index on a regular basis

2006-12-20 Thread Erick Erickson
Why not switch where the searchers look rather than copy the index and restart? That is, your searcher is pointing at index1, and you build the new one in a a new dir (index2). On some signal, your server closes the searcher pointing to index1 and opens one pointing to index2 and uses that until t

Re: to boost or not to boost

2006-12-20 Thread Daniel Naber
On Wednesday 20 December 2006 17:32, Martin Braun wrote: > so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should > get a boost of 1.1975 . The boost is stored with a limited resolution. Try boosting one doc by 10, the other one by 20 or something like that. Regards Daniel -

Re: Rebuilding index on a regular basis

2006-12-20 Thread Scott Sellman
Note: I have changed the title of this thread to match its content I am currently facing a similar issue. I am dealing with a large index that is constantly used and needs to be updated on a daily basis. For fear of corruption I would rather rebuild the index each time, performing tests against

to boost or not to boost

2006-12-20 Thread Martin Braun
Hello all, I am trying to boost more recent Docs, i.e. Docs with a greater year Value like this: if (title.getEJ() != null) { titleDocument.setBoost(new Float("1." + title.getEJ())); } so a doc from 1973 should get a boost of 1.1973 and a do

RE: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

2006-12-20 Thread Scott Sellman
>Please try using the MultiFieldQueryParser's constructor, not the static >>method. I think that might fix your problem. Yes, after I created a new MultiFieldQueryParser and calling the parse( String query) method my search executed as expected. Thanks for your help! Scott >> BooleanClause.O

Re: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

2006-12-20 Thread Erick Erickson
My first question is how many documents would you be deleting on a pass for option 2? If it's 10 documents out of 10,000, I'd consider just deleting them and re-adding (see IndexModifier). Personally, if posible, I prefer your first option, building a completely new index and switching between th

RE: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

2006-12-20 Thread Adam Fleming
Hello Gentlemen (+Ladies?), I'm integrating Lucene into a Spring web-app, and have found a plethora of great web + print resources to make the integration quick and seamless. One thing that I have been hard-pressed to find is a good solution for rebuilding the index on a regular basis. I'm

Re: Help with jump from 1.4.3 to 2.0.0

2006-12-20 Thread JT Kimbell
I've sent the code your way. I'm downloading eclipse right now so I can step through with its debugger once I get it all set up. However, I don't think I am using the same index for each of them, as this is all actually on 3 different machines. Machine A has 1.4.3 and I wrote that code on tha

Re: giving different boost to different terms in a same document

2006-12-20 Thread Erick Erickson
I don't think you want to do this at index time, but rather search time. Quoting from Hoss (?)... Index time field boosts are a way to express things like "this documents title is worth twice as much as the title of most documents". Query time boosts are a way to express "I care about matches on

Re: Help with jump from 1.4.3 to 2.0.0

2006-12-20 Thread Gopikrishnan Subramani
All I could suspect is perhaps you are trying to add documents to an index that was originally created using Lucene 1.4.3. If trying to create a fresh index doesn't work, you could send me your indexer code so I can take a look. -Gopi On 12/19/06, JT Kimbell <[EMAIL PROTECTED]> wrote: Hi,

giving different boost to different terms in a same document

2006-12-20 Thread Eun Yong Kang
Hi, I am trying to figure out how to give different weights to different terms in a same document. Anybody knows how to do this? For example, doc A contains field1 : term1 (weight C) field1 : term2 (weight F) If I use setBoost(float) function in the Field Object, I cannot give differ

Re: sorting by per doc hit count

2006-12-20 Thread Chris Hostetter
: : problem reamins that I would like to be able to switch between the hits : per doc Similarity and the default Similarity on any given search. I : was hoping that I could index with DefaultSimilarity and store the norms : for normal relevancy searching. Then I would need to ignore or make : cons