Re: Search for docs containing only a certain word in a specified field?

2007-04-27 Thread Kun Hong
karl wettin wrote: 27 apr 2007 kl. 14.11 skrev Erik Hatcher: On Apr 27, 2007, at 6:39 AM, karl wettin wrote: 27 apr 2007 kl. 12.36 skrev Erik Hatcher: Unless someone has some other tricks I'm not aware of, that is. I guess it would be possible to add start/stop-tokens such as ^ and $ to

Re: How to index a lot of fields (without FileNotFoundException: Too many open files)

2007-04-27 Thread Doron Cohen
Just in case norms info cannot be spared, note that since Lucene 2.1 norms are maintained in a single file, no matter how many fields there are. However due to a bug in 2.1 this did not prevent the too many open files problem. This bug was already fixed but not yet released. For more details on th

Re: Index sync up

2007-04-27 Thread Tony Qian
Erick, Thanks for your explaination. I thought using HitCollector. The search interface we are facing now actually is pretty simple. One of the search requires maximum of search results of 500 and page size is 500 (basically return first 500). Second one requires max of 250 and page size is 25

Re: Search for docs containing only a certain word in a specified field?

2007-04-27 Thread karl wettin
27 apr 2007 kl. 14.11 skrev Erik Hatcher: On Apr 27, 2007, at 6:39 AM, karl wettin wrote: 27 apr 2007 kl. 12.36 skrev Erik Hatcher: Unless someone has some other tricks I'm not aware of, that is. I guess it would be possible to add start/stop-tokens such as ^ and $ to the indexed text:

Re: How to index a lot of fields (without FileNotFoundException: Too many open files)

2007-04-27 Thread Chris Hostetter
: >From what I read in the Lucene docs, these .f files store the : normalization factor for the corresponding field. What exactly is this : used for and more importantly, can this be disabled so that the files : are not created in the first place? field norms are primarily used for length normali

Re: lucene indexes back up strategies

2007-04-27 Thread Chris Hostetter
: > Wow, I did not know Lucene 2.1 can do all of this. The problem is that I'm : > currently using 2.0. Is there something similar to what you just mentioned : > in dealing with 2.0 indexes--backing up piecewise? Thanks again. : : Hmm, OK. Pre-2.1 Lucene will overwrite at least the file "segmen

Re: filter caching

2007-04-27 Thread Chris Hostetter
: I have a question about filter caching. I have a lot of QueryFilters : that I use when searching that filter on a single field. Sometimes : alone I use them by themselves, but mostly I use them in some : combination using ChainedFilter. Does the caching take advantage of : only the final filte

Re: Snowball and accents filter...?

2007-04-27 Thread Chris Hostetter
: In order to do this, we tried subclassing the SnowballAnalyzer... it : doesn't work yet, though. Here is the code of our custom class: At first glance, what youv'e got seems fine, can you elaborate on what you mean by "it doesn't work" ? Perhaps the issue is that the SnowballStemmer can't hand

Re: Customizing scoring

2007-04-27 Thread Chris Hostetter
: If a BooleanQuery is created as the addition of two TermQuery ... : The score for this BooleanQuery is double (around 6) when the compared : document has the field ?pets? with these two values, but we want that : the score is only 3, although there is more than one coincidence. .

How to index a lot of fields (without FileNotFoundException: Too many open files)

2007-04-27 Thread pbm-rico
Hello, What would be the best strategy to support an index with thousands or even hundreds of thousands of individual field names? I have client applications that create a lot of key/value type data. I use the key as document field name so I end up with _a lot_ of .f files and eventually the my

Re: lucene indexes back up strategies

2007-04-27 Thread Michael McCandless
"larry hughes" <[EMAIL PROTECTED]> wrote: > Wow, I did not know Lucene 2.1 can do all of this. The problem is that I'm > currently using 2.0. Is there something similar to what you just mentioned > in dealing with 2.0 indexes--backing up piecewise? Thanks again. Hmm, OK. Pre-2.1 Lucene will

Re: Sorting with custom SortComparator

2007-04-27 Thread Theodan
Never mind. The sorting was working correctly. I was just misinterpretting the results I was seeing. -Theo Theodan wrote: > > Hello. > > I am trying to sort my query results on a String field called "AssetType" > and then on the relevancy score, but I need a particular ordering of the > po

Re: lucene indexes back up strategies

2007-04-27 Thread larry hughes
Thanks Mike, Wow, I did not know Lucene 2.1 can do all of this. The problem is that I'm currently using 2.0. Is there something similar to what you just mentioned in dealing with 2.0 indexes--backing up piecewise? Thanks again. LH Michael McCandless-3 wrote: > > > > "larry hughes" <[EMAI

Re: lucene indexes back up strategies

2007-04-27 Thread Michael McCandless
"larry hughes" <[EMAIL PROTECTED]> wrote: > I'm pondering on long term maintenance issues with Lucene indexes > and would like to know of anyone's suggestions or recommendations to > backing up these indexes. My goal is to have a weekly, or even > daily, snapshot of the current index to make su

Re: Index sync up

2007-04-27 Thread Erick Erickson
<4> is also easy From the javadoc: "*Caution:* Iterate only over the hits needed. Iterating over all hits is generally not desirable and may be the source of performance issues." So an iterator should be fine for all documents, even those > 100. But do be aware that the entire query gets r

lucene indexes back up strategies

2007-04-27 Thread larry hughes
I'm pondering on long term maintenance issues with Lucene indexes and would like to know of anyone's suggestions or recommendations to backing up these indexes. My goal is to have a weekly, or even daily, snapshot of the current index to make sure it is recoverable if the index gets corrupted. I

Re: Index sync up

2007-04-27 Thread Phil Myers
Regarding your first question (the easy one), there is some information here: http://www.gossamer-threads.com/lists/lucene/java-user/44312 --- Tony Qian <[EMAIL PROTECTED]> wrote: > All, > > After playing around with Lucene, we decided to > replace old full-text search > engine with Lucene.

Index sync up

2007-04-27 Thread Tony Qian
All, After playing around with Lucene, we decided to replace old full-text search engine with Lucene. I got "Lucene in Action" a week ago and finished reading most of the book. I got several questions. 1) Since the book was written two years ago and Lucene has made a lot of changes, is there

Re: Search for docs containing only a certain word in a specified field?

2007-04-27 Thread Erik Hatcher
On Apr 27, 2007, at 6:39 AM, karl wettin wrote: 27 apr 2007 kl. 12.36 skrev Erik Hatcher: Unless someone has some other tricks I'm not aware of, that is. I guess it would be possible to add start/stop-tokens such as ^ and $ to the indexed text: "^ the $" and place a phrase query with 0 slo

Re: Search for docs containing only a certain word in a specified field?

2007-04-27 Thread karl wettin
27 apr 2007 kl. 12.36 skrev Erik Hatcher: Unless someone has some other tricks I'm not aware of, that is. I guess it would be possible to add start/stop-tokens such as ^ and $ to the indexed text: "^ the $" and place a phrase query with 0 slop. But that might screw up SpanFirstQuery et c?

Re: Search for docs containing only a certain word in a specified field?

2007-04-27 Thread Erik Hatcher
On Apr 27, 2007, at 6:08 AM, karl wettin wrote: 27 apr 2007 kl. 08.21 skrev Kun Hong: I just want that one document which contains no other words than "the". Is it possible using Lucene query? Take a look at SpanFirstQuery. Perhaps you would need implement a SpanLastQuery too. Perhaps

Re: Search for docs containing only a certain word in a specified field?

2007-04-27 Thread karl wettin
27 apr 2007 kl. 08.21 skrev Kun Hong: I just want that one document which contains no other words than "the". Is it possible using Lucene query? Take a look at SpanFirstQuery. Perhaps you would need implement a SpanLastQuery too. Perhaps the easiest way about it would be a RegexQuery that