Re: Different Analyzer for different fields in the same document

2009-04-10 Thread John Seer
Thanks this is useful class for future... Koji Sekiguchi-2 wrote: > > John Seer wrote: >> Hello, >> There is any way that a single document fields can have different >> analyzers >> for different fields? >> >> I think one way of doing it to create custom analyzer which will do field >> spastic a

Re: exponential boosts

2009-04-10 Thread Steven Bethard
On 4/10/2009 12:56 PM, Steven Bethard wrote: > I need to have a scoring model of the form: > > s1(d, q)^a1 * s2(d, q)^a2 * ... * sN(d, q)^aN > > where "d" is a document, "q" is a query, "sK" is a scoring function, and > "aK" is the exponential boost factor for that scoring function. As a > si

Re: Different Analyzer for different fields in the same document

2009-04-10 Thread Koji Sekiguchi
John Seer wrote: Hello, There is any way that a single document fields can have different analyzers for different fields? I think one way of doing it to create custom analyzer which will do field spastic analyzes.. Any other suggestions? There is PerFieldAnalyzerWrapper http://hudson.z

Different Analyzer for different fields in the same document

2009-04-10 Thread John Seer
Hello, There is any way that a single document fields can have different analyzers for different fields? I think one way of doing it to create custom analyzer which will do field spastic analyzes.. Any other suggestions? -- View this message in context: http://www.nabble.com/Different-Anal

Sequential match query

2009-04-10 Thread John Seer
Hello, I have 3 terms and I want to much them in order I tried to use wildcard query I am not getting any results back Terms: A C F Doc: name:A B C D E F query: name:A*C*F I am not getting any results back, Please any suggestions? Thanks for help in advance -- View this message in context

Re: exponential boosts

2009-04-10 Thread Steven Bethard
On 4/10/2009 1:08 PM, Jack Stahl wrote: > Perhaps you'd find it easier to implement the equivalent: > > log(s1(d, q))*a1 + ... + log(sN(d, q))*aN Yes, that's fine too - that's actually what I'd be optimizing anyway. But how would I do that? If I took the query boost route, how do I get a TermQue

RE: RangeFilter performance problem using MultiReader

2009-04-10 Thread Uwe Schindler
You got a lot of answers and questions about your index structure. Now another idea, maybe this helps you to speed up your RangeFilter: What type of range do you want to query? From your index statistics, it looks like a numeric/date field from which you filter very large ranges. If the values are

Re: exponential boosts

2009-04-10 Thread Jack Stahl
Perhaps you'd find it easier to implement the equivalent: log(s1(d, q))*a1 + ... + log(sN(d, q))*aN On Fri, Apr 10, 2009 at 12:56 PM, Steven Bethard wrote: > I need to have a scoring model of the form: > >s1(d, q)^a1 * s2(d, q)^a2 * ... * sN(d, q)^aN > > where "d" is a document, "q" is a que

exponential boosts

2009-04-10 Thread Steven Bethard
I need to have a scoring model of the form: s1(d, q)^a1 * s2(d, q)^a2 * ... * sN(d, q)^aN where "d" is a document, "q" is a query, "sK" is a scoring function, and "aK" is the exponential boost factor for that scoring function. As a simple example, I might have: s1 = TF-IDF score matching

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 3:06 PM, Mark Miller wrote: > 24 segments is bound to be quite a bit slower than an optimized index for > most things I'd be curious just how true this really is (in general)... my guess is the "long tail of tiny segments" gets into the OS's IO cache (as long as the syste

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 3:11 PM, Mark Miller wrote: > Mark Miller wrote: >> >> Michael McCandless wrote: >>> >>> which is why I'm baffled that Raf didn't see a speedup on >>> upgrading. >>> >>> Mike >>> >> >> Another point is that he may not have such a nasty set of segments - Raf >> says he has 2

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 3:14 PM, Mark Miller wrote: > Raf wrote: >> >> We have more or less 3M documents in 24 indexes and we read all of them >> using a MultiReader. >> > > Is this a multireader containing multireaders? Let's hear Raf's answer, but I think likely "yes". But this shouldn't be a

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
Raf wrote: We have more or less 3M documents in 24 indexes and we read all of them using a MultiReader. Is this a multireader containing multireaders? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
Mark Miller wrote: Michael McCandless wrote: which is why I'm baffled that Raf didn't see a speedup on upgrading. Mike Another point is that he may not have such a nasty set of segments - Raf says he has 24 indexes, which sounds like he may not have the logarithmic sizing you normally see

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
Michael McCandless wrote: which is why I'm baffled that Raf didn't see a speedup on upgrading. Mike Another point is that he may not have such a nasty set of segments - Raf says he has 24 indexes, which sounds like he may not have the logarithmic sizing you normally see. If you have somewh

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
Michael McCandless wrote: On Fri, Apr 10, 2009 at 2:32 PM, Mark Miller wrote: I had thought we would also see the advantage with multi-term queries - you rewrite against each segment and avoid extra seeks (though not nearly as many as when enumerating every term). As Mike pointed out to me

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 2:32 PM, Mark Miller wrote: > I had thought we would also see the advantage with multi-term queries - you > rewrite against each segment and avoid extra seeks (though not nearly as > many as when enumerating every term). As Mike pointed out to me back when > though : we st

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
When I did some profiling I saw that the slow down came from tons of extra seeks (single segment vs multisegment). What was happening was, the first couple segments would have thousands of terms for the field, but as the segments logarithmically shrank in size, the number of terms for the segme

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 1:20 PM, Raf wrote: > Hi Mike, > thank you for your answer. > > I have downloaded lucene-core-2.9-dev and I have executed my tests (both on > multireader and on consolidated index) using this new version, but the > performance are very similar to the previous ones. > The bi

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 11:03 AM, Yonik Seeley wrote: > On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless > wrote: >> Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms >> (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers. > > Do we know why this is, and

Lucene SnowBall unexpected behavior for some terms

2009-04-10 Thread AlexElba
Hello, I was working with lucene snowball 2.3.2 and I switch to 2.4.0. After switch I came by to some case where lucene doesn't do lemmatization correctly. So far I found only one case spa - spas. spas are not getting lemmatize at all... BTW I saw the same behavior on solr 1.3 Anybody have any

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Raf
Hi Mike, thank you for your answer. I have downloaded lucene-core-2.9-dev and I have executed my tests (both on multireader and on consolidated index) using this new version, but the performance are very similar to the previous ones. The big index is 7/8 times faster than multireader version. Raf

Re: Help to determine why an optimized index is proportionaly too big.

2009-04-10 Thread Andrzej Bialecki
Chris Hostetter wrote: : The second stage index failed an optimization with a disk full exception : (I had to move it to another lucene machine with a larger disk partition : to complete the optimization. Is there a reason why a 22 day index would : be 10x the size of an 8 day index when the do

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Yonik Seeley
On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless wrote: > Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms > (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers. Do we know why this is, and if it's fixable (the MultiTermEnum, not the higher level query o

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers. I think the only workaround is to merge your indexes down to a single index. But, Lucene trunk (not yet released) has fixed this, so that searching through

RangeFilter performance problem using MultiReader

2009-04-10 Thread Raf
Hi, we are experiencing some problems using RangeFilters and we think there are some performance issues caused by MultiReader. We have more or less 3M documents in 24 indexes and we read all of them using a MultiReader. If we do a search using only terms, there are no problems, but it if we add to

RE: Wordnet indexing error

2009-04-10 Thread Sudarsan, Sithu D.
Thanks Otis, Yes, we figured that out! Since, we do not intend to migrate to 2.4 yet, we used the syns2index source code from svn. The problem is now taken care. This part is for all: This brings us to the next question: 1. Is there some contrib code available for using hypernyms and such,

Re: Query any data

2009-04-10 Thread Tim Williams
2009/4/10 Matthew Hall : > I think I would tackle this in a slightly different manner. > > When you are creating this index, make sure that that field has a > default value. Make sure this value is something that could never appear > in the index otherwise. Then, when you goto place this field into

SpellChecker in use with composite query

2009-04-10 Thread Amin Mohammed-Coleman
Hi I have been playing around with the SpellChecker class and so far it looks really good. While developing a testcase to show it working I came across a couple of issues which I have resolved but I'm not certain if this is the correct approach. I would therefore be grateful if anyone could tell

Re: Query any data

2009-04-10 Thread Matthew Hall
I think I would tackle this in a slightly different manner. When you are creating this index, make sure that that field has a default value. Make sure this value is something that could never appear in the index otherwise. Then, when you goto place this field into the index, either write out your

Re: MultiSearcher query with Sort option

2009-04-10 Thread Preetham Kajekar
Hi, I found the API in another post on the net. new *Sort*(new SortField(null, SortField.DOC, true)) The trick is to set the field to null. Thanks for the help. Preetham Kajekar wrote: Hi Uwe, Thanks for your response. However, I could not find the API in SortField and Sort to achieve this.

Re: MultiSearcher query with Sort option

2009-04-10 Thread Preetham Kajekar
Hi Uwe, Thanks for your response. However, I could not find the API in SortField and Sort to achieve this. SortField can be wrapped inside a Sort, but you cannot specify to reverse the order . Thx, ~preetham Uwe Schindler wrote: It should, do not use Sort.INDEX_ORDER, create a SortField wit

Re: MultiSearcher query with Sort option

2009-04-10 Thread Michael McCandless
This (reversing a SortField.FIELD_DOC) should work... if it doesn't it's a bug. SortField.FIELD_DOC and SortField.FIELD_SCORE are "first class" SortField objects. Mike On Fri, Apr 10, 2009 at 5:31 AM, Uwe Schindler wrote: > It should, do not use Sort.INDEX_ORDER, create a SortField with indexor

Re: Exceptions in merge thread (while optimizing) causing problems with subsequent reopens

2009-04-10 Thread Michael McCandless
Actually it's perfectly fine for two threads to enter that code fragment (you obtain a write lock to protect the code so that "there can be only one"). Second off, even if you didn't have your write lock, the code should still be safe in that no index corruption is possible. Multiple threads may

RE: MultiSearcher query with Sort option

2009-04-10 Thread Uwe Schindler
It should, do not use Sort.INDEX_ORDER, create a SortField with indexorder and the reverse parameter, the SortField can be warpped inside a Sort instance and voila. I am not sure, if it works, but it should. Same with score. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.theta

Re: MultiSearcher query with Sort option

2009-04-10 Thread Preetham Kajekar
Hi, I just realized it was a bug in my code. On a related note, is it possible to Sort based on reverse index order ? Thanks, ~preetham Uwe Schindler wrote: Hallo Preetham, never heard of this. What Lucene version do you use? To check out, try the search in andifferent way: Combine the two ind

Re: SpellChecker AlreadyClosedException issue

2009-04-10 Thread John Cherouvim
dir is a local variable inside a method, so it's not getting reused. Should I synchronise the whole method? I think that would slow things down in a concurrent environment. Thanks for your response. Chris Hostetter wrote: : My code looks like this: : : Directory dir = null; : try { :di

RE: MultiSearcher query with Sort option

2009-04-10 Thread Uwe Schindler
Hallo Preetham, never heard of this. What Lucene version do you use? To check out, try the search in andifferent way: Combine the two indexes not into a MultiSearcher, instead open an IndexReader for both indexes and combine both readers to a MultiReader. This MultiReader can be used like a conven

MultiSearcher query with Sort option

2009-04-10 Thread Preetham Kajekar
Hi, I am using a MultiSearcher to search 2 indexes. As part of my query, I am sorting the results based on a field (which in NOT_ANALYSED). However, i seem to be getting hits only from one of the indexes. If I change to Sort.INDEX_ORDER, I seem to be getting results from both. Is this a know p