new sorting api and some perf numbers

2009-10-11 Thread John Wang
Hi guys: The new FieldComparator api looks really scary :) But after some perf testing with numbers I'd like to share, I guess it is worth it: HW: Mac Pro with 16G memory jvm: 1.6.0_13" jvm arg: -Xms1g -Xmx1g -server setup index: 1M docs even split into 8 segments (to make sure the test

Re: Question about how to speed up custom scoring

2009-10-11 Thread scott w
On Sun, Oct 11, 2009 at 9:10 AM, Jake Mannix wrote: > What do you mean "not something I can plug in on top of my original query"? > > Do you mean that you can't do it like the more complex example in the class > you posted earlier in the thread, where you take a linear combination of > the > Map

Re: Realtime & distributed

2009-10-11 Thread Jake Mannix
Ok nevermind actually - the simultaneous indexing was something done in zoie 1.3, and was changed in 1.4 to addIndexesNoOptimize() on the RAMDirectory indexes as soon as they are big enough. It's still true that you can throw away the RAMDirectory once the disk index is reopened though. -jake

Re: Realtime & distributed

2009-10-11 Thread Jake Mannix
Hey Eric, One clarification before letting the rest of this discussion sneak over to the zoie list: On Sun, Oct 11, 2009 at 1:51 PM, Angel, Eric wrote: * Am I wrong to assume that the RAMDir holds the entire index - just as the > FSDir? Or does RAMDir only hold a portion of the index that ha

Re: Realtime & distributed

2009-10-11 Thread John Wang
Eric: For more specific Zoie questions, let's move it to the zoie discussion group instead. Thanks -John On Sun, Oct 11, 2009 at 2:31 PM, John Wang wrote: > Hi Eric: > > I regret the direction the thread has taken and partly take responsibility > for it... > > As to your question: > > We h

Re: Realtime & distributed

2009-10-11 Thread John Wang
Hi Eric: I regret the direction the thread has taken and partly take responsibility for it... As to your question: We have 2 nodes per commodity server, each holding 5 million docs (although given the numbers we are seeing, we think we were a bit too conservative, and may increase to 10). In ter

RE: Realtime & distributed

2009-10-11 Thread Angel, Eric
Man, this thread really went south. Anyhow, I have a few questions about Zoie: * How many nodes are you using to support the speeds you desire at LI? * Am I wrong to assume that the RAMDir holds the entire index - just as the FSDir? Or does RAMDir only hold a portion of the index that hasn't ye

Re: How do you properly use NumericField

2009-10-11 Thread Paul Taylor
Uwe Schindler wrote: As we told you before. The default QueryParser has no support fro NumericField (as it doesn't know the schema). To get it running, subclass it and overwrite newRangeQuery method to create a NumericRangeQuery for field names that are indexed using NumericField. Hi, yes I di

RE: How do you properly use NumericField

2009-10-11 Thread Uwe Schindler
I forgot: The format of numeric fields is also not plain text, because of this a simple TermQuery as generated by your query parser will not work, too. If you want to hit numeric values without a NumericRangeQuery with lower and upper bound equal, you have to use NumericUtils to translate the term

RE: How do you properly use NumericField

2009-10-11 Thread Uwe Schindler
As we told you before. The default QueryParser has no support fro NumericField (as it doesn't know the schema). To get it running, subclass it and overwrite newRangeQuery method to create a NumericRangeQuery for field names that are indexed using NumericField. The recommended way is to instantiate

Re: How do you properly use NumericField

2009-10-11 Thread Paul Taylor
Michael McCandless wrote: On the indexing side you do this: doc.add(new NumericField("price").setDoubleValue(19.99)); The NumericField is not stored by default (there's also a ctor to specify Store.YES or Store.NO). If the numeric field is not being used in a range query, how is it being u

Re: Question about how to speed up custom scoring

2009-10-11 Thread Jake Mannix
What do you mean "not something I can plug in on top of my original query"? Do you mean that you can't do it like the more complex example in the class you posted earlier in the thread, where you take a linear combination of the Map -based score, and the regular text score? Another option is to j