Re: frequent keyword computation within a search ( and timeinterval )

2012-01-05 Thread prasenjit mukherjee
It seems that the field ( on which stats needs to be cimputed ) should always remain in memory. This could be a killer. Why isn't it possible to put that stat-field information into posting stream ( using payload ) which facilitate fast computation of stats withouting requiring it to keep the conte

Re: frequent keyword computation within a search ( and timeinterval )

2012-01-05 Thread Jason Rutherglen
> Although I still question whether this is a *good* use of Solr It's a great use of Lucene, which can be made into a superior horizontally scalable database when compared with open source relational database systems. My only concern, going back to *other* conversation(s) is whether or not the fi

Re: frequent keyword computation within a search ( and timeinterval )

2012-01-05 Thread Erick Erickson
Hmmm, guess you're right, the stats component does return that data. It's been a long day... Although I still question whether this is a *good* use of Solr, I'd still re-examine my approach whenever I found myself trying to translate SQL queries into Solr But if, after that examination I stil

Re: frequent keyword computation within a search ( and timeinterval )

2012-01-05 Thread Jason Rutherglen
> Short answer is that no, there isn't an aggregate > function. And you shouldn't even try If that is the case why does a 'stats' component exist for Solr with the SUM function built in? http://wiki.apache.org/solr/StatsComponent On Thu, Jan 5, 2012 at 1:37 PM, Erick Erickson wrote: > You will

Re: frequent keyword computation within a search ( and timeinterval )

2012-01-05 Thread Erick Erickson
You will encounter endless grief until you stop thinking of Solr/Lucene as a replacement for an RDBMS. It is a *text search engine*. Whenever you start asking "how do I implement a SQL statement in Solr", you have to stop and reconsider *why* you are trying to do that. Then recast the question in t

RE: Commit data to disk ...

2012-01-05 Thread Dragon Fly
I think I'll have to upgrade then, thanks. > Date: Thu, 5 Jan 2012 08:32:48 -0500 > Subject: Re: Commit data to disk ... > From: erickerick...@gmail.com > To: java-user@lucene.apache.org > > Lucene 2.0? I don't even know how to find the docs any more, I really > suggest you upgrade to something

Heads Up - Index File Format Change on Trunk

2012-01-05 Thread Simon Willnauer
Folks, I just committed LUCENE-3628 [1] which cuts over Norms to DocVaues. This is an index file format change and if you are using trunk you need to reindex before updating. happy indexing :) simon [1] https://issues.apache.org/jira/browse/LUCENE-3628 -

Re: frequent keyword computation within a search ( and timeinterval )

2012-01-05 Thread prasenjit mukherjee
Thanks Eric for the response. Will lucene/solr provide me aggregations ( of field vaues ) satisying a query criteria ? e.g. SELECT SUM(price) WHERE item=fruits Or I need to use hitCollector to achieve that ? Any sample solr/lucene query to compte aggregates ( like SUM ) will be great. -Thanks,

Re: frequent keyword computation within a search ( and timeinterval )

2012-01-05 Thread Erick Erickson
the time interval is just a RangeQuery in the Lucene world. The rest is pretty standard search stuff. You probably want to have a look at the NRT (near real time) stuff in trunk. Your reads/writes are pretty high, so you'll need some experimentation to size your site correctly. Best Erick On We

Re: Commit data to disk ...

2012-01-05 Thread Erick Erickson
Lucene 2.0? I don't even know how to find the docs any more, I really suggest you upgrade to something more recent. In the 2.9 both IndexReader and IndexWriter have commit() methods. Best Erick On Tue, Jan 3, 2012 at 8:35 AM, Dragon Fly wrote: > > Hi, I'm using Lucene 2.0 and was wondering how

Re: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-05 Thread Peter K
Hi Simon, answers below. >> It does not seem to be an 'IO related issue' because using RAMDirectory >> results in the same times. >> And indexing via Luc4 with only one thread shouldn't be slower than 3.5 (?) > it could be since we use a different term dictionary impl which is > more expensive in

Re: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-05 Thread Simon Willnauer
hey peter, On Wed, Jan 4, 2012 at 12:52 AM, Peter K wrote: > Thanks Simon for you answer! > >> as far as I can see you are comparing apples and pears. > > When excluding the waiting time I also get the slight but reproducable > difference**. The times for waitForGeneration are nearly the same > (