It seems that the field ( on which stats needs to be cimputed ) should always remain in memory. This could be a killer. Why isn't it possible to put that stat-field information into posting stream ( using payload ) which facilitate fast computation of stats withouting requiring it to keep the content in memory.
On 1/6/12, Jason Rutherglen <jason.rutherg...@gmail.com> wrote: >> Although I still question whether this is a *good* use of Solr > > It's a great use of Lucene, which can be made into a superior > horizontally scalable database when compared with open source > relational database systems. > > My only concern, going back to *other* conversation(s) is whether or > not the field cache used by stats component is operated on per-segment > or not. If *true* then the stats part of Solr can be checked off as > NRT / soft commit capable / efficient. > > I think the answer is *FALSE* based on these lines in StatsComponent > which seem to be operating on the top-level reader (eg, NOT > per-segment). > > si = FieldCache.DEFAULT.getTermsIndex(searcher.getIndexReader(), > fieldName); > > UnInvertedField uif = UnInvertedField.getUnInvertedField(f, searcher); > > On Thu, Jan 5, 2012 at 4:54 PM, Erick Erickson <erickerick...@gmail.com> > wrote: >> Hmmm, guess you're right, the stats component >> does return that data. It's been a long day... >> >> Although I still question whether this is a *good* >> use of Solr, I'd still re-examine my approach >> whenever I found myself trying to translate >> SQL queries into Solr.... >> >> But if, after that examination I still required >> SUM, stats would do it. >> >> Erick >> >> On Thu, Jan 5, 2012 at 7:23 PM, Jason Rutherglen >> <jason.rutherg...@gmail.com> wrote: >>>> Short answer is that no, there isn't an aggregate >>>> function. And you shouldn't even try >>> >>> If that is the case why does a 'stats' component exist for Solr with >>> the SUM function built in? >>> >>> http://wiki.apache.org/solr/StatsComponent >>> >>> On Thu, Jan 5, 2012 at 1:37 PM, Erick Erickson <erickerick...@gmail.com> >>> wrote: >>>> You will encounter endless grief until you stop >>>> thinking of Solr/Lucene as a replacement for >>>> an RDBMS. It is a *text search engine*. >>>> Whenever you start asking "how do I implement >>>> a SQL statement in Solr", you have to stop >>>> and reconsider *why* you are trying to do that. >>>> Then recast the question in terms of searching. >>>> >>>> Short answer is that no, there isn't an aggregate >>>> function. And you shouldn't even try. >>>> >>>> Best >>>> Erick >>>> >>>> On Thu, Jan 5, 2012 at 12:53 PM, prasenjit mukherjee >>>> <prasen....@gmail.com> wrote: >>>>> Thanks Eric for the response. >>>>> >>>>> Will lucene/solr provide me aggregations ( of field vaues ) satisying >>>>> a query criteria ? e.g. SELECT SUM(price) WHERE item=fruits >>>>> >>>>> Or I need to use hitCollector to achieve that ? >>>>> >>>>> Any sample solr/lucene query to compte aggregates ( like SUM ) will be >>>>> great. >>>>> >>>>> -Thanks, >>>>> Prasenjit >>>>> >>>>> On Thu, Jan 5, 2012 at 7:10 PM, Erick Erickson >>>>> <erickerick...@gmail.com> wrote: >>>>>> the time interval is just a RangeQuery in the Lucene >>>>>> world. The rest is pretty standard search stuff. >>>>>> >>>>>> You probably want to have a look at the NRT >>>>>> (near real time) stuff in trunk. >>>>>> >>>>>> Your reads/writes are pretty high, so you'll need >>>>>> some experimentation to size your site >>>>>> correctly. >>>>>> >>>>>> Best >>>>>> Erick >>>>>> >>>>>> On Wed, Jan 4, 2012 at 12:17 AM, prasenjit mukherjee >>>>>> <prasen....@gmail.com> wrote: >>>>>>> I have a requirement where reads and writes are quite high ( @ >>>>>>> 100-500 >>>>>>> per-sec ). A document has the following fields : timestamp, >>>>>>> unique-docid, content-text, keyword. Average content-text length is >>>>>>> ~ >>>>>>> 20 bytes, there is only 1 keyword for a given docid. >>>>>>> >>>>>>> At runtime, given a query-term ( which could be null ) and a >>>>>>> time-interval, I need to find out top-k frequent keywords which >>>>>>> contains the query-term ( optional if its null ) in its context-text >>>>>>> field within that time-interval. I can purge the data every day, >>>>>>> hence >>>>>>> no need for me to have more than a days data. >>>>>>> >>>>>>> I have quite a few options here : Starting with MySQL, NoSQLs ( >>>>>>> Cassandra, Mongo, Couch, Riak, Redis ) , Search-Engine based ( >>>>>>> lucene/solr ) each having its own pros/cons. >>>>>>> >>>>>>> In MySQL we can achieve this via : GROUP-BY/COUNT clause >>>>>>> In NoSQL I can probably write a map/reduce task to query these >>>>>>> numbers. Although I am not very sure about the query response time. >>>>>>> Not sure of we can achieve it via lucene/solr OOB. >>>>>>> >>>>>>> Any suggestions on what would be a good choice for this use case ? >>>>>>> >>>>>>> -Thanks, >>>>>>> prasenjit >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sent from my mobile device --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org