> Although I still question whether this is a *good* use of Solr It's a great use of Lucene, which can be made into a superior horizontally scalable database when compared with open source relational database systems.
My only concern, going back to *other* conversation(s) is whether or not the field cache used by stats component is operated on per-segment or not. If *true* then the stats part of Solr can be checked off as NRT / soft commit capable / efficient. I think the answer is *FALSE* based on these lines in StatsComponent which seem to be operating on the top-level reader (eg, NOT per-segment). si = FieldCache.DEFAULT.getTermsIndex(searcher.getIndexReader(), fieldName); UnInvertedField uif = UnInvertedField.getUnInvertedField(f, searcher); On Thu, Jan 5, 2012 at 4:54 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Hmmm, guess you're right, the stats component > does return that data. It's been a long day... > > Although I still question whether this is a *good* > use of Solr, I'd still re-examine my approach > whenever I found myself trying to translate > SQL queries into Solr.... > > But if, after that examination I still required > SUM, stats would do it. > > Erick > > On Thu, Jan 5, 2012 at 7:23 PM, Jason Rutherglen > <jason.rutherg...@gmail.com> wrote: >>> Short answer is that no, there isn't an aggregate >>> function. And you shouldn't even try >> >> If that is the case why does a 'stats' component exist for Solr with >> the SUM function built in? >> >> http://wiki.apache.org/solr/StatsComponent >> >> On Thu, Jan 5, 2012 at 1:37 PM, Erick Erickson <erickerick...@gmail.com> >> wrote: >>> You will encounter endless grief until you stop >>> thinking of Solr/Lucene as a replacement for >>> an RDBMS. It is a *text search engine*. >>> Whenever you start asking "how do I implement >>> a SQL statement in Solr", you have to stop >>> and reconsider *why* you are trying to do that. >>> Then recast the question in terms of searching. >>> >>> Short answer is that no, there isn't an aggregate >>> function. And you shouldn't even try. >>> >>> Best >>> Erick >>> >>> On Thu, Jan 5, 2012 at 12:53 PM, prasenjit mukherjee >>> <prasen....@gmail.com> wrote: >>>> Thanks Eric for the response. >>>> >>>> Will lucene/solr provide me aggregations ( of field vaues ) satisying >>>> a query criteria ? e.g. SELECT SUM(price) WHERE item=fruits >>>> >>>> Or I need to use hitCollector to achieve that ? >>>> >>>> Any sample solr/lucene query to compte aggregates ( like SUM ) will be >>>> great. >>>> >>>> -Thanks, >>>> Prasenjit >>>> >>>> On Thu, Jan 5, 2012 at 7:10 PM, Erick Erickson <erickerick...@gmail.com> >>>> wrote: >>>>> the time interval is just a RangeQuery in the Lucene >>>>> world. The rest is pretty standard search stuff. >>>>> >>>>> You probably want to have a look at the NRT >>>>> (near real time) stuff in trunk. >>>>> >>>>> Your reads/writes are pretty high, so you'll need >>>>> some experimentation to size your site >>>>> correctly. >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> On Wed, Jan 4, 2012 at 12:17 AM, prasenjit mukherjee >>>>> <prasen....@gmail.com> wrote: >>>>>> I have a requirement where reads and writes are quite high ( @ 100-500 >>>>>> per-sec ). A document has the following fields : timestamp, >>>>>> unique-docid, content-text, keyword. Average content-text length is ~ >>>>>> 20 bytes, there is only 1 keyword for a given docid. >>>>>> >>>>>> At runtime, given a query-term ( which could be null ) and a >>>>>> time-interval, I need to find out top-k frequent keywords which >>>>>> contains the query-term ( optional if its null ) in its context-text >>>>>> field within that time-interval. I can purge the data every day, hence >>>>>> no need for me to have more than a days data. >>>>>> >>>>>> I have quite a few options here : Starting with MySQL, NoSQLs ( >>>>>> Cassandra, Mongo, Couch, Riak, Redis ) , Search-Engine based ( >>>>>> lucene/solr ) each having its own pros/cons. >>>>>> >>>>>> In MySQL we can achieve this via : GROUP-BY/COUNT clause >>>>>> In NoSQL I can probably write a map/reduce task to query these >>>>>> numbers. Although I am not very sure about the query response time. >>>>>> Not sure of we can achieve it via lucene/solr OOB. >>>>>> >>>>>> Any suggestions on what would be a good choice for this use case ? >>>>>> >>>>>> -Thanks, >>>>>> prasenjit >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org