Thanks Rob for pointing me to that link. I haven't gone through all the JIRAs but I guess it talks about adv & disadv of Secondary Index in Cassandra which I understand by now but doesn't really talk about why the default implementation of Secondary Index didn't take the DSE/Solr approach?
Hi Jack, Thats good to know but any pointers on how is this any different than https://github.com/Stratio/stratio-cassandra or http://stargate-core.readthedocs.org/en/latest/intro.html ? --Ram On Tue, Sep 16, 2014 at 10:32 PM, Jack Krupansky <j...@basetechnology.com> wrote: > DSE/Solr is tightly integrated, so there is no “external” system to > manage – insert data in CQL and within a few seconds it is available for > query from Solr running in the same JVM as Cassandra. DSE/Solr indexes the > data on each Cassandra node, and uses Cassandra’s cluster management for > distributing queries across the cluster. And... Lucene (underneath Solr) is > optimal for queries that span multiple fields. DSE/Solr supports CQL3 wide > rows (clustering columns.) > > -- Jack Krupansky > > *From:* Ram N <yrami...@gmail.com> > *Sent:* Monday, September 15, 2014 4:34 PM > *To:* user <user@cassandra.apache.org> > *Subject:* Re: C 2.1 > > > Jack, > > Using Solr or an external search/indexing service is an option but > increases the complexity of managing different systems. I am curious to > understand the impact of having wide-rows on a separate CF for inverted > index purpose which if I understand correctly is what Rob's response, > having a separate CF for index is better than using the default Secondary > index option. > > Would be great to understand the design decision to go with present > implementation on Secondary Index when the alternative is better? Looking > at JIRAs is still confusing to come up with the why :) > > --R > > > > > > On Mon, Sep 15, 2014 at 11:17 AM, Jack Krupansky <j...@basetechnology.com> > wrote: > >> If you’re indexing and querying on that many columns (dozens, or more >> than a handful), consider DSE/Solr, especially if you need to query on >> multiple columns in the same query. >> >> -- Jack Krupansky >> >> *From:* Robert Coli <rc...@eventbrite.com> >> *Sent:* Monday, September 15, 2014 11:07 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: C 2.1 >> >> On Sat, Sep 13, 2014 at 3:49 PM, Ram N <yrami...@gmail.com> wrote: >> >>> Is 2.1 a production ready release? >>> >> >> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ >> >> >>> Datastax Java driver - I get too confused with CQL and the >>> underlying storage model. I am also not clear on the indexing structure of >>> columns. Does CQL indexes create a separate CF for the index table? How is >>> it different from maintaining inverted index? Internally both are the same? >>> Does cql stmt to create index, creates a separate CF and has an atomic way >>> of updating/managing them? Which one is better to scale? (something like >>> stargate-core or the ones done by usergrid? or the CQL approach?) >>> >> >> New projects should use CQL. Access to underlying storage via Thrift is >> likely to eventually be removed from Cassandra. >> >> >>> On a separate note just curious if I have 1000's of columns in a given >>> row and a fixed set of indexed column (say 30 - 50 columns) which approach >>> should I be taking? Will cassandra scale with these many indexed column? >>> Are there any limits? How much of an impact do CQL indexes create on the >>> system? I am also not sure if these use cases are the right choice for >>> cassandra but would really appreciate any response on these. Thanks. >>> >> >> Use of the "Secondary Indexes" feature is generally an anti-pattern in >> Cassandra. 30-50 indexed columns in a row sounds insane to me. However >> 30-50 column families into which one manually denormalized does not sound >> too insane to me... >> >> =Rob >> http://twitter.com/rcolidba >> > >