Re: C 2.1

James Briggs Mon, 15 Sep 2014 17:03:36 -0700

Ram,

The reason secondary indexes are not recommended is that since
they can't use the partition key, the values have to be fetched from
all nodes. So you have higher latency, and likely timeouts.

The C* solutions are:

a) use a denormalized ("materialized") table

b) use a clustered index if all the data related to the row key is
in the same partition (read my blog link from this thread for more)

That's the price of using distributed systems.

Oh, and then there's the need to rewrite the data access layer
of your entire existing app. :)

AOL and Blizzard talked about porting a couple apps to Cassandra
at the conference last week, but they sounded like trivial user-db
("UDB") apps, and even then Patrick was usually credited with the
data modelling.

I haven't heard of anybody porting a 100+ table Oracle or MySQL
app to C* yet. I'm sure it's been done, but most of the
apps written for C* are greenfield or v2.0 rewrites.

Thanks, James Briggs
--
Cassandra/MySQL DBA. Available in San Jose area or remote.

________________________________
 From: Ram N <yrami...@gmail.com>
To: user <user@cassandra.apache.org> 
Sent: Monday, September 15, 2014 1:34 PM
Subject: Re: C 2.1

Jack, 

Using Solr or an external search/indexing service is an option but increases 
the complexity of managing different systems. I am curious to understand the 
impact of having wide-rows on a separate CF for inverted index purpose which if 
I understand correctly is what Rob's response, having a separate CF for index 
is better than using the default Secondary index option. 

Would be great to understand the design decision to go with present 
implementation on Secondary Index when the alternative is better? Looking at 
JIRAs is still confusing to come up with the why :) 

--R 

On Mon, Sep 15, 2014 at 11:17 AM, Jack Krupansky <j...@basetechnology.com> 
wrote:

If you’re indexing and querying on that many columns (dozens, or more than 
a handful), consider DSE/Solr, especially if you need to query on multiple 
columns in the same query.
> 
>-- Jack 
Krupansky
> 
>From: Robert Coli 
>Sent: Monday, September 15, 2014 11:07 AM
>To: user@cassandra.apache.org 
>Subject: Re: C 2.1
> 
>On Sat, Sep 13, 2014 at 3:49 PM, Ram N <yrami...@gmail.com> wrote:
>
>Is 2.1 a production ready release? 
> 
>https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>
> 
>     Datastax Java driver - I get too confused with  CQL and the underlying 
> storage model. I am also not clear on the indexing  structure of columns. 
> Does CQL indexes create a separate CF for the index  table? How is it 
> different from maintaining inverted index? Internally both  are the same? 
> Does cql stmt to create index, creates a separate CF and has an  atomic way 
> of updating/managing them? Which one is better to scale? (something  like 
> stargate-core or the ones done by usergrid? or the CQL  approach?)
> 
>New projects should use CQL. Access to underlying storage via Thrift is 
likely to eventually be removed from Cassandra.
> 
>On a separate note just curious if I have 1000's of columns in a given  row 
>and a fixed set of indexed column  (say 30 - 50 columns) which  approach 
>should I be taking? Will cassandra scale with these many indexed  column? Are 
>there any limits? How much of an impact do CQL indexes create on  the system? 
>I am also not sure if these use cases are the right choice for  cassandra but 
>would really appreciate any response on these.  Thanks.
> 
>Use of the "Secondary Indexes" feature is generally an anti-pattern in 
Cassandra. 30-50 indexed columns in a row sounds insane to me. However 30-50 
column families into which one manually denormalized does not sound too insane 
to me...
> 
>=Rob
>http://twitter.com/rcolidba

Re: C 2.1

Reply via email to