At this point I have no additional advice to offer. There seems to be intense resistance to follow the modeling approach I have recommended, so there is nothing more I can offer on that front. The bottom line is that if the techniques referenced in the blog post are not sufficient, then nothing short of a clean re-model will get you to the performance that CQL is fully capable of delivering. Again, a relatively mechanical migration from Thrift to CQL will only get you so far and cannot be used as a full substitute for a clean re-model. If you feel that you cannot remodel, then you will simply have to accept any performance limitations. Those performance limitations are not limitations of CQL or non-COMPACT STORAGE but the limitations of doing a simple mechanical migration as proposed in that blog post.
That said, maybe somebody else might proposed a more sophisticated migration model. -- Jack Krupansky On Mon, Apr 11, 2016 at 6:15 PM, Jim Ancona <j...@anconafamily.com> wrote: > > On Mon, Apr 11, 2016 at 4:19 PM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: > >> Some of this may depend on exactly how you are using so-called COMPACT >> STORAGE. I mean, if your tables really are modeled as all but exactly one >> column in the primary key, then okay, COMPACT STORAGE may be a reasonable >> model, but that seems to be a very special, narrow use case, so for all >> other cases you really do need to re-model for CQL for Cassandra 4.0. >> > There was no such restriction when modeling with Thrift. It's an artifact > of how CQL chose to expose the Thrift data model. > > I'm not sure why anybody is thinking otherwise. Sure, maybe will be a lot >> of work, but that's life and people have been given plenty of notice. >> > "That's life" minimizes the difficulty of doing this sort of migration for > large, mission-critical systems. It would require large amounts of time, as > well as temporarily doubling hardware resources amounting to dozens up to > hundreds of nodes. > > And if it takes hours to do a data migration, I think that you can >> consider yourself lucky relative to people who may require days. >> > Or more. > > Now, if there are particular Thrift use cases that don't have efficient >> models in CQL, that can be discussed. Start by expressing the Thrift data >> in a neutral, natural, logical, plain English data model, and then we can >> see how that maps to CQL. >> >> So, where are we? Is it just the complaint that migration is slow and >> re-modeling is difficult, or are there specific questions about how to do >> the re-modeling? >> > My purpose is not to complain, but to educate :-). Telling someone "just > remodel your data" is not helpful, especially after he's told you that he > tried that and ran into performance issues. (Note that the link he posted > shows an order of magnitude decrease in throughput when moving from COMPACT > STORE to CQL3 native tables for analytics workloads, so it's not just his > use case.) Do you have any suggestions of ways he might mitigate those > issues? Is there information you need to make such a recommendation? > > Jim > > >> >> >> -- Jack Krupansky >> >> On Mon, Apr 11, 2016 at 1:30 PM, Anuj Wadehra <anujw_2...@yahoo.co.in> >> wrote: >> >>> Thanks Jim. I think you understand the pain of migrating TBs of data to >>> new tables. There is no command to change from compact to non compact >>> storage and the fastest solution to migrate data using Spark is too slow >>> for production systems. >>> >>> And the pain gets bigger when your performance dips after moving to non >>> compact storage table. Thats because non compact storage is quite >>> inefficient storage format till 3.x and its incurs heavy penalty on Row >>> Scan performance in Analytics workload. >>> Please go throught the link to understand how old Compact storage gives >>> much better performance than non compact storage as far as Row Scans are >>> concerned: >>> https://www.oreilly.com/ideas/apache-cassandra-for-analytics-a-performance-and-storage-analysis >>> >>> The flexibility of Cql comes at heavy cost until 3.x. >>> >>> >>> >>> Thanks >>> Anuj >>> Sent from Yahoo Mail on Android >>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>> >>> On Mon, 11 Apr, 2016 at 10:35 PM, Jim Ancona >>> <j...@anconafamily.com> wrote: >>> Jack, the Datastax link he posted ( >>> http://www.datastax.com/dev/blog/thrift-to-cql3) says that for column >>> families with mixed dynamic and static columns: "The only solution to be >>> able to access the column family fully is to remove the declared columns >>> from the thrift schema altogether..." I think that page describes the >>> problem and the potential solutions well. I haven't seen an answer to >>> Anuj's question about why the native CQL solution using collections doesn't >>> perform as well. >>> >>> Keep in mind that some of us understand CQL just fine but have working >>> pre-CQL Thrift-based systems storing hundreds of terabytes of data and with >>> requirements that mean that saying "bite the bullet and re-model your >>> data" is not really helpful. Another quote from that Datastax link: >>> "Thrift isn't going anywhere." Granted that that link is three-plus years >>> old, but Thrift now *is* now going away, so it's not unexpected that people >>> will be trying to figure out how to deal with that. It's bad enough that we >>> need to rewrite our clients to use CQL instead of Thrift. It's not helpful >>> to say that we should also re-model and migrate all our data. >>> >>> Jim >>> >>> On Mon, Apr 11, 2016 at 11:29 AM, Jack Krupansky < >>> jack.krupan...@gmail.com> wrote: >>> >>>> Sorry, but your message is too confusing - you say "reading dynamic >>>> columns in CQL" and "make the table schema less", but neither has any >>>> relevance to CQL! 1. CQL tables always have schemas. 2. All columns in CQL >>>> are statically declared (even maps/collections are statically declared >>>> columns.) Granted, it is a challenge for Thrift users to get used to the >>>> terminology of CQL, but it is required. If necessary, review some of the >>>> free online training videos for data modeling. >>>> >>>> Unless your data model is very simply and does directly translate into >>>> CQL, you probably do need to bite the bullet and re-model your data to >>>> exploit the features of CQL rather than fight CQL trying to mimic Thrift >>>> per se. >>>> >>>> In any case, take another shot at framing the problem and then maybe >>>> people here can help you out. >>>> >>>> -- Jack Krupansky >>>> >>>> On Mon, Apr 11, 2016 at 10:39 AM, Anuj Wadehra <anujw_2...@yahoo.co.in> >>>> wrote: >>>> >>>>> Any comments or suggestions on this one? >>>>> >>>>> Thanks >>>>> Anuj >>>>> >>>>> Sent from Yahoo Mail on Android >>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>>> >>>>> On Sun, 10 Apr, 2016 at 11:39 PM, Anuj Wadehra >>>>> <anujw_2...@yahoo.co.in> wrote: >>>>> Hi >>>>> >>>>> We are on 2.0.14 and Thrift. We are planning to migrate to CQL soon >>>>> but facing some challenges. >>>>> >>>>> We have a cf with a mix of statically defined columns and dynamic >>>>> columns (created at run time). For reading dynamic columns in CQL, >>>>> we have two options: >>>>> >>>>> 1. Drop all columns and make the table schema less. This way, we will >>>>> get a Cql row for each column defined for a row key--As mentioned here: >>>>> http://www.datastax.com/dev/blog/thrift-to-cql3 >>>>> >>>>> 2.Migrate entire data to a new non compact storage table and create >>>>> collections for dynamic columns in new table. >>>>> >>>>> In our case, we have observed that approach 2 causes 3 times slower >>>>> performance in Range scan queries used by Spark. This is not acceptable. >>>>> Cassandra 3 has optimized storage engine but we are not comfortable moving >>>>> to 3.x in production. >>>>> >>>>> Moreover, data migration to new table using Spark takes hours. >>>>> >>>>> Any suggestions for the two issues? >>>>> >>>>> >>>>> Thanks >>>>> Anuj >>>>> >>>>> >>>>> Sent from Yahoo Mail on Android >>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>>> >>>>> >>>> >>> >> >