Re: Migrating to CQL and Non Compact Storage

Jack Krupansky Mon, 11 Apr 2016 21:35:39 -0700

At this point I have no additional advice to offer. There seems to be
intense resistance to follow the modeling approach I have recommended, so
there is nothing more I can offer on that front. The bottom line is that if
the techniques referenced in the blog post are not sufficient, then nothing
short of a clean re-model will get you to the performance that CQL is fully
capable of delivering. Again, a relatively mechanical migration from Thrift
to CQL will only get you so far and cannot be used as a full substitute for
a clean re-model. If you feel that you cannot remodel, then you will simply
have to accept any performance limitations. Those performance limitations
are not limitations of CQL or non-COMPACT STORAGE but the limitations of
doing a simple mechanical migration as proposed in that blog post.


That said, maybe somebody else might proposed a more sophisticated
migration model.

-- Jack Krupansky

On Mon, Apr 11, 2016 at 6:15 PM, Jim Ancona <j...@anconafamily.com> wrote:

>
> On Mon, Apr 11, 2016 at 4:19 PM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
>> Some of this may depend on exactly how you are using so-called COMPACT
>> STORAGE. I mean, if your tables really are modeled as all but exactly one
>> column in the primary key, then okay, COMPACT STORAGE may be a reasonable
>> model, but that seems to be a very special, narrow use case, so for all
>> other cases you really do need to re-model for CQL for Cassandra 4.0.
>>
> There was no such restriction when modeling with Thrift. It's an artifact
> of how CQL chose to expose the Thrift data model.
>
> I'm not sure why anybody is thinking otherwise. Sure, maybe will be a lot
>> of work, but that's life and people have been given plenty of notice.
>>
> "That's life" minimizes the difficulty of doing this sort of migration for
> large, mission-critical systems. It would require large amounts of time, as
> well as temporarily doubling hardware resources amounting to dozens up to
> hundreds of nodes.
>
> And if it takes hours to do a data migration, I think that you can
>> consider yourself lucky relative to people who may require days.
>>
> Or more.
>
> Now, if there are particular Thrift use cases that don't have efficient
>> models in CQL, that can be discussed. Start by expressing the Thrift data
>> in a neutral, natural, logical, plain English data model, and then we can
>> see how that maps to CQL.
>>
>> So, where are we? Is it just the complaint that migration is slow and
>> re-modeling is difficult, or are there specific questions about how to do
>> the re-modeling?
>>
> My purpose is not to complain, but to educate :-). Telling someone "just
> remodel your data" is not helpful, especially after he's told you that he
> tried that and ran into performance issues. (Note that the link he posted
> shows an order of magnitude decrease in throughput when moving from COMPACT
> STORE to CQL3 native tables for analytics workloads, so it's not just his
> use case.) Do you have any suggestions of ways he might mitigate those
> issues? Is there information you need to make such a recommendation?
>
> Jim
>
>
>>
>>
>> -- Jack Krupansky
>>
>> On Mon, Apr 11, 2016 at 1:30 PM, Anuj Wadehra <anujw_2...@yahoo.co.in>
>> wrote:
>>
>>> Thanks Jim. I think you understand the pain of migrating TBs of data to
>>> new tables. There is no command to change from compact to non compact
>>> storage and the fastest solution to migrate data using Spark is too slow
>>> for production systems.
>>>
>>> And the pain gets bigger when your performance dips after moving to non
>>> compact storage table. Thats because non compact storage is quite
>>> inefficient storage format till 3.x and its incurs heavy penalty on Row
>>> Scan performance in Analytics workload.
>>> Please go throught the link to understand how old Compact storage gives
>>> much better performance than non compact storage as far as Row Scans are
>>> concerned:
>>> https://www.oreilly.com/ideas/apache-cassandra-for-analytics-a-performance-and-storage-analysis
>>>
>>> The flexibility of Cql comes at heavy cost until 3.x.
>>>
>>>
>>>
>>> Thanks
>>> Anuj
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Mon, 11 Apr, 2016 at 10:35 PM, Jim Ancona
>>> <j...@anconafamily.com> wrote:
>>> Jack, the Datastax link he posted (
>>> http://www.datastax.com/dev/blog/thrift-to-cql3) says that for column
>>> families with mixed dynamic and static columns: "The only solution to be
>>> able to access the column family fully is to remove the declared columns
>>> from the thrift schema altogether..." I think that page describes the
>>> problem and the potential solutions well. I haven't seen an answer to
>>> Anuj's question about why the native CQL solution using collections doesn't
>>> perform as well.
>>>
>>> Keep in mind that some of us understand CQL just fine but have working
>>> pre-CQL Thrift-based systems storing hundreds of terabytes of data and with
>>> requirements that mean that saying "bite the bullet and re-model your
>>> data" is not really helpful. Another quote from that Datastax link:
>>> "Thrift isn't going anywhere." Granted that that link is three-plus years
>>> old, but Thrift now *is* now going away, so it's not unexpected that people
>>> will be trying to figure out how to deal with that. It's bad enough that we
>>> need to rewrite our clients to use CQL instead of Thrift. It's not helpful
>>> to say that we should also re-model and migrate all our data.
>>>
>>> Jim
>>>
>>> On Mon, Apr 11, 2016 at 11:29 AM, Jack Krupansky <
>>> jack.krupan...@gmail.com> wrote:
>>>
>>>> Sorry, but your message is too confusing - you say "reading dynamic
>>>> columns in CQL" and "make the table schema less", but neither has any
>>>> relevance to CQL! 1. CQL tables always have schemas. 2. All columns in CQL
>>>> are statically declared (even maps/collections are statically declared
>>>> columns.) Granted, it is a challenge for Thrift users to get used to the
>>>> terminology of CQL, but it is required. If necessary, review some of the
>>>> free online training videos for data modeling.
>>>>
>>>> Unless your data model is very simply and does directly translate into
>>>> CQL, you probably do need to bite the bullet and re-model your data to
>>>> exploit the features of CQL rather than fight CQL trying to mimic Thrift
>>>> per se.
>>>>
>>>> In any case, take another shot at framing the problem and then maybe
>>>> people here can help you out.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Mon, Apr 11, 2016 at 10:39 AM, Anuj Wadehra <anujw_2...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Any comments or suggestions on this one?
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Sun, 10 Apr, 2016 at 11:39 PM, Anuj Wadehra
>>>>> <anujw_2...@yahoo.co.in> wrote:
>>>>> Hi
>>>>>
>>>>> We are on 2.0.14 and Thrift. We are planning to migrate to CQL soon
>>>>> but facing some challenges.
>>>>>
>>>>> We have a cf with a mix of statically defined columns and dynamic
>>>>> columns (created at run time). For reading dynamic columns in CQL,
>>>>> we have two options:
>>>>>
>>>>> 1. Drop all columns and make the table schema less. This way, we will
>>>>> get a Cql row for each column defined for a row key--As mentioned here:
>>>>> http://www.datastax.com/dev/blog/thrift-to-cql3
>>>>>
>>>>> 2.Migrate entire data to a new non compact storage table and create
>>>>> collections for dynamic columns in new table.
>>>>>
>>>>> In our case, we have observed that approach 2 causes 3 times slower
>>>>> performance in Range scan queries used by Spark. This is not acceptable.
>>>>> Cassandra 3 has optimized storage engine but we are not comfortable moving
>>>>> to 3.x in production.
>>>>>
>>>>> Moreover, data migration to new table using Spark takes hours.
>>>>>
>>>>> Any suggestions for the two issues?
>>>>>
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Migrating to CQL and Non Compact Storage

Reply via email to