No problem.  IS there a JIRA ticket already for this?

On Mon, Aug 24, 2015 at 6:06 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Can you post your findings to JIRA as well?  Would be good to see some
> real numbers from production.
>
> The refactor of the storage engine (8099) may completely change this, but
> it's good to have it on the radar.
>
>
> On Sun, Aug 23, 2015 at 10:31 PM Kevin Burton <bur...@spinn3r.com> wrote:
>
>> Agreed.  We’re going to run a benchmark.  Just realized we grew to 144
>> columns.  Fun.  Kind of disappointing that Cassandra is so slow in this
>> regard.  Kind of defeats the whole point of flexible schema if actually
>> using that feature is slow as hell.
>>
>> On Sun, Aug 23, 2015 at 4:54 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>>> The key is to benchmark it with your real data. Modern cassandra-stress
>>> let’s you get very close to your actual read/write behavior, and the real
>>> differentiator will depend on your use case (how often do you write the
>>> whole row vs updating just one column/field). My gist shows a ton of
>>> different examples, but they’re not scientific, and at this point they’re
>>> old versions (and performance varies version to version).
>>>
>>> - Jeff
>>>
>>> From: <burtonator2...@gmail.com> on behalf of Kevin Burton
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Sunday, August 23, 2015 at 2:58 PM
>>> To: "user@cassandra.apache.org"
>>> Subject: Re: Practical limitations of too many columns/cells ?
>>>
>>> Ah.. yes.  Great benchmarks. If I’m interpreting them correctly it was
>>> ~15x slower for 22 columns vs 2 columns?
>>>
>>> Guess we have to refactor again :-P
>>>
>>> Not the end of the world of course.
>>>
>>> On Sun, Aug 23, 2015 at 1:53 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>> wrote:
>>>
>>>> A few months back, a user in #cassandra on freenode mentioned that when
>>>> they transitioned from thrift to cql, their overall performance decreased
>>>> significantly. They had 66 columns per table, so I ran some benchmarks with
>>>> various versions of Cassandra and thrift/cql combinations.
>>>>
>>>> It shouldn’t really surprise you that more columns = more work = slower
>>>> operations. It’s not necessarily the size of the writes, but the amount of
>>>> work that needs to be done with the extra cells (2 large columns totaling
>>>> 2k performs better than 66 small columns totaling 0.66k even though it’s
>>>> three times as much raw data being written to disk)
>>>>
>>>> https://gist.github.com/jeffjirsa/6e481b132334dfb6d42c
>>>>
>>>> 2.0.13, 2 tokens per node, 66 columns, 10 bytes per column, thrift (660
>>>> bytes per): cassandra-stress --operation INSERT --num-keys 1000000
>>>> --columns 66 --column-size=10 --replication-factor 2 --nodesfile=nodes
>>>> Averages from the middle 80% of values: interval_op_rate : 10720
>>>>
>>>> 2.0.13, 2 tokens per node, 20 columns, 10 bytes per column, thrift (200
>>>> bytes per): cassandra-stress --operation INSERT --num-keys 1000000
>>>> --columns 20 --column-size=10 --replication-factor 2 --nodesfile=nodes
>>>> Averages from the middle 80% of values: interval_op_rate : 28667
>>>>
>>>> 2.0.13, 2 tokens per node, 2 large columns, thrift (2048 bytes per):
>>>> cassandra-stress --operation INSERT --num-keys 1000000 --columns 2
>>>> --column-size=1024 --replication-factor 2 --nodesfile=nodes Averages
>>>> from the middle 80% of values: interval_op_rate : 23489
>>>>
>>>> From: <burtonator2...@gmail.com> on behalf of Kevin Burton
>>>> Reply-To: "user@cassandra.apache.org"
>>>> Date: Sunday, August 23, 2015 at 1:02 PM
>>>> To: "user@cassandra.apache.org"
>>>> Subject: Practical limitations of too many columns/cells ?
>>>>
>>>> Is there any advantage to using say 40 columns per row vs using 2
>>>> columns (one for the pk and the other for data) and then shoving the data
>>>> into a BLOB as a JSON object?
>>>>
>>>> To date, we’ve been just adding new columns.  I profiled Cassandra and
>>>> about 50% of the CPU time is spent on CPU doing compactions.  Seeing that
>>>> CS is being CPU bottlenecked maybe this is a way I can optimize it.
>>>>
>>>> Any thoughts?
>>>>
>>>> --
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> blog: http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>>
>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Reply via email to