No problem. IS there a JIRA ticket already for this? On Mon, Aug 24, 2015 at 6:06 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:
> Can you post your findings to JIRA as well? Would be good to see some > real numbers from production. > > The refactor of the storage engine (8099) may completely change this, but > it's good to have it on the radar. > > > On Sun, Aug 23, 2015 at 10:31 PM Kevin Burton <bur...@spinn3r.com> wrote: > >> Agreed. We’re going to run a benchmark. Just realized we grew to 144 >> columns. Fun. Kind of disappointing that Cassandra is so slow in this >> regard. Kind of defeats the whole point of flexible schema if actually >> using that feature is slow as hell. >> >> On Sun, Aug 23, 2015 at 4:54 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >> wrote: >> >>> The key is to benchmark it with your real data. Modern cassandra-stress >>> let’s you get very close to your actual read/write behavior, and the real >>> differentiator will depend on your use case (how often do you write the >>> whole row vs updating just one column/field). My gist shows a ton of >>> different examples, but they’re not scientific, and at this point they’re >>> old versions (and performance varies version to version). >>> >>> - Jeff >>> >>> From: <burtonator2...@gmail.com> on behalf of Kevin Burton >>> Reply-To: "user@cassandra.apache.org" >>> Date: Sunday, August 23, 2015 at 2:58 PM >>> To: "user@cassandra.apache.org" >>> Subject: Re: Practical limitations of too many columns/cells ? >>> >>> Ah.. yes. Great benchmarks. If I’m interpreting them correctly it was >>> ~15x slower for 22 columns vs 2 columns? >>> >>> Guess we have to refactor again :-P >>> >>> Not the end of the world of course. >>> >>> On Sun, Aug 23, 2015 at 1:53 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >>> wrote: >>> >>>> A few months back, a user in #cassandra on freenode mentioned that when >>>> they transitioned from thrift to cql, their overall performance decreased >>>> significantly. They had 66 columns per table, so I ran some benchmarks with >>>> various versions of Cassandra and thrift/cql combinations. >>>> >>>> It shouldn’t really surprise you that more columns = more work = slower >>>> operations. It’s not necessarily the size of the writes, but the amount of >>>> work that needs to be done with the extra cells (2 large columns totaling >>>> 2k performs better than 66 small columns totaling 0.66k even though it’s >>>> three times as much raw data being written to disk) >>>> >>>> https://gist.github.com/jeffjirsa/6e481b132334dfb6d42c >>>> >>>> 2.0.13, 2 tokens per node, 66 columns, 10 bytes per column, thrift (660 >>>> bytes per): cassandra-stress --operation INSERT --num-keys 1000000 >>>> --columns 66 --column-size=10 --replication-factor 2 --nodesfile=nodes >>>> Averages from the middle 80% of values: interval_op_rate : 10720 >>>> >>>> 2.0.13, 2 tokens per node, 20 columns, 10 bytes per column, thrift (200 >>>> bytes per): cassandra-stress --operation INSERT --num-keys 1000000 >>>> --columns 20 --column-size=10 --replication-factor 2 --nodesfile=nodes >>>> Averages from the middle 80% of values: interval_op_rate : 28667 >>>> >>>> 2.0.13, 2 tokens per node, 2 large columns, thrift (2048 bytes per): >>>> cassandra-stress --operation INSERT --num-keys 1000000 --columns 2 >>>> --column-size=1024 --replication-factor 2 --nodesfile=nodes Averages >>>> from the middle 80% of values: interval_op_rate : 23489 >>>> >>>> From: <burtonator2...@gmail.com> on behalf of Kevin Burton >>>> Reply-To: "user@cassandra.apache.org" >>>> Date: Sunday, August 23, 2015 at 1:02 PM >>>> To: "user@cassandra.apache.org" >>>> Subject: Practical limitations of too many columns/cells ? >>>> >>>> Is there any advantage to using say 40 columns per row vs using 2 >>>> columns (one for the pk and the other for data) and then shoving the data >>>> into a BLOB as a JSON object? >>>> >>>> To date, we’ve been just adding new columns. I profiled Cassandra and >>>> about 50% of the CPU time is spent on CPU doing compactions. Seeing that >>>> CS is being CPU bottlenecked maybe this is a way I can optimize it. >>>> >>>> Any thoughts? >>>> >>>> -- >>>> >>>> Founder/CEO Spinn3r.com >>>> Location: *San Francisco, CA* >>>> blog: http://burtonator.wordpress.com >>>> … or check out my Google+ profile >>>> <https://plus.google.com/102718274791889610666/posts> >>>> >>>> >>> >>> >>> -- >>> >>> Founder/CEO Spinn3r.com >>> Location: *San Francisco, CA* >>> blog: http://burtonator.wordpress.com >>> … or check out my Google+ profile >>> <https://plus.google.com/102718274791889610666/posts> >>> >>> >> >> >> -- >> >> Founder/CEO Spinn3r.com >> Location: *San Francisco, CA* >> blog: http://burtonator.wordpress.com >> … or check out my Google+ profile >> <https://plus.google.com/102718274791889610666/posts> >> >> -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts>