Re: run cassandra on a small instance

Tim Dunphy Fri, 20 Feb 2015 09:41:19 -0800

Hey guys,

 OK well I've experimented with this a bit, and I think at this point the
problem with Cassandra crashing on the smaller instances is probably an
issue with my data. Because what I've done is blown away my data directory
to start fresh. And then started up Cassandra on the 2GB instance.


In addition to that, I've also fired up another Cassandra instance on a
t2.micro instance on Amazon Web Services. Again, without my data and just
to see if it would run on less resources in terms of memory.

And both the 2GB ram instance at Digital Ocean and the new t2.micro
instance at AWS. And cassandra has been running for a while now, without
interruption. It's been running well past the 5-hour window of time I've
been experiencing on the 2GB instance.

And with my test data involved on the 4GB memory instance at Digital Ocean,
it just runs and there's no issue.

All told I have 15MB of data in my keyspace for my test data. So I'm hoping
if I show you my schema, there might be some way of understanding why, when
I introduce my data, the smaller instances refuse to run for any great
length of time.

CREATE KEYSPACE IF NOT EXISTS joke_fire1 WITH replication = {'class':
'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;

use joke_fire1;

CREATE TABLE IF NOT EXISTS joke_fire1.jokes (
    joke_id int PRIMARY KEY,
    fire_it int,
    joke_title text,
    joke_type int,
    long_descr text,
    long_descrlink text,
    long_descrtype int,
    posted_on timestamp,
    short_descr text,
    status int,
    user_id int,
    view_by int
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

CREATE TABLE IF NOT EXISTS joke_fire1.joke_details (
    cmt_id int PRIMARY KEY,
    cmts text,
    fireit_tag boolean,
    joke_id int,
    posted_by int,
    posted_on timestamp,
    rated_value int
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX IF NOT EXISTS joke_details_joke_id ON joke_fire1.joke_details
(joke_id);

CREATE TABLE IF NOT EXISTS joke_fire1.users (
    user_id int,
    user_name text PRIMARY KEY,
    email text,
    first_name text,
    last_name text,
    password varchar,
    city text,
    state text,
    b_date timestamp,
    gender text,
    about_u text,
    profile_image varchar,
    status text,
    added_date timestamp,
    fb_uid varchar,
    twt_uid varchar,
    open_uid varchar,
    lastactivity timestamp
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Also, because it's a test environment, the data is not seeing any kind of
heavy use. I just say that to rule out any factors that reads/writes may
introduce to the equation.

Thanks in advance for any insights you may have to offer!


Tim

On Thu, Feb 19, 2015 at 2:25 PM, Kai Wang <dep...@gmail.com> wrote:

> One welcome change is http://cassandra.apache.org/ actually starts
> displaying:
>
> "Latest release *2.1.3* (Changes
> <http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3>),
> Stable release *2.0.12* (Changes
> <http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.0.12>)
> "
>
> It's better than before where "Latest release" is the only link available
> on the home page and naturally that's most people download.
>
> On Thu, Feb 19, 2015 at 1:57 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Wed, Feb 18, 2015 at 5:26 PM, Andrew <redmu...@gmail.com> wrote:
>>
>>> Let me know if I’m off base about this—but I feel like I see a lot of
>>> posts that are like this (i.e., use this arbitrary version, not this other
>>> arbitrary version).  Why are releases going out if they’re “broken”?  This
>>> seems like a very confusing way for new (and existing) users to approach
>>> versions...
>>>
>>
>> In my opinion and in no way speaking for or representing Apache
>> Cassandra, Datastax, or anyone else :
>>
>> I think it's a problem of messaging, and a mismatch of expectations
>> between the development team and operators.
>>
>> I think the "stable" versions are stable by the dev team's standards, and
>> not by operators' standards. While testing has historically been IMO
>> insufficient for a data-store (where correctness really matters) there are
>> also various issues which probably can not realistically be detected in
>> testing. Of course, operators need to be willing to operate (ideally in
>> non-production) near the cutting edge in order to assist in the detection
>> and resolution of these bugs, but I think the project does itself a
>> disservice by encouraging noobs to run these versions. You only get one
>> chance to make a first impression, as the saying goes.
>>
>> My ideal messaging would probably say something like "versions near the
>> cutting edge should be treated cautiously, conservative operators should
>> run mature point releases in production and only upgrade to near the
>> cutting edge after extended burn-in in dev/QA/stage environments."
>>
>> A fair response to this critique is that operators should know better
>> than to trust that x.y.0-5 release versions of any open source software are
>> likely to be production ready, even if the website says "stable" next to
>> the download. Trust, but verify?
>>
>> =Rob
>>
>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Reply via email to