Determining the issues of marking node down

2011-04-30 Thread Rauan Maemirov
I have a test cluster with 3 nodes, earlier I've installed OpsCenter to
watch my cluster. Every day I see, that the same one node goes down (at
different time, but every day). Then I just run `service cassandra start` to
fix that problem. system.log doesn't show me anything strange. What are the
steps to determine issues? I didn't change logging properties (and
cassandra.yaml is not far away from the default), so maybe there must be
some options to be switched to debug?

Btw, the node that goes down is the most loaded (in storage capacity). Maybe
the problem is in OPP?
Once I've ran loadbalance command and it changed token for the first node
from 0 to one of the keys (without touching another 2, I've generated tokens
with tokens.py).


RE: best way to backup

2011-04-30 Thread Jeremiah Jordan
The files inside the keyspace folders are the SSTable.



From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Friday, April 29, 2011 4:49 PM
To: user@cassandra.apache.org
Subject: Re: best way to backup


William,  
Some info on the sstables from me
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

 If you
want to know more check out the BigTable and original Facebook papers,
linked from the wiki

 Aaron

On 29 Apr 2011, at 23:43, William Oberman wrote:


Dumb question, but referenced twice now: which files are the
SSTables and why is backing them up incrementally a win? 

Or should I not bother to understand internals, and instead just
roll with the "backup my keyspace(s) and system in a compressed tar"
strategy, as while it may be excessive, it's guaranteed to work and work
easily (which I like, a great deal).

will


On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday
 wrote:


What we are about to set up is a time machine like
backup. This is more like an add on to the s3 backup. 

Our boxes have an additional larger drive for local
backup. We create a new backup snaphot every x hours which hardlinks the
files in the previous snapshot (bit like cassandras incremental_backups
thing) and than we sync that snapshot dir with the cassandra data dir.
We can do archiving / backup to external system from there without
impacting the main data raid.

But the main reason to do this is to have an 'omg we
screwed up big time and deleted / corrupted data' recovery.

On Apr 28, 2011, at 9:53 PM, William Oberman wrote:


Even with N-nodes for redundancy, I still want
to have backups.  I'm an amazon person, so naturally I'm thinking S3.
Reading over the docs, and messing with nodeutil, it looks like each new
snapshot contains the previous snapshot as a subset (and I've read how
cassandra uses hard links to avoid excessive disk use).  When does that
pattern break down?  

I'm basically debating if I can do a "rsync"
like backup, or if I should do a compressed tar backup.  And I obviously
want multiple points in time.  S3 does allow file versioning, if a file
or file name is changed/resused over time (only matters in the rsync
case).  My only concerns with compressed tars is I'll have to have free
space to create the archive and I get no "delta" space savings on the
backup (the former is solved by not allowing the disk space to get so
low and/or adding more nodes to bring down the space, the latter is
solved by S3 being really cheap anyways).

-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com






-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com





Re: best way to backup

2011-04-30 Thread William Oberman
Thanks, I think I'm getting some of the file layout/data structures now, so
that helps with the backup strategy.  I might still start simple, as it's
usually harder to screw up simple, but at least I'll know where I can go
with something more clever.

will

On Sat, Apr 30, 2011 at 9:15 AM, Jeremiah Jordan <
jeremiah.jor...@morningstar.com> wrote:

>  The files inside the keyspace folders are the SSTable.
>
>  --
> *From:* aaron morton [mailto:aa...@thelastpickle.com]
> *Sent:* Friday, April 29, 2011 4:49 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: best way to backup
>
> William,
> Some info on the sstables from me
> http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
>
>  If you
> want to know more check out the BigTable and original Facebook papers,
> linked from the wiki
>
>  Aaron
>
>  On 29 Apr 2011, at 23:43, William Oberman wrote:
>
> Dumb question, but referenced twice now: which files are the SSTables and
> why is backing them up incrementally a win?
>
> Or should I not bother to understand internals, and instead just roll with
> the "backup my keyspace(s) and system in a compressed tar" strategy, as
> while it may be excessive, it's guaranteed to work and work easily (which I
> like, a great deal).
>
> will
>
> On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday <
> daniel.double...@gmx.net> wrote:
>
>> What we are about to set up is a time machine like backup. This is more
>> like an add on to the s3 backup.
>>
>> Our boxes have an additional larger drive for local backup. We create a
>> new backup snaphot every x hours which hardlinks the files in the previous
>> snapshot (bit like cassandras incremental_backups thing) and than we sync
>> that snapshot dir with the cassandra data dir. We can do archiving / backup
>> to external system from there without impacting the main data raid.
>>
>> But the main reason to do this is to have an 'omg we screwed up big time
>> and deleted / corrupted data' recovery.
>>
>>  On Apr 28, 2011, at 9:53 PM, William Oberman wrote:
>>
>>   Even with N-nodes for redundancy, I still want to have backups.  I'm an
>> amazon person, so naturally I'm thinking S3.  Reading over the docs, and
>> messing with nodeutil, it looks like each new snapshot contains the previous
>> snapshot as a subset (and I've read how cassandra uses hard links to avoid
>> excessive disk use).  When does that pattern break down?
>>
>> I'm basically debating if I can do a "rsync" like backup, or if I should
>> do a compressed tar backup.  And I obviously want multiple points in time.
>> S3 does allow file versioning, if a file or file name is changed/resused
>> over time (only matters in the rsync case).  My only concerns with
>> compressed tars is I'll have to have free space to create the archive and I
>> get no "delta" space savings on the backup (the former is solved by not
>> allowing the disk space to get so low and/or adding more nodes to bring down
>> the space, the latter is solved by S3 being really cheap anyways).
>>
>> --
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E) ober...@civicscience.com
>>
>>
>>
>
>
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) ober...@civicscience.com
>
>
>


-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com


Re: Determining the issues of marking node down

2011-04-30 Thread aaron morton
If the node is crashing with OutOfMemory it will be in the cassandra logs. 
Search them for "ERROR". Alternatively if you've installed a package the stdout 
and stderr may be redirected to a file called something like output.log in the 
same location as the log file.

You can change the logging using the log4j-server.properties file typically in 
the same location as cassandra.yaml. By detail they will be logging errors and 
warnings though. 

What does "nodetool ring" say about the token distribution ? If you are using 
the OPP you need to make sure your app is evening distributing the keys to 
avoid hot spots. 

Hope that helps.
Aaron
 
On 30 Apr 2011, at 22:14, Rauan Maemirov wrote:

> I have a test cluster with 3 nodes, earlier I've installed OpsCenter to watch 
> my cluster. Every day I see, that the same one node goes down (at different 
> time, but every day). Then I just run `service cassandra start` to fix that 
> problem. system.log doesn't show me anything strange. What are the steps to 
> determine issues? I didn't change logging properties (and cassandra.yaml is 
> not far away from the default), so maybe there must be some options to be 
> switched to debug?
> 
> Btw, the node that goes down is the most loaded (in storage capacity). Maybe 
> the problem is in OPP?
> Once I've ran loadbalance command and it changed token for the first node 
> from 0 to one of the keys (without touching another 2, I've generated tokens 
> with tokens.py). 



Re: 0.7.5 Debian packages - can't upgrade?

2011-04-30 Thread Dan Washusen
It looks like it's an issue with Ubuntu 9.10 (or my install of 9.10).  Tried
on a machine running 10.04 and it works fine...

On 30 April 2011 12:35, Dan Washusen  wrote:

> Thanks for the response. :)
>
> I should have also mentioned that I'm running this on Ubuntu Karmic 
> Koala(9.10).
>
> The output of `sudo aptitude full-upgrade` looks the same as safe-upgrade:
>
>> Reading package lists... Done
>> Building dependency tree
>> Reading state information... Done
>> Reading extended state information
>> Initializing package states... Done
>> No packages will be installed, upgraded, or removed.
>> 0 packages upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
>> Need to get 0B of archives. After unpacking 0B will be used.
>> Reading package lists... Done
>> Building dependency tree
>> Reading state information... Done
>> Reading extended state information
>> Initializing package states... Done
>
>
> Here is the output of 'apt-cache policy && apt-cache policy cassandra':
> http://pastebin.com/PqRiGmWi
>
>
> On 30 April 2011 11:18, Eric Evans  wrote:
>
>> On Sat, 2011-04-30 at 09:34 +1000, Dan Washusen wrote:
>> > > sudo aptitude update
>> >
>> > sudo aptitude safe-upgrade
>> >
>> >
>> > The upgrade shows this:
>> >
>> > > Reading package lists... Done
>> > > Building dependency tree
>> > > Reading state information... Done
>> > > Reading extended state information
>> > > Initializing package states... Done
>> > > No packages will be installed, upgraded, or removed.
>> > > 0 packages upgraded, 0 newly installed, 0 to remove and *1 not
>> > upgraded*.
>> > > Need to get 0B of archives. After unpacking 0B will be used.
>> > > Reading package lists... Done
>> > > Building dependency tree
>> > > Reading state information... Done
>> > > Reading extended state information
>> > > Initializing package states... Done
>> >
>> >
>> > The above mentions that 1 package wasn't upgraded (I assume this is
>> > 0.7.5).
>> >  Anyone have any ideas what I'm doing wrong?
>>
>> Usually this means that upgrading would install a new package (i.e. that
>> it picked up a new dependency), which shouldn't be the case.  You might
>> try an `aptitude full-upgrade' just to see what that might be.  You
>> could also try pasting the output of `apt-cache policy && apt-cache
>> policy cassandra' to the list.
>>
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>>
>