date:20110309

mixed cluster 0.6.9 and 0.6.12

2011-03-09 Thread Daniel Doubleday

Hi all

we are still on 0.6.9 and plan to upgrade to 0.6.12 but are a little concerned 
about:

https://issues.apache.org/jira/browse/CASSANDRA-2170

I thought of upgrading only one node (of 5) to .12 and monitor for a couple of 
days.

Is this a bad idea?

Thanks,
Daniel

Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Sasha Dolgy

Hi Everyone,

Now that I'm past the problems of IP addresses changing ... I am onto the
idea of storage.  Initially I had though that for each cassandra instance, I
should have an EBS volume to store all the cassandra data / information.
Now I'm starting to wonder if this is duplication and not necessary.  If an
instance dies, I loose anything that's not attached to EBS.  However, if the
cassandra cluster is healthy ... this shouldn't be an issue ... Is this a
correct assumption?

-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread William Oberman

I'm considering similar issues right now.  The problem with ephemeral
storage is I don't know an easy way to back it up, while on an EBS it's a
simple snapshot API call.

Otherwise, I believe the performance of the ephemeral (certainly in the case
of large or greater, where you can RAID0 multiple disks) is way better than
EBS.

will

On Wed, Mar 9, 2011 at 10:27 AM, Sasha Dolgy  wrote:

> Hi Everyone,
>
> Now that I'm past the problems of IP addresses changing ... I am onto the
> idea of storage.  Initially I had though that for each cassandra instance, I
> should have an EBS volume to store all the cassandra data / information.
> Now I'm starting to wonder if this is duplication and not necessary.  If an
> instance dies, I loose anything that's not attached to EBS.  However, if the
> cassandra cluster is healthy ... this shouldn't be an issue ... Is this a
> correct assumption?
>
> -sd
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>

-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Sasha Dolgy

well, this is what i'm getting at.  why would you want to back it up if the
cluster is working properly?  backup is silly ; )

On Wed, Mar 9, 2011 at 4:54 PM, William Oberman wrote:

> I'm considering similar issues right now.  The problem with ephemeral
> storage is I don't know an easy way to back it up, while on an EBS it's a
> simple snapshot API call.
>
> Otherwise, I believe the performance of the ephemeral (certainly in the
> case of large or greater, where you can RAID0 multiple disks) is way better
> than EBS.
>
> will
>
>

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread William Oberman

For me, to transition production data into a development environment for
real world testing.  Also, backups are never a bad idea, though I agree most
all risk is mitigated due to cassandra's design.

will

On Wed, Mar 9, 2011 at 10:57 AM, Sasha Dolgy  wrote:

>
> well, this is what i'm getting at.  why would you want to back it up if the
> cluster is working properly?  backup is silly ; )
>
>
> On Wed, Mar 9, 2011 at 4:54 PM, William Oberman 
> wrote:
>
>> I'm considering similar issues right now.  The problem with ephemeral
>> storage is I don't know an easy way to back it up, while on an EBS it's a
>> simple snapshot API call.
>>
>> Otherwise, I believe the performance of the ephemeral (certainly in the
>> case of large or greater, where you can RAID0 multiple disks) is way better
>> than EBS.
>>
>> will
>>
>>


-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Jeremy Hanna

I've seen both sides but Cassandra does handle replication and bringing data 
back is a matter of bootstrapping a node to replace the downed node.  

One thing to consider is availability zones and regions though.  What happens 
if your entire cluster goes down in the case of a single datacenter going 
offline?  From what I understand ec2 availability zones are equivalent to 
physical datacenters so going across availability zones will handle an entire 
datacenter going down.  Regions are another level of safeguarding against this. 
 Anyway, just some thoughts.

Some considerations are also found in the Cloud section of this page: 
http://wiki.apache.org/cassandra/CassandraHardware

On Mar 9, 2011, at 9:57 AM, Sasha Dolgy wrote:

> 
> well, this is what i'm getting at.  why would you want to back it up if the 
> cluster is working properly?  backup is silly ; )
> 
> On Wed, Mar 9, 2011 at 4:54 PM, William Oberman  
> wrote:
> I'm considering similar issues right now.  The problem with ephemeral storage 
> is I don't know an easy way to back it up, while on an EBS it's a simple 
> snapshot API call.
> 
> Otherwise, I believe the performance of the ephemeral (certainly in the case 
> of large or greater, where you can RAID0 multiple disks) is way better than 
> EBS.
> 
> will
>

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Sasha Dolgy

Could you not nodetool snapshot the data into an mounted ebs/s3 bucket and
satisfy your development requirement?
-sd

On Wed, Mar 9, 2011 at 5:23 PM, William Oberman wrote:

> For me, to transition production data into a development environment for
> real world testing.  Also, backups are never a bad idea, though I agree most
> all risk is mitigated due to cassandra's design.
>
> will
>
>

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread William Oberman

I thought nodetool snapshot writes the snapshot locally, requiring 2x of
expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
snapshot).  By that I mean EBS allocation is GB allocated per month costs at
one rate, and EBS snapshots are delta compressed copies to S3.

Can you point the snapshot to an external filesystem?

will

On Wed, Mar 9, 2011 at 11:31 AM, Sasha Dolgy  wrote:

>
> Could you not nodetool snapshot the data into an mounted ebs/s3 bucket and
> satisfy your development requirement?
> -sd
>
>
> On Wed, Mar 9, 2011 at 5:23 PM, William Oberman 
> wrote:
>
>> For me, to transition production data into a development environment for
>> real world testing.  Also, backups are never a bad idea, though I agree most
>> all risk is mitigated due to cassandra's design.
>>
>> will
>>
>>

-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Sasha Dolgy

Hi Will,

http://wiki.apache.org/cassandra/Operations#Backing_up_data

If the snapshot
is written to the ephemeral storage ... there isn't a cost. (i need to
confirm that)

You can then move this to an S3 bucket with RDS if you want or full
99.9% redundancy and have it available to developers

This is what I had in my head
-sd

On Wed, Mar 9, 2011 at 5:39 PM, William Oberman wrote:

> I thought nodetool snapshot writes the snapshot locally, requiring 2x of
> expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
> snapshot).  By that I mean EBS allocation is GB allocated per month costs at
> one rate, and EBS snapshots are delta compressed copies to S3.
>
> Can you point the snapshot to an external filesystem?
>
> will
>
>

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread William Oberman

I haven't done backups yet, so I don't know where the data is written.  Is
it where the nodetool is run from?  Or local to the instance running
cassandra (and there, local to the data directory?).  I assumed it was the
latter (not finding docs on that yet), and that would require 2x storage
allocated on that instance for 1x data (to have room for the snapshot).  If
its the former, then yes, I'd totally run the command from an ephemeral
store, and backup to S3.

will

On Wed, Mar 9, 2011 at 11:48 AM, Sasha Dolgy  wrote:

> Hi Will,
>
> http://wiki.apache.org/cassandra/Operations#Backing_up_data
>
> If the
> snapshot is written to the ephemeral storage ... there isn't a cost. (i need
> to confirm that)
>
> You can then move this to an S3 bucket with RDS if you want or full
> 99.9% redundancy and have it available to developers
>
> This is what I had in my head
> -sd
>
>
> On Wed, Mar 9, 2011 at 5:39 PM, William Oberman 
> wrote:
>
>> I thought nodetool snapshot writes the snapshot locally, requiring 2x of
>> expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
>> snapshot).  By that I mean EBS allocation is GB allocated per month costs at
>> one rate, and EBS snapshots are delta compressed copies to S3.
>>
>> Can you point the snapshot to an external filesystem?
>>
>> will
>>
>>

-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Dave Viner

Sasha,

You might also check out http://coreyhulen.org/category/cassandra/ for speed
tests done by Corey Hulan on different disk configurations (both inside ec2
and on real hw).

If you write to the ephermeral storage on an EC2 instance, there is no
additional cost for the data written.  Mostly similarly with EBS.  In EBS
you pay for the disk size you allocate.  There's a tiny additional charge
for IO (currently $0.10 per 1M io requests).

HTH,

Dave Viner

On Wed, Mar 9, 2011 at 8:48 AM, Sasha Dolgy  wrote:

> Hi Will,
>
> http://wiki.apache.org/cassandra/Operations#Backing_up_data
>
> If the
> snapshot is written to the ephemeral storage ... there isn't a cost. (i need
> to confirm that)
>
> You can then move this to an S3 bucket with RDS if you want or full
> 99.9% redundancy and have it available to developers
>
> This is what I had in my head
> -sd
>
>
> On Wed, Mar 9, 2011 at 5:39 PM, William Oberman 
> wrote:
>
>> I thought nodetool snapshot writes the snapshot locally, requiring 2x of
>> expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
>> snapshot).  By that I mean EBS allocation is GB allocated per month costs at
>> one rate, and EBS snapshots are delta compressed copies to S3.
>>
>> Can you point the snapshot to an external filesystem?
>>
>> will
>>
>>

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Frank LoVecchio

>
> Now that I'm past the problems of IP addresses changing ... I am onto the
> idea of storage.  Initially I had though that for each cassandra instance, I
> should have an EBS volume to store all the cassandra data / information.
> Now I'm starting to wonder if this is duplication and not necessary.  If an
> instance dies, I loose anything that's not attached to EBS.  However, if the
> cassandra cluster is healthy ... this shouldn't be an issue ... Is this a
> correct assumption?


Correct.  Why not use EBS backed instances?  The ability to reboot comes in
handy.  I have a cluster of 6 nodes, each with an EBS drive of data (EBS
drives can scale, if you need them to - not advised).  Bootstrapping has
always worked better for me than doing any sort of data snapshotting,
allowing nodes to come in and out with proper token management.  You can
attach S3 buckets as drives as well...

On Wed, Mar 9, 2011 at 9:48 AM, Sasha Dolgy  wrote:

> Hi Will,
>
> http://wiki.apache.org/cassandra/Operations#Backing_up_data
>
> If the
> snapshot is written to the ephemeral storage ... there isn't a cost. (i need
> to confirm that)
>
> You can then move this to an S3 bucket with RDS if you want or full
> 99.9% redundancy and have it available to developers
>
> This is what I had in my head
> -sd
>
>
> On Wed, Mar 9, 2011 at 5:39 PM, William Oberman 
> wrote:
>
>> I thought nodetool snapshot writes the snapshot locally, requiring 2x of
>> expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
>> snapshot).  By that I mean EBS allocation is GB allocated per month costs at
>> one rate, and EBS snapshots are delta compressed copies to S3.
>>
>> Can you point the snapshot to an external filesystem?
>>
>> will
>>
>>


-- 
Frank LoVecchio
isidorey.com | facebook.com/franklovecchio | franklovecchio.com

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Sasha Dolgy

Hi will,

Quickly did a snapshot:

nodetool -h 10.0.0.2 -p 8080 snapshot 09032011

The snapshots end up in the data dir for cassandra.  The default is
/var/lib/cassandra/data//snapshots/

In this directory i have:  1299689801925-09032011

-sd

On Wed, Mar 9, 2011 at 5:54 PM, William Oberman wrote:

> I haven't done backups yet, so I don't know where the data is written.  Is
> it where the nodetool is run from?  Or local to the instance running
> cassandra (and there, local to the data directory?).  I assumed it was the
> latter (not finding docs on that yet), and that would require 2x storage
> allocated on that instance for 1x data (to have room for the snapshot).  If
> its the former, then yes, I'd totally run the command from an ephemeral
> store, and backup to S3.
>
> will
>
>
> On Wed, Mar 9, 2011 at 11:48 AM, Sasha Dolgy  wrote:
>
>> Hi Will,
>>
>> http://wiki.apache.org/cassandra/Operations#Backing_up_data
>>
>> If the
>> snapshot is written to the ephemeral storage ... there isn't a cost. (i need
>> to confirm that)
>>
>> You can then move this to an S3 bucket with RDS if you want or full
>> 99.9% redundancy and have it available to developers
>>
>> This is what I had in my head
>> -sd
>>
>>
>> On Wed, Mar 9, 2011 at 5:39 PM, William Oberman > > wrote:
>>
>>> I thought nodetool snapshot writes the snapshot locally, requiring 2x of
>>> expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
>>> snapshot).  By that I mean EBS allocation is GB allocated per month costs at
>>> one rate, and EBS snapshots are delta compressed copies to S3.
>>>
>>> Can you point the snapshot to an external filesystem?
>>>
>>> will
>>>
>>>
>
>
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) ober...@civicscience.com
>



-- 
Sasha Dolgy
sasha.do...@gmail.com

removing a node

2011-03-09 Thread Sasha Dolgy

Hi there,

Wanted to clarify with anyone ... re:
http://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely

You can take a node out of the cluster with nodetool decommission to a live
node, or nodetool removetoken (to any other machine) to remove a dead one.
This will assign the ranges the old node was responsible for to other nodes,
and replicate the appropriate data there. If decommission is used, the data
will stream from the decommissioned node. If removetoken is used, the data
will stream from the remaining replicas.


   - If the node is alive and functional, the command to be run from that
   node is:  nodetool decommission
   - If the node is dead, the command to be run from another node (or all
   other nodes) is:  nodetool removetoken 


-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Erik Onnen

I'd recommend not storing commit logs or data files on EBS volumes if
your machines are under any decent amount of load. I say that for
three reasons.

First, both EBS volumes contend directly for network throughput with
what appears to be a peer QoS policy to standard packets. In other
words, if you're saturating a network link, EBS throughput falls. The
same has not been true of ephemeral volumes in all of our testing,
ephemeral I/O speeds tend to only take a minor hit under network
pressure and are consistently faster in raw speed tests.

Second, at some point it's a given that you will encounter misbehaving
EBS volumes. They won't completely fail, worse they will just get
really, really slow. Often times this is worse than a total failure
because the system just back piles reads/writes but doesn't totally
fall over until the entire cluster becomes overwhelmed. We've never
had single volume ephemeral problems.

Lastly, I think people have a tendency to bolt a large number of EBS
volumes to a host and think that because they have disk capacity they
serve more data from fewer hosts. If you push that too far, you'll
outstrip the ability of the system to keep effective buffer caches and
concurrently serve requests for all the data it is responsible for
managing. IME there is pretty good parity between an EC2 XL and the
ephemeral disks available relative to how Cassandra uses disk and RAM
that adding more storage is right at the breaking point of over
committing your hardware.

If you want protection from AZ failure, split you ring across AZs
(Cassandra is quite good at this) or copy snapshots to EBS volumes.

-erik

There are a lot of benefits to EBS volumes, I/O throughput and
reliability are not among those benefits.

On Wed, Mar 9, 2011 at 8:39 AM, William Oberman
 wrote:
> I thought nodetool snapshot writes the snapshot locally, requiring 2x of
> expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
> snapshot).  By that I mean EBS allocation is GB allocated per month costs at
> one rate, and EBS snapshots are delta compressed copies to S3.
>
> Can you point the snapshot to an external filesystem?
>
> will
>
> On Wed, Mar 9, 2011 at 11:31 AM, Sasha Dolgy  wrote:
>>
>> Could you not nodetool snapshot the data into an mounted ebs/s3 bucket and
>> satisfy your development requirement?
>> -sd
>>
>> On Wed, Mar 9, 2011 at 5:23 PM, William Oberman 
>> wrote:
>>>
>>> For me, to transition production data into a development environment for
>>> real world testing.  Also, backups are never a bad idea, though I agree most
>>> all risk is mitigated due to cassandra's design.
>>>
>>> will
>
>
>
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) ober...@civicscience.com
>

Re: Reducing memory footprint

2011-03-09 Thread Casey Deccio

On Sat, Mar 5, 2011 at 7:37 PM, aaron morton wrote:

> There is some additional memory usage in the JVM beyond that Heap size, in
> the permanent generation. 900mb sounds like too much for that, but you can
> check by connecting with JConsole and looking at the memory tab. You can
> also check the heap size there to see that it's under the value you've set.
>
>
Thanks for the tip!

>From JConsole:
Heap memory usage: Current 46M; Max 902M
Non-Heap memory usage: Current 34M; Max 200MB

Both of these seem reasonable and don't reach the (current) 2.1 GB resident
usage I am seeing.

Check you are using standard disk access (in conf/cassandra.yaml) rather
> than memory mapped access. However the memory mapped memory is reported as
> virtual memory, not resident. So I'm just mentioning it to be complete.
>
>
At the moment, it's set to "auto", but it's a 64-bit machine, so I believe
it's using memory mapped.  The virtual memory usage says that it is 54.6 GB.

If you think you've configured things correctly and the JVM is not behaving
> (which is unlikely) please include some information on the JVM and OS
> versions and some hard numbers about what the process is using.
>
>
Debian 6.0 using openjdk-6-jre-lib-6b18-1.8.3-2+squeeze1

Thanks for your help.

Casey

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread William Oberman

This is excellent, specific feedback.  Thanks!

Given the relative costs, I was hoping L was the optimal tradeoff vs XL, but
if that's the best option, that's the best option.

will

On Wed, Mar 9, 2011 at 12:04 PM, Erik Onnen  wrote:

> I'd recommend not storing commit logs or data files on EBS volumes if
> your machines are under any decent amount of load. I say that for
> three reasons.
>
> First, both EBS volumes contend directly for network throughput with
> what appears to be a peer QoS policy to standard packets. In other
> words, if you're saturating a network link, EBS throughput falls. The
> same has not been true of ephemeral volumes in all of our testing,
> ephemeral I/O speeds tend to only take a minor hit under network
> pressure and are consistently faster in raw speed tests.
>
> Second, at some point it's a given that you will encounter misbehaving
> EBS volumes. They won't completely fail, worse they will just get
> really, really slow. Often times this is worse than a total failure
> because the system just back piles reads/writes but doesn't totally
> fall over until the entire cluster becomes overwhelmed. We've never
> had single volume ephemeral problems.
>
> Lastly, I think people have a tendency to bolt a large number of EBS
> volumes to a host and think that because they have disk capacity they
> serve more data from fewer hosts. If you push that too far, you'll
> outstrip the ability of the system to keep effective buffer caches and
> concurrently serve requests for all the data it is responsible for
> managing. IME there is pretty good parity between an EC2 XL and the
> ephemeral disks available relative to how Cassandra uses disk and RAM
> that adding more storage is right at the breaking point of over
> committing your hardware.
>
> If you want protection from AZ failure, split you ring across AZs
> (Cassandra is quite good at this) or copy snapshots to EBS volumes.
>
> -erik
>
> There are a lot of benefits to EBS volumes, I/O throughput and
> reliability are not among those benefits.
>
> On Wed, Mar 9, 2011 at 8:39 AM, William Oberman
>  wrote:
> > I thought nodetool snapshot writes the snapshot locally, requiring 2x of
> > expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
> > snapshot).  By that I mean EBS allocation is GB allocated per month costs
> at
> > one rate, and EBS snapshots are delta compressed copies to S3.
> >
> > Can you point the snapshot to an external filesystem?
> >
> > will
> >
> > On Wed, Mar 9, 2011 at 11:31 AM, Sasha Dolgy  wrote:
> >>
> >> Could you not nodetool snapshot the data into an mounted ebs/s3 bucket
> and
> >> satisfy your development requirement?
> >> -sd
> >>
> >> On Wed, Mar 9, 2011 at 5:23 PM, William Oberman <
> ober...@civicscience.com>
> >> wrote:
> >>>
> >>> For me, to transition production data into a development environment
> for
> >>> real world testing.  Also, backups are never a bad idea, though I agree
> most
> >>> all risk is mitigated due to cassandra's design.
> >>>
> >>> will
> >
> >
> >
> > --
> > Will Oberman
> > Civic Science, Inc.
> > 3030 Penn Avenue., First Floor
> > Pittsburgh, PA 15201
> > (M) 412-480-7835
> > (E) ober...@civicscience.com
> >
>



-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

Re: Does the memtable replace the old version of column with the new overwriting version or is it just a simple append ?

2011-03-09 Thread Jonathan Ellis

On Wed, Mar 9, 2011 at 1:56 AM, Aditya Narayan  wrote:
> so this means that in memtable only the most recent version of a
> column will reside?

The most recent version seen since the memtable was opened, which may
not be the most recent version ever.

> For this implementation, while writing "to
> memtable" Cassandra will see if there are other versions and will
> overwrite them (reconcilation while writing) !?

Only versions in the same memtable.  It doesn't look at versions
already flushed.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread William Oberman

Based on Eric's email, it sounds like EBS is a no go from the start.  But
given your snapshot feedback, it seems like you have to plan on leaving
slack on every disk, and the % of slack depends on the size of a snapshot
relative to the data (given the snapshot shares the disk with the data, at
least temporarily).

will

On Wed, Mar 9, 2011 at 11:59 AM, Sasha Dolgy  wrote:

> Hi will,
>
> Quickly did a snapshot:
>
> nodetool -h 10.0.0.2 -p 8080 snapshot 09032011
>
> The snapshots end up in the data dir for cassandra.  The default is
> /var/lib/cassandra/data//snapshots/
>
> In this directory i have:  1299689801925-09032011
>
> -sd
>
>
> On Wed, Mar 9, 2011 at 5:54 PM, William Oberman 
> wrote:
>
>> I haven't done backups yet, so I don't know where the data is written.  Is
>> it where the nodetool is run from?  Or local to the instance running
>> cassandra (and there, local to the data directory?).  I assumed it was the
>> latter (not finding docs on that yet), and that would require 2x storage
>> allocated on that instance for 1x data (to have room for the snapshot).  If
>> its the former, then yes, I'd totally run the command from an ephemeral
>> store, and backup to S3.
>>
>> will
>>
>>
>> On Wed, Mar 9, 2011 at 11:48 AM, Sasha Dolgy  wrote:
>>
>>> Hi Will,
>>>
>>> http://wiki.apache.org/cassandra/Operations#Backing_up_data
>>>
>>> If the
>>> snapshot is written to the ephemeral storage ... there isn't a cost. (i need
>>> to confirm that)
>>>
>>> You can then move this to an S3 bucket with RDS if you want or full
>>> 99.9% redundancy and have it available to developers
>>>
>>> This is what I had in my head
>>> -sd
>>>
>>>
>>> On Wed, Mar 9, 2011 at 5:39 PM, William Oberman <
>>> ober...@civicscience.com> wrote:
>>>
 I thought nodetool snapshot writes the snapshot locally, requiring 2x of
 expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
 snapshot).  By that I mean EBS allocation is GB allocated per month costs 
 at
 one rate, and EBS snapshots are delta compressed copies to S3.

 Can you point the snapshot to an external filesystem?

 will


>>
>>
>> --
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E) ober...@civicscience.com
>>
>
>
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>



-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

Re: mixed cluster 0.6.9 and 0.6.12

2011-03-09 Thread Jonathan Ellis

We haven't been able to reproduce 2170 and have several customers on .12.

Mixing .9 and .12 should be fine.

On Wed, Mar 9, 2011 at 4:29 AM, Daniel Doubleday
 wrote:
> Hi all
>
> we are still on 0.6.9 and plan to upgrade to 0.6.12 but are a little 
> concerned about:
>
> https://issues.apache.org/jira/browse/CASSANDRA-2170
>
> I thought of upgrading only one node (of 5) to .12 and monitor for a couple 
> of days.
>
> Is this a bad idea?
>
> Thanks,
> Daniel



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Jonathan Ellis

Right, local snapshot is no-cost both from an EC2 pricing standpoint
and from a disk usage standpoint (because it uses hard links).

On Wed, Mar 9, 2011 at 10:48 AM, Sasha Dolgy  wrote:
> Hi Will,
> http://wiki.apache.org/cassandra/Operations#Backing_up_data
> If the snapshot is written to the ephemeral storage ... there isn't a cost.
> (i need to confirm that)
> You can then move this to an S3 bucket with RDS if you want or full
> 99.9% redundancy and have it available to developers
> This is what I had in my head
> -sd
>
> On Wed, Mar 9, 2011 at 5:39 PM, William Oberman 
> wrote:
>>
>> I thought nodetool snapshot writes the snapshot locally, requiring 2x of
>> expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
>> snapshot).  By that I mean EBS allocation is GB allocated per month costs at
>> one rate, and EBS snapshots are delta compressed copies to S3.
>>
>> Can you point the snapshot to an external filesystem?
>>
>> will
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Reducing memory footprint

2011-03-09 Thread Jonathan Ellis

I edited Peter Schuller's reply last time this came up into a FAQ:
http://wiki.apache.org/cassandra/FAQ#mmap

On Wed, Mar 9, 2011 at 11:10 AM, Casey Deccio  wrote:
> On Sat, Mar 5, 2011 at 7:37 PM, aaron morton 
> wrote:
>>
>> There is some additional memory usage in the JVM beyond that Heap size, in
>> the permanent generation. 900mb sounds like too much for that, but you can
>> check by connecting with JConsole and looking at the memory tab. You can
>> also check the heap size there to see that it's under the value you've set.
>
> Thanks for the tip!
>
> From JConsole:
> Heap memory usage: Current 46M; Max 902M
> Non-Heap memory usage: Current 34M; Max 200MB
>
> Both of these seem reasonable and don't reach the (current) 2.1 GB resident
> usage I am seeing.
>
>> Check you are using standard disk access (in conf/cassandra.yaml) rather
>> than memory mapped access. However the memory mapped memory is reported as
>> virtual memory, not resident. So I'm just mentioning it to be complete.
>
> At the moment, it's set to "auto", but it's a 64-bit machine, so I believe
> it's using memory mapped.  The virtual memory usage says that it is 54.6 GB.
>
>> If you think you've configured things correctly and the JVM is not
>> behaving (which is unlikely) please include some information on the JVM and
>> OS versions and some hard numbers about what the process is using.
>
> Debian 6.0 using openjdk-6-jre-lib-6b18-1.8.3-2+squeeze1
>
> Thanks for your help.
>
> Casey
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Removing a node ...

2011-03-09 Thread Sasha Dolgy

mail servers keep catching up in spam filters.  Something about being a
loyal gmail user ...!

Let's try this again:

further to this ...using cassandra 0.7.0

nodetool -h 10.0.0.1 -p 8080 decommission
 INFO [RMI TCP Connection(2)-10.0.0.1] 2011-03-09 16:52:22,226
StorageService.java (line 399) Leaving: sleeping 3 ms for pending range
setup
 INFO [RMI TCP Connection(2)-10.0.0.1] 2011-03-09 16:52:52,233
StorageService.java (line 399) Leaving: streaming data to other nodes
 INFO [StreamStage:1] 2011-03-09 16:52:52,237 StreamOut.java (line 77)
Beginning transfer to /10.0.0.2
 INFO [StreamStage:1] 2011-03-09 16:52:52,238 StreamOut.java (line 100)
Flushing memtables for sdo...
 INFO [StreamStage:1] 2011-03-09 16:52:52,239 StreamOut.java (line 173)
Stream context metadata [/mnt/cassandra/data/sdo/user-e-1-Data.db/(0,675)
 progress=0/675 - 0%,
/mnt/cassandra/data/sdo/app-e-1-Data.db/(0,502)
 progress=0/502 - 0%,
/mnt/cassandra/data/sdo/aut-e-1-Data.db/(0,1493)
 progress=0/1493 - 0%], 3 sstables.
 INFO [StreamStage:1] 2011-03-09 16:52:52,240 StreamOutSession.java (line
174) Streaming to /10.0.0.2

It's been about 40 minutes now, and when I go to another node and run
nodetool -h 10.0.0.2 -p 8080 ring I get the following:

Address Status State   LoadOwnsToken

116084175244813755374456454604099553584
10.0.0.3  Up Normal  206.17 KB   45.91%
 24053088190195663439419935163232881936
10.0.0.1Up Leaving 218.71 KB   21.76%
 61078635599166706937511052402724559481
10.0.0.2Up Normal  224.8 KB32.33%
 116084175244813755374456454604099553584

Leaving is taking a very long time for such a very very small amount of
data.  How long does it take to "decommission" ?

-sd


On Wed, Mar 9, 2011 at 6:01 PM, Sasha Dolgy  wrote:

>
> Hi there,
>
> Wanted to clarify with anyone ... re:
> http://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely
>
> You can take a node out of the cluster with nodetool decommission to a
> live node, or nodetool removetoken (to any other machine) to remove a dead
> one. This will assign the ranges the old node was responsible for to other
> nodes, and replicate the appropriate data there. If decommission is used,
> the data will stream from the decommissioned node. If removetoken is used,
> the data will stream from the remaining replicas.
>
>
>- If the node is alive and functional, the command to be run from that
>node is:  nodetool decommission
>- If the node is dead, the command to be run from another node (or all
>other nodes) is:  nodetool removetoken 
>
>
> -sd
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com

RE: build.xml issue with 0.7.3?

2011-03-09 Thread Paul Choi

Eric,
Thanks for your response.

This is all I see in /build:
[paulchoi@build02 build]$ find .
.
./lib
./maven-ant-tasks-2.1.1.jar
[paulchoi@build02 build]$

I just downloaded a new copy of 0.7.3-src and tried manually. I'm still running 
into the same problem.

I tried doing this with 0.7.0, and ant downloads Ivy and Ivy takes care of the 
dependencies. With 0.7.3, maybe Ant doesn't get to the point of downloading 
Ivy? I guess I need to wise up on Ant and Ivy myself.

-Paul

From: Eric Evans [eev...@rackspace.com]
Sent: Friday, March 04, 2011 7:36 PM
To: user@cassandra.apache.org
Subject: Re: build.xml issue with 0.7.3?

On Sat, 2011-03-05 at 01:23 +, Paul Choi wrote:
> We're running 0.7.0, and we want to upgrade to 0.7.3 ASAP.
>
> This worked in our RPM SPEC file with 0.7.0:
> %build
> export JAVA_HOME=/usr/java/latest
> ant clean jar -Drelease=true
>
> Now running "ant jar" throws some kind of build.xml error at line 155 - 
> typedef is undefined. I'm running ant 1.6.5-2jpp.2 that comes with CentOS 
> 5.5. Unfortunately, I'm no Ant expert, so I'm stumped. Does anyone have any 
> idea why this is happening?

0.7.0 pulled down dependencies using Ivy, 0.7.3 uses maven-ant-tasks.
This error occurred when trying to create the artifact typedef for
maven-ant-tasks, though I don't know why (it works here).

What do the contents of build/ look like after the error?

> Thanks for your help.
> BTW, I grabbed the tarball from http://apache.org/dist/cassandra/0.7.3/, 
> since the mirrors didn't have it. I hope it was ok to get this one.
>
> [paulchoi@build02 apache-cassandra-0.7.3-src]$ ant jar
> Buildfile: build.xml
>
> maven-ant-tasks-download:
>  [echo] Downloading Maven ANT Tasks...
> [mkdir] Created dir: 
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build
>   [get] Getting: 
> http://repo2.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.1/maven-ant-tasks-2.1.1.jar
>   [get] To: 
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build/maven-ant-tasks-2.1.1.jar
>
> maven-ant-tasks-init:
> [mkdir] Created dir: 
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build/lib
>
> BUILD FAILED
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build.xml:155: name, 
> file or resource attribute of typedef is undefined
>
> Total time: 1 second
> [paulchoi@build02 apache-cassandra-0.7.3-src]$

--
Eric Evans
eev...@rackspace.com

cassandra and G1 gc

2011-03-09 Thread ruslan usifov

Hello

Does anybody use G1 gc in production? What your impressions?

RE: build.xml issue with 0.7.3?

2011-03-09 Thread Paul Choi

Ah, the plot thickens...

I downloaded Ant 1.8.2, tried building Cassandra 0.7.3. That works fine.

CentOS 5 comes with ant 1.6.5, and that does not work with Cassandra 0.7.3

I see in Cassandra's changelog that, in 0.7.1, ivy was replaced with 
maven-ant-tasks. Now, maven-ant-tasks requires ant 1.6x and newer, so you'd 
think it works.

I'll need to build a newer Ant RPM for my build system, then try building 
Cassandra 0.73.

Thanks for your help!

From: Paul Choi [paulc...@plaxo.com]
Sent: Wednesday, March 09, 2011 10:52 AM
To: user@cassandra.apache.org
Subject: RE: build.xml issue with 0.7.3?

Eric,
Thanks for your response.

This is all I see in /build:
[paulchoi@build02 build]$ find .
.
./lib
./maven-ant-tasks-2.1.1.jar
[paulchoi@build02 build]$

I just downloaded a new copy of 0.7.3-src and tried manually. I'm still running 
into the same problem.

I tried doing this with 0.7.0, and ant downloads Ivy and Ivy takes care of the 
dependencies. With 0.7.3, maybe Ant doesn't get to the point of downloading 
Ivy? I guess I need to wise up on Ant and Ivy myself.

-Paul

From: Eric Evans [eev...@rackspace.com]
Sent: Friday, March 04, 2011 7:36 PM
To: user@cassandra.apache.org
Subject: Re: build.xml issue with 0.7.3?

On Sat, 2011-03-05 at 01:23 +, Paul Choi wrote:
> We're running 0.7.0, and we want to upgrade to 0.7.3 ASAP.
>
> This worked in our RPM SPEC file with 0.7.0:
> %build
> export JAVA_HOME=/usr/java/latest
> ant clean jar -Drelease=true
>
> Now running "ant jar" throws some kind of build.xml error at line 155 - 
> typedef is undefined. I'm running ant 1.6.5-2jpp.2 that comes with CentOS 
> 5.5. Unfortunately, I'm no Ant expert, so I'm stumped. Does anyone have any 
> idea why this is happening?

0.7.0 pulled down dependencies using Ivy, 0.7.3 uses maven-ant-tasks.
This error occurred when trying to create the artifact typedef for
maven-ant-tasks, though I don't know why (it works here).

What do the contents of build/ look like after the error?

> Thanks for your help.
> BTW, I grabbed the tarball from http://apache.org/dist/cassandra/0.7.3/, 
> since the mirrors didn't have it. I hope it was ok to get this one.
>
> [paulchoi@build02 apache-cassandra-0.7.3-src]$ ant jar
> Buildfile: build.xml
>
> maven-ant-tasks-download:
>  [echo] Downloading Maven ANT Tasks...
> [mkdir] Created dir: 
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build
>   [get] Getting: 
> http://repo2.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.1/maven-ant-tasks-2.1.1.jar
>   [get] To: 
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build/maven-ant-tasks-2.1.1.jar
>
> maven-ant-tasks-init:
> [mkdir] Created dir: 
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build/lib
>
> BUILD FAILED
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build.xml:155: name, 
> file or resource attribute of typedef is undefined
>
> Total time: 1 second
> [paulchoi@build02 apache-cassandra-0.7.3-src]$

--
Eric Evans
eev...@rackspace.com

Re: build.xml issue with 0.7.3?

2011-03-09 Thread Stephen Connolly

there is no ivy any more.

drop me a mail with details of your _exact_ ANT version and JDK and i'll see
if I can diagnose your issues

-Stephen

On 9 March 2011 18:51, Paul Choi  wrote:

> Eric,
> Thanks for your response.
>
> This is all I see in /build:
> [paulchoi@build02 build]$ find .
> .
> ./lib
> ./maven-ant-tasks-2.1.1.jar
> [paulchoi@build02 build]$
>
> I just downloaded a new copy of 0.7.3-src and tried manually. I'm still
> running into the same problem.
>
> I tried doing this with 0.7.0, and ant downloads Ivy and Ivy takes care of
> the dependencies. With 0.7.3, maybe Ant doesn't get to the point of
> downloading Ivy? I guess I need to wise up on Ant and Ivy myself.
>
> -Paul
>
> 
> From: Eric Evans [eev...@rackspace.com]
> Sent: Friday, March 04, 2011 7:36 PM
> To: user@cassandra.apache.org
> Subject: Re: build.xml issue with 0.7.3?
>
> On Sat, 2011-03-05 at 01:23 +, Paul Choi wrote:
> > We're running 0.7.0, and we want to upgrade to 0.7.3 ASAP.
> >
> > This worked in our RPM SPEC file with 0.7.0:
> > %build
> > export JAVA_HOME=/usr/java/latest
> > ant clean jar -Drelease=true
> >
> > Now running "ant jar" throws some kind of build.xml error at line 155 -
> typedef is undefined. I'm running ant 1.6.5-2jpp.2 that comes with CentOS
> 5.5. Unfortunately, I'm no Ant expert, so I'm stumped. Does anyone have any
> idea why this is happening?
>
> 0.7.0 pulled down dependencies using Ivy, 0.7.3 uses maven-ant-tasks.
> This error occurred when trying to create the artifact typedef for
> maven-ant-tasks, though I don't know why (it works here).
>
> What do the contents of build/ look like after the error?
>
> > Thanks for your help.
> > BTW, I grabbed the tarball from http://apache.org/dist/cassandra/0.7.3/,
> since the mirrors didn't have it. I hope it was ok to get this one.
> >
> > [paulchoi@build02 apache-cassandra-0.7.3-src]$ ant jar
> > Buildfile: build.xml
> >
> > maven-ant-tasks-download:
> >  [echo] Downloading Maven ANT Tasks...
> > [mkdir] Created dir:
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build
> >   [get] Getting:
> http://repo2.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.1/maven-ant-tasks-2.1.1.jar
> >   [get] To:
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build/maven-ant-tasks-2.1.1.jar
> >
> > maven-ant-tasks-init:
> > [mkdir] Created dir:
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build/lib
> >
> > BUILD FAILED
> > /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build.xml:155:
> name, file or resource attribute of typedef is undefined
> >
> > Total time: 1 second
> > [paulchoi@build02 apache-cassandra-0.7.3-src]$
>
>
> --
> Eric Evans
> eev...@rackspace.com
>

Re: Removing a node ...

2011-03-09 Thread Jonathan Ellis

Did you check log for errors?

(ASF mailing lists prefer plain text; html has a higher spam score.)

On Wed, Mar 9, 2011 at 11:59 AM, Sasha Dolgy  wrote:
> mail servers keep catching up in spam filters.  Something about being a
> loyal gmail user ...!
> Let's try this again:
>
> further to this ...using cassandra 0.7.0
>
> nodetool -h 10.0.0.1 -p 8080 decommission
>  INFO [RMI TCP Connection(2)-10.0.0.1] 2011-03-09 16:52:22,226
> StorageService.java (line 399) Leaving: sleeping 3 ms for pending range
> setup
>  INFO [RMI TCP Connection(2)-10.0.0.1] 2011-03-09 16:52:52,233
> StorageService.java (line 399) Leaving: streaming data to other nodes
>  INFO [StreamStage:1] 2011-03-09 16:52:52,237 StreamOut.java (line 77)
> Beginning transfer to /10.0.0.2
>  INFO [StreamStage:1] 2011-03-09 16:52:52,238 StreamOut.java (line 100)
> Flushing memtables for sdo...
>  INFO [StreamStage:1] 2011-03-09 16:52:52,239 StreamOut.java (line 173)
> Stream context metadata [/mnt/cassandra/data/sdo/user-e-1-Data.db/(0,675)
>  progress=0/675 - 0%,
> /mnt/cassandra/data/sdo/app-e-1-Data.db/(0,502)
>  progress=0/502 - 0%,
> /mnt/cassandra/data/sdo/aut-e-1-Data.db/(0,1493)
>  progress=0/1493 - 0%], 3 sstables.
>  INFO [StreamStage:1] 2011-03-09 16:52:52,240 StreamOutSession.java (line
> 174) Streaming to /10.0.0.2
>
> It's been about 40 minutes now, and when I go to another node and run
> nodetool -h 10.0.0.2 -p 8080 ring I get the following:
>
> Address Status State   LoadOwnsToken
>
> 116084175244813755374456454604099553584
> 10.0.0.3  Up Normal  206.17 KB   45.91%
>  24053088190195663439419935163232881936
> 10.0.0.1Up Leaving 218.71 KB   21.76%
>  61078635599166706937511052402724559481
> 10.0.0.2Up Normal  224.8 KB32.33%
>  116084175244813755374456454604099553584
>
> Leaving is taking a very long time for such a very very small amount of
> data.  How long does it take to "decommission" ?
>
> -sd
>
>
> On Wed, Mar 9, 2011 at 6:01 PM, Sasha Dolgy  wrote:
>
>>
>> Hi there,
>>
>> Wanted to clarify with anyone ... re:
>> http://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely
>>
>> You can take a node out of the cluster with nodetool decommission to a
>> live node, or nodetool removetoken (to any other machine) to remove a dead
>> one. This will assign the ranges the old node was responsible for to other
>> nodes, and replicate the appropriate data there. If decommission is used,
>> the data will stream from the decommissioned node. If removetoken is used,
>> the data will stream from the remaining replicas.
>>
>>
>>- If the node is alive and functional, the command to be run from that
>>node is:  nodetool decommission
>>- If the node is dead, the command to be run from another node (or all
>>other nodes) is:  nodetool removetoken 
>>
>>
>> -sd
>>
>> --
>> Sasha Dolgy
>> sasha.do...@gmail.com



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: cassandra and G1 gc

2011-03-09 Thread Jonathan Ellis

Our testing indicates G1 is not significantly better than CMS.

On Wed, Mar 9, 2011 at 1:00 PM, ruslan usifov  wrote:
> Hello
>
> Does anybody use G1 gc in production? What your impressions?
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: cassandra and G1 gc

2011-03-09 Thread Peter Schuller

> Does anybody use G1 gc in production? What your impressions?

I don't know if anyone does, but I've tested it very briefly and at
least seen it run well for a while ;) (JDK 1.7 trunk builds)

I do have some comments about expected behavior though. But first, for
those not familiar with G1, the main motivation for using G1 for
Cassandra probably include:

(1) By design all collections are compacting, meaning that
fragmentation will never become an issue like with CMS old space

(2) It's a *lot* easier to tweak than CMS, making deployment easier
and configuration less of a hassle for users. Usually you'd specify
some pause time goals, and that's it. In the case of Cassandra one
would probably still want to force an aggressive trigger for
concurrent marking, but I suspect that's about it.

(3) As a result of (1) and other properties of G1, it has the
potential to completely eliminate even the occasional stop-the-world
full GC even after extended runtime. (Keyword being *potential*.)

Now, first of all, G1 is still immature compared to CMS. But even if
you are in a position that you are willing to trust G1 in some
particular JVM version for your production use, and even if G1
actually does work well with Cassandra in terms of workload, there is
at least one reason why I would urge caution w.r.t. G1 and Cassandra:
The fact that Cassandra uses GC as a means of controlling external
resources - in this case, sstables. With CMS, it's "kinda of" okay
because unreachable objects will be collected on each run of CMS. So
by triggering a full GC when discovering out-of-disk space conditions,
Cassandra can kinda avoid the pitfalls it would otherwise entail
(though confusion/impracticality for the user remains in that sstables
linger for longer than they need to).

With G1, it doesn't do a concurrent mark+sweep like CMS. Instead it
divides the heap into regions that are individually collected. While
there is a concurrent marking process, it is only used to feed data to
the policy which decides which regions to collect. There is no
guarantee or even expectation that for one "cycle" of concurrent
marking, all regions are collection. Individual regions may remain
uncollected for extended periods of time or even perpetually.

So, while it's iffy of Cassandra to begin with to use the GC for
managing external resources (I believe the motivation is less
synchronization complexity and/or overhead involved in making the
determination as to when an sstable can be deleted), G1 brings it much
more into the light than does CMS because one no longer even have the
*soft* guarantee that a CMS cycle will allow them to be freed.

Now in addition, I said G1 had the *potential* to eliminate full
GC pauses. I say potential because it's still very possible to have
workloads that cause it to effectively fail.

In particular, whenever I try to stress it I run into problems where
the tracking of inter-pointer references doesn't scale with lots of
inter-region writes. The remembered set scanning costs for regions
thus go *WAY* up to the point where regions are never collected.
Eventually as you rack up more such regions, you end up taking a full
GC anyway. Todd Lipcon seemed to have the very same problem when
trying to mitigate GC issues with HBase. For more details, there's the
"G1GC Full GCs" thread on hotspot-gc-dev/hotspot-gc-use. Unfortunately
I can't provide a link because I haven't found an ML archive that
properly reconstructs threads for that list...

I don't know whether this particular problem would in fact be an issue
for Cassandra. Extended long-term testing would probably be required
under real workloads of different kinds to determine whether G1 seems
suitable in its current condition.

-- 
/ Peter Schuller

Re: cassandra and G1 gc

2011-03-09 Thread Peter Schuller

> I don't know whether this particular problem would in fact be an issue
> for Cassandra. Extended long-term testing would probably be required
> under real workloads of different kinds to determine whether G1 seems
> suitable in its current condition.

But honestly, for any case where you have a large key cache and row
cache I fully expect there to be issues. A large LRU is exactly the
type of workload where I seem to consistently break it...

-- 
/ Peter Schuller

Re: Removing a node ...

2011-03-09 Thread Sasha Dolgy

Hi,

Checked logs on node where decommission command was performed and on
other nodes, and no error messages.  Just info messages.  Although the
behaviour and circumstances are exact, I'm wondering if
https://issues.apache.org/jira/browse/CASSANDRA-2072 has something to
do with it.  Again, this is with 0.7.0 ... Some more details on what i
did:

$CASSANDRA_HOME/bin/nodetool -h 10.0.0.1 decommission

When I view the ring from another node:

10.0.0.1Down   Leaving 218.71 KB   21.76%
61078635599166706937511052402724559481

I see this message.  Great ... but after an hour of waiting, I give up
and try to force the removal of the token:

nodetool -h 10.0.0.2 removetoken 61078635599166706937511052402724559481
Exception in thread "main" java.lang.UnsupportedOperationException:
Node /10.0.0.1 is already being removed.

Ok then... this is interesting:

nodetool -h 10.0.0.2 removetoken status
RemovalStatus: No token removals in process.

I don't get it.  How do I gracefully remove a node?  Finally, I killed
the node on 10.0.0.1 and removed it's data.  Ungraceful.  I then went
to the other nodes, still couldn't force it's removal.  Started the
node back up on 10.0.0.1 and it's rejoined the cluster ... with data
spread evenly around.  Not exactly what I wanted ... oh well

I'm sure I've missed a concept.  So, now that I have a 3 node cluster
working and balanced, I turn off cassandra on 10.0.0.1 and check the
ring from another node:

nodetool -h 10.0.0.2 ring

10.0.0.3  Up Normal  224.21 KB   40.78%
24053088190195663439419935163232881936
10.0.0.1Down   Normal  213.51 KB   36.78%
86624712919272143003828971968762407027
10.0.0.2Up Normal  244.42 KB   22.44%
124804735337540159479107746638263794797

Now, to try and remove that node by removing the token:

nodetool -h 10.0.0.1 removetoken 86624712919272143003828971968762407027

Job done, the node is gone...

nodetool -h 10.0.0.2 ring
10.0.0.3  Up Normal  224.21 KB   40.78%
24053088190195663439419935163232881936
10.0.0.2Up Normal  244.42 KB   59.22%
124804735337540159479107746638263794797

-sd

On Wed, Mar 9, 2011 at 8:43 PM, Jonathan Ellis  wrote:
>
> Did you check log for errors?
>
> (ASF mailing lists prefer plain text; html has a higher spam score.)
>
> On Wed, Mar 9, 2011 at 11:59 AM, Sasha Dolgy  wrote:
> > mail servers keep catching up in spam filters.  Something about being a
> > loyal gmail user ...!
> > Let's try this again:
> >
> > further to this ...using cassandra 0.7.0

Re: problem with bootstrap

2011-03-09 Thread aaron morton

The definition of "down" is important here. 

Down refers to a node that has joined the ring, so the other nodes know of it's 
existence and the range it is storing, which is not responding to gossip 
messages. While it is down it is still considered an endpoint. The error you 
and Patrik saw refers to the number of endpoints in the ring, not the number of 
Up nodes. When doing dev I have a 2 nodes cluster on my laptop with rf=2, it's 
fine to bring the nodes in the cluster up one at a time. 

The issue I think you and Patrik are seeing occurs when you *remove* nodes from 
the ring. The ring does not know if they are up or down. E.g. you have a ring 
of 3 nodes, and add a keyspace with RF 3. Then for whatever reason 2 nodes are 
removed from the ring. When bootstrapping a node into this ring it will fail 
because it detects the cluster does not have enough *endpoints* (different to 
up nodes) to support the keyspace. 

One thing I want to double check is that the node doing the bootstrap considers 
it's self when calculating the number of end points. Some of the things you and 
Patrik said about bootstrapping node 3 into a ring of 3 with rf=3 made me want 
to check. 

IMHO bootstrapping is the process of pulling data the *new* node is responsible 
for from other nodes in the ring. This is different to joining the ring. 

Hope that helps.
Aaron

On 9/03/2011, at 10:54 AM, mcasandra wrote:

> I think this not the right functionality and it is really odd that you can't
> successfully bring it online without turning off bootstrap BUT you can bring
> it online by turning auto_boostrap off and then run nodetool repair
> afterwards.
> 
> Also, if that's the case then when one node goes down, say out of 3 one node
> goes down then should cassandra eject other nodes as well?? Why should
> cassandra exit on startup? That node could at least serve other keyspaces
> and alleviate load while returning errors to the client for those keyspaces
> where RF cannot be met. 
> 
> As noted in my other post regarding similar issue that I reported, I have
> also seen wierd behaviour where I had 2 nodes down out of 3 and I was able
> to bring up one of the nodes except the remaining one. You would think that
> no nodes will come up but I really think there is a problem here.
> 
> 
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/problem-with-bootstrap-tp6127315p6145100.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.

Re: removing a node

2011-03-09 Thread aaron morton

yes.
dead normally means machine cannot be started.

Aaron
 
On 10/03/2011, at 6:01 AM, Sasha Dolgy wrote:

> 
> Hi there,
> 
> Wanted to clarify with anyone ... re:  
> http://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely
> 
> You can take a node out of the cluster with nodetool decommission to a live 
> node, or nodetool removetoken (to any other machine) to remove a dead one. 
> This will assign the ranges the old node was responsible for to other nodes, 
> and replicate the appropriate data there. If decommission is used, the data 
> will stream from the decommissioned node. If removetoken is used, the data 
> will stream from the remaining replicas.
> 
> If the node is alive and functional, the command to be run from that node is: 
>  nodetool decommission 
> If the node is dead, the command to be run from another node (or all other 
> nodes) is:  nodetool removetoken 
> 
> -sd
> 
> -- 
> Sasha Dolgy
> sasha.do...@gmail.com
> 
>

Re: problem with bootstrap

2011-03-09 Thread mcasandra

Thanks!

aaron morton wrote:
> 
> 
> The issue I think you and Patrik are seeing occurs when you *remove* nodes
> from the ring. The ring does not know if they are up or down. E.g. you
> have a ring of 3 nodes, and add a keyspace with RF 3. Then for whatever
> reason 2 nodes are removed from the ring. When bootstrapping a node into
> this ring it will fail because it detects the cluster does not have enough
> *endpoints* (different to up nodes) to support the keyspace. 
> 
> 
What causes a node to remove? All I did was kill -9 and then sudo cassandra
to start the node.



> IMHO bootstrapping is the process of pulling data the *new* node is
> responsible for from other nodes in the ring. This is different to joining
> the ring. 
> 

How is this different than joining the ring? It will be good to see some
example and the difference.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/problem-with-bootstrap-tp6127315p6155334.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Reducing memory footprint

2011-03-09 Thread aaron morton

Casey, 
 It sounds like the JVM is behaving. Perhaps turn off mmapped 
disk_access to double check that the number you are seeing as resident does not 
include the mapped memory?

Aaron

On 10/03/2011, at 6:36 AM, Jonathan Ellis wrote:

> I edited Peter Schuller's reply last time this came up into a FAQ:
> http://wiki.apache.org/cassandra/FAQ#mmap
> 
> On Wed, Mar 9, 2011 at 11:10 AM, Casey Deccio  wrote:
>> On Sat, Mar 5, 2011 at 7:37 PM, aaron morton 
>> wrote:
>>> 
>>> There is some additional memory usage in the JVM beyond that Heap size, in
>>> the permanent generation. 900mb sounds like too much for that, but you can
>>> check by connecting with JConsole and looking at the memory tab. You can
>>> also check the heap size there to see that it's under the value you've set.
>> 
>> Thanks for the tip!
>> 
>> From JConsole:
>> Heap memory usage: Current 46M; Max 902M
>> Non-Heap memory usage: Current 34M; Max 200MB
>> 
>> Both of these seem reasonable and don't reach the (current) 2.1 GB resident
>> usage I am seeing.
>> 
>>> Check you are using standard disk access (in conf/cassandra.yaml) rather
>>> than memory mapped access. However the memory mapped memory is reported as
>>> virtual memory, not resident. So I'm just mentioning it to be complete.
>> 
>> At the moment, it's set to "auto", but it's a 64-bit machine, so I believe
>> it's using memory mapped.  The virtual memory usage says that it is 54.6 GB.
>> 
>>> If you think you've configured things correctly and the JVM is not
>>> behaving (which is unlikely) please include some information on the JVM and
>>> OS versions and some hard numbers about what the process is using.
>> 
>> Debian 6.0 using openjdk-6-jre-lib-6b18-1.8.3-2+squeeze1
>> 
>> Thanks for your help.
>> 
>> Casey
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: Removing a node ...

2011-03-09 Thread Jonathan Ellis

I think 2072 is something different but you should definitely upgrade
before troubleshooting further.

On Wed, Mar 9, 2011 at 2:00 PM, Sasha Dolgy  wrote:
> Hi,
>
> Checked logs on node where decommission command was performed and on
> other nodes, and no error messages.  Just info messages.  Although the
> behaviour and circumstances are exact, I'm wondering if
> https://issues.apache.org/jira/browse/CASSANDRA-2072 has something to
> do with it.  Again, this is with 0.7.0 ... Some more details on what i
> did:
>
> $CASSANDRA_HOME/bin/nodetool -h 10.0.0.1 decommission
>
> When I view the ring from another node:
>
>
> 10.0.0.1    Down   Leaving 218.71 KB       21.76%
> 61078635599166706937511052402724559481
>
> I see this message.  Great ... but after an hour of waiting, I give up
> and try to force the removal of the token:
>
> nodetool -h 10.0.0.2 removetoken 61078635599166706937511052402724559481
> Exception in thread "main" java.lang.UnsupportedOperationException:
> Node /10.0.0.1 is already being removed.
>
> Ok then... this is interesting:
>
> nodetool -h 10.0.0.2 removetoken status
> RemovalStatus: No token removals in process.
>
> I don't get it.  How do I gracefully remove a node?  Finally, I killed
> the node on 10.0.0.1 and removed it's data.  Ungraceful.  I then went
> to the other nodes, still couldn't force it's removal.  Started the
> node back up on 10.0.0.1 and it's rejoined the cluster ... with data
> spread evenly around.  Not exactly what I wanted ... oh well
>
> I'm sure I've missed a concept.  So, now that I have a 3 node cluster
> working and balanced, I turn off cassandra on 10.0.0.1 and check the
> ring from another node:
>
> nodetool -h 10.0.0.2 ring
>
> 10.0.0.3  Up     Normal  224.21 KB       40.78%
> 24053088190195663439419935163232881936
> 10.0.0.1    Down   Normal  213.51 KB       36.78%
> 86624712919272143003828971968762407027
> 10.0.0.2    Up     Normal  244.42 KB       22.44%
> 124804735337540159479107746638263794797
>
> Now, to try and remove that node by removing the token:
>
> nodetool -h 10.0.0.1 removetoken 86624712919272143003828971968762407027
>
> Job done, the node is gone...
>
> nodetool -h 10.0.0.2 ring
> 10.0.0.3  Up     Normal  224.21 KB       40.78%
> 24053088190195663439419935163232881936
> 10.0.0.2    Up     Normal  244.42 KB       59.22%
> 124804735337540159479107746638263794797
>
> -sd
>
>
> On Wed, Mar 9, 2011 at 8:43 PM, Jonathan Ellis  wrote:
>>
>> Did you check log for errors?
>>
>> (ASF mailing lists prefer plain text; html has a higher spam score.)
>>
>> On Wed, Mar 9, 2011 at 11:59 AM, Sasha Dolgy  wrote:
>> > mail servers keep catching up in spam filters.  Something about being a
>> > loyal gmail user ...!
>> > Let's try this again:
>> >
>> > further to this ...using cassandra 0.7.0
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Removing a node ...

2011-03-09 Thread Sasha Dolgy

Ok ... I have been very hesitant to upgrade from 0.7.0 because I
haven't really had many problems and the comments on 0.7.1 and 0.7.2
weren't that encouraging .

So testingon 10.0.0.1 I set up 0.7.3 and have it auto-bootstrap:

10.0.0.3  Up Normal  229.26 KB   40.78%
24053088190195663439419935163232881936
10.0.0.1Up Joining 211.32 KB   19.57%
57348442436860668951429860183380413967
10.0.0.2Up Normal  249.4 KB39.65%
124804735337540159479107746638263794797

In the logs on 10.0.0.1 I get the following exceptions:

ERROR [Thread-14] 2011-03-09 21:04:41,021 AbstractCassandraDaemon.java
(line 114) Fatal exception in thread Thread[Thread-14,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
java.lang.NegativeArraySizeException
at 
org.apache.cassandra.db.ColumnFamilyStore.buildSecondaryIndexes(ColumnFamilyStore.java:375)
at 
org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:159)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91)
Caused by: java.util.concurrent.ExecutionException:
java.lang.NegativeArraySizeException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at 
org.apache.cassandra.db.ColumnFamilyStore.buildSecondaryIndexes(ColumnFamilyStore.java:365)
... 3 more
Caused by: java.lang.NegativeArraySizeException
at 
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:49)
at 
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:117)
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:94)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:107)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:72)
at 
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1311)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1203)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1131)
at 
org.apache.cassandra.db.Table.readCurrentIndexedColumns(Table.java:459)
at org.apache.cassandra.db.Table.access$200(Table.java:56)
at org.apache.cassandra.db.Table$IndexBuilder.build(Table.java:573)
at 
org.apache.cassandra.db.CompactionManager$8.run(CompactionManager.java:892)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
ERROR [CompactionExecutor:1] 2011-03-09 21:04:41,023
AbstractCassandraDaemon.java (line 114) Fatal exception in thread
Thread[CompactionExecutor:1,1,main]
java.lang.NegativeArraySizeException
at 
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:49)
at 
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:117)
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:94)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:107)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:72)
at 
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1311)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1203)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1131)
at 
org.apache.cassandra.db.Table.readCurrentIndexedColumns(Table.java:459)
at org.apache.cassandra.db.Table.access$200(Table.java:56)
at org.apach

Re: Removing a node ...

2011-03-09 Thread Jonathan Ellis

On Wed, Mar 9, 2011 at 3:11 PM, Sasha Dolgy  wrote:
> Ok ... I have been very hesitant to upgrade from 0.7.0 because I
> haven't really had many problems and the comments on 0.7.1 and 0.7.2
> weren't that encouraging .

0.7.0 is worse.

> So testingon 10.0.0.1 I set up 0.7.3 and have it auto-bootstrap:
>
> 10.0.0.3  Up     Normal  229.26 KB       40.78%
> 24053088190195663439419935163232881936
> 10.0.0.1    Up     Joining 211.32 KB       19.57%
> 57348442436860668951429860183380413967
> 10.0.0.2    Up     Normal  249.4 KB        39.65%
> 124804735337540159479107746638263794797
>
> In the logs on 10.0.0.1 I get the following exceptions:

https://issues.apache.org/jira/browse/CASSANDRA-2283

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Removing a node ...

2011-03-09 Thread Sasha Dolgy

nice .. thanks Jonathan for the quick feedback.

suppose then, for the time being, until the fix for 2283 is applied i
guess i'll live with no decommission and my dirty way of removing a
node.

-sd

On Wed, Mar 9, 2011 at 10:24 PM, Jonathan Ellis  wrote:
> On Wed, Mar 9, 2011 at 3:11 PM, Sasha Dolgy  wrote:
>> Ok ... I have been very hesitant to upgrade from 0.7.0 because I
>> haven't really had many problems and the comments on 0.7.1 and 0.7.2
>> weren't that encouraging .
>
> 0.7.0 is worse.
>
>> So testingon 10.0.0.1 I set up 0.7.3 and have it auto-bootstrap:
>>
>> 10.0.0.3  Up     Normal  229.26 KB       40.78%
>> 24053088190195663439419935163232881936
>> 10.0.0.1    Up     Joining 211.32 KB       19.57%
>> 57348442436860668951429860183380413967
>> 10.0.0.2    Up     Normal  249.4 KB        39.65%
>> 124804735337540159479107746638263794797
>>
>> In the logs on 10.0.0.1 I get the following exceptions:
>
> https://issues.apache.org/jira/browse/CASSANDRA-2283

understanding tombstones

2011-03-09 Thread Jeffrey Wang

Hey all,

I was wondering if this is the expected behavior of deletes (0.7.0). Let's say 
I have a 1-node cluster with a single CF which has gc_grace_seconds = 0. The 
following sequence of operations happens (in the given order):

insert row X with timestamp T
delete row X with timestamp T+1
force flush + compaction
insert row X with timestamp T

My understanding is that the tombstone created by the delete (and row X) will 
disappear with the flush + compaction which means the last insertion should 
show up. My experimentation, however, suggests otherwise (the last insertion 
does not show up).

I believe I have traced this to the fact that the markedForDeleteAt field on 
the ColumnFamily does not get reset after a compaction (after gc_grace_seconds 
has passed); is this desirable? I think it introduces an inconsistency in how 
tombstoned columns work versus tombstoned CFs. Thanks.

-Jeffrey

Understanding index builds

2011-03-09 Thread Matt Kennedy

I'm trying to gain some insight into what happens with a cluster when
indexes are being built, or when CFs with indexed columns are being written
to.

Over the past couple of days we've been doing some loads into a CF with 29
indexed columns.  Eventually, the nodes just got overwhelmed and the client
(Hector) started getting timeouts.  We were using using a MapReduce job to
load an HDFS file into Cassandra, though we had limited the load job to one
task per node.  My confusion comes from how difficult it was to know that
the nodes were becoming overwhelmed.  The ring consistently reported that
all nodes were up and it did not appear that there were pending operations
under tpstats.  I also monitor this cluster with Ganglia, and at no point
did any of the machine loads appear very high at all, yet our job kept
failing with Hector reporting timeouts.

Today we decided to leave index creation until the end, and just load the
data using the same Hector code.  We bumped up the hadoop concurrency to two
concurrent tasks per node, and everything went fine, as expected, we've done
much larger loads than this using Hadoop and as long as you don't shoot for
too much concurrency, Cassandra can deal with it.  So now we have the data
in the column family and I updated the column family metadata in the CLI to
enable the 29 indexes.  As soon as I do that, the ring starts reporting that
nodes are down intermittently, and HintedHandoffs are starting to accumulate
under tpstats. Ganglia is reporting very low overall load, so I'm wondering
why it's taking so long for cli and nodetool commands to return.

I'm just trying to get a better handle on what kind of actions have a
serious impact on cluster availability and to know the right places to look
to try to get ahead of those conditions.

Thanks for any insight you can provide,
Matt

Re: problem with bootstrap

2011-03-09 Thread aaron morton

Bootstrapping uses the same mechanisms as a repair to streams data from other 
nodes. This can be a heavy weight process and you may want to control when it 
starts. 

Joining the ring just tells the other nodes you exists and this is your token. 


A
On 10/03/2011, at 9:27 AM, mcasandra wrote:

> Thanks!
> 
> aaron morton wrote:
>> 
>> 
>> The issue I think you and Patrik are seeing occurs when you *remove* nodes
>> from the ring. The ring does not know if they are up or down. E.g. you
>> have a ring of 3 nodes, and add a keyspace with RF 3. Then for whatever
>> reason 2 nodes are removed from the ring. When bootstrapping a node into
>> this ring it will fail because it detects the cluster does not have enough
>> *endpoints* (different to up nodes) to support the keyspace. 
>> 
>> 
> What causes a node to remove? All I did was kill -9 and then sudo cassandra
> to start the node.
> 
> 
> 
>> IMHO bootstrapping is the process of pulling data the *new* node is
>> responsible for from other nodes in the ring. This is different to joining
>> the ring. 
>> 
> 
> How is this different than joining the ring? It will be good to see some
> example and the difference.
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/problem-with-bootstrap-tp6127315p6155334.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.

Re: problem with bootstrap

2011-03-09 Thread mcasandra


aaron morton wrote:
> 
> 
> The issue I think you and Patrik are seeing occurs when you *remove* nodes
> from the ring. The ring does not know if they are up or down. E.g. you
> have a ring of 3 nodes, and add a keyspace with RF 3. Then for whatever
> reason 2 nodes are removed from the ring. When bootstrapping a node into
> this ring it will fail because it detects the cluster does not have enough
> *endpoints* (different to up nodes) to support the keyspace. 
> 
> 
In your previous post you mentioned that the node got removed. I am trying
to understand what that really means and what causes a node to remove? All I
did was kill -9 and then sudo cassandra to start the node.



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/problem-with-bootstrap-tp6127315p6156250.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Bug in fix for #2296?

2011-03-09 Thread Jason Harvey

I applied the #2296 patch and retried a scrub. Now getting thousands
of the following:

java.io.IOException: Keys must be written in ascending order.
at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:111)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:128)
at 
org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:598)
at 
org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56)
at 
org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

Re: Understanding index builds

2011-03-09 Thread Jonathan Ellis

https://issues.apache.org/jira/browse/CASSANDRA-2294
https://issues.apache.org/jira/browse/CASSANDRA-2295

On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy  wrote:
> I'm trying to gain some insight into what happens with a cluster when
> indexes are being built, or when CFs with indexed columns are being written
> to.
>
> Over the past couple of days we've been doing some loads into a CF with 29
> indexed columns.  Eventually, the nodes just got overwhelmed and the client
> (Hector) started getting timeouts.  We were using using a MapReduce job to
> load an HDFS file into Cassandra, though we had limited the load job to one
> task per node.  My confusion comes from how difficult it was to know that
> the nodes were becoming overwhelmed.  The ring consistently reported that
> all nodes were up and it did not appear that there were pending operations
> under tpstats.  I also monitor this cluster with Ganglia, and at no point
> did any of the machine loads appear very high at all, yet our job kept
> failing with Hector reporting timeouts.
>
> Today we decided to leave index creation until the end, and just load the
> data using the same Hector code.  We bumped up the hadoop concurrency to two
> concurrent tasks per node, and everything went fine, as expected, we've done
> much larger loads than this using Hadoop and as long as you don't shoot for
> too much concurrency, Cassandra can deal with it.  So now we have the data
> in the column family and I updated the column family metadata in the CLI to
> enable the 29 indexes.  As soon as I do that, the ring starts reporting that
> nodes are down intermittently, and HintedHandoffs are starting to accumulate
> under tpstats. Ganglia is reporting very low overall load, so I'm wondering
> why it's taking so long for cli and nodetool commands to return.
>
> I'm just trying to get a better handle on what kind of actions have a
> serious impact on cluster availability and to know the right places to look
> to try to get ahead of those conditions.
>
> Thanks for any insight you can provide,
> Matt
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Modeling Multi-Valued Fields

2011-03-09 Thread Cameron Leach

Is there a best-practice for modeling multi-valued fields (fields that are
repeated or collections of fields)? Our current data model allows for a User
to store multiple email addresses:

User {
  Integer id; //row key
  List emails;

  Email {
String type; //home, work, gmail, hotmail, etc...
String address;
  }
}

So if I setup a 'User' column family with an 'Email' super column, how would
one support multiple email addresses, storing values for the 'type' and
'address' column names? I've seen it suggested to have dynamic column names,
but this doesn't seem practical, unless someone can make it more clear how
that strategy would work.

Thanks!

Re: understanding tombstones

2011-03-09 Thread Jonathan Ellis

On Wed, Mar 9, 2011 at 4:54 PM, Jeffrey Wang  wrote:
> insert row X with timestamp T
> delete row X with timestamp T+1
> force flush + compaction
> insert row X with timestamp T
>
> My understanding is that the tombstone created by the delete (and row X)
> will disappear with the flush + compaction which means the last insertion
> should show up.

Right.

> I believe I have traced this to the fact that the markedForDeleteAt field on
> the ColumnFamily does not get reset after a compaction (after
> gc_grace_seconds has passed); is this desirable? I think it introduces an
> inconsistency in how tombstoned columns work versus tombstoned CFs. Thanks.

That does sound like a bug.  Can you create a ticket?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

RE: understanding tombstones

2011-03-09 Thread Jeffrey Wang

Yup. https://issues.apache.org/jira/browse/CASSANDRA-2305

-Jeffrey

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Wednesday, March 09, 2011 6:19 PM
To: user@cassandra.apache.org
Subject: Re: understanding tombstones

On Wed, Mar 9, 2011 at 4:54 PM, Jeffrey Wang  wrote:
> insert row X with timestamp T
> delete row X with timestamp T+1
> force flush + compaction
> insert row X with timestamp T
>
> My understanding is that the tombstone created by the delete (and row X)
> will disappear with the flush + compaction which means the last insertion
> should show up.

Right.

> I believe I have traced this to the fact that the markedForDeleteAt field on
> the ColumnFamily does not get reset after a compaction (after
> gc_grace_seconds has passed); is this desirable? I think it introduces an
> inconsistency in how tombstoned columns work versus tombstoned CFs. Thanks.

That does sound like a bug.  Can you create a ticket?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Exception when running a clean up

2011-03-09 Thread Stu King

I am seeing this exception when I am trying to run a cleanup. I want to
decommission the node after the cleanup.

java.util.concurrent.ExecutionException: java.io.IOError:
java.io.EOFException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at
org.apache.cassandra.db.CompactionManager.performCleanup(CompactionManager.java:180)
at
org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:909)
at
org.apache.cassandra.service.StorageService.forceTableCleanup(StorageService.java:1127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
at
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.IOError: java.io.EOFException
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105)
at
org.apache.cassandra.db.CompactionManager.doCleanupCompaction(CompactionManager.java:418)
at
org.apache.cassandra.db.CompactionManager.access$400(CompactionManager.java:54)
at
org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:171)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
... 3 more
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:416)
at
org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:76)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:364)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:101)
... 8 more

Re: Exception when running a clean up

2011-03-09 Thread aaron morton

What version of cassandra are you using and what is the upgrade history for the 
cluster?
Aaron

On 10/03/2011, at 8:24 PM, Stu King wrote:

> I am seeing this exception when I am trying to run a cleanup. I want to 
> decommission the node after the cleanup.
> 
> java.util.concurrent.ExecutionException: java.io.IOError: java.io.EOFException
>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>   at 
> org.apache.cassandra.db.CompactionManager.performCleanup(CompactionManager.java:180)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:909)
>   at 
> org.apache.cassandra.service.StorageService.forceTableCleanup(StorageService.java:1127)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
>   at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
>   at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
>   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
>   at sun.rmi.transport.Transport$1.run(Transport.java:177)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
>   at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>   at java.lang.Thread.run(Thread.java:636)
> Caused by: java.io.IOError: java.io.EOFException
>   at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105)
>   at 
> org.apache.cassandra.db.CompactionManager.doCleanupCompaction(CompactionManager.java:418)
>   at 
> org.apache.cassandra.db.CompactionManager.access$400(CompactionManager.java:54)
>   at 
> org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:171)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   ... 3 more
> Caused by: java.io.EOFException
>   at java.io.RandomAccessFile.readFully(RandomAccessFile.java:416)
>   at 
> org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280)
>   at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:76)
>   at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:364)
>   at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313)
>   at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:101)
>   ... 8 more

51 matches

Mail list logo