Re: Cluster fragility

2010-11-12 Thread Dave Gardner
We never have to reboot our production cluster. However we're not
running a beta version but a release version (0.6.6). If your aim is
to avoid fragility, it would seem sensible to run a release version as
a good starting point.

dave

On Friday, November 12, 2010, Reverend Chip  wrote:
> I've been running tests with a first four-node, then eight-node
> cluster.  I started with 0.7.0 beta3, but have since updated to a more
> recent Hudson build.  I've been happy with a lot of things, but I've had
> some really surprisingly unpleasant experiences with operational fragility.
>
> For example, when adding four nodes to a four-node cluster (at 2x
> replication), I had two nodes that insisted they were streaming data,
> but no progress was made in the stream for over a day (this was with
> beta3).  I had to reboot the cluster to clear that condition.  For the
> purpose of making progress on other tests I decided just to reload the
> data at eight-wide (with the more recent build), but if I had data I
> couldn't reload or the cluster were serving in production, that would
> have been a very inconvenient failure.
>
> I also had a node that refused to bootstrap immediately, but after I
> waited a day, it finally got its act together.
>
> I write this, not to complain per se, but to ask whether these failures
> are known & expected, and rebooting a cluster is just a Thing You Have
> To Do once in a while; or if not, what techniques can be used to clear
> such cluster topology and streaming/replication problems without rebooting.
>
>

-- 
*Dave Gardner*
Technical Architect

[image: imagini_58mmX15mm.png]   [image: VisualDNA-Logo-small.png]

*Imagini Europe Limited*
7 Moor Street, London W1D 5NB

[image: phone_icon.png] +44 20 7734 7033
[image: skype_icon.png] daveg79
[image: emailIcon.png] dave.gard...@imagini.net
[image: icon-web.png] http://www.visualdna.com

Imagini Europe Limited, Company number 5565112 (England
and Wales), Registered address: c/o Bird & Bird,
90 Fetter Lane, London, EC4A 1EQ, United Kingdom


RE: Cassandra 0.7 beta3 BinaryMemtable and Supercolumns

2010-11-12 Thread Aditya Muralidharan
Thanks for the response. We're trying to get a general idea of the insert and 
retrieval performance, and we figured BinaryMemtable would be a great enabler 
for our bulk import scenarios. Normal thrift inserts are certainly fast, but it 
would be nice to get an idea of how BMT could improve our throughput.

Are you able to share some general performance numbers for thrift/avro/bmt?

Thanks.

AD


-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Thursday, November 11, 2010 8:23 PM
To: user
Subject: Re: Cassandra 0.7 beta3 BinaryMemtable and Supercolumns

Before you dig into that, are you sure normal Thrift inserts are not
fast enough?

On Thu, Nov 11, 2010 at 4:41 PM, Aditya Muralidharan
 wrote:
> Pretty sure I could ask that better:
>
>
>
> Is it possible for me to perform RowMutations on BinaryMemtable for a
> ColumnFamily of type Super?
>
>
>
> The bmt_example seems to say that it's possible, but cassandra 0.7 b3 seems
> to disagree with the following:
>
>
>
> ERROR [MutationStage:38] 2010-11-11 13:47:37,383
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
>
> java.lang.RuntimeException: java.lang.UnsupportedOperationException: This
> operation is not supported for Super Columns.
>
>     at
> org.apache.cassandra.db.BinaryVerbHandler.doVerb(BinaryVerbHandler.java:54)
>
>     at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>     at java.lang.Thread.run(Thread.java:619)
>
> Caused by: java.lang.UnsupportedOperationException: This operation is not
> supported for Super Columns.
>
>     at org.apache.cassandra.db.SuperColumn.value(SuperColumn.java:158)
>
>     at org.apache.cassandra.db.Table.load(Table.java:640)
>
>     at
> org.apache.cassandra.db.RowMutation.applyBinary(RowMutation.java:206)
>
>     at
> org.apache.cassandra.db.BinaryVerbHandler.doVerb(BinaryVerbHandler.java:44)
>
>
>
> The code in the bmt_example serializes the CF for the super columns and sets
> that as column data (made me scratch my head) on a different CF for the
> RowMutation. Attempting that causes the following exception:
>
>
>
> Caused by: java.io.IOException: Invalid localDeleteTime read: 0
>
>     at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:334)
>
>     at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:291)
>
>     at
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
>
>     at
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:120)
>
>     at
> org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:368)
>
>     at
> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:378)
>
>     at
> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:336)
>
>     at
> org.apache.cassandra.db.RowMutationMessageSerializer.deserialize(RowMutationMessage.java:84)
>
>     at
> org.apache.cassandra.db.BinaryVerbHandler.doVerb(BinaryVerbHandler.java:42)
>
>
>
> . which is basically because the CFSerializer is (rightly) expecting to
> deserialize a super column though the bmt_example serialized a Standard CF.
>
>
>
> Any help on BMT with supercolumns would be appreciated.
>
>
>
> Thanks.
>
>
>
> AD
>
>
>
>
>
> From: Aditya Muralidharan [mailto:aditya.muralidha...@nisc.coop]
> Sent: Thursday, November 11, 2010 3:27 PM
> To: user@cassandra.apache.org
> Subject: Cassandra 0.7 beta3 BinaryMemtable and Supercolumns
>
>
>
> Is it possible for BinaryMemtable RowMutations to a ColumnFamily with
> supercolumns?



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra 0.7 beta3 BinaryMemtable and Supercolumns

2010-11-12 Thread Ryan King
On Fri, Nov 12, 2010 at 7:33 AM, Aditya Muralidharan
 wrote:
> Thanks for the response. We're trying to get a general idea of the insert and 
> retrieval performance, and we figured BinaryMemtable would be a great enabler 
> for our bulk import scenarios. Normal thrift inserts are certainly fast, but 
> it would be nice to get an idea of how BMT could improve our throughput.

Before you go trying to improve it, you should be sure that it needs
improvement.

> Are you able to share some general performance numbers for thrift/avro/bmt?

We've consistently found that BMT was unnecessary. We've always run
into some other limit first.

-ryan


Gossip yoyo under write load

2010-11-12 Thread Chip Salzenberg
After I rebooted my 0.7.0beta3+ cluster to increase threads (read=100
write=200 ... they're beefy machines), and putting them under load again, I
find gossip reporting yoyo up-down-up-down status for the other nodes.
 Anyone know what this is a symptom of, and/or how to avoid it?  I haven't
seen any particular symptoms other than the log messages; and I suppose I'm
also dropping replication MUTATEs which had been happening already, anyway.

cas001   INFO [ScheduledTasks:1] 2010-11-12 13:00:02,891 Gossiper.java (line
195) InetAddress /X.20 is now dead.
cas001   INFO [GossipStage:1] 2010-11-12 13:00:07,567 Gossiper.java (line
569) InetAddress /X.20 is now UP
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:00:53,662 Gossiper.java (line
195) InetAddress /X.21 is now dead.
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:00:56,967 GCInspector.java
(line 133) GC for ParNew: 255 ms, 135668944 reclaimed leaving 18375966648
used; max is 34557919232
cas001   INFO [COMMIT-LOG-WRITER] 2010-11-12 13:01:01,135
CommitLogSegment.java (line 50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1289595661135.log
cas001   INFO [GossipStage:1] 2010-11-12 13:01:08,148 Gossiper.java (line
569) InetAddress /X.21 is now UP
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:01:56,753 GCInspector.java
(line 133) GC for ParNew: 268 ms, 132609096 reclaimed leaving 20102566032
used; max is 34557919232
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:01:57,771 GCInspector.java
(line 133) GC for ParNew: 274 ms, 115223104 reclaimed leaving 20214228560
used; max is 34557919232
cas001   INFO [COMMIT-LOG-WRITER] 2010-11-12 13:02:14,746
CommitLogSegment.java (line 50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1289595734746.log
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:03:02,868 GCInspector.java
(line 133) GC for ParNew: 297 ms, 62163960 reclaimed leaving 22200082216
used; max is 34557919232
cas001   INFO [COMMIT-LOG-WRITER] 2010-11-12 13:03:29,123
CommitLogSegment.java (line 50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1289595809123.log
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:04:09,626 GCInspector.java
(line 133) GC for ParNew: 321 ms, 125585880 reclaimed leaving 24138058936
used; max is 34557919232
cas001   INFO [COMMIT-LOG-WRITER] 2010-11-12 13:04:44,852
CommitLogSegment.java (line 50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1289595884852.log
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:05:13,695 GCInspector.java
(line 133) GC for ParNew: 242 ms, 126754312 reclaimed leaving 26019407576
used; max is 34557919232
cas001   INFO [COMMIT-LOG-WRITER] 2010-11-12 13:06:01,941
CommitLogSegment.java (line 50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1289595961941.log
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:06:25,617 GCInspector.java
(line 133) GC for ParNew: 307 ms, 134631824 reclaimed leaving 11283839952
used; max is 34557919232
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:06:37,032 Gossiper.java (line
195) InetAddress /X.18 is now dead.
cas001   INFO [GossipStage:1] 2010-11-12 13:06:38,666 Gossiper.java (line
569) InetAddress /X.18 is now UP
cas001   INFO [COMMIT-LOG-WRITER] 2010-11-12 13:07:23,417
CommitLogSegment.java (line 50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1289596043417.log
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:07:33,034 GCInspector.java
(line 133) GC for ParNew: 231 ms, 108391848 reclaimed leaving 13146098816
used; max is 34557919232
cas001   INFO [MutationStage:169] 2010-11-12 13:08:12,548
ColumnFamilyStore.java (line 580) switching in a fresh Memtable for
TestAttrs at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1289596043417.log',
position=84947614)
cas001   INFO [MutationStage:169] 2010-11-12 13:08:12,549
ColumnFamilyStore.java (line 879) Enqueuing flush of
memtable-testat...@1721243764(293461476 bytes, 8388614 operations)
cas001   INFO [FlushWriter:1] 2010-11-12 13:08:12,549 Memtable.java (line
155) Writing memtable-testat...@1721243764(293461476 bytes, 8388614
operations)
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:08:40,628 GCInspector.java
(line 133) GC for ParNew: 278 ms, 135521080 reclaimed leaving 15121172544
used; max is 34557919232
cas001   INFO [COMMIT-LOG-WRITER] 2010-11-12 13:08:42,349
CommitLogSegment.java (line 50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1289596122349.log
cas001   INFO [FlushWriter:1] 2010-11-12 13:09:20,586 Memtable.java (line
162) Completed flushing
/var/lib/cassandra/data/Attrs/TestAttrs-e-305-Data.db (954244537 bytes)
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:09:31,873 GCInspector.java
(line 133) GC for ParNew: 228 ms, 201874960 reclaimed leaving 16512318064
used; max is 34557919232
cas001   INFO [ScheduledTasks:1] 2010-11-12 13:09:47,127 GCInspector.java
(line 133) GC for ParNew: 266 ms, 123236640 reclaimed leaving 17009227136
used; max is 34557919232
cas0

Re: Backup Strategy

2010-11-12 Thread Rob Coli

On 11/9/10 5:15 AM, Wayne wrote:

We are trying to use snapshots etc. to back up the data
but it is slow (hours) and slows down the entire node.


The snapshot process (as I understand it, and with the caveat that this 
is the code path without JNA available) first flushes all memtables 
(this can take a while, and can trigger minor compaction) and then does 
the following per SSTable :


a) flushes all memtables ()
b) fork process (this can take a while depending on heap size)
c) ln /path/to/SSTable-etc.db /path/to/snapshot

In general this process should not take "hours". Are you perhaps, in a 
case where you have a very large number of SSTable files in a dir and 
are not using JNA? I have seen snapshots lag in those circumstances, but 
those circumstances were usually pathological..


=Rob


[RELEASE] 0.6.8

2010-11-12 Thread Eric Evans

Greetings,

I have some bad news, and some good news.

The Bad News is that a regression[1] made its way into our latest
release, 0.6.7.  Sorry about that, we try really hard to keep that from
happening, but every once in a while one sneaks through.

The Good News is that it's been fixed and we've expedited a new release.
Since this regression breaks read-repair, we recommend you upgrade to
0.6.8 right away.

As usual, links to binary and source archives are available from the
Downloads page[3], and packages for Debian-based systems are available
from our repo[4].

[1]: https://issues.apache.org/jira/browse/CASSANDRA-1727
[2]: http://goo.gl/iTJHD (CHANGES.txt)
[3]: http://cassandra.apache.org/download
[4]: http://wiki.apache.org/cassandra/DebianPackaging

-- 
Eric Evans
eev...@rackspace.com



using SimpleAuthenticator is not working

2010-11-12 Thread Alaa Zubaidi

using SimpleAuthenticator is not working with me in beta 3

I am doing the following:

·In Cassandra.yaml Set
authenticator: org.apache.cassandra.auth.SimpleAuthenticator
·Add username and password to passwd.proprties
·Add username to keyspace and column family permission in access.proprties
·Add the path for passwd.proprties and access.proprties to Cassandra.bat
set CASSANDRA_PARAMS=-Dcassandra -Dcassandra-foreground=yes
-Dpasswd.properties=E:\Cassandra\Cass07b3\apache-cassandra-0.7.0-beta3\conf\passwd.properties 

-Daccess.properties=E:\Cassandra\Cass07b3\apache-cassandra-0.7.0-beta3\conf\access.properties 



·Use login() to login to Cassandra in the application:
Map creds = new HashMap();
creds.put("user1", "pwd1");
AuthenticationRequest Auth = newAuthenticationRequest(creds);
_client.login(Auth);

its giving me an error, with no message, and if I try to do any thing 
its giving me "I am not logged in" error.

Is there any thing I am missing?


Thanks,
--
Alaa Zubaidi


Re: [RELEASE] 0.6.8

2010-11-12 Thread Schubert Zhang
Thanks.

The tag 0.6.8 is not available in SVN

On Sat, Nov 13, 2010 at 8:02 AM, Eric Evans  wrote:

>
> Greetings,
>
> I have some bad news, and some good news.
>
> The Bad News is that a regression[1] made its way into our latest
> release, 0.6.7.  Sorry about that, we try really hard to keep that from
> happening, but every once in a while one sneaks through.
>
> The Good News is that it's been fixed and we've expedited a new release.
> Since this regression breaks read-repair, we recommend you upgrade to
> 0.6.8 right away.
>
> As usual, links to binary and source archives are available from the
> Downloads page[3], and packages for Debian-based systems are available
> from our repo[4].
>
> [1]: https://issues.apache.org/jira/browse/CASSANDRA-1727
> [2]: http://goo.gl/iTJHD (CHANGES.txt)
> [3]: http://cassandra.apache.org/download
> [4]: http://wiki.apache.org/cassandra/DebianPackaging
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: Cluster fragility

2010-11-12 Thread Jonathan Ellis
These are not expected.  In order of increasing utility of fixing it
we could use

 - INFO level logs from when something went wrong; when streaming,
both source and target
 - DEBUG level logs
 - instructions for how to reproduce

On Thu, Nov 11, 2010 at 7:46 PM, Reverend Chip  wrote:
> I've been running tests with a first four-node, then eight-node
> cluster.  I started with 0.7.0 beta3, but have since updated to a more
> recent Hudson build.  I've been happy with a lot of things, but I've had
> some really surprisingly unpleasant experiences with operational fragility.
>
> For example, when adding four nodes to a four-node cluster (at 2x
> replication), I had two nodes that insisted they were streaming data,
> but no progress was made in the stream for over a day (this was with
> beta3).  I had to reboot the cluster to clear that condition.  For the
> purpose of making progress on other tests I decided just to reload the
> data at eight-wide (with the more recent build), but if I had data I
> couldn't reload or the cluster were serving in production, that would
> have been a very inconvenient failure.
>
> I also had a node that refused to bootstrap immediately, but after I
> waited a day, it finally got its act together.
>
> I write this, not to complain per se, but to ask whether these failures
> are known & expected, and rebooting a cluster is just a Thing You Have
> To Do once in a while; or if not, what techniques can be used to clear
> such cluster topology and streaming/replication problems without rebooting.
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Gossip yoyo under write load

2010-11-12 Thread Jonathan Ellis
On Fri, Nov 12, 2010 at 3:19 PM, Chip Salzenberg  wrote:
> After I rebooted my 0.7.0beta3+ cluster to increase threads (read=100
> write=200 ... they're beefy machines), and putting them under load again, I
> find gossip reporting yoyo up-down-up-down status for the other nodes.
>  Anyone know what this is a symptom of, and/or how to avoid it?

It means "the system is too overloaded to process gossip data in a
timely manner."  Usually this means GC storming but that does not like
the problem here.  Swapping is a less frequent offender.  Since you
are seeing this after bumping to extremely high thread counts I would
guess context switching might be a factor.

What are tpstats?

>  I haven't
> seen any particular symptoms other than the log messages; and I suppose I'm
> also dropping replication MUTATEs which had been happening already, anyway.

I don't see any WARN lines about that, did you elide them?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: [RELEASE] 0.6.8

2010-11-12 Thread Eric Evans
On Sat, 2010-11-13 at 10:21 +0800, Schubert Zhang wrote:
> Thanks.
> 
> The tag 0.6.8 is not available in SVN 

It's there now.  Thanks for letting me know.

-- 
Eric Evans
eev...@rackspace.com