Re: Cluster fragility
We never have to reboot our production cluster. However we're not running a beta version but a release version (0.6.6). If your aim is to avoid fragility, it would seem sensible to run a release version as a good starting point. dave On Friday, November 12, 2010, Reverend Chip wrote: > I've been running tests with a first four-node, then eight-node > cluster. I started with 0.7.0 beta3, but have since updated to a more > recent Hudson build. I've been happy with a lot of things, but I've had > some really surprisingly unpleasant experiences with operational fragility. > > For example, when adding four nodes to a four-node cluster (at 2x > replication), I had two nodes that insisted they were streaming data, > but no progress was made in the stream for over a day (this was with > beta3). I had to reboot the cluster to clear that condition. For the > purpose of making progress on other tests I decided just to reload the > data at eight-wide (with the more recent build), but if I had data I > couldn't reload or the cluster were serving in production, that would > have been a very inconvenient failure. > > I also had a node that refused to bootstrap immediately, but after I > waited a day, it finally got its act together. > > I write this, not to complain per se, but to ask whether these failures > are known & expected, and rebooting a cluster is just a Thing You Have > To Do once in a while; or if not, what techniques can be used to clear > such cluster topology and streaming/replication problems without rebooting. > > -- *Dave Gardner* Technical Architect [image: imagini_58mmX15mm.png] [image: VisualDNA-Logo-small.png] *Imagini Europe Limited* 7 Moor Street, London W1D 5NB [image: phone_icon.png] +44 20 7734 7033 [image: skype_icon.png] daveg79 [image: emailIcon.png] dave.gard...@imagini.net [image: icon-web.png] http://www.visualdna.com Imagini Europe Limited, Company number 5565112 (England and Wales), Registered address: c/o Bird & Bird, 90 Fetter Lane, London, EC4A 1EQ, United Kingdom
RE: Cassandra 0.7 beta3 BinaryMemtable and Supercolumns
Thanks for the response. We're trying to get a general idea of the insert and retrieval performance, and we figured BinaryMemtable would be a great enabler for our bulk import scenarios. Normal thrift inserts are certainly fast, but it would be nice to get an idea of how BMT could improve our throughput. Are you able to share some general performance numbers for thrift/avro/bmt? Thanks. AD -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 11, 2010 8:23 PM To: user Subject: Re: Cassandra 0.7 beta3 BinaryMemtable and Supercolumns Before you dig into that, are you sure normal Thrift inserts are not fast enough? On Thu, Nov 11, 2010 at 4:41 PM, Aditya Muralidharan wrote: > Pretty sure I could ask that better: > > > > Is it possible for me to perform RowMutations on BinaryMemtable for a > ColumnFamily of type Super? > > > > The bmt_example seems to say that it's possible, but cassandra 0.7 b3 seems > to disagree with the following: > > > > ERROR [MutationStage:38] 2010-11-11 13:47:37,383 > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor > > java.lang.RuntimeException: java.lang.UnsupportedOperationException: This > operation is not supported for Super Columns. > > at > org.apache.cassandra.db.BinaryVerbHandler.doVerb(BinaryVerbHandler.java:54) > > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > Caused by: java.lang.UnsupportedOperationException: This operation is not > supported for Super Columns. > > at org.apache.cassandra.db.SuperColumn.value(SuperColumn.java:158) > > at org.apache.cassandra.db.Table.load(Table.java:640) > > at > org.apache.cassandra.db.RowMutation.applyBinary(RowMutation.java:206) > > at > org.apache.cassandra.db.BinaryVerbHandler.doVerb(BinaryVerbHandler.java:44) > > > > The code in the bmt_example serializes the CF for the super columns and sets > that as column data (made me scratch my head) on a different CF for the > RowMutation. Attempting that causes the following exception: > > > > Caused by: java.io.IOException: Invalid localDeleteTime read: 0 > > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:334) > > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:291) > > at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) > > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:120) > > at > org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:368) > > at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:378) > > at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:336) > > at > org.apache.cassandra.db.RowMutationMessageSerializer.deserialize(RowMutationMessage.java:84) > > at > org.apache.cassandra.db.BinaryVerbHandler.doVerb(BinaryVerbHandler.java:42) > > > > . which is basically because the CFSerializer is (rightly) expecting to > deserialize a super column though the bmt_example serialized a Standard CF. > > > > Any help on BMT with supercolumns would be appreciated. > > > > Thanks. > > > > AD > > > > > > From: Aditya Muralidharan [mailto:aditya.muralidha...@nisc.coop] > Sent: Thursday, November 11, 2010 3:27 PM > To: user@cassandra.apache.org > Subject: Cassandra 0.7 beta3 BinaryMemtable and Supercolumns > > > > Is it possible for BinaryMemtable RowMutations to a ColumnFamily with > supercolumns? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Cassandra 0.7 beta3 BinaryMemtable and Supercolumns
On Fri, Nov 12, 2010 at 7:33 AM, Aditya Muralidharan wrote: > Thanks for the response. We're trying to get a general idea of the insert and > retrieval performance, and we figured BinaryMemtable would be a great enabler > for our bulk import scenarios. Normal thrift inserts are certainly fast, but > it would be nice to get an idea of how BMT could improve our throughput. Before you go trying to improve it, you should be sure that it needs improvement. > Are you able to share some general performance numbers for thrift/avro/bmt? We've consistently found that BMT was unnecessary. We've always run into some other limit first. -ryan
Gossip yoyo under write load
After I rebooted my 0.7.0beta3+ cluster to increase threads (read=100 write=200 ... they're beefy machines), and putting them under load again, I find gossip reporting yoyo up-down-up-down status for the other nodes. Anyone know what this is a symptom of, and/or how to avoid it? I haven't seen any particular symptoms other than the log messages; and I suppose I'm also dropping replication MUTATEs which had been happening already, anyway. cas001 INFO [ScheduledTasks:1] 2010-11-12 13:00:02,891 Gossiper.java (line 195) InetAddress /X.20 is now dead. cas001 INFO [GossipStage:1] 2010-11-12 13:00:07,567 Gossiper.java (line 569) InetAddress /X.20 is now UP cas001 INFO [ScheduledTasks:1] 2010-11-12 13:00:53,662 Gossiper.java (line 195) InetAddress /X.21 is now dead. cas001 INFO [ScheduledTasks:1] 2010-11-12 13:00:56,967 GCInspector.java (line 133) GC for ParNew: 255 ms, 135668944 reclaimed leaving 18375966648 used; max is 34557919232 cas001 INFO [COMMIT-LOG-WRITER] 2010-11-12 13:01:01,135 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1289595661135.log cas001 INFO [GossipStage:1] 2010-11-12 13:01:08,148 Gossiper.java (line 569) InetAddress /X.21 is now UP cas001 INFO [ScheduledTasks:1] 2010-11-12 13:01:56,753 GCInspector.java (line 133) GC for ParNew: 268 ms, 132609096 reclaimed leaving 20102566032 used; max is 34557919232 cas001 INFO [ScheduledTasks:1] 2010-11-12 13:01:57,771 GCInspector.java (line 133) GC for ParNew: 274 ms, 115223104 reclaimed leaving 20214228560 used; max is 34557919232 cas001 INFO [COMMIT-LOG-WRITER] 2010-11-12 13:02:14,746 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1289595734746.log cas001 INFO [ScheduledTasks:1] 2010-11-12 13:03:02,868 GCInspector.java (line 133) GC for ParNew: 297 ms, 62163960 reclaimed leaving 22200082216 used; max is 34557919232 cas001 INFO [COMMIT-LOG-WRITER] 2010-11-12 13:03:29,123 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1289595809123.log cas001 INFO [ScheduledTasks:1] 2010-11-12 13:04:09,626 GCInspector.java (line 133) GC for ParNew: 321 ms, 125585880 reclaimed leaving 24138058936 used; max is 34557919232 cas001 INFO [COMMIT-LOG-WRITER] 2010-11-12 13:04:44,852 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1289595884852.log cas001 INFO [ScheduledTasks:1] 2010-11-12 13:05:13,695 GCInspector.java (line 133) GC for ParNew: 242 ms, 126754312 reclaimed leaving 26019407576 used; max is 34557919232 cas001 INFO [COMMIT-LOG-WRITER] 2010-11-12 13:06:01,941 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1289595961941.log cas001 INFO [ScheduledTasks:1] 2010-11-12 13:06:25,617 GCInspector.java (line 133) GC for ParNew: 307 ms, 134631824 reclaimed leaving 11283839952 used; max is 34557919232 cas001 INFO [ScheduledTasks:1] 2010-11-12 13:06:37,032 Gossiper.java (line 195) InetAddress /X.18 is now dead. cas001 INFO [GossipStage:1] 2010-11-12 13:06:38,666 Gossiper.java (line 569) InetAddress /X.18 is now UP cas001 INFO [COMMIT-LOG-WRITER] 2010-11-12 13:07:23,417 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1289596043417.log cas001 INFO [ScheduledTasks:1] 2010-11-12 13:07:33,034 GCInspector.java (line 133) GC for ParNew: 231 ms, 108391848 reclaimed leaving 13146098816 used; max is 34557919232 cas001 INFO [MutationStage:169] 2010-11-12 13:08:12,548 ColumnFamilyStore.java (line 580) switching in a fresh Memtable for TestAttrs at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1289596043417.log', position=84947614) cas001 INFO [MutationStage:169] 2010-11-12 13:08:12,549 ColumnFamilyStore.java (line 879) Enqueuing flush of memtable-testat...@1721243764(293461476 bytes, 8388614 operations) cas001 INFO [FlushWriter:1] 2010-11-12 13:08:12,549 Memtable.java (line 155) Writing memtable-testat...@1721243764(293461476 bytes, 8388614 operations) cas001 INFO [ScheduledTasks:1] 2010-11-12 13:08:40,628 GCInspector.java (line 133) GC for ParNew: 278 ms, 135521080 reclaimed leaving 15121172544 used; max is 34557919232 cas001 INFO [COMMIT-LOG-WRITER] 2010-11-12 13:08:42,349 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1289596122349.log cas001 INFO [FlushWriter:1] 2010-11-12 13:09:20,586 Memtable.java (line 162) Completed flushing /var/lib/cassandra/data/Attrs/TestAttrs-e-305-Data.db (954244537 bytes) cas001 INFO [ScheduledTasks:1] 2010-11-12 13:09:31,873 GCInspector.java (line 133) GC for ParNew: 228 ms, 201874960 reclaimed leaving 16512318064 used; max is 34557919232 cas001 INFO [ScheduledTasks:1] 2010-11-12 13:09:47,127 GCInspector.java (line 133) GC for ParNew: 266 ms, 123236640 reclaimed leaving 17009227136 used; max is 34557919232 cas0
Re: Backup Strategy
On 11/9/10 5:15 AM, Wayne wrote: We are trying to use snapshots etc. to back up the data but it is slow (hours) and slows down the entire node. The snapshot process (as I understand it, and with the caveat that this is the code path without JNA available) first flushes all memtables (this can take a while, and can trigger minor compaction) and then does the following per SSTable : a) flushes all memtables () b) fork process (this can take a while depending on heap size) c) ln /path/to/SSTable-etc.db /path/to/snapshot In general this process should not take "hours". Are you perhaps, in a case where you have a very large number of SSTable files in a dir and are not using JNA? I have seen snapshots lag in those circumstances, but those circumstances were usually pathological.. =Rob
[RELEASE] 0.6.8
Greetings, I have some bad news, and some good news. The Bad News is that a regression[1] made its way into our latest release, 0.6.7. Sorry about that, we try really hard to keep that from happening, but every once in a while one sneaks through. The Good News is that it's been fixed and we've expedited a new release. Since this regression breaks read-repair, we recommend you upgrade to 0.6.8 right away. As usual, links to binary and source archives are available from the Downloads page[3], and packages for Debian-based systems are available from our repo[4]. [1]: https://issues.apache.org/jira/browse/CASSANDRA-1727 [2]: http://goo.gl/iTJHD (CHANGES.txt) [3]: http://cassandra.apache.org/download [4]: http://wiki.apache.org/cassandra/DebianPackaging -- Eric Evans eev...@rackspace.com
using SimpleAuthenticator is not working
using SimpleAuthenticator is not working with me in beta 3 I am doing the following: ·In Cassandra.yaml Set authenticator: org.apache.cassandra.auth.SimpleAuthenticator ·Add username and password to passwd.proprties ·Add username to keyspace and column family permission in access.proprties ·Add the path for passwd.proprties and access.proprties to Cassandra.bat set CASSANDRA_PARAMS=-Dcassandra -Dcassandra-foreground=yes -Dpasswd.properties=E:\Cassandra\Cass07b3\apache-cassandra-0.7.0-beta3\conf\passwd.properties -Daccess.properties=E:\Cassandra\Cass07b3\apache-cassandra-0.7.0-beta3\conf\access.properties ·Use login() to login to Cassandra in the application: Map creds = new HashMap(); creds.put("user1", "pwd1"); AuthenticationRequest Auth = newAuthenticationRequest(creds); _client.login(Auth); its giving me an error, with no message, and if I try to do any thing its giving me "I am not logged in" error. Is there any thing I am missing? Thanks, -- Alaa Zubaidi
Re: [RELEASE] 0.6.8
Thanks. The tag 0.6.8 is not available in SVN On Sat, Nov 13, 2010 at 8:02 AM, Eric Evans wrote: > > Greetings, > > I have some bad news, and some good news. > > The Bad News is that a regression[1] made its way into our latest > release, 0.6.7. Sorry about that, we try really hard to keep that from > happening, but every once in a while one sneaks through. > > The Good News is that it's been fixed and we've expedited a new release. > Since this regression breaks read-repair, we recommend you upgrade to > 0.6.8 right away. > > As usual, links to binary and source archives are available from the > Downloads page[3], and packages for Debian-based systems are available > from our repo[4]. > > [1]: https://issues.apache.org/jira/browse/CASSANDRA-1727 > [2]: http://goo.gl/iTJHD (CHANGES.txt) > [3]: http://cassandra.apache.org/download > [4]: http://wiki.apache.org/cassandra/DebianPackaging > > -- > Eric Evans > eev...@rackspace.com > >
Re: Cluster fragility
These are not expected. In order of increasing utility of fixing it we could use - INFO level logs from when something went wrong; when streaming, both source and target - DEBUG level logs - instructions for how to reproduce On Thu, Nov 11, 2010 at 7:46 PM, Reverend Chip wrote: > I've been running tests with a first four-node, then eight-node > cluster. I started with 0.7.0 beta3, but have since updated to a more > recent Hudson build. I've been happy with a lot of things, but I've had > some really surprisingly unpleasant experiences with operational fragility. > > For example, when adding four nodes to a four-node cluster (at 2x > replication), I had two nodes that insisted they were streaming data, > but no progress was made in the stream for over a day (this was with > beta3). I had to reboot the cluster to clear that condition. For the > purpose of making progress on other tests I decided just to reload the > data at eight-wide (with the more recent build), but if I had data I > couldn't reload or the cluster were serving in production, that would > have been a very inconvenient failure. > > I also had a node that refused to bootstrap immediately, but after I > waited a day, it finally got its act together. > > I write this, not to complain per se, but to ask whether these failures > are known & expected, and rebooting a cluster is just a Thing You Have > To Do once in a while; or if not, what techniques can be used to clear > such cluster topology and streaming/replication problems without rebooting. > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Gossip yoyo under write load
On Fri, Nov 12, 2010 at 3:19 PM, Chip Salzenberg wrote: > After I rebooted my 0.7.0beta3+ cluster to increase threads (read=100 > write=200 ... they're beefy machines), and putting them under load again, I > find gossip reporting yoyo up-down-up-down status for the other nodes. > Anyone know what this is a symptom of, and/or how to avoid it? It means "the system is too overloaded to process gossip data in a timely manner." Usually this means GC storming but that does not like the problem here. Swapping is a less frequent offender. Since you are seeing this after bumping to extremely high thread counts I would guess context switching might be a factor. What are tpstats? > I haven't > seen any particular symptoms other than the log messages; and I suppose I'm > also dropping replication MUTATEs which had been happening already, anyway. I don't see any WARN lines about that, did you elide them? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: [RELEASE] 0.6.8
On Sat, 2010-11-13 at 10:21 +0800, Schubert Zhang wrote: > Thanks. > > The tag 0.6.8 is not available in SVN It's there now. Thanks for letting me know. -- Eric Evans eev...@rackspace.com