I am running the bin/cassandra with the -f option and it does seem to fully die and not just stalling.
I have also tried using the cassandra-cli to create keyspace and it works for a little bit and then will die slightly after accepting the request the vmstat after it dies is as follows: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 311240 424 23356 0 0 14 4 13 2 0 0 99 0 I also tried the cassandra-cli creating keyspace after I deleted all the content of cassandra/data and cassandra/commitlog and it still is dying almost immediately after the keyspace creation I am not sure why this is the case. Is there a way to fully remove cassandra and start off with a fully fresh copy? Thanks Alex On Fri, Dec 24, 2010 at 1:42 PM, Dan Hendry <dan.hendry.j...@gmail.com>wrote: > Hum, very strange. > > More what I was trying to get at was: did the process truly die or was it > just non-responsive and looking like it was dead? It would be very strange > if the actual process was dying without any warnings in the logs. Presumably > you are running bin/cassandra *without* the -f option? What is the output of > top/vmstat on the dead node after Cassandra has 'died'? Sorry I was not > clear on this initially. > > I have no experience with pycassa but you might want to try using the > Cassandra CLI to create keyspaces and column families to rule out some sort > of client weirdness. Also, you haven't made any changes to cassandra-env.sh > have you? EC2 micros have a very limited amount of ram. I have also seen > their CPU bursting cause problems but that does not seem to be the issue > here. I might also suggest you try a m1.small instead just to be safe; they > are still pretty cheap when you run then as spot-instances. > > As a last ditch effort (given that this is a test cluster), you can delete > the contents of /var/lib/cassandra/data/*. /var/lib/cassandra/commitlog/* to > effectively reset your nodes. > > On Fri, Dec 24, 2010 at 12:48 PM, Alex Quan <alex.q...@tinkur.com> wrote: > >> Sorry but I am not sure how to answer all the question that you have posed >> since a lot of the stuff I am working with is quite new to me and I haven't >> use many of the tools that are talked about but I will try my best to answer >> the question to the best of my knowledge. I am trying to get the cassandra >> to run between 2 nodes that are both Amazon's ec2 micro instances, I believe >> they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23. When >> I said killed it was what was outputted into the console when the process >> died so I am not sure what that exactly means. Here is some of the info >> before cassandra went down: >> >> ring: >> >> Address Status State Load Owns >> Token >> >> 111232248257764777335763873822010980488 >> 10.127.155.205 Up Normal 85.17 KB 59.06% >> 41570168072350555868554892080805525145 >> 10.122.123.210 Up Normal 91.1 KB 40.94% >> 111232248257764777335763873822010980488 >> >> vmstat before cassandra is up: >> >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ----cpu---- >> r b swpd free buff cache si so bi bo in cs us sy id >> wa >> 0 0 0 328196 632 13936 0 0 12 4 13 1 0 0 >> 99 0 >> >> vmstat after cassandra is up: >> >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ----cpu---- >> r b swpd free buff cache si so bi bo in cs us sy id >> wa >> 0 2 0 5660 116 10312 0 0 12 4 13 1 0 0 >> 99 0 >> >> Then after I run a line like sys.create_keyspace('testing', 1) in pycassa >> with the connections setup to point to my machine I get the following error: >> >> >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File >> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/system_manager.py", >> line 365, in drop_keyspace >> schema_version = self._conn.system_drop_keyspace(keyspace) >> File >> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py", >> line 1255, in system_drop_keyspace >> return self.recv_system_drop_keyspace() >> File >> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py", >> line 1266, in recv_system_drop_keyspace >> (fname, mtype, rseqid) = self._iprot.readMessageBegin() >> File >> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py", >> line 126, in readMessageBegin >> sz = self.readI32() >> File >> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py", >> line 203, in readI32 >> buff = self.trans.readAll(4) >> File >> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py", >> line 58, in readAll >> chunk = self.read(sz-have) >> File >> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py", >> line 272, in read >> self.readFrame() >> File >> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py", >> line 276, in readFrame >> buff = self.__trans.readAll(4) >> File >> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py", >> line 58, in readAll >> chunk = self.read(sz-have) >> File >> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TSocket.py", >> line 108, in read >> raise TTransportException(type=TTransportException.END_OF_FILE, >> message='TSocket read 0 bytes') >> thrift.transport.TTransport.TTransportException: TSocket read 0 bytes >> >> and then cassandra on the machine dies, here is the log some of the log of >> the machine that died: >> >> INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162) >> Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db >> (301 bytes) >> INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not load >> MX4J, mx4j-tools.jar is not in the classpath >> INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77) >> Binding thrift service to /0.0.0.0:9160 >> INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using >> TFramedTransport with a max frame size of 15728640 bytes. >> INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119) >> Listening for thrift clients... >> INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java >> (line 639) switching in a fresh Memtable for Migrations at >> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log', >> position=10873) >> INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java >> (line 943) Enqueuing flush of memtable-migrati...@948345082(5902 bytes, 1 >> operations) >> INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155) >> Writing memtable-migrati...@948345082(5902 bytes, 1 operations) >> INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java >> (line 639) switching in a fresh Memtable for Schema at >> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log', >> position=10873) >> INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java >> (line 943) Enqueuing flush of memtable-sch...@212165140(2194 bytes, 3 >> operations) >> INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162) >> Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db >> (6035 bytes) >> INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155) >> Writing memtable-sch...@212165140(2194 bytes, 3 operations) >> >> and the log on the machine that stays up: >> >> ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java >> (line 90) Fatal exception in thread Thread[ReadStage:4,5,main] >> org.apache.avro.AvroTypeException: Found >> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]}, >> expecting >> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]} >> at >> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212) >> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) >> at >> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121) >> at >> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138) >> at >> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) >> at >> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142) >> at >> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) >> at >> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118) >> at >> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142) >> at >> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) >> at >> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105) >> at >> org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98) >> at >> org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274) >> at >> org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(DefinitionsUpdateResponseVerbHandler.java:56) >> at >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:662) >> INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583) >> Node /10.127.155.205 has restarted, now UP again >> INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line >> 670) Node /10.127.155.205 state jump to normal >> INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java >> (line 191) Started hinted handoff for endpoint /10.127.155.205 >> INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java >> (line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205 >> INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789 >> OutboundTcpConnection.java (line 115) error writing to /10.127.155.205 >> INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195) >> InetAddress /10.127.155.205 is now dead. >> >> The ring output on my node that stays up: >> >> Address Status State Load Owns >> Token >> >> 111232248257764777335763873822010980488 >> 10.127.155.205 Down Normal 85.17 KB 59.06% >> 41570168072350555868554892080805525145 >> 10.122.123.210 Up Normal 91.1 KB 40.94% >> 111232248257764777335763873822010980488 >> >> I am not sure how to use the jmx tools to connect to these machines so I >> can't really answer that but hopefully this is enough information to >> diagnose my problem, thanks >> >> Alex >> >> >> >> On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <dan.hendry.j...@gmail.com>wrote: >> >>> Your details are rather vague, what do you mean by killed? Is the >>> Cassandra java process still running? Any other warning or error log >>> messages (from either node)? Could you provide the last few Cassandra log >>> lines from each machine? Can you connect to the node via JMX? What is the >>> output of nodetool ring from the second node (which is presumably still >>> alive)? Is there any unusual system activity: high cpu usage, low cpu usage, >>> problems with disk IO (can be checked with vmstat). >>> >>> Can you provide any further system information? Linux/windows, java >>> version, 32/64 bit, amount of ram? >>> >>> >>> On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <alex.q...@tinkur.com> wrote: >>> >>>> Hi, >>>> >>>> I am a newbie to cassandra and am using cassandra RC 2. I initially have >>>> cassndra working on one node and was able to create keyspace, column >>>> families and populate the database fine. I tried adding a second node by >>>> changing the seed to point to another node and setting listen_address and >>>> rpc_address to blank. I then started up the second node and it seems to >>>> have >>>> connected fine using the node tool but after that I couldn't get it to >>>> accept any commands and whenever I tried to make a new keyspace or column >>>> family it would kill my initial node after a message like this: >>>> >>>> INFO 18:19:49,335 switching in a fresh Memtable for Schema at >>>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log', >>>> position=9143) >>>> INFO 18:19:49,335 Enqueuing flush of memtable-sch...@1358138608(2410 >>>> bytes, 5 operations) >>>> Killed >>>> >>>> and the next few time I start up the server a similar would pop up until >>>> I am guessing all the stuff is flushed out then it would start fine until I >>>> tried to add anything to it. I tried changing back the yaml file back to >>>> the >>>> original setup and this still happens. I don't know what to try to get it >>>> to >>>> work properly, if you guys can help I would be really grateful >>>> >>>> Alex >>>> >>> >>> >> >