Re: Having trouble getting cassandra to stay up

Alex Quan Mon, 27 Dec 2010 15:16:58 -0800

I started over and used a m1 type instance and everything seems to be
working fine now, thanks for all the help


Alex

On Mon, Dec 27, 2010 at 7:18 AM, Gary Dusbabek <[email protected]> wrote:

> You might want to try starting over.  Configure your initial keyspaces
> in conf/cassandra.yaml and load them into your cluster with
> bin/schematool.
>
> That nasty stack trace indicates the server is getting data that is
> not formatted the way it expects.  Please verify that your cassandra
> servers are both running the same version.
>
> Your earlier error when adding a keyspace through pycassa was
> confusing.  You stated that you tried to create a keyspace, but the
> error you pasted appeared to error in a drop_keyspace call.  Something
> doesn't add up.
>
> Gary.
>
>
> On Fri, Dec 24, 2010 at 11:48, Alex Quan <[email protected]> wrote:
> > Sorry but I am not sure how to answer all the question that you have
> posed
> > since a lot of the stuff I am working with is quite new to me and I
> haven't
> > use many of the tools that are talked about but I will try my best to
> answer
> > the question to the best of my knowledge. I am trying to get the
> cassandra
> > to run between 2 nodes that are both Amazon's ec2 micro instances, I
> believe
> > they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23.
> When
> > I said killed it was what was outputted into the console when the process
> > died so I am not sure what that exactly means. Here is some of the info
> > before cassandra went down:
> >
> > ring:
> >
> > Address         Status State   Load            Owns
> > Token
> >
> > 111232248257764777335763873822010980488
> > 10.127.155.205  Up     Normal  85.17 KB        59.06%
> > 41570168072350555868554892080805525145
> > 10.122.123.210  Up     Normal  91.1 KB         40.94%
> > 111232248257764777335763873822010980488
> >
> > vmstat before cassandra is up:
> >
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id
> > wa
> >  0  0      0 328196    632  13936    0    0    12     4   13    1  0  0
> 99
> > 0
> >
> > vmstat after cassandra is up:
> >
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id
> > wa
> >  0  2      0   5660    116  10312    0    0    12     4   13    1  0  0
> 99
> > 0
> >
> > Then after I run a line like sys.create_keyspace('testing', 1) in pycassa
> > with the connections setup to point to my machine I get the following
> error:
> >
> >
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/system_manager.py",
> > line 365, in drop_keyspace
> >     schema_version = self._conn.system_drop_keyspace(keyspace)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> > line 1255, in system_drop_keyspace
> >     return self.recv_system_drop_keyspace()
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> > line 1266, in recv_system_drop_keyspace
> >     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> > line 126, in readMessageBegin
> >     sz = self.readI32()
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> > line 203, in readI32
> >     buff = self.trans.readAll(4)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> > line 58, in readAll
> >     chunk = self.read(sz-have)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> > line 272, in read
> >     self.readFrame()
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> > line 276, in readFrame
> >     buff = self.__trans.readAll(4)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> > line 58, in readAll
> >     chunk = self.read(sz-have)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TSocket.py",
> > line 108, in read
> >     raise TTransportException(type=TTransportException.END_OF_FILE,
> > message='TSocket read 0 bytes')
> > thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
> >
> > and then cassandra on the machine dies, here is the log some of the log
> of
> > the machine that died:
> >
> >  INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162)
> > Completed flushing
> /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db
> > (301 bytes)
> >  INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not
> load
> > MX4J, mx4j-tools.jar is not in the classpath
> >  INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77)
> Binding
> > thrift service to /0.0.0.0:9160
> >  INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using
> > TFramedTransport with a max frame size of 15728640 bytes.
> >  INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119)
> > Listening for thrift clients...
> >  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> > (line 639) switching in a fresh Memtable for Migrations at
> >
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> > position=10873)
> >  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> > (line 943) Enqueuing flush of memtable-migrati...@948345082(5902 bytes,
> 1
> > operations)
> >  INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155)
> > Writing memtable-migrati...@948345082(5902 bytes, 1 operations)
> >  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> > (line 639) switching in a fresh Memtable for Schema at
> >
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> > position=10873)
> >  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> > (line 943) Enqueuing flush of memtable-sch...@212165140(2194 bytes, 3
> > operations)
> >  INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162)
> > Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db
> > (6035 bytes)
> >  INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155)
> > Writing memtable-sch...@212165140(2194 bytes, 3 operations)
> >
> > and the log on the machine that stays up:
> >
> > ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java
> > (line 90) Fatal exception in thread Thread[ReadStage:4,5,main]
> > org.apache.avro.AvroTypeException: Found
> >
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]},
> > expecting
> >
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
> >     at
> > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
> >     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> >     at
> >
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
> >     at
> >
> org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98)
> >     at
> >
> org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274)
> >     at
> >
> org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(DefinitionsUpdateResponseVerbHandler.java:56)
> >     at
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >     at java.lang.Thread.run(Thread.java:662)
> >  INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583)
> Node
> > /10.127.155.205 has restarted, now UP again
> >  INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line
> 670)
> > Node /10.127.155.205 state jump to normal
> >  INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java
> > (line 191) Started hinted handoff for endpoint /10.127.155.205
> >  INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java
> > (line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205
> >  INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789
> > OutboundTcpConnection.java (line 115) error writing to /10.127.155.205
> >  INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195)
> > InetAddress /10.127.155.205 is now dead.
> >
> > The ring output on my node that stays up:
> >
> > Address         Status State   Load            Owns
> > Token
> >
> > 111232248257764777335763873822010980488
> > 10.127.155.205  Down   Normal  85.17 KB        59.06%
> > 41570168072350555868554892080805525145
> > 10.122.123.210  Up     Normal  91.1 KB         40.94%
> > 111232248257764777335763873822010980488
> >
> > I am not sure how to use the jmx tools to connect to these machines so I
> > can't really answer that but hopefully this is enough information to
> > diagnose my problem, thanks
> >
> > Alex
> >
> >
> > On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <[email protected]>
> > wrote:
> >>
> >> Your details are rather vague, what do you mean by killed? Is the
> >> Cassandra java process still running? Any other warning or error log
> >> messages (from either node)? Could you provide the last few Cassandra
> log
> >> lines from each machine? Can you connect to the node via JMX? What is
> the
> >> output of nodetool ring from the second node (which is presumably still
> >> alive)? Is there any unusual system activity: high cpu usage, low cpu
> usage,
> >> problems with disk IO (can be checked with vmstat).
> >> Can you provide any further system information? Linux/windows, java
> >> version, 32/64 bit, amount of ram?
> >>
> >> On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <[email protected]>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am a newbie to cassandra and am using cassandra RC 2. I initially
> have
> >>> cassndra working on one node and was able to create keyspace, column
> >>> families and populate the database fine. I tried adding a second node
> by
> >>> changing the seed to point to another node and setting listen_address
> and
> >>> rpc_address to blank. I then started up the second node and it seems to
> have
> >>> connected fine using the node tool but after that I couldn't get it to
> >>> accept any commands and whenever I tried to make a new keyspace or
> column
> >>> family it would kill my initial node after a message like this:
> >>>
> >>>  INFO 18:19:49,335 switching in a fresh Memtable for Schema at
> >>>
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
> >>> position=9143)
> >>>  INFO 18:19:49,335 Enqueuing flush of memtable-sch...@1358138608(2410
> >>> bytes, 5 operations)
> >>> Killed
> >>>
> >>> and the next few time I start up the server a similar would pop up
> until
> >>> I am guessing all the stuff is flushed out then it would start fine
> until I
> >>> tried to add anything to it. I tried changing back the yaml file back
> to the
> >>> original setup and this still happens. I don't know what to try to get
> it to
> >>> work properly, if you guys can help I would be really grateful
> >>>
> >>> Alex
> >>
> >
> >
>

Re: Having trouble getting cassandra to stay up

Reply via email to