date:20120701

Re: Memtable tuning in 1.0 and higher

2012-07-01 Thread Jonathan Ellis

On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd
 wrote:
> the currentThoughput is increased even before the data is merged into the
> memtable so it is actually measuring the throughput afaik.

You're right.  I've attached a patch to
https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: High CPU usage as of 8pm eastern time

2012-07-01 Thread Hontvári József Levente

Thank you for the mail. Same here, but I restarted the affected server 
before I noticed your mail.


It affected both OpenJDK Java 6  (packaged with Ubuntu 10.04) and Oracle 
Java 7 processes. Ubuntu 32 bit servers had no issues, only a 64 bit 
machine.


Likely it is related to the leap second introduced today.

On 2012.07.01. 5:11, Mina Naguib wrote:

Hi folks

Our cassandra (and other java-based apps) started experiencing extremely high 
CPU usage as of 8pm eastern time (midnight UTC).

The issue appears to be related to specific versions of java + linux + ntpd

There are many solutions floating around on IRC, twitter, stackexchange, LKML.

The simplest one that worked for us is simply to run this command on each 
affected machine:

date; date `date +"%m%d%H%M%C%y.%S"`; date;

CPU drop was instantaneous - there was no need to restart the server, ntpd, or 
any of the affected JVMs.

Re: Memtable tuning in 1.0 and higher

2012-07-01 Thread Joost Van De Wijgerd

Hi Jonathan,

Looks good, any chance of porting this fix to the 1.0 branch?

Kind regards

Joost

Sent from my iPhone


On 1 jul. 2012, at 09:25, Jonathan Ellis  wrote:

> On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd
>  wrote:
>> the currentThoughput is increased even before the data is merged into the
>> memtable so it is actually measuring the throughput afaik.
> 
> You're right.  I've attached a patch to
> https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this.
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: High CPU usage as of 8pm eastern time

2012-07-01 Thread David Daeschler

More information for others that were affected.

Our installation of java:

[root@inv4 conf]# java -version
java version "1.6.0_30"
Java(TM) SE Runtime Environment (build 1.6.0_30-b12)
Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)

[root@inv4 conf]# uname -a
Linux inv4 2.6.32-220.4.2.el6.x86_64 #1 SMP Tue Feb 14 04:00:16 GMT
2012 x86_64 x86_64 x86_64 GNU/Linux

Jonathan pointed out a Linux bug that may be related:
https://issues.apache.org/jira/browse/CASSANDRA-4066

In my case only the Java process went nuts, as seems to be the case in
many other reports:
https://bugzilla.mozilla.org/show_bug.cgi?id=769972
http://www.wired.com/wiredenterprise/2012/07/leap-second-bug-wreaks-havoc-with-java-linux/

I hope everyone got enough sleep!
- David


On Sun, Jul 1, 2012 at 4:49 AM, Hontvári József Levente
 wrote:
> Thank you for the mail. Same here, but I restarted the affected server
> before I noticed your mail.
>
> It affected both OpenJDK Java 6  (packaged with Ubuntu 10.04) and Oracle
> Java 7 processes. Ubuntu 32 bit servers had no issues, only a 64 bit
> machine.
>
> Likely it is related to the leap second introduced today.
>
>
> On 2012.07.01. 5:11, Mina Naguib wrote:
>>
>> Hi folks
>>
>> Our cassandra (and other java-based apps) started experiencing extremely
>> high CPU usage as of 8pm eastern time (midnight UTC).
>>
>> The issue appears to be related to specific versions of java + linux +
>> ntpd
>>
>> There are many solutions floating around on IRC, twitter, stackexchange,
>> LKML.
>>
>> The simplest one that worked for us is simply to run this command on each
>> affected machine:
>>
>> date; date `date +"%m%d%H%M%C%y.%S"`; date;
>>
>> CPU drop was instantaneous - there was no need to restart the server,
>> ntpd, or any of the affected JVMs.
>>
>>
>>
>>
>
>

SnappyCompressor and Cassandra 1.1.1

2012-07-01 Thread Andy Cobley

I'm running Cassandra on Raspberry Pi (for educational reason) and have been 
successfully running 1.1.0 for some time.  However there is no native build of 
SnappyCompressor for the platform (I'm currently working n rectifying that if I 
can) so that compression is unavailable.  When I try and start 1.1.1 on the 
platform I'm getting the following error which looks to me like 1.1.1 is trying 
to load snappy compressor at startup and falls over when to can't find it.  
Thats not been the case with 1.1.0:

INFO 14:22:07,600 Global memtable threshold is enabled at 35MB
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
at org.xerial.snappy.Snappy.(Snappy.java:44)
at 
org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
at 
org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
at 
org.apache.cassandra.io.compress.SnappyCompressor.(SnappyCompressor.java:37)
at org.apache.cassandra.config.CFMetaData.(CFMetaData.java:76)
at 
org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:79)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:439)
at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:118)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:126)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1681)
at java.lang.Runtime.loadLibrary0(Runtime.java:840)
at java.lang.System.loadLibrary(System.java:1047)
at 
org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
... 17 more
ERROR 14:22:09,934 Exception encountered during startup
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
at org.xerial.snappy.Snappy.(Snappy.java:44)
at 
org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
at 
org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
at 
org.apache.cassandra.io.compress.SnappyCompressor.(SnappyCompressor.java:37)
at org.apache.cassandra.config.CFMetaData.(CFMetaData.java:76)
at 
org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:79)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:439)
at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:118)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:126)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
at org.xerial.snappy.Snappy.(Snappy.java:44)
at 
org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
at 
org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
at 
org.apache.cassandra.io.compress.SnappyCompressor.(SnappyCompressor.java:37)
at org.apache.cassandra.config.CFMetaData.(CFMetaData.java:76)
at 
org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:79)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:439)
at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:118)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:126)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
Exception encountered during startup: [FAILED_TO_LOAD_NATIVE_LIBRARY] null

Andy


The University of Dundee is a Scottish Registered Charity, No. SC015096.

Bootstrap code path

2012-07-01 Thread Bill Hastings

Could someone please tell me where I should start looking at code to
understand how cassandra bootstrap process works? I am sure it is
complicated but I have time. Also is my understanding correct that the
new nodes that are added are not joining the ring till the bootstrap
process is complete i.e do not receive any read or write requests from
outside?

Re: Failed to solve Digest mismatch

2012-07-01 Thread Jason Tang

For the create/update/deleteColumn/deleteRow test case, for Quorum
consistency level, 6 nodes, replicate factor 3, for one thread around 1/100
round, I can have this reproduced.

And if I have 20 client threads to run the test client, the ratio is bigger.

And the test group will be executed by one thread, and the client time
stamp is unique and sequenced, guaranteed by Hector.

And client only access the data from local Cassandra.

And the query only use the row key which is unique. The column name is not
unique, in my case, eg, "status".

And the row have around 7 columns, which are all not big, eg "status:true",
"userName:Jason" ...

BRs
//Ares

2012/7/1 Jonathan Ellis 

> Is this Cassandra 1.1.1?
>
> How often do you observe this?  How many columns are in the row?  Can
> you reproduce when querying by column name, or only when "slicing" the
> row?
>
> On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang  wrote:
> > Hi
> >
> >First I delete one column, then I delete one row. Then try to read all
> > columns from the same row, all operations from same client app.
> >
> >The consistency level is read/write quorum.
> >
> >Check the Cassandra log, the local node don't perform the delete
> > operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1)
> >
> >After delete, I try to read all columns from the row, I found the node
> > found "Digest mismatch" due to Quorum consistency configuration, but the
> > result is not correct.
> >
> >From the log, I can see the delete mutation already accepted
> > by 192.168.0.6, 192.168.0.1,  but when 192.168.0.5 read response from 0.6
> > and 0.1, and then it merge the data, but finally 0.5 shows the result
> which
> > is the dirty data.
> >
> >Following logs shows the change of column "737461747573" , 192.168.0.5
> > try to read from 0.1 and 0.6, it should be deleted, but finally it shows
> it
> > has the data.
> >
> > log:
> > 192.168.0.5
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653)
> > Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc',
> > key=7878323239537570657254616e67307878,
> > columnParent='QueryPath(columnFamilyName='queue', superColumnName='null',
> > columnName='null')',
> >
> columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79)
> > Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674)
> > reading data from /192.168.0.6
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694)
> > reading digest from /192.168.0.1
> > DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 6556@/192.168.0.6
> > DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
> > AbstractRowResolver.java (line 66) Preprocessed data response
> > DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 6557@/192.168.0.1
> > DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
> > AbstractRowResolver.java (line 66) Preprocessed digest response
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line
> 65)
> > resolving 2 responses
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733)
> > Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
> > Mismatch for key DecoratedKey(100572974179274741747356988451225858264,
> > 7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs
> > d41d8cd98f00b204e9800998ecf8427e)
> > DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 6558@/192.168.0.6
> > DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 6559@/192.168.0.1
> > DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
> > AbstractRowResolver.java (line 66) Preprocessed data response
> > DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
> > AbstractRowResolver.java (line 66) Preprocessed data response
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line
> 63)
> > resolving 2 responses
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
> 123)
> > collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
> 123)
> > collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
> 123)
> > collecting 2 of 2147483647:
> 696e517565756554696d6

cassandra halt after started minutes later

2012-07-01 Thread Yan Chunlu

I have a three node cluster running 1.0.2, today there's a very strange
problem that suddenly two of cassandra  node(let's say B and C) was costing
a lot of cpu, turned out for some reason the "java" binary just dont
run I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works
okay.

after that node A stop working... same problem, I install "sun jdk", then
it's okay. but minutes later, B stop working again, about 5-10 minutes
later after the cassandra started, it stop responding connections, I can't
access 9160 and nodetool dont return either.

I have turned on DEBUG and dont see much useful information, the last rows
on node B are as belows:
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
(line 65) resolving 2 responses
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
(line 106) digests verified
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
(line 110) resolve: 0 ms.
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line
694) Read: 5 ms.
DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
116) Version is now 3
DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
116) Version is now 3


this problem is really driving me crazy since I just dont know what
happened, and how to debug it, I tried to kill node A and restart it, then
node B halt, after I restart B, then node C goes down..


one thing may related is that the log time on node B is not the same with
the system time(A and C are okay).

while date on node B shows:
Sun Jul  1 23:10:57 CST 2012 (system time)

but you may noticed that the time is "2012-07-01 07:45:XX" in those above
log message.  the system time is right, just not sure why cassandra's log
file shows the wrong time, I didn't recall cassandra have timezone
settings.

Re: cassandra halt after started minutes later

2012-07-01 Thread Yan Chunlu

adjust the timezone of java by  -Duser.timezone   and the timezone of
cassandra is the same with system(Debian 6.0).

after restart cassandra I found the following error message in the log file
of node B. after about 2 minutes later, node C stop responding

the error log of node B:

Thrift transport error occurred during processing of message.
org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
 at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
 at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)



the log info in node C:


DEBUG [MutationStage:25] 2012-07-01 23:29:42,909
RowMutationVerbHandler.java (line 60) RowMutation(keyspace='spark',
key='3937343836623538363837363135353264313339333463343532623634373131656462306139',
modifications=[ColumnFamily(permacache
[76616c7565:false:67906@1341156582948365,])]) applied.  Sending response to
79529@/192.168.1.129
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java
(line 523) insert
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark',
key='636f6d6d656e74735f706172656e74735f32373232343938',
modifications=[ColumnFamily(permacache [76616c7565:false:6@1341156582953843
,])])]/QUORUM
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
/192.168.1.40
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
/192.168.1.129
DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line
116) Version is now 3
DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913
ResponseVerbHandler.java (line 44) Processing response on a callback from
50050@/192.168.1.129
DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line
116) Version is now 3
DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914
ResponseVerbHandler.java (line 44) Processing response on a callback from
50051@/192.168.1.40
DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line
116) Version is now 3



On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu  wrote:

> I have a three node cluster running 1.0.2, today there's a very strange
> problem that suddenly two of cassandra  node(let's say B and C) was costing
> a lot of cpu, turned out for some reason the "java" binary just dont
> run I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works
> okay.
>
> after that node A stop working... same problem, I install "sun jdk", then
> it's okay. but minutes later, B stop working again, about 5-10 minutes
> later after the cassandra started, it stop responding connections, I can't
> access 9160 and nodetool dont return either.
>
> I have turned on DEBUG and dont see much useful information, the last rows
> on node B are as belows:
> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
> (line 65) resolving 2 responses
> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
> (line 106) digests verified
> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
> (line 110) resolve: 0 ms.
> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line
> 694) Read: 5 ms.
> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
> 116) Version is now 3
> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
> 116) Version is now 3
>
>
> this problem is really driving me crazy since I just dont know what
> happened, and how to debug it, I tried to kill node A and restart it, then
> node B halt, after I restart B, then node C goes down..
>
>
> one thing may related is that the log time on node B is not the same with
> the system time(A and C are okay).
>
> while date on node B shows:
> Sun Jul  1 23:10:57 CST 2012 (system time)
>
> but you may noticed that the time is "2012-07-01 07:45:XX" in those above
> log message.  the system time is right, just not sur

Re: cassandra halt after started minutes later

2012-07-01 Thread David Daeschler

This looks like the problem a bunch of us were having yesterday that
isn't cleared without a reboot or a date command. It seems to be
related to the leap second that was added between the 30th June and
the 1st of July.

See the mailing list thread with subject "High CPU usage as of 8pm eastern time"

If you are seeing high CPU usage and a stall after restarting
cassandra still, and you are on Linux, try:

date; date `date +"%m%d%H%M%C%y.%S"`; date;

In a terminal and see if everything starts working again.

I hope this helps.
-- 
David Daeschler



On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu  wrote:
> adjust the timezone of java by  -Duser.timezone   and the timezone of
> cassandra is the same with system(Debian 6.0).
>
> after restart cassandra I found the following error message in the log file
> of node B. after about 2 minutes later, node C stop responding
>
> the error log of node B:
>
> Thrift transport error occurred during processing of message.
> org.apache.thrift.transport.TTransportException
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
>
>
> the log info in node C:
>
>
> DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 RowMutationVerbHandler.java
> (line 60) RowMutation(keyspace='spark',
> key='3937343836623538363837363135353264313339333463343532623634373131656462306139',
> modifications=[ColumnFamily(permacache
> [76616c7565:false:67906@1341156582948365,])]) applied.  Sending response to
> 79529@/192.168.1.129
> DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java (line
> 523) insert
> DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark',
> key='636f6d6d656e74735f706172656e74735f32373232343938',
> modifications=[ColumnFamily(permacache
> [76616c7565:false:6@1341156582953843,])])]/QUORUM
> DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
> /192.168.1.40
> DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
> /192.168.1.129
> DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line
> 116) Version is now 3
> DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913
> ResponseVerbHandler.java (line 44) Processing response on a callback from
> 50050@/192.168.1.129
> DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line
> 116) Version is now 3
> DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914
> ResponseVerbHandler.java (line 44) Processing response on a callback from
> 50051@/192.168.1.40
> DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line
> 116) Version is now 3
>
>
>
> On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu  wrote:
>>
>> I have a three node cluster running 1.0.2, today there's a very strange
>> problem that suddenly two of cassandra  node(let's say B and C) was costing
>> a lot of cpu, turned out for some reason the "java" binary just dont run
>> I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works okay.
>>
>> after that node A stop working... same problem, I install "sun jdk", then
>> it's okay. but minutes later, B stop working again, about 5-10 minutes later
>> after the cassandra started, it stop responding connections, I can't access
>> 9160 and nodetool dont return either.
>>
>> I have turned on DEBUG and dont see much useful information, the last rows
>> on node B are as belows:
>> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
>> (line 65) resolving 2 responses
>> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
>> (line 106) digests verified
>> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
>> (line 110) resolve: 0 ms.
>> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line
>>

Re: cassandra halt after started minutes later

2012-07-01 Thread Yan Chunlu

huge great thanks  it is the leap second problem!

finally I can go to bed

On Mon, Jul 2, 2012 at 12:11 AM, David Daeschler
wrote:

> This looks like the problem a bunch of us were having yesterday that
> isn't cleared without a reboot or a date command. It seems to be
> related to the leap second that was added between the 30th June and
> the 1st of July.
>
> See the mailing list thread with subject "High CPU usage as of 8pm eastern
> time"
>
> If you are seeing high CPU usage and a stall after restarting
> cassandra still, and you are on Linux, try:
>
> date; date `date +"%m%d%H%M%C%y.%S"`; date;
>
> In a terminal and see if everything starts working again.
>
> I hope this helps.
> --
> David Daeschler
>
>
>
> On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu  wrote:
> > adjust the timezone of java by  -Duser.timezone   and the timezone of
> > cassandra is the same with system(Debian 6.0).
> >
> > after restart cassandra I found the following error message in the log
> file
> > of node B. after about 2 minutes later, node C stop responding
> >
> > the error log of node B:
> >
> > Thrift transport error occurred during processing of message.
> > org.apache.thrift.transport.TTransportException
> > at
> >
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> > at
> >
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> > at
> >
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> > at
> >
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> > at
> >
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> > at
> >
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> > at
> >
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
> > at
> >
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.java:662)
> >
> >
> >
> > the log info in node C:
> >
> >
> > DEBUG [MutationStage:25] 2012-07-01 23:29:42,909
> RowMutationVerbHandler.java
> > (line 60) RowMutation(keyspace='spark',
> >
> key='3937343836623538363837363135353264313339333463343532623634373131656462306139',
> > modifications=[ColumnFamily(permacache
> > [76616c7565:false:67906@1341156582948365,])]) applied.  Sending
> response to
> > 79529@/192.168.1.129
> > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java
> (line
> > 523) insert
> > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> > 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark',
> > key='636f6d6d656e74735f706172656e74735f32373232343938',
> > modifications=[ColumnFamily(permacache
> > [76616c7565:false:6@1341156582953843,])])]/QUORUM
> > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> > 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938
> to
> > /192.168.1.40
> > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> > 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938
> to
> > /192.168.1.129
> > DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line
> > 116) Version is now 3
> > DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 50050@/192.168.1.129
> > DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java
> (line
> > 116) Version is now 3
> > DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 50051@/192.168.1.40
> > DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java
> (line
> > 116) Version is now 3
> >
> >
> >
> > On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu 
> wrote:
> >>
> >> I have a three node cluster running 1.0.2, today there's a very strange
> >> problem that suddenly two of cassandra  node(let's say B and C) was
> costing
> >> a lot of cpu, turned out for some reason the "java" binary just dont
> run
> >> I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works
> okay.
> >>
> >> after that node A stop working... same problem, I install "sun jdk",
> then
> >> it's okay. but minutes later, B stop working again, about 5-10 minutes
> later
> >> after the cassandra started, it stop responding connections, I can't
> access
> >> 9160 and nodetool dont return either.
> >>
> >> I have turned on DEBUG and dont see much useful information, the last
> rows
> >> on node B are as be

Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

2012-07-01 Thread ruslan usifov

Hello

We was under ddos attack, and as result we got high ksoftirqd activity
- as result cassandra begin answer very slow. But when ddos was gone
high ksoftirqd activity still exists, and dissaper when i stop
cassandra daemon, and repeat again when i start cassadra daemon, the
fully resolution of problem is full reboot of server. What this can be
(why ksoftirqd begin work very intensive when cassandra runing - we
disable all working traffic to cluster but this doesn't help so this
is can't be due heavy load )? And how to solve this?

PS:
 OS ubuntu 10.0.4 (2.6.32.41)
 cassandra 1.0.10
 java 1.6.32 (from oracle)

Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

2012-07-01 Thread Sergey Kondratyev

Hello,
it is not related to cassandra/ddos.
it is kernel problems due to leap second. See
http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-during-a-leap-second

On Sun, Jul 1, 2012 at 1:05 PM, ruslan usifov  wrote:
> Hello
>
> We was under ddos attack, and as result we got high ksoftirqd activity
> - as result cassandra begin answer very slow. But when ddos was gone
> high ksoftirqd activity still exists, and dissaper when i stop
> cassandra daemon, and repeat again when i start cassadra daemon, the
> fully resolution of problem is full reboot of server. What this can be
> (why ksoftirqd begin work very intensive when cassandra runing - we
> disable all working traffic to cluster but this doesn't help so this
> is can't be due heavy load )? And how to solve this?
>
> PS:
>  OS ubuntu 10.0.4 (2.6.32.41)
>  cassandra 1.0.10
>  java 1.6.32 (from oracle)

Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

2012-07-01 Thread David Daeschler

Good afternoon,

This again looks like it could be the leap second issue:

This looks like the problem a bunch of us were having yesterday that
isn't cleared without a reboot or a date command. It seems to be
related to the leap second that was added between the 30th June and
the 1st of July.

See the mailing list thread with subject "High CPU usage as of 8pm eastern time"

If you are seeing high CPU usage and a stall after restarting
cassandra still, and you are on Linux, try:

date; date `date +"%m%d%H%M%C%y.%S"`; date;

In a terminal and see if everything starts working again.

I hope this helps. Please spread the word if you see others having
issues with unresponsive kernels/high CPU.

-- 
David Daeschler

On Sun, Jul 1, 2012 at 1:05 PM, ruslan usifov  wrote:
> Hello
>
> We was under ddos attack, and as result we got high ksoftirqd activity
> - as result cassandra begin answer very slow. But when ddos was gone
> high ksoftirqd activity still exists, and dissaper when i stop
> cassandra daemon, and repeat again when i start cassadra daemon, the
> fully resolution of problem is full reboot of server. What this can be
> (why ksoftirqd begin work very intensive when cassandra runing - we
> disable all working traffic to cluster but this doesn't help so this
> is can't be due heavy load )? And how to solve this?
>
> PS:
>  OS ubuntu 10.0.4 (2.6.32.41)
>  cassandra 1.0.10
>  java 1.6.32 (from oracle)

Re: Cassandra consistency issue on cluster system

2012-07-01 Thread aaron morton

If you are reading at QUOURM there is no problem, this is how eventual 
consistency works in Cassandra.

The coordinator will resolve the differences between and the column with the 
higher timestamp will win. 

If the delete was applied to less then CL nodes the client should have received 
a TimedOutException.

 
Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/06/2012, at 7:41 PM, 黄荣桢 wrote:

> Background: My application is running on a cluster system(which have 4 
> nodes), and system time of these four nodes are synchronizing by NTP. I use 
> Write.QUORUM and Read.QUORUM strategy. The probability of this problem is not 
> very high. Cassandra version is 1.0.3, I have tried Cassandra 1.1.1, this 
> problem is still exist.
> 
> Problem: I deleted a column, but after 6 seconds, Cassandra can still get the 
> old record which "isMarkedForDelete" is still false.
> 
> Is anybody meet the same problem? And how to solve it?
> 
> Detail: See the log below:
> 
> Node 3(Local node):
> [pool-2-thread-42] 2012-06-27 14:49:23,732 SliceQueryFilter.java (line 123) 
> collecting 0 of 2147483647: SuperColumn(667072 
> [..7fff01382ca96c8b636b698a:false:36@1340779097312016,..)
> 
> [pool-2-thread-44] 2012-06-27 14:51:21,367 StorageProxy.java (line 172) 
> Mutations/ConsistencyLevel are [RowMutation(keyspace='drc', key='3332', 
> modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
> [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])])]/QUORUM
> 
> -- I delete this record at 14:51:21,367
> 
> [pool-2-thread-37] 2012-06-27 14:51:27,400 SliceQueryFilter.java (line 123) 
> collecting 0 of 2147483647: SuperColumn(667072 
> [..,7fff01382ca96c8b636b698a:false:36@1340779097312016,..)
> 
> -- But I can still get the old record at 14:51:27,400
> 
> Node2:
> [MutationStage:118] 2012-06-27 14:51:21,373 RowMutationVerbHandler.java (line 
> 48) Applying RowMutation(keyspace='drc', key='3332', 
> modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
> [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])])
> 
> [MutationStage:118] 2012-06-27 14:51:21,374 RowMutationVerbHandler.java (line 
> 60) RowMutation(keyspace='drc', key='3332', 
> modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
> [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])]) 
> applied. Sending response to 6692098@/192.168.0.3
> 
> [MutationStage:123] 2012-06-27 14:51:27,405 RowMutationVerbHandler.java (line 
> 48) Applying RowMutation(keyspace='drc', key='3332', 
> modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
> [..,7fff01382ca96c8b636b698a:false:36@1340779097312016,..])
> 
> [MutationStage:123] 2012-06-27 14:51:27,405 RowMutationVerbHandler.java (line 
> 60) RowMutation(keyspace='drc', key='3332', 
> modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
> [..,7fff01382ca96c8b636b698a:false:36@1340779097312016,...]),])])
>  applied. Sending response to 6698516@/192.168.0.3
> 
> Node1:
> [MutationStage:98] 2012-06-27 14:51:24,661 RowMutationVerbHandler.java (line 
> 48) Applying RowMutation(keyspace='drc', key='3332', 
> modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
> [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])])
> 
> [MutationStage:98] 2012-06-27 14:51:24,675 RowMutationVerbHandler.java (line 
> 60) RowMutation(keyspace='drc', key='3332', 
> modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
> [7fff01382ca96c8b636b698a: true :4@1340779881338000,]),])]) 
> applied. Sending response to 6692099@/192.168.0.3
> 
> [MutationStage:93] 2012-06-27 14:51:40,932 RowMutationVerbHandler.java (line 
> 48) Applying RowMutation(keyspace='drc', key='3332', 
> modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
> [7fff01382ca96c8b636b698a:true:4@1340779900915004,]),])])
> 
> DEBUG [MutationStage:93] 2012-06-27 14:51:40,933 RowMutationVerbHandler.java 
> (line 60) RowMutation(keyspace='drc', key='3332', 
> modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
> [7fff01382ca96c8b636b698a: true :4@1340779900915004,]),])]) 
> applied. Sending response to 6706555@/192.168.0.3
> 
> [ReadStage:55] 2012-06-27 14:51:43,074 SliceQueryFilter.java (line 123) 
> collecting 0 of 
> 5000:7fff01382ca96c8b636b698a:true:4@1340779900915004
> 
> Node 4:
> 
> There is no log about this record on Node 4.
>

Re: No indexed columns present in by-columns clause with "equals" operator

2012-07-01 Thread aaron morton

Like the exception says:

> Bad Request: No indexed columns present in by-columns clause with "equals" 
> operator
> Same with other relational operators(<,>=,<=)
You must include an equality operator in the where clause:

That is why
> SELECT * FROM STEST WHERE VALUE1 = 10; 

Works but 
> SELECT * FROM STEST WHERE VALUE1 > 10; 
does not. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/06/2012, at 8:55 PM, Abhijit Chanda wrote:

> Hi All,
> I have got a strange exception while using cassandra cql. Relational 
> operators like (<, >, >=, <=) are not working.
> my columnfamily looks like this.
> CREATE COLUMNFAMILY STEST (
>   ROW_KEY text PRIMARY KEY,
>   VALUE1 text,
>   VALUE2 text
> ) WITH
>   comment='' AND
>   comparator=text AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   default_validation=text AND
>   min_compaction_threshold=4 AND
>   max_compaction_threshold=32 AND
>   replicate_on_write=True;
> 
> CREATE INDEX VALUE1_IDX ON STEST (VALUE1);
> 
> CREATE INDEX VALUE2_IDX ON STEST (VALUE2);
> 
> 
> Now in this columnfamily if i query this 
> SELECT * FROM STEST WHERE VALUE1 = 10; it returns ->
>  ROW_KEY | VALUE1 | VALUE2
>  -+-+
> 2 | 10 | AB
> 
> But if i query like this 
> SELECT * FROM STEST WHERE VALUE1 > 10; 
> It is showing this exception
> Bad Request: No indexed columns present in by-columns clause with "equals" 
> operator
> Same with other relational operators(<,>=,<=)
> 
> these are  the datas available in my columnfamily 
> ROW_KEY | VALUE1 | VALUE2
> +--+
>   3 | 100 |ABC
>   5 |9 |  ABCDE
>   2 |  10 | AB
>   1 |1 |  A
>   4 |  19 |   ABCD
> 
> Looks like some configuration problem. Please help me. Thanks in Advance
> 
> 
> 
> 
> Regards,
> -- 
> Abhijit Chanda
> Analyst
> VeHere Interactive Pvt. Ltd.
> +91-974395
>

Re: BulkLoading SSTables and compression

2012-07-01 Thread aaron morton

When the data is streamed into the cluster by the bulk loader it is compressed 
on the receiving end (if the target CF has compression enabled).

If you are able to reproduce this  can you create a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/06/2012, at 10:00 PM, Andy Cobley wrote:

> My (limited) experience of moving form 0.8 to 1.0 is that you do have to use 
> rebuildsstables.  I'm guessing BlukLoading is bypassing the compression ?
> 
> Andy
> 
> On 28 Jun 2012, at 10:53, jmodha wrote:
> 
>> Hi,
>> 
>> We are migrating our Cassandra cluster from v1.0.3 to v1.1.1, the data is
>> migrated using SSTableLoader to an empty Cassandra cluster.
>> 
>> The data in the source cluster (v1.0.3) is uncompressed and the target
>> cluster (1.1.1) has the column family created with compression turned on.
>> 
>> What we are seeing is that once the data has been loaded into the target
>> cluster, the size is similar to the data in the source cluster. Our
>> expectation is that since we have turned on compression in the target
>> cluster, the amount of data would be reduced.
>> 
>> We have tried running the "rebuildsstables" nodetool command on a node after
>> data has been loaded and we do indeed see a huge reduction in size e.g. from
>> 30GB to 10GB for a given column family. We were hoping to see this at the
>> point of loading the data in via the SSTableLoader.
>> 
>> Is this behaviour expected? 
>> 
>> Do we need to run the rebuildsstables command on all nodes to actually
>> compress the data after it has been streamed in?
>> 
>> Thanks.
>> 
>> --
>> View this message in context: 
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BulkLoading-SSTables-and-compression-tp7580849.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
>> Nabble.com.
> 
> 
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
> 
>

Re: Amazingly bad compaction performance

2012-07-01 Thread aaron morton

>>> Can compression be changed or disabled on-the-fly with cassandra?
Yes. Disable it in the schema and then run nodetool upgradetables

As Tyler said, JDK7 is not officially supported yet and you may be running into 
issues others have not found. Any chance you could downgrade one node to JDK6 
and check the performance ? If it looks like a JDK issue could you post your 
findings to https://issues.apache.org/jira/browse/CASSANDRA and include the 
schema details ? 

Thanks

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/06/2012, at 2:36 AM, Dustin Wenz wrote:

> My maximum and initial heap sizes are set to 6GB. Actual memory usage for the 
> VM is around 11-12GB. The machine has 24GB of physical memory, so there isn't 
> any paging going in.
> 
> I don't see any GC events logged that are longer than a few hundred 
> milliseconds. Is it possible that GC is taking significant time without it 
> being reported?
> 
>   - .Dustin
> 
> On Jun 27, 2012, at 1:31 AM, Igor wrote:
> 
>> Hello
>> 
>> Too much GC? Check JVM heap settings and real usage.
>> 
>> On 06/27/2012 01:37 AM, Dustin Wenz wrote:
>>> We occasionally see fairly poor compaction performance on random nodes in 
>>> our 7-node cluster, and I have no idea why. This is one example from the 
>>> log:
>>> 
>>> [CompactionExecutor:45] 2012-06-26 13:40:18,721 CompactionTask.java 
>>> (line 221) Compacted to 
>>> [/raid00/cassandra_data/main/basic/main-basic.basic_id_index-hd-160-Data.db,].
>>>   26,632,210 to 26,679,667 (~100% of original) bytes for 2 keys at 
>>> 0.006250MB/s.  Time: 4,071,163ms.
>>> 
>>> That particular event took over an hour to compact only 25 megabytes. 
>>> During that time, there was very little disk IO, and the java process 
>>> (OpenJDK 7) was pegged at 200% CPU. The node was also completely 
>>> unresponsive to network requests until the compaction was finished. Most 
>>> compactions run just over 7MB/s. This is an extreme outlier, but users 
>>> definitely notice the hit when it occurs.
>>> 
>>> I grabbed a sample of the process using jstack, and this was the only 
>>> thread in CompactionExecutor:
>>> 
>>> "CompactionExecutor:54" daemon prio=1 tid=41247522816 nid=0x99a5ff740 
>>> runnable [140737253617664]
>>>java.lang.Thread.State: RUNNABLE
>>> at org.xerial.snappy.SnappyNative.rawCompress(Native Method)
>>> at org.xerial.snappy.Snappy.rawCompress(Snappy.java:358)
>>> at 
>>> org.apache.cassandra.io.compress.SnappyCompressor.compress(SnappyCompressor.java:80)
>>> at 
>>> org.apache.cassandra.io.compress.CompressedSequentialWriter.flushData(CompressedSequentialWriter.java:89)
>>> at 
>>> org.apache.cassandra.io.util.SequentialWriter.flushInternal(SequentialWriter.java:196)
>>> at 
>>> org.apache.cassandra.io.util.SequentialWriter.reBuffer(SequentialWriter.java:260)
>>> at 
>>> org.apache.cassandra.io.util.SequentialWriter.writeAtMost(SequentialWriter.java:128)
>>> at 
>>> org.apache.cassandra.io.util.SequentialWriter.write(SequentialWriter.java:112)
>>> at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>> - locked <36527862064> (a java.io.DataOutputStream)
>>> at 
>>> org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:142)
>>> at 
>>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:156)
>>> at 
>>> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
>>> at 
>>> org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
>>> at 
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>> at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at 
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>>> 
>>> Is it possible that there is an issue with snappy compression? Based on the 
>>> lousy compression ratio, I think we could get by without it just fine. Can 
>>> compression be changed or disabled on-the-fly with cassandra?
>>> 
>>> - .Dustin
>> 
>> 
>

Re: hector timeouts

2012-07-01 Thread aaron morton

Using Cassandra as a queue is generally thought of as a bas idea, owing to the 
high delete workload. Levelled compaction handles it better but it is still no 
the best approach. 

Depending on your needs consider running http://incubator.apache.org/kafka/ 

> could you share some details on this?  we're using hector and we see random 
> timeout warns in the logs and not sure how to address them.
First determine if they are server side or client side timeouts. Then determine 
what the query was. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/06/2012, at 7:02 AM, Deno Vichas wrote:

> On 6/28/2012 9:37 AM, David Leimbach wrote:
>> 
>> That coupled with Hector timeout issues became a real problem for us.
> 
> could you share some details on this?  we're using hector and we see random 
> timeout warns in the logs and not sure how to address them.
> 
> 
> thanks,
> deno

Re: BulkLoading SSTables and compression

2012-07-01 Thread jmodha

Sure, before I create a ticket, is there a way I can confirm that the
sstables are indeed not compressed other than running the "rebuildsstables"
nodetool command (and observing the live size go down)?

Thanks.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BulkLoading-SSTables-and-compression-tp7580849p7580922.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

2012-07-01 Thread ruslan usifov

2012/7/1 David Daeschler :
> Good afternoon,
>
> This again looks like it could be the leap second issue:
>
> This looks like the problem a bunch of us were having yesterday that
> isn't cleared without a reboot or a date command. It seems to be
> related to the leap second that was added between the 30th June and
> the 1st of July.
>
> See the mailing list thread with subject "High CPU usage as of 8pm eastern 
> time"
>
> If you are seeing high CPU usage and a stall after restarting
> cassandra still, and you are on Linux, try:
>
> date; date `date +"%m%d%H%M%C%y.%S"`; date;
>
> In a terminal and see if everything starts working again.
>
> I hope this helps. Please spread the word if you see others having
> issues with unresponsive kernels/high CPU.

 Hello, this realy helps. In our case two problems cross each other-((
and we doesn't have assumed that might be a kernel problem. On one
data cluster we simply reboot it, and in seccond apply date solution
and everything is fine, thanks

Re: No indexed columns present in by-columns clause with "equals" operator

2012-07-01 Thread Abhijit Chanda

Hey Aaron,

I am able to sort out the problem. Thanks anyways.

Regards,
Abhijit

Re: Memtable tuning in 1.0 and higher

Re: High CPU usage as of 8pm eastern time

Re: Memtable tuning in 1.0 and higher

Re: High CPU usage as of 8pm eastern time

SnappyCompressor and Cassandra 1.1.1

Bootstrap code path

Re: Failed to solve Digest mismatch

cassandra halt after started minutes later

Re: cassandra halt after started minutes later

Re: cassandra halt after started minutes later

Re: cassandra halt after started minutes later

Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

Re: Cassandra consistency issue on cluster system

Re: No indexed columns present in by-columns clause with "equals" operator

Re: BulkLoading SSTables and compression

Re: Amazingly bad compaction performance

Re: hector timeouts

Re: BulkLoading SSTables and compression

Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

Re: No indexed columns present in by-columns clause with "equals" operator

22 matches

Site Navigation

Mail list logo

Footer information