Re: Unit Testing Cassandra

2013-06-19 Thread Stephen Connolly
Unit testing means testing in isolation the smallest part.

Unit tests should not take more than a few milliseconds to set up and
verify their assertions.

As such, if your code is not factored well for testing, you would typically
use mocking (either by hand, or with mocking libraries) to mock out the
bits not under test.

Extensive use of mocks is usually a smell of code that is not well designed
*for testing*

If you intend to test components integrated together... That is integration
testing.

If you intend to test performance of the whole or significant parts of the
whole... That is performance testing.

When searching for the above, you will not get much luck if you are looking
for them in the context of "unit testing" as those things are *outside the
scope of unit testing"

On Wednesday, 19 June 2013, Shahab Yunus wrote:

> Hello,
>
> Can anyone suggest a good/popular Unit Test tools/frameworks/utilities out
> there for unit testing Cassandra stores? I am looking for testing from
> performance/load and monitoring perspective. I am using 1.2.
>
> Thanks a lot.
>
> Regards,
> Shahab
>


-- 
Sent from my phone


Re: Reduce Cassandra GC

2013-06-19 Thread Joel Samuelsson
My Cassandra ps info:

root 26791 1  0 07:14 ?00:00:00 /usr/bin/jsvc -user
cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
/var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
-cp
/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true
-XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
org.apache.cassandra.service.CassandraDaemon
103  26792 26791 99 07:14 ?854015-22:02:22 /usr/bin/jsvc -user
cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
/var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
-cp
/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true
-XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.manage

Rolling upgrade from 1.1.12 to 1.2.5 visibility issue

2013-06-19 Thread Polytron Feng
Hi,

We are trying to roll upgrade from 1.0.12 to 1.2.5, but we found that the
1.2.5 node cannot see other old nodes.
Therefore, we tried to upgrade to 1.1.12 first, and it works.
However, we still saw the same issue when rolling upgrade from 1.1.12 to
1.2.5.
This seems to be the fixed issue as
https://issues.apache.org/jira/browse/CASSANDRA-5332 but we still saw it in
1.2.5.

Enviroment:
   OS: CentOS 6
   JDK: 6u31
   cluster:3 nodes for testing, in EC2
   Snitch: Ec2MultiRegionSnitch
   NetworkTopologyStrategy: strategy_options = { ap-southeast:3 }

We have 3 nodes and we upgraded 122.248.xxx.xxx to 1.2.5 first, the other 2
nodes are still 1.1.12.
When we restarted the upgraded node, it will see the other 2 old nodes as
UP in the log.
However, after a few seconds, these 2 nodes will be marked as DOWN.
This is the ring info from 1.2.5 node - 122.248.xxx.xxx

Note: Ownership information does not include topology; for complete
information, specify a keyspace

Datacenter: ap-southeast
==
Address  RackStatus State   LoadOwns
 Token

113427455640312821154458202477256070486
122.248.xxx.xxx  1b  Up Normal  69.74 GB33.33%
 1
54.251.xxx.xxx   1b  Down   Normal  69.77 GB33.33%
 56713727820156410577229101238628035243
54.254.xxx.xxx   1b  Down   Normal  70.28 GB33.33%
 113427455640312821154458202477256070486


but Old 1.1.12 nodes can see new node:

Note: Ownership information does not include topology, please specify a
keyspace.
Address DC  RackStatus State   Load
 OwnsToken

   113427455640312821154458202477256070486
122.248.xxx.xxx ap-southeast1b  Up Normal  69.74 GB
 33.33%  1
54.251.xxx.xxx  ap-southeast1b  Up Normal  69.77 GB
 33.33%  56713727820156410577229101238628035243
54.254.xxx.xxx  ap-southeast1b  Up Normal  70.28 GB
 33.33%  113427455640312821154458202477256070486


We enabled trace log level to check gossip related logs. The log below from
1.2.5 node shows that the
other 2 nodes are UP in the beginning. They seem to complete SYN/ACK/ACK2
handshake cycle.

TRACE [GossipStage:1] 2013-06-19 07:44:43,047
GossipDigestSynVerbHandler.java (line 40) Received a GossipDigestSynMessage
from /54.254.xxx.xxx
TRACE [GossipStage:1] 2013-06-19 07:44:43,047
GossipDigestSynVerbHandler.java (line 71) Gossip syn digests are :
/54.254.xxx.xxx:1371617084:10967 /54.251.xxx.xxx:1371625851:2055
TRACE [GossipStage:1] 2013-06-19 07:44:43,048 Gossiper.java (line 945)
requestAll for /54.254.xxx.xxx
.

TRACE [GossipStage:1] 2013-06-19 07:44:43,080
GossipDigestSynVerbHandler.java (line 84) Sending a GossipDigestAckMessage
to /54.254.xxx.xxx
TRACE [GossipStage:1] 2013-06-19 07:44:43,080 MessagingService.java
(line 601) /122.248.216.142 sending GOSSIP_DIGEST_ACK to 19@/54.254.xxx.xxx
TRACE [GossipStage:1] 2013-06-19 07:44:43,080
GossipDigestSynVerbHandler.java (line 40) Received a GossipDigestSynMessage
from /54.254.xxx.xxx
TRACE [GossipStage:1] 2013-06-19 07:44:43,080
GossipDigestSynVerbHandler.java (line 71) Gossip syn digests are :
/54.254.xxx.xxx:1371617084:10978 /54.251.xxx.xxx:1371625851:2066

.

TRACE [GossipStage:1] 2013-06-19 07:44:43,083
GossipDigestSynVerbHandler.java (line 84) Sending a GossipDigestAckMessage
to /54.254.xxx.xxx
TRACE [GossipStage:1] 2013-06-19 07:44:43,083 MessagingService.java
(line 601) /122.248.216.142 sending GOSSIP_DIGEST_ACK to 22@/54.254.xxx.xxx
TRACE [GossipStage:1] 2013-06-19 07:44:43,084
GossipDigestAck2VerbHandler.java (line 38) Received a
GossipDigestAck2Message from /54.254.xxx.xxx
TRACE [GossipStage:1] 2013-06-19 07:44:43,084 FailureDetector.java
(line 168) reporting /54.254.xxx.xxx
 INFO [GossipStage:1] 2013-06-19 07:44:43,093 Gossiper.java (line 805)
Node /54.254.xxx.xxx is now part of the cluster
TRACE [GossipStage:1] 2013-06-19 07:44:43,094 Gossiper.java (line 808)
Adding endpoint state for /54.254.xxx.xxx
DEBUG [GossipStage:1] 2013-06-19 07:44:43,094 MessagingService.java
(line 377) Resetting pool for /54.254.xxx.xxx
TRACE [GossipStage:1] 2013-06-19 07:44:43,094 Gossiper.java (line 764)
marking as alive /54.254.xxx.xxx
DEBUG [GossipStage:1] 2013-06-19 07:44:43,095 Gossiper.java (line 770)
removing expire time for endpoint : /54.254.xxx.xxx
 INFO [GossipStage:1] 2013-06-19 07:44:43,095 Gossiper.java (line 771)
InetAddress /54.254.xxx.xxx is now UP



After a few seconds, the log shows that the old nodes did not respond ACK2
and are marked as DOWN:

TRACE [GossipStage:1] 2013-06-19 07:44:52,121
GossipDigestSynVerbHandler.java (line 40) Received a GossipDigestSynMessage
from /54.254.xxx.xxx
TRACE [GossipStage:1] 2013-06-19 07:44:52,123
GossipDigestSynVerbHandler.java (line 71) Gossip syn diges

Re: Reduce Cassandra GC

2013-06-19 Thread Takenori Sato
GC options are not set. You should see the followings.

 -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure
-Xloggc:/var/log/cassandra/gc-1371603607.log

> Is it normal to have two processes like this?

No. You are running two processes.


On Wed, Jun 19, 2013 at 4:16 PM, Joel Samuelsson
wrote:

> My Cassandra ps info:
>
> root 26791 1  0 07:14 ?00:00:00 /usr/bin/jsvc -user
> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
> 103  26792 26791 99 07:14 ?854015-22:02:22 /usr/bin/jsvc -user
> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
> -XX:+HeapDumpO

Re: Reduce Cassandra GC

2013-06-19 Thread Joel Samuelsson
Right, after getting the GC logging information I tested upgrading to 1.2.
Didn't help but I forgot to reenable the GC options.

> No. You are running two processes.
Ok, that's weird. I am using an unmodified version of a startup script in
/etc/init.d/cassandra from the Debian package. Here's some output:

joel@dev:~$ sudo /etc/init.d/cassandra stop
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M
-Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
joel@dev:~$ ps -ef | grep cassandra
joel  5038  9763  0 08:58 pts/200:00:00 grep --color=auto cassandra
joel@dev:~$ sudo /etc/init.d/cassandra start
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M
-Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
joel@dev:~$ ps -ef | grep cassandra
root  5116 1  0 08:59 ?00:00:00 /usr/bin/jsvc -user
cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
/var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
-cp
/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true
-XX:HeapDumpPath=/var/lib/cassandra/java_1371632342.hprof
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371632342.log -ea
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
org.apache.cassandra.service.CassandraDaemon
103   5117  5116 99 08:59 ?640511-22:40:28 /usr/bin/jsvc -user
cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
/var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
-cp
/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cass

RE: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread James Lee
The test tool I am using catches any exceptions on the original writes and 
resubmits the write request until it's successful (bailing out after 5 
failures).  So for each key Cassandra has reported a successful write.


Nodetool says the following - I'm guessing the pending hinted handoff is the 
interesting bit?

comet-mvs01:/dsc-cassandra-1.2.2# ./bin/nodetool tpstats
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 0 0  35445 0  
   0
RequestResponseStage  0 01535171 0  
   0
MutationStage 0 03038941 0  
   0
ReadRepairStage   0 0   2695 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 0   2898 0  
   0
AntiEntropyStage  0 0  0 0  
   0
MigrationStage0 0245 0  
   0
MemtablePostFlusher   0 0   1260 0  
   0
FlushWriter   0 0633 0  
 212
MiscStage 0 0  0 0  
   0
commitlog_archiver0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
HintedHandoff 1 1  0 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 0
MUTATION 60427
_TRACE   0
REQUEST_RESPONSE 0


Looking at the hints column family in the system keyspace, I see one row with a 
large number of columns.  Presumably that along with the nodetool output above 
suggests there are hinted handoffs pending?  How long should I expect these to 
remain for?

Ah, actually now that I re-run the command it seems that nodetool now reports 
that hint as completed and there are no hints left in the system keyspace on 
either node.  I'm still seeing failures to read the data I'm expecting though, 
as before.  Note that I've run this with a smaller data set (2M rows, 1GB data 
total) for this latest test.

Thanks,
James


-Original Message-
From: Robert Coli [mailto:rc...@eventbrite.com] 
Sent: 18 June 2013 19:45
To: user@cassandra.apache.org
Subject: Re: Data not fully replicated with 2 nodes and replication factor 2

On Tue, Jun 18, 2013 at 11:36 AM, Wei Zhu  wrote:
> Cassandra doesn't do async replication like HBase does.You can run 
> nodetool repair to insure the consistency.

While this answer is true, it is somewhat non-responsive to the OP.

If the OP didn't see timeout exception, the theoretical worst case is that he 
should have hints stored for initially failed to replicate writes. His nodes 
should not be failing GC with a total data size of 5gb on an 8gb heap, so those 
hints should deliver quite quickly. After
30 minutes those hints should certainly be delivered.

@OP : do you see hints being stored? does nodetool tpstats indicate dropped 
messages?

=Rob


Re: Reduce Cassandra GC

2013-06-19 Thread Fabrice Facorat
2013/6/19 Takenori Sato :
> GC options are not set. You should see the followings.
>
>  -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure
> -Xloggc:/var/log/cassandra/gc-1371603607.log
>
>> Is it normal to have two processes like this?
>
> No. You are running two processes.

It's "normal" as this is jsvc behavior.
It start as root, then fork to start cassandra as cassandra user, you
en up with 2 process with the second one running as cassandra having
the one started as root for parent

http://commons.apache.org/proper/commons-daemon/jsvc.html

--
Close the World, Open the Net
http://www.linux-wizard.net


TTL can't be speciefied at column level using CQL 3 in Cassandra 1.2.x

2013-06-19 Thread Amresh Kumar Singh
Hi,

Using Thrift, we are allowed to specify different TTL values for each columns 
in a row.

But CQL3 doesn't provide a way for this.

For instance, this is allowed:
INSERT INTO users (user_name, password, gender, state)  VALUES ('xamry2, 
'aa', 'm', 'UP') using TTL 5;

But something like this is not achievable:
INSERT INTO users (user_name, password, gender, state)  VALUES ('xamry' using 
TTL 3, 'aa' using TTL , 'm' using TTL , 'UP' using TTL );


Why is such inconsistency. Should we not be able to achieve this using CQL 
looking at the fact that CQL usage is encouraged as a replacement for Thrift.

Sincerely,
Amresh









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: TTL can't be speciefied at column level using CQL 3 in Cassandra 1.2.x

2013-06-19 Thread Sylvain Lebresne
Hi,

> But CQL3 doesn't provide a way for this.

That's not true. But the syntax is probably a bit more verbose than what you
were hoping for. Your example (where I assume user_name is you partition
key)
can be achieved with:
  BEGIN BATCH
UPDATE users SET password = 'aa' WHERE user_name='xamry' USING TTL
4;
UPDATE users SET gender = 'm'WHERE user_name='xamry' USING TTL
1;
UPDATE users SET state = 'UP'WHERE user_name='xamry' USING TTL
6;
  APPLY BATCH;

Granted that is a tad verbose, but in term of the actual query performed
this
is *absolutely* equivalent to what you would do in thrift.

So should we provide a shorter syntax to achieve this? It's worth discussing
and nobody said CQL3 is not meant to evolve. Though my initial opinion on
this
is that setting different TTL on different columns in the same CQL3 row and
the
same query is probably not all that common overall, so I'm not totally
convinced it's worth adding complexity to the syntax for such a shortcut
(yes,
a shorter syntax would mean less bytes to transfer to the server for the
query
and less to parse but if you care about performance you should be using
prepared statement which makes that issue moot).

--
Sylvain



On Wed, Jun 19, 2013 at 11:40 AM, Amresh Kumar Singh <
amresh.si...@impetus.co.in> wrote:

>  Hi,
>
> Using Thrift, we are allowed to specify different TTL values for each
> columns in a row.
>
> But CQL3 doesn't provide a way for this.
>
> For instance, this is allowed:
> INSERT INTO users (user_name, password, gender, state)  VALUES ('xamry2,
> 'aa', 'm', 'UP') using TTL 5;
>
> But something like this is not achievable:
> INSERT INTO users (user_name, password, gender, state)  VALUES ('xamry'
> using TTL 3, 'aa' using TTL , 'm' using TTL , 'UP' using
> TTL );
>
>
> Why is such inconsistency. Should we not be able to achieve this using CQL
> looking at the fact that CQL usage is encouraged as a replacement for
> Thrift.
>
> Sincerely,
> Amresh
>
>
> --
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


RE: TTL can't be speciefied at column level using CQL 3 in Cassandra 1.2.x

2013-06-19 Thread Amresh Kumar Singh
Thanks Sylvian,

I am working on a high level client (Kundera) which, if users want, should be 
able to achieve this, even if that's uncommon.

Writing Update Batch CQL is an approach that works, as you are saying 
performance is not impacted.

In my opinion, an *optional* "USING TTL" with column values (in addition to 
existing syntax) won't hurt. In most of the cases people won't need this, but 
if they do, it will be there.

In the meantime, I am going to try approach you suggested. Thanks again!

Sincerely,
Amresh

From: Sylvain Lebresne [sylv...@datastax.com]
Sent: Wednesday, June 19, 2013 3:45 PM
To: user@cassandra.apache.org
Subject: Re: TTL can't be speciefied at column level using CQL 3 in Cassandra 
1.2.x

Hi,

> But CQL3 doesn't provide a way for this.

That's not true. But the syntax is probably a bit more verbose than what you
were hoping for. Your example (where I assume user_name is you partition key)
can be achieved with:
  BEGIN BATCH
UPDATE users SET password = 'aa' WHERE user_name='xamry' USING TTL 
4;
UPDATE users SET gender = 'm'WHERE user_name='xamry' USING TTL 
1;
UPDATE users SET state = 'UP'WHERE user_name='xamry' USING TTL 
6;
  APPLY BATCH;

Granted that is a tad verbose, but in term of the actual query performed this
is *absolutely* equivalent to what you would do in thrift.

So should we provide a shorter syntax to achieve this? It's worth discussing
and nobody said CQL3 is not meant to evolve. Though my initial opinion on this
is that setting different TTL on different columns in the same CQL3 row and the
same query is probably not all that common overall, so I'm not totally
convinced it's worth adding complexity to the syntax for such a shortcut (yes,
a shorter syntax would mean less bytes to transfer to the server for the query
and less to parse but if you care about performance you should be using
prepared statement which makes that issue moot).

--
Sylvain



On Wed, Jun 19, 2013 at 11:40 AM, Amresh Kumar Singh 
mailto:amresh.si...@impetus.co.in>> wrote:
Hi,

Using Thrift, we are allowed to specify different TTL values for each columns 
in a row.

But CQL3 doesn't provide a way for this.

For instance, this is allowed:
INSERT INTO users (user_name, password, gender, state)  VALUES ('xamry2, 
'aa', 'm', 'UP') using TTL 5;

But something like this is not achievable:
INSERT INTO users (user_name, password, gender, state)  VALUES ('xamry' using 
TTL 3, 'aa' using TTL , 'm' using TTL , 'UP' using TTL );


Why is such inconsistency. Should we not be able to achieve this using CQL 
looking at the fact that CQL usage is encouraged as a replacement for Thrift.

Sincerely,
Amresh









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Real Use Cases in Cassandra !!!

2013-06-19 Thread varadarajan . v

Team,

Can anyone share real use cases in Cassandra?


Thanks & Regards,
Varada
Solution Architect/Business Information Management Services Practice
Polaris Financial Technology Limited
6th Floor, West Wing, Nxt lvl, Navalur
W:044-33418000*8613  M:9791700984 : VOIP:90-8613
E:varadaraja...@polarisft.com

"Delivering Retail Banking Services and DATA Services - Thought to
Experience"

Theme 2013 - 14 : "Year of Converging Thought Leadership"



This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com



RE: Real Use Cases in Cassandra !!!

2013-06-19 Thread Romain HARDOUIN
Hi,

Have a look at DataStax's customers: http://www.datastax.com/customers


varadaraja...@polarisft.com a écrit sur 19/06/2013 12:48:50 :

> De : varadaraja...@polarisft.com
> A : user@cassandra.apache.org, 
> Date : 19/06/2013 12:49
> Objet : Real Use Cases in Cassandra !!!
> 
> 
> Team,
> 
>Can anyone share real use cases in Cassandra?
> 
> 
> Thanks & Regards,
> Varada
> Solution Architect/Business Information Management Services Practice
> Polaris Financial Technology Limited
> 6th Floor, West Wing, Nxt lvl, Navalur
> W:044-33418000*8613  M:9791700984 : VOIP:90-8613
> E:varadaraja...@polarisft.com
> 
> "Delivering Retail Banking Services and DATA Services - Thought to
> Experience"
> 
> Theme 2013 - 14 : "Year of Converging Thought Leadership"
> 
> 
> 
> This e-Mail may contain proprietary and confidential information and
> is sent for the intended recipient(s) only.  If by an addressing or 
> transmission error this mail has been misdirected to you, you are 
> requested to delete this mail immediately. You are also hereby 
> notified that any use, any form of reproduction, dissemination, 
> copying, disclosure, modification, distribution and/or publication 
> of this e-mail message, contents or its attachment other than by its
> intended recipient/s is strictly prohibited.
> 
> Visit us at http://www.polarisFT.com
> 


Re: Real Use Cases in Cassandra !!!

2013-06-19 Thread Elliot Thompson
Visit Planet cassandra website.. hosted by datastax..
On 19 Jun 2013, at 13:21, Romain HARDOUIN  wrote:

> Hi, 
> 
> Have a look at DataStax's customers: http://www.datastax.com/customers 
> 
> 
> varadaraja...@polarisft.com a écrit sur 19/06/2013 12:48:50 :
> 
> > De : varadaraja...@polarisft.com 
> > A : user@cassandra.apache.org, 
> > Date : 19/06/2013 12:49 
> > Objet : Real Use Cases in Cassandra !!! 
> > 
> > 
> > Team,
> > 
> >Can anyone share real use cases in Cassandra?
> > 
> > 
> > Thanks & Regards,
> > Varada
> > Solution Architect/Business Information Management Services Practice
> > Polaris Financial Technology Limited
> > 6th Floor, West Wing, Nxt lvl, Navalur
> > W:044-33418000*8613  M:9791700984 : VOIP:90-8613
> > E:varadaraja...@polarisft.com
> > 
> > "Delivering Retail Banking Services and DATA Services - Thought to
> > Experience"
> > 
> > Theme 2013 - 14 : "Year of Converging Thought Leadership"
> > 
> > 
> > 
> > This e-Mail may contain proprietary and confidential information and
> > is sent for the intended recipient(s) only.  If by an addressing or 
> > transmission error this mail has been misdirected to you, you are 
> > requested to delete this mail immediately. You are also hereby 
> > notified that any use, any form of reproduction, dissemination, 
> > copying, disclosure, modification, distribution and/or publication 
> > of this e-mail message, contents or its attachment other than by its
> > intended recipient/s is strictly prohibited.
> > 
> > Visit us at http://www.polarisFT.com
> > 



Re: nodetool ring showing different 'Load' size

2013-06-19 Thread Rodrigo Felix
Thanks Eric. Is there a way to start manually compaction operations?
I'm thinking about doing after loading data and before start run phase of
the benchmark.
Thanks.

Att.

*Rodrigo Felix de Almeida*
LSBD - Universidade Federal do Ceará
Project Manager
MBA, CSM, CSPO, SCJP


On Mon, Jun 17, 2013 at 12:41 PM, Eric Stevens  wrote:

> Load is the size of the storage on disk as I understand it.  This can
> fluctuate during normal usage even if records are not being added or
> removed, a node's load may be reduced during compaction for example.
>  During compaction, especially if you use Size Tiered Compaction strategy
> (the default), load may temporarily double for a column family.
>
>
> On Mon, Jun 17, 2013 at 11:33 AM, Rodrigo Felix <
> rodrigofelixdealme...@gmail.com> wrote:
>
>> Hi,
>>
>>I've been running a benchmark on Cassandra and I'm facing a problem
>> regarding to the size of the database.
>>I performed a load phase and then, when running nodetool ring, I got
>> the following output:
>>
>> *ubuntu@domU-12-31-39-0E-11-F1:~/cassandra$ bin/nodetool ring *
>> *Address DC  RackStatus State   Load
>>  Effective-Ownership Token   *
>> *
>>85070591730234615865843651857942052864  *
>> *10.192.18.3 datacenter1 rack1   Up Normal  2.07 GB
>> 50.00%  0   *
>> *10.85.135.169   datacenter1 rack1   Up Normal  2.09 GB
>> 50.00%  85070591730234615865843651857942052864*
>>
>>After that I executed, for about one hour, a workload with scan and
>> insert queries. Then, after finishing the workload execution, I run again
>> nodetool ring and got the following:
>>
>> *ubuntu@domU-12-31-39-0E-11-F1:~/cassandra$ bin/nodetool ring *
>> *Address DC  RackStatus State   Load
>>  Effective-Ownership Token   *
>> *
>>85070591730234615865843651857942052864  *
>> *10.192.18.3 datacenter1 rack1   Up Normal  1.07 GB
>> 50.00%  0   *
>> *10.85.135.169   datacenter1 rack1   Up Normal  2.15 GB
>> 50.00%  85070591730234615865843651857942052864*
>>
>>Any idea why a node had its size reduced if no record was removed? No
>> machine or added or removed during this workload.
>>Is this related to any kind of compression? If yes, is there a command
>> to confirm that?
>>I also faced a problem where a node has its size increased from about
>> 2gb to about 4gb. In this last scenario, I both added and removed nodes
>> during the workload depending on the load (CPU).
>>Thanks in advance for any help.
>>
>>
>> Att.
>>
>> *Rodrigo Felix de Almeida*
>> LSBD - Universidade Federal do Ceará
>> Project Manager
>> MBA, CSM, CSPO, SCJP
>>
>
>


Re: Dropped mutation messages

2013-06-19 Thread Shahab Yunus
Hello Arthur,

What do you mean by "The queries need to be lightened"?

Thanks,
Shahb


On Tue, Jun 18, 2013 at 8:47 PM, Arthur Zubarev wrote:

>   Cem hi,
>
> as per http://wiki.apache.org/cassandra/FAQ#dropped_messages
>
>
> Internode messages which are received by a node, but do not get not to be
> processed within rpc_timeout are dropped rather than processed. As the
> coordinator node will no longer be waiting for a response. If the
> Coordinator node does not receive Consistency Level responses before the
> rpc_timeout it will return a TimedOutException to the client. If the
> coordinator receives Consistency Level responses it will return success to
> the client.
>
> For MUTATION messages this means that the mutation was not applied to all
> replicas it was sent to. The inconsistency will be repaired by Read Repair
> or Anti Entropy Repair.
>
> For READ messages this means a read request may not have completed.
>
> Load shedding is part of the Cassandra architecture, if this is a
> persistent issue it is generally a sign of an overloaded node or cluster.
>
> By the way, I am on C* 1.2.4 too in dev mode, after having my node filled
> with 400 GB I started getting RPC timeouts on large data retrievals, so in
> short, you may need to revise how you query.
>
> The queries need to be lightened
>
> /Arthur
>
>  *From:* cem 
> *Sent:* Tuesday, June 18, 2013 1:12 PM
> *To:* user@cassandra.apache.org
> *Subject:* Dropped mutation messages
>
>  Hi All,
>
> I have a cluster of 5 nodes with C* 1.2.4.
>
> Each node has 4 disks 1 TB each.
>
> I see  a lot of dropped messages after it stores 400 GB  per disk. (1.6 TB
> per node).
>
> The recommendation was 500 GB max per node before 1.2.  Datastax says that
> we can store terabytes of data per node with 1.2.
> http://www.datastax.com/docs/1.2/cluster_architecture/cluster_planning
>
> Do I need to enable anything to leverage from 1.2? Do you have any other
> advice?
>
> What should be the path to investigate this?
>
> Thanks in advance!
>
> Best Regards,
> Cem.
>
>
>


Re: Unit Testing Cassandra

2013-06-19 Thread Shahab Yunus
Thanks Stephen for you reply and explanation. My bad that I mixed those up
and wasn't clear enough. Yes, I have different 2 requests/questions.

1) One is for the unit testing.

2) Second (in which I am more interested in) is for performance
(stress/load) testing. Let us keep integration aside for now.

I do see some stuff out there but wanted to know recommendations from the
community given their experience.

Regards,
Shahab


On Wed, Jun 19, 2013 at 3:15 AM, Stephen Connolly <
stephen.alan.conno...@gmail.com> wrote:

> Unit testing means testing in isolation the smallest part.
>
> Unit tests should not take more than a few milliseconds to set up and
> verify their assertions.
>
> As such, if your code is not factored well for testing, you would
> typically use mocking (either by hand, or with mocking libraries) to mock
> out the bits not under test.
>
> Extensive use of mocks is usually a smell of code that is not well
> designed *for testing*
>
> If you intend to test components integrated together... That is
> integration testing.
>
> If you intend to test performance of the whole or significant parts of the
> whole... That is performance testing.
>
> When searching for the above, you will not get much luck if you are
> looking for them in the context of "unit testing" as those things are
> *outside the scope of unit testing"
>
>
> On Wednesday, 19 June 2013, Shahab Yunus wrote:
>
>> Hello,
>>
>> Can anyone suggest a good/popular Unit Test tools/frameworks/utilities out
>> there for unit testing Cassandra stores? I am looking for testing from
>> performance/load and monitoring perspective. I am using 1.2.
>>
>> Thanks a lot.
>>
>> Regards,
>> Shahab
>>
>
>
> --
> Sent from my phone
>


token() function in CQL3 (1.2.5)

2013-06-19 Thread Ben Boule
Can anyone explain this to me?  I have been looking through the source code but 
can't seem to find the answer.

The documentation mentions using the token() function to change a value into 
it's token for use in queries.   It always mentions it as taking a single 
parameter:


SELECT * FROM posts WHERE token(userid) > token('tom') AND token(userid) < 
token('bob')

However on my 1.2.5 node I am getting the following error:

e.x.

create table foo (
organization text,
type text,
time timestamp,
id uuid,
primary key ((organization, type, time), id))

select * from foo where organization = 'companyA' and type = 'typeB' and 
token(time) < token('somevalue') and token(time) > token('othervalue')

Bad Request: Invalid number of arguments in call to function token: 3 required 
but 1 provided

What are the other two parameters?  We don't currently use the token function 
but I was experimenting seeing if I could move the time into the partition key 
for a table like this to better distribute the rows.  But I can't seem to 
figure out how to get token() working.

Ben
This electronic message contains information which may be confidential or 
privileged. The information is intended for the use of the individual or entity 
named above. If you are not the intended recipient, be aware that any 
disclosure, copying, distribution or use of the contents of this information is 
prohibited. If you have received this electronic transmission in error, please 
notify us by e-mail at (postmas...@rapid7.com) immediately.


Re: nodetool ring showing different 'Load' size

2013-06-19 Thread Michal Michalski
You can start compaction via JMX if you need it and you know what you're 
doing:
Find org.apache.cassandra.db:type=CompactionManager MBean and 
forceUserDefinedCompaction operation in it.
First argument is keyspace name, second one is a comma-separated list of 
SSTables to compact (filename)


You can also perform a major compaction via nodetool compact (for 
SizeTieredCompaction), but - again - you really should not do it unless 
you're really sure what you do, as it compacts all the SSTables 
together, which is not something you might want to achieve in most of 
the cases.


M.

W dniu 19.06.2013 14:31, Rodrigo Felix pisze:

Thanks Eric. Is there a way to start manually compaction operations?
I'm thinking about doing after loading data and before start run phase of
the benchmark.
Thanks.

Att.

*Rodrigo Felix de Almeida*
LSBD - Universidade Federal do Ceará
Project Manager
MBA, CSM, CSPO, SCJP


On Mon, Jun 17, 2013 at 12:41 PM, Eric Stevens  wrote:


Load is the size of the storage on disk as I understand it.  This can
fluctuate during normal usage even if records are not being added or
removed, a node's load may be reduced during compaction for example.
  During compaction, especially if you use Size Tiered Compaction strategy
(the default), load may temporarily double for a column family.


On Mon, Jun 17, 2013 at 11:33 AM, Rodrigo Felix <
rodrigofelixdealme...@gmail.com> wrote:


Hi,

I've been running a benchmark on Cassandra and I'm facing a problem
regarding to the size of the database.
I performed a load phase and then, when running nodetool ring, I got
the following output:

*ubuntu@domU-12-31-39-0E-11-F1:~/cassandra$ bin/nodetool ring *
*Address DC  RackStatus State   Load
  Effective-Ownership Token   *
*
85070591730234615865843651857942052864  *
*10.192.18.3 datacenter1 rack1   Up Normal  2.07 GB
50.00%  0   *
*10.85.135.169   datacenter1 rack1   Up Normal  2.09 GB
50.00%  85070591730234615865843651857942052864*

After that I executed, for about one hour, a workload with scan and
insert queries. Then, after finishing the workload execution, I run again
nodetool ring and got the following:

*ubuntu@domU-12-31-39-0E-11-F1:~/cassandra$ bin/nodetool ring *
*Address DC  RackStatus State   Load
  Effective-Ownership Token   *
*
85070591730234615865843651857942052864  *
*10.192.18.3 datacenter1 rack1   Up Normal  1.07 GB
50.00%  0   *
*10.85.135.169   datacenter1 rack1   Up Normal  2.15 GB
50.00%  85070591730234615865843651857942052864*

Any idea why a node had its size reduced if no record was removed? No
machine or added or removed during this workload.
Is this related to any kind of compression? If yes, is there a command
to confirm that?
I also faced a problem where a node has its size increased from about
2gb to about 4gb. In this last scenario, I both added and removed nodes
during the workload depending on the load (CPU).
Thanks in advance for any help.


Att.

*Rodrigo Felix de Almeida*
LSBD - Universidade Federal do Ceará
Project Manager
MBA, CSM, CSPO, SCJP










Re: Unit Testing Cassandra

2013-06-19 Thread Hiller, Dean
For unit testing, we actually use PlayOrm which has an in-memory version of 
nosql so we just write unit tests against our code which uses the in-memory 
version but that is only if you are in java.
Later,
Dean

From: Shahab Yunus mailto:shahab.yu...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, June 19, 2013 6:46 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Unit Testing Cassandra

Thanks Stephen for you reply and explanation. My bad that I mixed those up and 
wasn't clear enough. Yes, I have different 2 requests/questions.

1) One is for the unit testing.

2) Second (in which I am more interested in) is for performance (stress/load) 
testing. Let us keep integration aside for now.

I do see some stuff out there but wanted to know recommendations from the 
community given their experience.

Regards,
Shahab


On Wed, Jun 19, 2013 at 3:15 AM, Stephen Connolly 
mailto:stephen.alan.conno...@gmail.com>> wrote:
Unit testing means testing in isolation the smallest part.

Unit tests should not take more than a few milliseconds to set up and verify 
their assertions.

As such, if your code is not factored well for testing, you would typically use 
mocking (either by hand, or with mocking libraries) to mock out the bits not 
under test.

Extensive use of mocks is usually a smell of code that is not well designed 
*for testing*

If you intend to test components integrated together... That is integration 
testing.

If you intend to test performance of the whole or significant parts of the 
whole... That is performance testing.

When searching for the above, you will not get much luck if you are looking for 
them in the context of "unit testing" as those things are *outside the scope of 
unit testing"


On Wednesday, 19 June 2013, Shahab Yunus wrote:
Hello,

Can anyone suggest a good/popular Unit Test tools/frameworks/utilities out
there for unit testing Cassandra stores? I am looking for testing from 
performance/load and monitoring perspective. I am using 1.2.

Thanks a lot.

Regards,
Shahab


--
Sent from my phone



RE: Unit Testing Cassandra

2013-06-19 Thread Ben Boule
Hi Shabab,

Cassandra-Unit has been helpful for us for running unit tests without requiring 
a real cassandra instance to be running.   We only use this to test our "DAO" 
code which interacts with the Cassandra client.  It basically starts up an 
embedded instance of cassandra and fools your client/driver into using it.  It 
uses a non-standard port and you just need to make sure you can set the port as 
a parameter into your client code.

https://github.com/jsevellec/cassandra-unit

One important thing is to either clear out the keyspace in between tests or 
carefully separate your data so different tests don't collide with each other 
in the embedded database.

Setup/tear down time is pretty reasonable.

Ben

From: Shahab Yunus [shahab.yu...@gmail.com]
Sent: Wednesday, June 19, 2013 8:46 AM
To: user@cassandra.apache.org
Subject: Re: Unit Testing Cassandra

Thanks Stephen for you reply and explanation. My bad that I mixed those up and 
wasn't clear enough. Yes, I have different 2 requests/questions.

1) One is for the unit testing.

2) Second (in which I am more interested in) is for performance (stress/load) 
testing. Let us keep integration aside for now.

I do see some stuff out there but wanted to know recommendations from the 
community given their experience.

Regards,
Shahab


On Wed, Jun 19, 2013 at 3:15 AM, Stephen Connolly 
mailto:stephen.alan.conno...@gmail.com>> wrote:
Unit testing means testing in isolation the smallest part.

Unit tests should not take more than a few milliseconds to set up and verify 
their assertions.

As such, if your code is not factored well for testing, you would typically use 
mocking (either by hand, or with mocking libraries) to mock out the bits not 
under test.

Extensive use of mocks is usually a smell of code that is not well designed 
*for testing*

If you intend to test components integrated together... That is integration 
testing.

If you intend to test performance of the whole or significant parts of the 
whole... That is performance testing.

When searching for the above, you will not get much luck if you are looking for 
them in the context of "unit testing" as those things are *outside the scope of 
unit testing"


On Wednesday, 19 June 2013, Shahab Yunus wrote:
Hello,

Can anyone suggest a good/popular Unit Test tools/frameworks/utilities out
there for unit testing Cassandra stores? I am looking for testing from 
performance/load and monitoring perspective. I am using 1.2.

Thanks a lot.

Regards,
Shahab


--
Sent from my phone

This electronic message contains information which may be confidential or 
privileged. The information is intended for the use of the individual or entity 
named above. If you are not the intended recipient, be aware that any 
disclosure, copying, distribution or use of the contents of this information is 
prohibited. If you have received this electronic transmission in error, please 
notify us by e-mail at (postmas...@rapid7.com) immediately.


timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
I'm experimenting with a data model that will need to ingest a lot of data that 
will need to be query able by time.  In the example below, I want to be able to 
run a query like "select * from count3 where counter = 'test' and ts > 
minTimeuuid('2013-06-18 16:23:00') and ts < minTimeuuid('2013-06-18 
16:24:00');".  However, in certain cases this query fails with the error "Bad 
Request: Start key must sort before (or equal to) finish key in your 
partitioner!".  It's not clear to be why this happens or what the issue is as 
it seems like a bug.

Here's the table:

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY ((counter, ts))
)

It has data like so:

cqlsh:Statistics> select counter,dateof(ts),key1,value from count3;

 counter | dateof(ts)   | key1 | value
-+--+--+---
test | 2013-06-18 16:23:25-0400 |1 | 1
test | 2013-06-18 16:23:28-0400 |1 | 1
test | 2013-06-18 16:23:28-0400 |1 | 1
test | 2013-06-18 16:23:28-0400 |1 | 1
test | 2013-06-18 16:23:29-0400 |1 | 1
test | 2013-06-18 16:23:29-0400 |1 | 1
test | 2013-06-18 16:23:29-0400 |1 | 1
test | 2013-06-18 16:23:30-0400 |1 | 1
test | 2013-06-18 16:23:30-0400 |1 | 1
test | 2013-06-18 16:23:31-0400 |1 | 1
test | 2013-06-18 16:23:31-0400 |1 | 1
test | 2013-06-18 16:23:31-0400 |1 | 1
test | 2013-06-18 16:23:32-0400 |1 | 1
test | 2013-06-18 16:23:32-0400 |1 | 1


DOESN'T WORK:
cqlsh:Statistics> select * from count3 where counter = 'test' and ts > 
minTimeuuid('2013-06-18 16:23:00') and ts < minTimeuuid('2013-06-18 16:24:00');
Bad Request: Start key must sort before (or equal to) finish key in your 
partitioner!

WORKS FINE:
cqlsh:Statistics> select * from count3 where counter = 'test' and ts > 
minTimeuuid('2013-06-18 16:23:25') and ts < minTimeuuid('2013-06-18 16:23:31');

 counter | ts   | key1 | value
-+--+--+---
test | edee0df0-d854-11e2-ac46-cba9e55f995d |1 | 1
test | ef9a5e60-d854-11e2-ac46-cba9e55f995d |1 | 1
test | efccb900-d854-11e2-ac46-cba9e55f995d |1 | 1
test | effb1c00-d854-11e2-ac46-cba9e55f995d |1 | 1
test | f0284680-d854-11e2-ac46-cba9e55f995d |1 | 1
test | f05b8b80-d854-11e2-ac46-cba9e55f995d |1 | 1
test | f08c5f80-d854-11e2-ac46-cba9e55f995d |1 | 1
test | f0c6f780-d854-11e2-ac46-cba9e55f995d |1 | 1
test | f1018f80-d854-11e2-ac46-cba9e55f995d |1 | 1


Thanks,
Brent



Re: Unit Testing Cassandra

2013-06-19 Thread Edward Capriolo
You really do not need much in java you can use the embedded server. Hector
wrap a simple class around thiscalled  EmbeddedServerHelper

On Wednesday, June 19, 2013, Ben Boule  wrote:
> Hi Shabab,
>
> Cassandra-Unit has been helpful for us for running unit tests without
requiring a real cassandra instance to be running.   We only use this to
test our "DAO" code which interacts with the Cassandra client.  It
basically starts up an embedded instance of cassandra and fools your
client/driver into using it.  It uses a non-standard port and you just need
to make sure you can set the port as a parameter into your client code.
>
> https://github.com/jsevellec/cassandra-unit
>
> One important thing is to either clear out the keyspace in between tests
or carefully separate your data so different tests don't collide with each
other in the embedded database.
>
> Setup/tear down time is pretty reasonable.
>
> Ben
> 
> From: Shahab Yunus [shahab.yu...@gmail.com]
> Sent: Wednesday, June 19, 2013 8:46 AM
> To: user@cassandra.apache.org
> Subject: Re: Unit Testing Cassandra
>
> Thanks Stephen for you reply and explanation. My bad that I mixed those
up and wasn't clear enough. Yes, I have different 2 requests/questions.
> 1) One is for the unit testing.
> 2) Second (in which I am more interested in) is for performance
(stress/load) testing. Let us keep integration aside for now.
> I do see some stuff out there but wanted to know recommendations from the
community given their experience.
> Regards,
> Shahab
>
> On Wed, Jun 19, 2013 at 3:15 AM, Stephen Connolly <
stephen.alan.conno...@gmail.com> wrote:
>>
>> Unit testing means testing in isolation the smallest part.
>> Unit tests should not take more than a few milliseconds to set up and
verify their assertions.
>> As such, if your code is not factored well for testing, you would
typically use mocking (either by hand, or with mocking libraries) to mock
out the bits not under test.
>> Extensive use of mocks is usually a smell of code that is not well
designed *for testing*
>> If you intend to test components integrated together... That is
integration testing.
>> If you intend to test performance of the whole or significant parts of
the whole... That is performance testing.
>> When searching for the above, you will not get much luck if you are
looking for them in the context of "unit testing" as those things are
*outside the scope of unit testing"
>>
>> On Wednesday, 19 June 2013, Shahab Yunus wrote:
>>>
>>> Hello,
>>>
>>> Can anyone suggest a good/popular Unit Test tools/frameworks/utilities
out
>>> there for unit testing Cassandra stores? I am looking for testing from
performance/load and monitoring perspective. I am using 1.2.
>>>
>>> Thanks a lot.
>>>
>>> Regards,
>>> Shahab
>>
>>
>> --
>> Sent from my phone
>
> This electronic message contains information which may be confidential or
privileged. The information is intended for the use of the individual or
entity named above. If you are not the intended recipient, be aware that
any disclosure, copying, distribution or use of the contents of this
information is prohibited. If you have received this electronic
transmission in error, please notify us by e-mail at (postmas...@rapid7.com)
immediately.


Re: vnodes ready for production ?

2013-06-19 Thread Jim Ancona
On Tue, Jun 18, 2013 at 4:04 AM, aaron morton  wrote:
>> Even more if we could automate some up-scale thanks to AWS alarms, It
>> would be awesome.
>
> I saw a demo for Priam (https://github.com/Netflix/Priam) doing that at
> netflix in March, not sure if it's public yet.
>
>> Are the vnodes feature and the tokens =>vnodes transition safe enough to
>> go live with vnodes ?
>
> There have been some issues, search the user list for shuffle and as always
> test.
>
>> Any advice about vnodes ?
>
> They are in use out there. It's a sizable change so it would be good idea to
> build a test system for running shuffle and testing your application. There
> have been some issues with repair and range scans (including hadoop
> integration.)

Also, in his presentation at last week's Summit, Eric Evans suggested
not using shuffle. As an alternative he suggested removing and
replacing nodes one-by-one.

Jim

>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/06/2013, at 7:04 PM, Alain RODRIGUEZ  wrote:
>
> Any insights on vnodes, one month after my original post ?
>
>
> 2013/5/16 Alain RODRIGUEZ 
>>
>> Hi,
>>
>> Adding vnodes is a big improvement to Cassandra, specifically because we
>> have a fluctuating load on our Cassandra depending on the week, and it is
>> quite annoying to add some nodes for one week or two, move tokens and then
>> having to remove them and then move tokens again. Even more if we could
>> automate some up-scale thanks to AWS alarms, It would be awesome.
>>
>> We don't use vnodes yet because Opscenter did not support this feature and
>> because we need to have a reliable production. Now Opscenter handles vnodes.
>>
>> Are the vnodes feature and the tokens =>vnodes transition safe enough to
>> go live with vnodes ?
>>
>> What would be the transition process ?
>>
>> Does someone auto-scale his Cassandra cluster ?
>>
>> Any advice about vnodes ?
>
>
>


DC dedicated to Hadoop jobs

2013-06-19 Thread cscetbon.ext
Hi,

Our Hadoop jobs will only do READs and we want to restrict reads in this 
dedicated DC  even if performances are bad. 

What can we do to achieve this goal ? 
- set dynamic_snitch_badness_threshold to 0.98 on these DC's nodes ? can we 
have different dynamic_snitch_badness_threshold values on nodes from different 
DC ?
- for consistency to LOCAL_QUORUM ? However if we have more than one replica 
we'll do more reads

thanks
-- 
Cyril SCETBON


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



Re: timeuuid and cql3 query

2013-06-19 Thread Tyler Hobbs
On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent  wrote:

>
>  CREATE TABLE count3 (
>   counter text,
>   ts timeuuid,
>   key1 text,
>   value int,
>   PRIMARY KEY ((counter, ts))
> )
>

Instead of doing a composite partition key, remove a set of parens and let
ts be your clustering key.  That will cause cql rows to be stored in sorted
order by the ts column (for a given value of "counter") and allow you to do
the kind of query you're looking for.


-- 
Tyler Hobbs
DataStax 


Re: token() function in CQL3 (1.2.5)

2013-06-19 Thread Tyler Hobbs
On Wed, Jun 19, 2013 at 7:47 AM, Ben Boule  wrote:

>  Can anyone explain this to me?  I have been looking through the source
> code but can't seem to find the answer.
>
> The documentation mentions using the token() function to change a value
> into it's token for use in queries.   It always mentions it as taking a
> single parameter:
>
> SELECT * FROM posts WHERE token(userid) > token('tom') AND token(userid) < 
> token('bob')
>
>
> However on my 1.2.5 node I am getting the following error:
>
> e.x.
>
> create table foo (
> organization text,
> type text,
> time timestamp,
> id uuid,
> primary key ((organization, type, time), id))
>
> select * from foo where organization = 'companyA' and type = 'typeB' and
> token(time) < token('somevalue') and token(time) > token('othervalue')
>
> Bad Request: Invalid number of arguments in call to function token: 3
> required but 1 provided
>
> What are the other two parameters?  We don't currently use the token
> function but I was experimenting seeing if I could move the time into the
> partition key for a table like this to better distribute the rows.  But I
> can't seem to figure out how to get token() working.
>

token() acts on the entire partition key, which for you is (organization,
type, time), hence the 3 required values.

In order to better distribute the rows, I suggest using a time bucket as
part of the partition key.  For example, you might use only the date
portion of the timestamp as the time bucket.

These posts talk about doing something similar with the Thrift API, but
they will probably still be helpful:
- http://rubyscale.com/2011/basic-time-series-with-cassandra/
- http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

-- 
Tyler Hobbs
DataStax 


Re: timeuuid and cql3 query

2013-06-19 Thread Davide Anastasia
Hi Tyler,
I am interested in this scenario as well: could you please elaborate
further your answer?

Thanks a lot,
Davide
On 19 Jun 2013 16:01, "Tyler Hobbs"  wrote:

>
> On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent  wrote:
>
>>
>>  CREATE TABLE count3 (
>>   counter text,
>>   ts timeuuid,
>>   key1 text,
>>   value int,
>>   PRIMARY KEY ((counter, ts))
>> )
>>
>
> Instead of doing a composite partition key, remove a set of parens and let
> ts be your clustering key.  That will cause cql rows to be stored in sorted
> order by the ts column (for a given value of "counter") and allow you to do
> the kind of query you're looking for.
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: timeuuid and cql3 query

2013-06-19 Thread Sylvain Lebresne
You're using the ordered partitioner, right?


On Wed, Jun 19, 2013 at 5:06 PM, Davide Anastasia <
davide.anasta...@gmail.com> wrote:

> Hi Tyler,
> I am interested in this scenario as well: could you please elaborate
> further your answer?
>
> Thanks a lot,
> Davide
> On 19 Jun 2013 16:01, "Tyler Hobbs"  wrote:
>
>>
>> On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent  wrote:
>>
>>>
>>>  CREATE TABLE count3 (
>>>   counter text,
>>>   ts timeuuid,
>>>   key1 text,
>>>   value int,
>>>   PRIMARY KEY ((counter, ts))
>>> )
>>>
>>
>> Instead of doing a composite partition key, remove a set of parens and
>> let ts be your clustering key.  That will cause cql rows to be stored in
>> sorted order by the ts column (for a given value of "counter") and allow
>> you to do the kind of query you're looking for.
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>


Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
I'm using the byte ordered partitioner.

Sent from my iPhone

On Jun 19, 2013, at 11:26 AM, "Sylvain Lebresne" 
mailto:sylv...@datastax.com>> wrote:

You're using the ordered partitioner, right?


On Wed, Jun 19, 2013 at 5:06 PM, Davide Anastasia 
mailto:davide.anasta...@gmail.com>> wrote:

Hi Tyler,
I am interested in this scenario as well: could you please elaborate further 
your answer?

Thanks a lot,
Davide

On 19 Jun 2013 16:01, "Tyler Hobbs" 
mailto:ty...@datastax.com>> wrote:

On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent 
mailto:br...@cvent.com>> wrote:

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY ((counter, ts))
)

Instead of doing a composite partition key, remove a set of parens and let ts 
be your clustering key.  That will cause cql rows to be stored in sorted order 
by the ts column (for a given value of "counter") and allow you to do the kind 
of query you're looking for.


--
Tyler Hobbs
DataStax



Re: Reduce Cassandra GC

2013-06-19 Thread Mohit Anchlia
How much data do you have per node?
How much RAM per node?
How much CPU per node?
What is the avg CPU and memory usage?

On Wed, Jun 19, 2013 at 12:16 AM, Joel Samuelsson  wrote:

>  My Cassandra ps info:
>
> root 26791 1  0 07:14 ?00:00:00 /usr/bin/jsvc -user
> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
> 103  26792 26791 99 07:14 ?854015-22:02:22 /usr/bin/jsvc -user
> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRat

Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
Tyler,

You're recommending this schema instead, correct?

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY (ts, counter)
)

I believe I tried this as well and ran into similar problems but I'll try it 
again.  I'm using the "ByteOrderedPartitioner" if that helps with the latest 
version of DSE community edition which I believe is Cassandra 1.2.3.


Thanks,
Brent


From: Tyler Hobbs mailto:ty...@datastax.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, June 19, 2013 11:00 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: timeuuid and cql3 query


On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent 
mailto:br...@cvent.com>> wrote:

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY ((counter, ts))
)

Instead of doing a composite partition key, remove a set of parens and let ts 
be your clustering key.  That will cause cql rows to be stored in sorted order 
by the ts column (for a given value of "counter") and allow you to do the kind 
of query you're looking for.


--
Tyler Hobbs
DataStax


Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
Here's an example of that not working:

cqlsh:Test> desc table count4;

CREATE TABLE count4 (
  ts timeuuid,
  counter text,
  key1 text,
  value int,
  PRIMARY KEY (ts, counter)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

cqlsh:Test> select counter,dateof(ts),key1,value from count4;

 counter | dateof(ts)   | key1 | value
-+--+--+---
test | 2013-06-18 22:36:16-0400 |1 | 1
test | 2013-06-18 22:36:18-0400 |1 | 1
test | 2013-06-18 22:36:18-0400 |1 | 1
test | 2013-06-18 22:36:18-0400 |1 | 1
test | 2013-06-18 22:36:19-0400 |1 | 1
test | 2013-06-18 22:36:19-0400 |1 | 1
test | 2013-06-18 22:36:20-0400 |1 | 1
test | 2013-06-18 22:36:20-0400 |1 | 1
test | 2013-06-18 22:36:21-0400 |1 | 1
test | 2013-06-18 22:36:21-0400 |1 | 1
test | 2013-06-18 22:36:22-0400 |1 | 1
test | 2013-06-18 22:36:22-0400 |1 | 1
test | 2013-06-18 22:36:23-0400 |1 | 1
test | 2013-06-18 22:36:23-0400 |1 | 1
test | 2013-06-18 22:36:25-0400 |1 | 1
test | 2013-06-18 22:36:27-0400 |1 | 1
test | 2013-06-18 22:36:28-0400 |1 | 1

cqlsh:Statistics> select counter,dateof(ts),key1,value from count4 where ts > 
minTimeuuid('2013-06-17 22:36:16') and ts < minTimeuuid('2013-06-19 22:36:20');
Bad Request: 2 Start key must sort before (or equal to) finish key in your 
partitioner!



Any ideas?  Seems like a bug to me, right?

Brent

From: , Brent Ryan mailto:br...@cvent.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, June 19, 2013 12:47 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: timeuuid and cql3 query

Tyler,

You're recommending this schema instead, correct?

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY (ts, counter)
)

I believe I tried this as well and ran into similar problems but I'll try it 
again.  I'm using the "ByteOrderedPartitioner" if that helps with the latest 
version of DSE community edition which I believe is Cassandra 1.2.3.


Thanks,
Brent


From: Tyler Hobbs mailto:ty...@datastax.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, June 19, 2013 11:00 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: timeuuid and cql3 query


On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent 
mailto:br...@cvent.com>> wrote:

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY ((counter, ts))
)

Instead of doing a composite partition key, remove a set of parens and let ts 
be your clustering key.  That will cause cql rows to be stored in sorted order 
by the ts column (for a given value of "counter") and allow you to do the kind 
of query you're looking for.


--
Tyler Hobbs
DataStax


Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
Note that it seems to work when you structure your schema in this example 
below, BUT this is a problem because all of my data will wind up hitting a 
single node in my cassandra cluster because the partitioning key is "counter" 
and that isn't unique enough.  I was hoping that I wasn't going to need to 
build up my own "sharding" scheme as this blog talks about 
(http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra) because 
this becomes much harder for other clients to integrate with because they now 
need to know how my data is structured in order to get it out.

CREATE TABLE count5 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY (counter, ts)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

cqlsh:Test> select counter,dateof(ts),key1,value from count5 where counter = 
'test' and ts > minTimeuuid('2013-06-17 22:36:16') and ts < 
minTimeuuid('2013-06-18 22:44:02');

 counter | dateof(ts)   | key1 | value
-+--+--+---
test | 2013-06-18 22:43:53-0400 |1 | 1
test | 2013-06-18 22:43:54-0400 |1 | 1
test | 2013-06-18 22:43:55-0400 |1 | 1
test | 2013-06-18 22:43:56-0400 |1 | 1
test | 2013-06-18 22:43:58-0400 |1 | 1
test | 2013-06-18 22:43:58-0400 |1 | 1
test | 2013-06-18 22:43:59-0400 |1 | 1
test | 2013-06-18 22:44:00-0400 |1 | 1
test | 2013-06-18 22:44:01-0400 |1 | 1

cqlsh:Test> select counter,dateof(ts),key1,value from count5 where counter = 
'test' and ts > minTimeuuid('2013-06-17 22:36:16') and ts < 
minTimeuuid('2013-06-20 22:44:02');

 counter | dateof(ts)   | key1 | value
-+--+--+---
test | 2013-06-18 22:43:53-0400 |1 | 1
test | 2013-06-18 22:43:54-0400 |1 | 1
test | 2013-06-18 22:43:55-0400 |1 | 1
test | 2013-06-18 22:43:56-0400 |1 | 1
test | 2013-06-18 22:43:58-0400 |1 | 1
test | 2013-06-18 22:43:58-0400 |1 | 1
test | 2013-06-18 22:43:59-0400 |1 | 1
test | 2013-06-18 22:44:00-0400 |1 | 1
test | 2013-06-18 22:44:01-0400 |1 | 1
test | 2013-06-18 22:44:02-0400 |1 | 1
test | 2013-06-18 22:44:02-0400 |1 | 1
test | 2013-06-18 22:44:03-0400 |1 | 1
test | 2013-06-18 22:44:04-0400 |1 | 1
test | 2013-06-18 22:44:05-0400 |1 | 1
test | 2013-06-18 22:44:06-0400 |1 | 1


From: , Brent Ryan mailto:br...@cvent.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, June 19, 2013 12:56 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: timeuuid and cql3 query

Here's an example of that not working:

cqlsh:Test> desc table count4;

CREATE TABLE count4 (
  ts timeuuid,
  counter text,
  key1 text,
  value int,
  PRIMARY KEY (ts, counter)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

cqlsh:Test> select counter,dateof(ts),key1,value from count4;

 counter | dateof(ts)   | key1 | value
-+--+--+---
test | 2013-06-18 22:36:16-0400 |1 | 1
test | 2013-06-18 22:36:18-0400 |1 | 1
test | 2013-06-18 22:36:18-0400 |1 | 1
test | 2013-06-18 22:36:18-0400 |1 | 1
test | 2013-06-18 22:36:19-0400 |1 | 1
test | 2013-06-18 22:36:19-0400 |1 | 1
test | 2013-06-18 22:36:20-0400 |1 | 1
test | 2013-06-18 22:36:20-0400 |1 | 1
test | 2013-06-18 22:36:21-0400 |1 | 1
test | 2013-06-18 22:36:21-0400 |1 | 1
test | 2013-06-18 22:36:22-0400 |1 | 1
test | 2013-06-18 22:36:22-0400 |1 | 1
test | 2013-06-18 22:36:23-0400 |1 | 1
test | 2013-06-18 22:36:23-0400 |1 | 1
test | 2013-06-18 22:36:25-0400 |1 | 1
test | 2013-06-18 22:36:27-0400 |1 | 1
test | 2013-06-18 22:36:28-0400 |1 | 1

cqlsh:Statistics> select counter,dateof(ts),key1,value from count4 where ts > 
minTimeuuid('2013-06-17 22:36:16') and ts < minTimeuuid('2013-06-19 22:36:20');
Bad Request: 2 Start key must sort before 

Date range queries

2013-06-19 Thread Christopher J. Bottaro
Hello,

We are considering using Cassandra and I want to make sure our use case
fits Cassandra's strengths.  We have the table like:

answers
---
user_id | question_id | result | created_at

Where our most common query will be something like:

SELECT * FROM answers WHERE user_id = 123 AND created_at > '01/01/2012' AND
created_at < '01/01/2013'

Sometimes we will also limit by a question_id or a list of question_ids.

Secondary indexes will be created on user_id and question_id.  We expect
the upper bound of number of answers for a given user to be around 10,000.

Now my understanding of how Cassandra will run the aforementioned query is
that it will load all the answers for a given user into memory using the
secondary index, then scan over that set filtering based on the dates.

Considering that that will be our most used query and it will happen very
often, is this a bad use case for Cassandra?

Thanks for the help.


Re: nodetool ring showing different 'Load' size

2013-06-19 Thread Robert Coli
On Wed, Jun 19, 2013 at 5:47 AM, Michal Michalski  wrote:
> You can also perform a major compaction via nodetool compact (for
> SizeTieredCompaction), but - again - you really should not do it unless
> you're really sure what you do, as it compacts all the SSTables together,
> which is not something you might want to achieve in most of the cases.

If you do that and discover you did not want to :

https://github.com/pcmanus/cassandra/tree/sstable_split

Will enable you to split your monolithic sstable back into smaller sstables.

=Rob
PS - @pcmanus, here's that reminder we discussed @ summit to merge
this tool into upstream! :D


Re: Date range queries

2013-06-19 Thread David McNelis
I think you'd just be better served with just a little different primary
key.

If your primary key was (user_id, created_at)  or (user_id, created_at,
question_id), then you'd be able to run the above query without a problem.

This will mean that the entire pantheon of a specific user_id will be
stored as a 'row' (in the old style C* vernacular), and then the
information would be ordered by the 2nd piece of the primary key (or 2nd,
then 3rd if you included question_id).

You would certainly want to include any field that makes a record unique in
the primary key.  Another thing to note is that if a field is part of the
primary key you can not create a secondary index on that field.  You can
work around that by storing the field twice, but you might want to rethink
your structure if you find yourself doing that often.


On Wed, Jun 19, 2013 at 12:05 PM, Christopher J. Bottaro <
cjbott...@academicworks.com> wrote:

> Hello,
>
> We are considering using Cassandra and I want to make sure our use case
> fits Cassandra's strengths.  We have the table like:
>
> answers
> ---
> user_id | question_id | result | created_at
>
> Where our most common query will be something like:
>
> SELECT * FROM answers WHERE user_id = 123 AND created_at > '01/01/2012'
> AND created_at < '01/01/2013'
>
> Sometimes we will also limit by a question_id or a list of question_ids.
>
> Secondary indexes will be created on user_id and question_id.  We expect
> the upper bound of number of answers for a given user to be around 10,000.
>
> Now my understanding of how Cassandra will run the aforementioned query is
> that it will load all the answers for a given user into memory using the
> secondary index, then scan over that set filtering based on the dates.
>
> Considering that that will be our most used query and it will happen very
> often, is this a bad use case for Cassandra?
>
> Thanks for the help.
>


Joining distinct clusters with the same schema together

2013-06-19 Thread Faraaz Sareshwala
My company is planning on deploying cassandra to three separate datacenters.
Each datacenter will have a cassandra cluster with a separate set of seeds
specific to that datacenter. However, the cluster name will be the same.

Question 1: is this enough to guarentee that the three datacenters will have
distinct cassandra clusters as well? Or will one node in datacenter A still
somehow be able to join datacenter B's ring.

Cassandra has cross datacenter replication and we plan to use that in the
future. For now, we are planning on using our own relay mechanism to transfer
data changes from one datacenter to another. Each cassandra cluster in each
datacenter will have the same keyspaces and column families with the same
schema. Datacenter A will send mutations over this relay to datacenter B which
will replay the mutation in cassandra.  Therefore, datacenter A's cassandra
cluster will look identical to datacenter B's cassandra cluster, but not through
the cross datacenter replication that cassandra offers.

Question 2: is this a sane strategy? We're trying to make the smallest possible
change when deploying cassandra. Our plan is to slowly move our infrastructure
over to relying more on cassandra once we can assess how it behaves with our
workload.

Question 3: eventually, we want to turn all these cassandra clusters into one
large multi-datacenter cluster. What's the best practice to do this? Should I
just add nodes from all datacenters to the list of seeds and let cassandra
resolve differences? Is there another way I don't know about?

Thank you,
Faraaz


Re: timeuuid and cql3 query

2013-06-19 Thread Sylvain Lebresne
So part of it is a bug, namely
https://issues.apache.org/jira/browse/CASSANDRA-5666. In summary CQL3
should not accept: ts > minTimeuuid('2013-06-17 22:36:16') and ts <
minTimeuuid('2013-06-20 22:44:02'), because it does no know how to handle
it properly. What it should support is token(ts) >
token(minTimeuuid('2013-06-17 22:36:16')) and token(ts) <
token(minTimeuuid('2013-06-20 22:44:02')). And that is different because
the token always sort by bytes, and comparing timeuuid by bytes does not
yield a time based ordering.

Long story short, using non-equal condition on the partition key (i.e. the
first part of your primary key) is generally not advised. Or to put it
another way, the use of the byte ordering partitioner is discouraged. But
if you still want to use the ordering partitioner and do range queries on
the partition key, do not use a timeuuid, because the ordering that the
partitioner enforce will not be one that is meaningful (due to the timeuuid
layout).

--
Sylvain



On Wed, Jun 19, 2013 at 7:04 PM, Ryan, Brent  wrote:

>  Note that it seems to work when you structure your schema in this
> example below, BUT this is a problem because all of my data will wind up
> hitting a single node in my cassandra cluster because the partitioning key
> is "counter" and that isn't unique enough.  I was hoping that I wasn't
> going to need to build up my own "sharding" scheme as this blog talks about
> (http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra)
> because this becomes much harder for other clients to integrate with
> because they now need to know how my data is structured in order to get it
> out.
>
>  CREATE TABLE count5 (
>   counter text,
>   ts timeuuid,
>   key1 text,
>   value int,
>   PRIMARY KEY (counter, ts)
> ) WITH
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
>
>  cqlsh:Test> select counter,dateof(ts),key1,value from count5 where
> counter = 'test' and ts > minTimeuuid('2013-06-17 22:36:16') and ts <
> minTimeuuid('2013-06-18 22:44:02');
>
>   counter | dateof(ts)   | key1 | value
> -+--+--+---
> test | 2013-06-18 22:43:53-0400 |1 | 1
> test | 2013-06-18 22:43:54-0400 |1 | 1
> test | 2013-06-18 22:43:55-0400 |1 | 1
> test | 2013-06-18 22:43:56-0400 |1 | 1
> test | 2013-06-18 22:43:58-0400 |1 | 1
> test | 2013-06-18 22:43:58-0400 |1 | 1
> test | 2013-06-18 22:43:59-0400 |1 | 1
> test | 2013-06-18 22:44:00-0400 |1 | 1
> test | 2013-06-18 22:44:01-0400 |1 | 1
>
>  cqlsh:Test> select counter,dateof(ts),key1,value from count5 where
> counter = 'test' and ts > minTimeuuid('2013-06-17 22:36:16') and ts <
> minTimeuuid('2013-06-20 22:44:02');
>
>   counter | dateof(ts)   | key1 | value
> -+--+--+---
> test | 2013-06-18 22:43:53-0400 |1 | 1
> test | 2013-06-18 22:43:54-0400 |1 | 1
> test | 2013-06-18 22:43:55-0400 |1 | 1
> test | 2013-06-18 22:43:56-0400 |1 | 1
> test | 2013-06-18 22:43:58-0400 |1 | 1
> test | 2013-06-18 22:43:58-0400 |1 | 1
> test | 2013-06-18 22:43:59-0400 |1 | 1
> test | 2013-06-18 22:44:00-0400 |1 | 1
> test | 2013-06-18 22:44:01-0400 |1 | 1
> test | 2013-06-18 22:44:02-0400 |1 | 1
> test | 2013-06-18 22:44:02-0400 |1 | 1
> test | 2013-06-18 22:44:03-0400 |1 | 1
> test | 2013-06-18 22:44:04-0400 |1 | 1
> test | 2013-06-18 22:44:05-0400 |1 | 1
> test | 2013-06-18 22:44:06-0400 |1 | 1
>
>
>   From: , Brent Ryan 
> Reply-To: "user@cassandra.apache.org" 
> Date: Wednesday, June 19, 2013 12:56 PM
>
> To: "user@cassandra.apache.org" 
> Subject: Re: timeuuid and cql3 query
>
>   Here's an example of that not working:
>
>  cqlsh:Test> desc table count4;
>
>  CREATE TABLE count4 (
>   ts timeuuid,
>   counter text,
>   key1 text,
>   value int,
>   PRIMARY KEY (ts, counter)
> ) WITH
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
>
>  cqlsh:Test> select counter,dateof(ts),key1,value from count4;
>
>   counter | dateof(ts)   | key1 | value
> -+--+--+

Re: Joining distinct clusters with the same schema together

2013-06-19 Thread Robert Coli
On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala
 wrote:
> Each datacenter will have a cassandra cluster with a separate set of seeds
> specific to that datacenter. However, the cluster name will be the same.
>
> Question 1: is this enough to guarentee that the three datacenters will have
> distinct cassandra clusters as well? Or will one node in datacenter A still
> somehow be able to join datacenter B's ring.

If they have network connectivity and the same cluster name, they are
the same logical cluster. However if your nodes share tokens and you
have auto_bootstrap=yes (the implicit default) the second node you
attempt to start will refuse to start because you are trying to
bootstrap it into the range of a live node.

> For now, we are planning on using our own relay mechanism to transfer
> data changes from one datacenter to another.

Are you planning to use the streaming commitlog functionality for
this? Not sure how you would capture all changes otherwise, except
having your app just write the same thing to multiple places? Unless
data timestamps are identical between clusters, otherwise identical
data will not merge properly, as cassandra uses data timestamps to
merge.

> Question 2: is this a sane strategy?

On its face my answer is "not... really"? What do you view yourself as
getting with this technique versus using built in replication? As an
example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
consistency level operations?

> Question 3: eventually, we want to turn all these cassandra clusters into one
> large multi-datacenter cluster. What's the best practice to do this? Should I
> just add nodes from all datacenters to the list of seeds and let cassandra
> resolve differences? Is there another way I don't know about?

If you are using NetworkTopologyStrategy and have the same cluster
name for your isolated clusters, all you need to do is :

1) configure NTS to store replicas on a per-datacenter basis
2) ensure that your nodes are in different logical data centers (by
default, all nodes are in DC1/rack1)
3) ensure that clusters are able to reach each other
4) ensure that tokens do not overlap between clusters (the common
technique with manual token assignment is that each node gets a range
which is off-by-one)
5) ensure that all nodes seed lists contain (recommended) 3 seeds from each DC
6) rolling restart (so the new seed list is picked up)
7) repair ("should" only be required if writes have not replicated via
your out of band mechanism)

Vnodes change the picture slightly because the chance of your clusters
having conflicting tokens increases with the number of token ranges
you have.

=Rob


Re: Heap is not released and streaming hangs at 0%

2013-06-19 Thread Wei Zhu
If you want, you can try to force the GC through Jconsole. Memory->Perform GC. 

It theoretically triggers a full GC and when it will happen depends on the JVM 

-Wei 

- Original Message -

From: "Robert Coli"  
To: user@cassandra.apache.org 
Sent: Tuesday, June 18, 2013 10:43:13 AM 
Subject: Re: Heap is not released and streaming hangs at 0% 

On Tue, Jun 18, 2013 at 10:33 AM, srmore  wrote: 
> But then shouldn't JVM C G it eventually ? I can still see Cassandra alive 
> and kicking but looks like the heap is locked up even after the traffic is 
> long stopped. 

No, when GC system fails this hard it is often a permanent failure 
which requires a restart of the JVM. 

> nodetool -h localhost flush didn't do much good. 

This adds support to the idea that your heap is too full, and not full 
of memtables. 

You could try nodetool -h localhost invalidatekeycache, but that 
probably will not free enough memory to help you. 

=Rob 



Re: Joining distinct clusters with the same schema together

2013-06-19 Thread Eric Stevens
>
> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?


Doing replication manually sounds like a recipe for the DC's eventually
getting subtly out of sync with each other.  If a connection goes down
between DC's, and you are taking data at both, how will you catch each
other up?  C* already offers that resolution for you, and you'd have to
work pretty hard to reproduce it for no obvious benefit that I can see.

For minimum effort, definitely rely on Cassandra's well-tested codebase for
this.




On Wed, Jun 19, 2013 at 2:27 PM, Robert Coli  wrote:

> On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala
>  wrote:
> > Each datacenter will have a cassandra cluster with a separate set of
> seeds
> > specific to that datacenter. However, the cluster name will be the same.
> >
> > Question 1: is this enough to guarentee that the three datacenters will
> have
> > distinct cassandra clusters as well? Or will one node in datacenter A
> still
> > somehow be able to join datacenter B's ring.
>
> If they have network connectivity and the same cluster name, they are
> the same logical cluster. However if your nodes share tokens and you
> have auto_bootstrap=yes (the implicit default) the second node you
> attempt to start will refuse to start because you are trying to
> bootstrap it into the range of a live node.
>
> > For now, we are planning on using our own relay mechanism to transfer
> > data changes from one datacenter to another.
>
> Are you planning to use the streaming commitlog functionality for
> this? Not sure how you would capture all changes otherwise, except
> having your app just write the same thing to multiple places? Unless
> data timestamps are identical between clusters, otherwise identical
> data will not merge properly, as cassandra uses data timestamps to
> merge.
>
> > Question 2: is this a sane strategy?
>
> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?
>
> > Question 3: eventually, we want to turn all these cassandra clusters
> into one
> > large multi-datacenter cluster. What's the best practice to do this?
> Should I
> > just add nodes from all datacenters to the list of seeds and let
> cassandra
> > resolve differences? Is there another way I don't know about?
>
> If you are using NetworkTopologyStrategy and have the same cluster
> name for your isolated clusters, all you need to do is :
>
> 1) configure NTS to store replicas on a per-datacenter basis
> 2) ensure that your nodes are in different logical data centers (by
> default, all nodes are in DC1/rack1)
> 3) ensure that clusters are able to reach each other
> 4) ensure that tokens do not overlap between clusters (the common
> technique with manual token assignment is that each node gets a range
> which is off-by-one)
> 5) ensure that all nodes seed lists contain (recommended) 3 seeds from
> each DC
> 6) rolling restart (so the new seed list is picked up)
> 7) repair ("should" only be required if writes have not replicated via
> your out of band mechanism)
>
> Vnodes change the picture slightly because the chance of your clusters
> having conflicting tokens increases with the number of token ranges
> you have.
>
> =Rob
>


Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread Wei Zhu
You have a lot of Dropped Mutations which means those writes might not go 
through. Since you have CL.ONE as write consistency, your client doesn't see 
the exception if write fails only on one node. 
I think hints are only stored when the other node is down, not on the dropped 
mutations. (Correct me if I am wrong, actually it's not a bad idea to store 
hints for dropped mutations and replay them later?) 

To solve your issue, as I mentioned, either do nodetool repair, or increase 
your consistency level. By the way, you probably write faster than your cluster 
can handle if you see that many dropped mutations. 

-Wei 

- Original Message -

From: "James Lee"  
To: user@cassandra.apache.org 
Sent: Wednesday, June 19, 2013 2:22:39 AM 
Subject: RE: Data not fully replicated with 2 nodes and replication factor 2 

The test tool I am using catches any exceptions on the original writes and 
resubmits the write request until it's successful (bailing out after 5 
failures). So for each key Cassandra has reported a successful write. 


Nodetool says the following - I'm guessing the pending hinted handoff is the 
interesting bit? 

comet-mvs01:/dsc-cassandra-1.2.2# ./bin/nodetool tpstats 
Pool Name Active Pending Completed Blocked All time blocked 
ReadStage 0 0 35445 0 0 
RequestResponseStage 0 0 1535171 0 0 
MutationStage 0 0 3038941 0 0 
ReadRepairStage 0 0 2695 0 0 
ReplicateOnWriteStage 0 0 0 0 0 
GossipStage 0 0 2898 0 0 
AntiEntropyStage 0 0 0 0 0 
MigrationStage 0 0 245 0 0 
MemtablePostFlusher 0 0 1260 0 0 
FlushWriter 0 0 633 0 212 
MiscStage 0 0 0 0 0 
commitlog_archiver 0 0 0 0 0 
InternalResponseStage 0 0 0 0 0 
HintedHandoff 1 1 0 0 0 

Message type Dropped 
RANGE_SLICE 0 
READ_REPAIR 0 
BINARY 0 
READ 0 
MUTATION 60427 
_TRACE 0 
REQUEST_RESPONSE 0 


Looking at the hints column family in the system keyspace, I see one row with a 
large number of columns. Presumably that along with the nodetool output above 
suggests there are hinted handoffs pending? How long should I expect these to 
remain for? 

Ah, actually now that I re-run the command it seems that nodetool now reports 
that hint as completed and there are no hints left in the system keyspace on 
either node. I'm still seeing failures to read the data I'm expecting though, 
as before. Note that I've run this with a smaller data set (2M rows, 1GB data 
total) for this latest test. 

Thanks, 
James 


-Original Message- 
From: Robert Coli [mailto:rc...@eventbrite.com] 
Sent: 18 June 2013 19:45 
To: user@cassandra.apache.org 
Subject: Re: Data not fully replicated with 2 nodes and replication factor 2 

On Tue, Jun 18, 2013 at 11:36 AM, Wei Zhu  wrote: 
> Cassandra doesn't do async replication like HBase does.You can run 
> nodetool repair to insure the consistency. 

While this answer is true, it is somewhat non-responsive to the OP. 

If the OP didn't see timeout exception, the theoretical worst case is that he 
should have hints stored for initially failed to replicate writes. His nodes 
should not be failing GC with a total data size of 5gb on an 8gb heap, so those 
hints should deliver quite quickly. After 
30 minutes those hints should certainly be delivered. 

@OP : do you see hints being stored? does nodetool tpstats indicate dropped 
messages? 

=Rob 



Re: timeuuid and cql3 query

2013-06-19 Thread Francisco Andrades Grassi
Hi,

I believe what he's recommending is:

CREATE TABLE count3 (
  counter text,
  ts timeuuid,
  key1 text,
  value int,
  PRIMARY KEY (counter, ts)
)

That way counter will be your partitioning key, and all the rows that have the 
same counter value will be clustered (stored as a single wide row sorted by the 
ts value). In this scenario the query:

 where counter = 'test' and ts > minTimeuuid('2013-06-18 16:23:00') and ts < 
minTimeuuid('2013-06-18 16:24:00');

would actually be a sequential read on a wide row on a single node.

--
Francisco Andrades Grassi
www.bigjocker.com
@bigjocker

On Jun 19, 2013, at 12:17 PM, "Ryan, Brent"  wrote:

> Tyler,
> 
> You're recommending this schema instead, correct?
> 
> CREATE TABLE count3 (
>   counter text,
>   ts timeuuid,
>   key1 text,
>   value int,
>   PRIMARY KEY (ts, counter)
> )
> 
> I believe I tried this as well and ran into similar problems but I'll try it 
> again.  I'm using the "ByteOrderedPartitioner" if that helps with the latest 
> version of DSE community edition which I believe is Cassandra 1.2.3.
> 
> 
> Thanks,
> Brent
> 
> 
> From: Tyler Hobbs 
> Reply-To: "user@cassandra.apache.org" 
> Date: Wednesday, June 19, 2013 11:00 AM
> To: "user@cassandra.apache.org" 
> Subject: Re: timeuuid and cql3 query
> 
> 
> On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent  wrote:
> 
> CREATE TABLE count3 (
>   counter text,
>   ts timeuuid,
>   key1 text,
>   value int,
>   PRIMARY KEY ((counter, ts))
> )
> 
> Instead of doing a composite partition key, remove a set of parens and let ts 
> be your clustering key.  That will cause cql rows to be stored in sorted 
> order by the ts column (for a given value of "counter") and allow you to do 
> the kind of query you're looking for.
> 
> 
> -- 
> Tyler Hobbs
> DataStax



Re: Date range queries

2013-06-19 Thread Christopher J. Bottaro
Interesting, thank you for the reply.

Two questions though...

Why should created_at come before question_id in the primary key?  In other
words, why (user_id, created_at, question_id) instead of (user_id,
question_id, created_at)?

Given this setup, all a user's answers (all 10k) will be stored in a single
C* (internal, not cql) row?  I thought having "fat" or "big" rows was bad.
 I worked with Cassandra 0.6 at my previous job and given the nature of our
work, we would sometimes generate these "fat" rows... at which point
Cassandra would basically shit the bed.

Thanks for the help.


On Wed, Jun 19, 2013 at 12:26 PM, David McNelis  wrote:

> I think you'd just be better served with just a little different primary
> key.
>
> If your primary key was (user_id, created_at)  or (user_id, created_at,
> question_id), then you'd be able to run the above query without a problem.
>
> This will mean that the entire pantheon of a specific user_id will be
> stored as a 'row' (in the old style C* vernacular), and then the
> information would be ordered by the 2nd piece of the primary key (or 2nd,
> then 3rd if you included question_id).
>
> You would certainly want to include any field that makes a record unique
> in the primary key.  Another thing to note is that if a field is part of
> the primary key you can not create a secondary index on that field.  You
> can work around that by storing the field twice, but you might want to
> rethink your structure if you find yourself doing that often.
>
>
> On Wed, Jun 19, 2013 at 12:05 PM, Christopher J. Bottaro <
> cjbott...@academicworks.com> wrote:
>
>> Hello,
>>
>> We are considering using Cassandra and I want to make sure our use case
>> fits Cassandra's strengths.  We have the table like:
>>
>> answers
>> ---
>> user_id | question_id | result | created_at
>>
>> Where our most common query will be something like:
>>
>> SELECT * FROM answers WHERE user_id = 123 AND created_at > '01/01/2012'
>> AND created_at < '01/01/2013'
>>
>> Sometimes we will also limit by a question_id or a list of question_ids.
>>
>> Secondary indexes will be created on user_id and question_id.  We expect
>> the upper bound of number of answers for a given user to be around 10,000.
>>
>> Now my understanding of how Cassandra will run the aforementioned query
>> is that it will load all the answers for a given user into memory using the
>> secondary index, then scan over that set filtering based on the dates.
>>
>> Considering that that will be our most used query and it will happen very
>> often, is this a bad use case for Cassandra?
>>
>> Thanks for the help.
>>
>
>


Re: Date range queries

2013-06-19 Thread David McNelis
So, if you want to grab by the created_at and occasionally limit by
question id, that is why you'd use created_at.

The way the primary keys work is the first part of the primary key is the
Partioner key, that field is what essentially is the single cassandra row.
 The second key is the order preserving key, so you can sort by that key.
 If you have a third piece, then that is the secondary order preserving key.

The reason you'd want to do (user_id, created_at, question_id) is because
when you do a query on the keys, if you MUST use the preceding pieces of
the primary key.  So in your case, you could not do a query with just
user_id and question_id with the user-created-question key.  Alternatively
if you went with (user_id, question_id, created_at), you would not be able
to include a range of created_at unless you were also filtering on the
question_id.

Does that make sense?

As for the large rows, 10k is unlikely to cause you too many issues (unless
the answer is potentially a big blob of text).  Newer versions of cassandra
deal with a lot of things in far, far, superior ways to < 1.0.

For a really good primary on keys in cql and how to potentially avoid hot
rows, a really good article to read is this one:
http://thelastpickle.com/2013/01/11/primary-keys-in-cql/  Aaron did a great
job of laying out the subtleties of primary keys in CQL.


On Wed, Jun 19, 2013 at 2:21 PM, Christopher J. Bottaro <
cjbott...@academicworks.com> wrote:

> Interesting, thank you for the reply.
>
> Two questions though...
>
> Why should created_at come before question_id in the primary key?  In
> other words, why (user_id, created_at, question_id) instead of (user_id,
> question_id, created_at)?
>
> Given this setup, all a user's answers (all 10k) will be stored in a
> single C* (internal, not cql) row?  I thought having "fat" or "big" rows
> was bad.  I worked with Cassandra 0.6 at my previous job and given the
> nature of our work, we would sometimes generate these "fat" rows... at
> which point Cassandra would basically shit the bed.
>
> Thanks for the help.
>
>
> On Wed, Jun 19, 2013 at 12:26 PM, David McNelis wrote:
>
>> I think you'd just be better served with just a little different primary
>> key.
>>
>> If your primary key was (user_id, created_at)  or (user_id, created_at,
>> question_id), then you'd be able to run the above query without a problem.
>>
>> This will mean that the entire pantheon of a specific user_id will be
>> stored as a 'row' (in the old style C* vernacular), and then the
>> information would be ordered by the 2nd piece of the primary key (or 2nd,
>> then 3rd if you included question_id).
>>
>> You would certainly want to include any field that makes a record unique
>> in the primary key.  Another thing to note is that if a field is part of
>> the primary key you can not create a secondary index on that field.  You
>> can work around that by storing the field twice, but you might want to
>> rethink your structure if you find yourself doing that often.
>>
>>
>> On Wed, Jun 19, 2013 at 12:05 PM, Christopher J. Bottaro <
>> cjbott...@academicworks.com> wrote:
>>
>>> Hello,
>>>
>>> We are considering using Cassandra and I want to make sure our use case
>>> fits Cassandra's strengths.  We have the table like:
>>>
>>> answers
>>> ---
>>> user_id | question_id | result | created_at
>>>
>>> Where our most common query will be something like:
>>>
>>> SELECT * FROM answers WHERE user_id = 123 AND created_at > '01/01/2012'
>>> AND created_at < '01/01/2013'
>>>
>>> Sometimes we will also limit by a question_id or a list of question_ids.
>>>
>>> Secondary indexes will be created on user_id and question_id.  We expect
>>> the upper bound of number of answers for a given user to be around 10,000.
>>>
>>> Now my understanding of how Cassandra will run the aforementioned query
>>> is that it will load all the answers for a given user into memory using the
>>> secondary index, then scan over that set filtering based on the dates.
>>>
>>> Considering that that will be our most used query and it will happen
>>> very often, is this a bad use case for Cassandra?
>>>
>>> Thanks for the help.
>>>
>>
>>
>


Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread Robert Coli
On Wed, Jun 19, 2013 at 11:43 AM, Wei Zhu  wrote:
> I think hints are only stored when the other node is down, not on the
> dropped mutations. (Correct me if I am wrong, actually it's not a bad idea
> to store hints for dropped mutations and replay them later?)

This used to be the way it worked pre-1.0...

https://issues.apache.org/jira/browse/CASSANDRA-2034

In modern cassandra, anything but a successful ack from a coordinated
write results in a hint on the coordinator.

> To solve your issue, as I mentioned, either do nodetool repair, or increase
> your consistency level.  By the way, you probably write faster than your
> cluster can handle if you see that many dropped mutations.

If his hints are ultimately delivered, OP should not "need" repair to
be consistent.

=Rob


error on startup: unable to find sufficient sources for streaming range

2013-06-19 Thread Faraaz Sareshwala
Hi,

I couldn't find any information on the following error so I apologize if it has
already been discussed.

On some of my nodes, I'm getting the following exception when cassandra starts
up:

2013-06-19 22:17:39.480414500 Exception encountered during startup: unable to 
find sufficient sources for streaming range 
(-4250921392403750427,-4250887922781325324]
2013-06-19 22:17:39.482733500 ERROR Exception in thread 
Thread[StorageServiceShutdownHook,5,main] 
(CassandraDaemon.java:org.apache.cassandra.service.CassandraDaemon$1:175)
2013-06-19 22:17:39.482735500 java.lang.NullPointerException
2013-06-19 22:17:39.482735500   at 
org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321)
2013-06-19 22:17:39.482736500   at 
org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:362)
2013-06-19 22:17:39.482736500   at 
org.apache.cassandra.service.StorageService.access$000(StorageService.java:88)
2013-06-19 22:17:39.482751500   at 
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:513)
2013-06-19 22:17:39.482752500   at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
2013-06-19 22:17:39.482752500   at java.lang.Thread.run(Thread.java:662)

Can someone point me to more information about what could cause this error?

Faraaz


Performance Difference between Cassandra version

2013-06-19 Thread Raihan Jamal
I am trying to see whether there will be any performance difference between
Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly?

Has anyone seen any major performance difference?


Re: Performance Difference between Cassandra version

2013-06-19 Thread Franc Carter
On Thu, Jun 20, 2013 at 9:18 AM, Raihan Jamal  wrote:

> I am trying to see whether there will be any performance difference
> between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly?
>
> Has anyone seen any major performance difference?
>

We are part way through a performance comparison between 1.0.9 with Size
Tiered Compaction and 1.2.4 with Leveled Compaction - for our use case it
looks like a significant performance improvement on the read side.  We are
finding compaction lags when we do very large bulk loads, but for us this
is an initialisation task and that's a reasonable trade-off

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread Wei Zhu
Rob, 
Thanks. 
I was not aware of that. So we can avoid repair if there is no hardware 
failure...I found a blog: 

http://www.datastax.com/dev/blog/modern-hinted-handoff 

-Wei 

- Original Message -

From: "Robert Coli"  
To: user@cassandra.apache.org, "Wei Zhu"  
Sent: Wednesday, June 19, 2013 12:58:45 PM 
Subject: Re: Data not fully replicated with 2 nodes and replication factor 2 

On Wed, Jun 19, 2013 at 11:43 AM, Wei Zhu  wrote: 
> I think hints are only stored when the other node is down, not on the 
> dropped mutations. (Correct me if I am wrong, actually it's not a bad idea 
> to store hints for dropped mutations and replay them later?) 

This used to be the way it worked pre-1.0... 

https://issues.apache.org/jira/browse/CASSANDRA-2034 

In modern cassandra, anything but a successful ack from a coordinated 
write results in a hint on the coordinator. 

> To solve your issue, as I mentioned, either do nodetool repair, or increase 
> your consistency level. By the way, you probably write faster than your 
> cluster can handle if you see that many dropped mutations. 

If his hints are ultimately delivered, OP should not "need" repair to 
be consistent. 

=Rob 



Re: Unit Testing Cassandra

2013-06-19 Thread Shahab Yunus
Thanks Edward, Ben and Dean for the pointers. Yes, I am using Java and
these sounds promising for unit testing, at least.

Regards,
Shahab


On Wed, Jun 19, 2013 at 9:58 AM, Edward Capriolo wrote:

> You really do not need much in java you can use the embedded server.
> Hector wrap a simple class around thiscalled  EmbeddedServerHelper
>
>
> On Wednesday, June 19, 2013, Ben Boule  wrote:
> > Hi Shabab,
> >
> > Cassandra-Unit has been helpful for us for running unit tests without
> requiring a real cassandra instance to be running.   We only use this to
> test our "DAO" code which interacts with the Cassandra client.  It
> basically starts up an embedded instance of cassandra and fools your
> client/driver into using it.  It uses a non-standard port and you just need
> to make sure you can set the port as a parameter into your client code.
> >
> > https://github.com/jsevellec/cassandra-unit
> >
> > One important thing is to either clear out the keyspace in between tests
> or carefully separate your data so different tests don't collide with each
> other in the embedded database.
> >
> > Setup/tear down time is pretty reasonable.
> >
> > Ben
> > 
> > From: Shahab Yunus [shahab.yu...@gmail.com]
> > Sent: Wednesday, June 19, 2013 8:46 AM
> > To: user@cassandra.apache.org
> > Subject: Re: Unit Testing Cassandra
> >
> > Thanks Stephen for you reply and explanation. My bad that I mixed those
> up and wasn't clear enough. Yes, I have different 2 requests/questions.
> > 1) One is for the unit testing.
> > 2) Second (in which I am more interested in) is for performance
> (stress/load) testing. Let us keep integration aside for now.
> > I do see some stuff out there but wanted to know recommendations from
> the community given their experience.
> > Regards,
> > Shahab
> >
> > On Wed, Jun 19, 2013 at 3:15 AM, Stephen Connolly <
> stephen.alan.conno...@gmail.com> wrote:
> >>
> >> Unit testing means testing in isolation the smallest part.
> >> Unit tests should not take more than a few milliseconds to set up and
> verify their assertions.
> >> As such, if your code is not factored well for testing, you would
> typically use mocking (either by hand, or with mocking libraries) to mock
> out the bits not under test.
> >> Extensive use of mocks is usually a smell of code that is not well
> designed *for testing*
> >> If you intend to test components integrated together... That is
> integration testing.
> >> If you intend to test performance of the whole or significant parts of
> the whole... That is performance testing.
> >> When searching for the above, you will not get much luck if you are
> looking for them in the context of "unit testing" as those things are
> *outside the scope of unit testing"
> >>
> >> On Wednesday, 19 June 2013, Shahab Yunus wrote:
> >>>
> >>> Hello,
> >>>
> >>> Can anyone suggest a good/popular Unit Test tools/frameworks/utilities
> out
> >>> there for unit testing Cassandra stores? I am looking for testing from
> performance/load and monitoring perspective. I am using 1.2.
> >>>
> >>> Thanks a lot.
> >>>
> >>> Regards,
> >>> Shahab
> >>
> >>
> >> --
> >> Sent from my phone
> >
> > This electronic message contains information which may be confidential
> or privileged. The information is intended for the use of the individual or
> entity named above. If you are not the intended recipient, be aware that
> any disclosure, copying, distribution or use of the contents of this
> information is prohibited. If you have received this electronic
> transmission in error, please notify us by e-mail at (
> postmas...@rapid7.com) immediately.
>