[RELEASE] Apache Cassandra 2.0.3 released

2013-11-25 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.3.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/epkcOu (CHANGES.txt)
[2]: http://goo.gl/V66rGy (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 1.2.12 released

2013-11-25 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.12.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Enjoy!

[1]: http://goo.gl/iN7sl1 (CHANGES.txt)
[2]: http://goo.gl/qgO6NI (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Nodetool cleanup

2013-11-25 Thread Julien Campan
Hi,

I'm working with Cassandra 1.2.2 and I have a question about nodetool
cleanup.
In the documentation , it's writted " Wait for cleanup to complete on one
node before doing the next"

I would like to know, why we can't perform a lot of cleanup in a same time
?


Thanks


Multiple writers writing to a cassandra node...

2013-11-25 Thread Krishna Chaitanya
Hello,
   I am a newbie to the Cassandra world. I would like to know if its
possible for two different nodes to write to a single Cassandra node. I
have a packet collector software which runs in two different systems. I
would like both of them to write the packets to a single node(same keyspace
and columnfamily). Currently using Cassandra 2.0.0 with libQtCassandra
library.
 Currently, I am getting a
IllegalRequestException, what (): Default TException on the first system,
the moment I try to store from the second system, but the second system
works fine. When I restart the program on the first system, the second
system gets the exception and the first one works fine. Occasionally, also
hitting "frame size has negative value" thrift exception when the traffic
is high and packets are getting stored very fast.
  Can someone please point out what I am doing wrong?  Thanks in advance..


Re: Nodetool cleanup

2013-11-25 Thread Artur Kronenberg

Hi Julien,

I hope I get this right :)

a repair will trigger a mayor compaction on your node which will take up 
a lot of CPU and IO performance. It needs to do this to build up the 
data structure that is used for the repair. After the compaction this is 
streamed to the different nodes in order to repair them.


If you trigger this on every node simultaneously you basically take the 
performance away from your cluster. I would expect cassandra still to 
function, just way slower then before. Triggering it node after node 
will leave your cluster with more resources to handle incoming requests.



Cheers,

Artur
On 25/11/13 15:12, Julien Campan wrote:

Hi,

I'm working with Cassandra 1.2.2 and I have a question about nodetool 
cleanup.
In the documentation , it's writted "Wait for cleanup to complete on 
one node before doing the next"


I would like to know, why we can't perform a lot of cleanup in a same 
time ?



Thanks






Data loss when swapping out cluster

2013-11-25 Thread Christopher J. Bottaro
Hello,

We recently experienced (pretty severe) data loss after moving our 4 node
Cassandra cluster from one EC2 availability zone to another.  Our strategy
for doing so was as follows:

   - One at a time, bring up new nodes in the new availability zone and
   have them join the cluster.
   - One at a time, decommission the old nodes in the old availability zone
   and turn them off (stop the Cassandra process).

Everything seemed to work as expected.  As we decommissioned each node, we
checked the logs for messages indicating "yes, this node is done
decommissioning" before turning the node off.

Pretty quickly after the old nodes left the cluster, we started getting
client calls about data missing.

We immediately turned the old nodes back on and when they rejoined the
cluster *most* of the reported missing data returned.  For the rest of the
missing data, we had to spin up a new cluster from EBS snapshots and copy
it over.

What did we do wrong?

In hindsight, we noticed a few things which may be clues...

   - The new nodes had much lower load after joining the cluster than the
   old ones (3-4 gb as opposed to 10 gb).
   - We have EC2Snitch turned on, although we're using SimpleStrategy for
   replication.
   - The new nodes showed even ownership (via nodetool status) after
   joining the cluster.

Here's more info about our cluster...

   - Cassandra 1.2.10
   - Replication factor of 3
   - Vnodes with 256 tokens
   - All tables made via CQL
   - Data dirs on EBS (yes, we are aware of the performance implications)


Thanks for the help.


Re: Cassandra high heap utilization under heavy reads and writes.

2013-11-25 Thread Christopher J. Bottaro
Yes, we saw this same behavior.

A couple of months ago, we moved a large portion of our data out of
Postgres and into Cassandra.  The initial migration was done in a
"distributed" manner:  we had 600 (or 800, can't remember) processes
reading from Postgres and writing to Cassandra in tight loops.  This caused
the exact behavior you described.  We also did a read before a write.

After we got through the initial data migration, our normal workload is
*much* less writes (and reads for that matter) such that our cluster can
easily handle it, so we didn't investigate further.

-- C


On Sat, Nov 23, 2013 at 10:55 PM, srmore  wrote:

> Hello,
> We moved to cassandra 1.2.9 from 1.0.11 to take advantage of the off-heap
> bloom filters and other improvements.
>
> We see a lot of messages dropped under high load conditions. We noticed
> that when we do heavy read AND write simultaneously (we read first and
> check whether the key exists if not we write it) Cassandra heap increases
> dramatically and then gossip marks the node down (as a result of high load
> on the node).
>
>
> Under heavy 'reads only' we don't see this behavior.  Has anyone seen this
> behavior ? any suggestions.
>
> Thanks !
>
>
>


Re: nodetool repair seems to increase linearly with number of keyspaces

2013-11-25 Thread Christopher J. Bottaro
We have the same setup:  one keyspace per client, and currently about 300
keyspaces.  nodetool repair takes a long time, 4 hours with -pr on a single
node.  We have a 4 node cluster with about 10 gb per node.  Unfortunately,
we haven't been keeping track of the running time as keyspaces, or load,
increases.

-- C


On Wed, Nov 20, 2013 at 6:53 AM, John Pyeatt wrote:

> We have an application that has been designed to use potentially 100s of
> keyspaces (one for each company).
>
> One thing we are noticing is that nodetool repair across all of the
> keyspaces seems to increase linearly based on the number of keyspaces. For
> example, if we have a 6 node ec2 (m1.large) cluster across 3 Availability
> Zones and create 20 keyspaces a nodetool repair -pr on one node takes 3
> hours even with no data in any of the keyspaces. If I bump that up to 40
> keyspaces it takes 6 hours.
>
> Is this the behaviour you would expect?
>
> Is there anything you can think of (short of redesigning the cluster to
> limit keyspaces) to increase the performance of the nodetool repairs?
>
> My obvious concern is that as this application grows and we get more
> companies using our it we will eventually have too many keyspaces to
> perform repairs on the cluster.
>
> --
> John Pyeatt
> Singlewire Software, LLC
> www.singlewire.com
> --
> 608.661.1184
> john.pye...@singlewire.com
>


Prepare is 100 to 200 times slower after migration to 1.2.11

2013-11-25 Thread Shahryar Sedghi
Hi

I have migrated my DEV environment from 1.2.8 to 1.2.11 to finally move to
2.0.2,  and prepare is 100 to 200 times slower, something that was sub
millisecond  now is 150 ms. Other CQL operations are normal.

I am nor planning to move to 2,0.2 until I fix this. I do  not see any warn
or error in the log, the only thing I saw was live ratio around 6.

Please help.

Thanks

Shahryar

--


Re: How to set Cassandra config directory path

2013-11-25 Thread Aaron Morton
> I noticed when I gave the path directly to cassandra.yaml, it works fine. 
> Can't I give the directory path here, as mentioned in the doc?
Documentation is wrong, the -Dcassandra.config param is used for the path of 
the yaml file not the config directory. 

I’ve emailed d...@datastax.com to let them know. 

> What I really want to do is to give the cassandra-topology.properties path to 
> Cassandra.
Set the CASSANDRA_CONF env var in cassandra-in.sh


Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 6:15 am, Bhathiya Jayasekara  wrote:

> Hi all,
> 
> I'm trying to set conf directory path to Cassandra. According to [1], I can 
> set it using a system variable as cassandra.config= 
> 
> But it doesn't seem to work for me when I give conf directory path. I get 
> following exception.
> 
> [2013-11-20 22:24:38,273] ERROR 
> {org.apache.cassandra.config.DatabaseDescriptor} -  Fatal configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Cannot locate 
> /home/bhathiya/cassandra/conf/etc
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.getStorageConfigURL(DatabaseDescriptor.java:117)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:134)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:126)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:216)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:446)
>   at 
> org.wso2.carbon.cassandra.server.CassandraServerController$1.run(CassandraServerController.java:48)
>   at java.lang.Thread.run(Thread.java:662)
> Cannot locate /home/bhathiya/cassandra/conf/etc
> Fatal configuration error; unable to start server.  See log for stacktrace.
> 
> I noticed when I gave the path directly to cassandra.yaml, it works fine. 
> Can't I give the directory path here, as mentioned in the doc?
> 
> What I really want to do is to give the cassandra-topology.properties path to 
> Cassandra.
> 
> [1] 
> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/tools/toolsCUtility_t.html
> 
> 
> Thanks,
> Bhathiya
> 
> 



Re: Cannot TRUNCATE

2013-11-25 Thread Aaron Morton
If it’s just a test system nuke it and try again :)

Was there more than one node at any time ? Does nodetool status show only one 
node ? 

Cheers
 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 7:45 am, Robert Wille  wrote:

> I've got a single node with all empty tables, and truncate fails with the 
> following error: Unable to complete request: one or more nodes were 
> unavailable.
> 
> Everything else seems fine. I can insert, update, delete, etc.
> 
> The only thing in the logs that looks relevant is this:
> 
> INFO [HANDSHAKE-/192.168.98.121] 2013-11-20 11:36:59,064 
> OutboundTcpConnection.java (line 386) Handshaking version with /192.168.98.121
> INFO [HANDSHAKE-/192.168.98.121] 2013-11-20 11:37:04,064 
> OutboundTcpConnection.java (line 395) Cannot handshake version with 
> /192.168.98.121
> 
> I'm running Cassandra 2.0.2. I get the same error in cqlsh as I do with the 
> java driver.
> 
> Thanks
> 
> Robert



Re: Config changes to leverage new hardware

2013-11-25 Thread Aaron Morton
> However, for both writes and reads there was virtually no difference in the 
> latencies.
What sort of latency were you getting ? 

> I’m still not very sure where the current *write* bottleneck is though. 
What numbers are you getting ? 
Could the bottle neck be the client ? Can it send writes fast enough to 
saturate the nodes ?

As a rule of thumb you should get 3,000 to 4,000 (non counter) writes per 
second per core. 

> Sample iostat data (captured every 10s) for the dedicated disk where commit 
> logs are written is below. Does this seem like a bottle neck?
Does not look too bad. 

> Another interesting thing is that the linux disk cache doesn’t seem to be 
> growing in spite of a lot of free memory available. 
Things will only get paged in when they are accessed. 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 12:42 pm, Arindam Barua  wrote:

>  
> Thanks for the suggestions Aaron.
>  
> As a follow up, we ran a bunch of tests with different combinations of these 
> changes on a 2-node ring. The load was generated using cassandra-stress, run 
> with default values to write 30 million rows, and read them back.
> However, for both writes and reads there was virtually no difference in the 
> latencies.
>  
> The different combinations attempted:
> 1.   Baseline test with none of the below changes.
> 2.   Grabbing the TLAB setting from 1.2
> 3.   Moving the commit logs too to the 7 disk RAID 0.
> 4.   Increasing the concurrent_read to 32, and concurrent_write to 64
> 5.   (3) + (4), i.e. moving commit logs to the RAID + increasing 
> concurrent_read and concurrent_write config to 32 and 64.
>  
> The write latencies were very similar, except them being ~3x worse for the 
> 99.9th percentile and above for scenario (5) above.
> The read latencies were also similar, with (3) and (5) being a little worse 
> for the 99.99th percentile.
>  
> Overall, not making any changes, i.e. (1) performed as well or slightly 
> better than any of the other changes.
>  
> Running cassandra-stress on both the old and new hardware without making any 
> config changes, the write performance was very similar, but the new hardware 
> did show ~10x improvement in the read for the 99.9th percentile and higher. 
> After thinking about this, the reason why we were not seeing any difference 
> with our test framework was perhaps the nature of the test where we write the 
> rows, and then do a bunch of reads to read the rows that were just written 
> immediately following. The data is read back from the memtables, and never 
> from the disk/sstables. Hence the new hardware’s increased RAM and size of 
> the disk cache or higher number of disks never helps.
>  
> I’m still not very sure where the current *write* bottleneck is though. The 
> new hardware has 32 cores vs 8 cores of the old hardware. Moving the commit 
> log from a dedicated disk to a 7 RAID-0 disk system (where it would be shared 
> by other data though) didn’t make a difference too. (unless the extra 
> contention on the RAID nullified the positive effects of the RAID).
>  
> Sample iostat data (captured every 10s) for the dedicated disk where commit 
> logs are written is below. Does this seem like a bottle neck? When the commit 
> logs are written the await/svctm ratio is high.
>  
> Device: rrqm/s   wrqm/s   r/s   w/srMB/swMB/s avgrq-sz 
> avgqu-sz   await  svctm  %util
>0.00 8.09  0.04  8.85 0.00 0.0715.74 0.00  
>   0.12   0.03   0.02
>0.00   768.03  0.00  9.49 0.00 3.04   655.41 0.04  
>   4.52   0.33   0.31
>0.00 8.10  0.04  8.85 0.00 0.0715.75 0.00  
>   0.12   0.03   0.02
>0.00   752.65  0.00 10.09 0.00 2.98   604.75 0.03  
>   3.00   0.26   0.26
>  
> Another interesting thing is that the linux disk cache doesn’t seem to be 
> growing in spite of a lot of free memory available. The total disk cache used 
> reported by ‘free’ is less than the size of the sstables written with over 
> 100 GB unused RAM.
> Even in production, where we have the older hardware running with 32 GB RAM 
> for a long time now, looking at 5 hosts in 1 DC, only 2.5 GB to 8 GB was used 
> for the disk cache. The Cassandra java process uses the 8 GB allocated to it, 
> and at least 10-15 GB on all the hosts is not used at all.
>  
> Thanks,
> Arindam
>  
> From: Aaron Morton [mailto:aa...@thelastpickle.com] 
> Sent: Wednesday, November 06, 2013 8:34 PM
> To: Cassandra User
> Subject: Re: Config changes to leverage new hardware
>  
> Running Cassandra 1.1.5 currently, but evaluating to upgrade to 1.2.11 soon.
> You will make more use of the extra memory moving to 1.2 as it moves bloom 
> filters and compression data off heap. 
>  
> Also grab the TLAB setting from cassandra-env.sh in v1.2
>  
> As

Re: Is there any open source software for automatized deploy C* in PRD?

2013-11-25 Thread Aaron Morton
> Thanks, But I suppose it’s just for Debian? Am I right?
There are debian and rpm packages, and people deploy them or the binary 
packages with with chef and similar tools. 

It may be easier to answer your question if you describe the specific platform 
/ needs. 

cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 10:35 pm, Boole.Z.Guo (mis.cnsh04.Newegg) 41442 
 wrote:

> Thanks, But I suppose it’s just for Debian? Am I right?
> Any others?
>  
> Best Regards,
> Boole Guo
> Software Engineer, NESC-SH.MIS
> +86-021-51530666*41442
> Floor 19, KaiKai Plaza, 888, Wanhangdu Rd, Shanghai (200042)
>  
> 发件人: Mike Adamson [mailto:mikeat...@gmail.com] 
> 发送时间: 2013年11月21日 17:16
> 收件人: user@cassandra.apache.org
> 主题: Re: Is there any open source software for automatized deploy C* in PRD?
>  
> Hi Boole,
> 
> Have you tried chef? There is this cookbook for deploying cassandra:
> 
> http://community.opscode.com/cookbooks/cassandra
> 
> MikeA
>  
> 
> On 21 November 2013 01:33, Boole.Z.Guo (mis.cnsh04.Newegg) 41442 
>  wrote:
> Hi all,
> Is there any open source software for automatized deploy C* in PRD?
>  
> Best Regards,
> Boole Guo
> Software Engineer, NESC-SH.MIS
> +86-021-51530666*41442
> Floor 19, KaiKai Plaza, 888, Wanhangdu Rd, Shanghai (200042)
> ONCE YOU KNOW, YOU NEWEGG.
> 
> CONFIDENTIALITY NOTICE: This email and any files transmitted with it may 
> contain privileged or otherwise confidential information. It is intended only 
> for the person or persons to whom it is addressed. If you received this 
> message in error, you are not authorized to read, print, retain, copy, 
> disclose, disseminate, distribute, or use this message any part thereof or 
> any information contained therein. Please notify the sender immediately and 
> delete all copies of this message. Thank you in advance for your cooperation.
> 保密注意:此邮件及其附随文件可能包含了保密信息。该邮件的目的是发送给指定收件人。如果您非指定收件人而错误地收到了本邮件,您将无权阅读、打印、保存、复制、泄露、传播、分发或使用此邮件全部或部分内容或者邮件中包含的任何信息。请立即通知发件人,并删除该邮件。感谢您的配合!



Re: Migration Cassandra 2.0 to Cassandra 2.0.2

2013-11-25 Thread Aaron Morton
> Mr Coli What's the difference between deploy binaries and the binary package ?
> I upload the binary package on the Apache Cassandra Homepage, Am I wrong ?
Yes you can use the instructions here for the binary package 
http://wiki.apache.org/cassandra/DebianPackaging

When you use the binary package it creates the directory locations, installs 
the init scripts and makes it a lot easier to start and stop cassandra. I  
recommend using them. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 11:06 pm, Bonnet Jonathan. 
 wrote:

> Thanks Mr Coli and Mr Wee for your answears,
> 
> Mr Coli What's the difference between deploy binaries and the binary package ?
> I upload the binary package on the Apache Cassandra Homepage, Am I wrong ?
> 
> Mr Wee i think you hit the right way, cause my lib directory in my
> Cassandra_Home are different between the two versions. In the Home for the
> old version /produits/cassandra/install_cassandra/apache-cassandra-2.0.0/lib
> i have:
> 
> [cassandra@s00vl9925761 lib]$ ls -ltr
> total 14564
> -rw-r- 1 cassandra cassandra  123898 Aug 28 15:07 thrift-server-0.3.0.jar
> -rw-r- 1 cassandra cassandra   42854 Aug 28 15:07
> thrift-python-internal-only-0.7.0.zip
> -rw-r- 1 cassandra cassandra   55066 Aug 28 15:07 snaptree-0.1.jar
> -rw-r- 1 cassandra cassandra 1251514 Aug 28 15:07 snappy-java-1.0.5.jar
> -rw-r- 1 cassandra cassandra  270552 Aug 28 15:07 snakeyaml-1.11.jar
> -rw-r- 1 cassandra cassandra8819 Aug 28 15:07 slf4j-log4j12-1.7.2.jar
> -rw-r- 1 cassandra cassandra   26083 Aug 28 15:07 slf4j-api-1.7.2.jar
> -rw-r- 1 cassandra cassandra  134133 Aug 28 15:07
> servlet-api-2.5-20081211.jar
> -rw-r- 1 cassandra cassandra 1128961 Aug 28 15:07 netty-3.5.9.Final.jar
> -rw-r- 1 cassandra cassandra   80800 Aug 28 15:07 metrics-core-2.0.3.jar
> -rw-r- 1 cassandra cassandra  134748 Aug 28 15:07 lz4-1.1.0.jar
> -rw-r- 1 cassandra cassandra  481534 Aug 28 15:07 log4j-1.2.16.jar
> -rw-r- 1 cassandra cassandra  347531 Aug 28 15:07 libthrift-0.9.0.jar
> -rw-r- 1 cassandra cassandra   16046 Aug 28 15:07 json-simple-1.1.jar
> -rw-r- 1 cassandra cassandra   91183 Aug 28 15:07 jline-1.0.jar
> -rw-r- 1 cassandra cassandra   17750 Aug 28 15:07 jbcrypt-0.3m.jar
> -rw-r- 1 cassandra cassandra5792 Aug 28 15:07 jamm-0.2.5.jar
> -rw-r- 1 cassandra cassandra  765648 Aug 28 15:07
> jackson-mapper-asl-1.9.2.jar
> -rw-r- 1 cassandra cassandra  228286 Aug 28 15:07 
> jackson-core-asl-1.9.2.jar
> -rw-r- 1 cassandra cassandra   96046 Aug 28 15:07 high-scale-lib-1.1.2.jar
> -rw-r- 1 cassandra cassandra 1891110 Aug 28 15:07 guava-13.0.1.jar
> -rw-r- 1 cassandra cassandra   66843 Aug 28 15:07 disruptor-3.0.1.jar
> -rw-r- 1 cassandra cassandra   91982 Aug 28 15:07
> cql-internal-only-1.4.0.zip
> -rw-r- 1 cassandra cassandra   54345 Aug 28 15:07
> concurrentlinkedhashmap-lru-1.3.jar
> -rw-r- 1 cassandra cassandra   25490 Aug 28 15:07 compress-lzf-0.8.4.jar
> -rw-r- 1 cassandra cassandra  284220 Aug 28 15:07 commons-lang-2.6.jar
> -rw-r- 1 cassandra cassandra   30085 Aug 28 15:07 commons-codec-1.2.jar
> -rw-r- 1 cassandra cassandra   36174 Aug 28 15:07 commons-cli-1.1.jar
> -rw-r- 1 cassandra cassandra 1695790 Aug 28 15:07
> apache-cassandra-thrift-2.0.0.jar
> -rw-r- 1 cassandra cassandra   71117 Aug 28 15:07
> apache-cassandra-clientutil-2.0.0.jar
> -rw-r- 1 cassandra cassandra 3265185 Aug 28 15:07 
> apache-cassandra-2.0.0.jar
> -rw-r- 1 cassandra cassandra 1928009 Aug 28 15:07 antlr-3.2.jar
> drwxr-x--- 2 cassandra cassandra4096 Oct  1 14:16 licenses
> 
> In my new home i have
> /produits/cassandra/install_cassandra/apache-cassandra-2.0.2/lib:
> 
> [cassandra@s00vl9925761 lib]$ ls -ltr
> total 9956
> -rw-r- 1 cassandra cassandra  123920 Oct 24 09:21 thrift-server-0.3.2.jar
> -rw-r- 1 cassandra cassandra   52477 Oct 24 09:21
> thrift-python-internal-only-0.9.1.zip
> -rw-r- 1 cassandra cassandra   55066 Oct 24 09:21 snaptree-0.1.jar
> -rw-r- 1 cassandra cassandra 1251514 Oct 24 09:21 snappy-java-1.0.5.jar
> -rw-r- 1 cassandra cassandra  270552 Oct 24 09:21 snakeyaml-1.11.jar
> -rw-r- 1 cassandra cassandra   26083 Oct 24 09:21 slf4j-api-1.7.2.jar
> -rw-r- 1 cassandra cassandra   22291 Oct 24 09:21 
> reporter-config-2.1.0.jar
> -rw-r- 1 cassandra cassandra 1206119 Oct 24 09:21 netty-3.6.6.Final.jar
> -rw-r- 1 cassandra cassandra   82123 Oct 24 09:21 metrics-core-2.2.0.jar
> -rw-r- 1 cassandra cassandra  165505 Oct 24 09:21 lz4-1.2.0.jar
> -rw-r- 1 cassandra cassandra  217054 Oct 24 09:21 libthrift-0.9.1.jar
> -rw-r- 1 cassandra cassandra   16046 Oct 24 09:21 json-simple-1.1.jar
> -rw-r- 1 cassandra cassandra   91183 Oct 24 09:21 jline-1.0.jar
> -rw-r- 1 cassandra cassandra   17750 Oct 24 09:21 jbcr

Re: nodetool repair seems to increase linearly with number of keyspaces

2013-11-25 Thread John Pyeatt
Mr. Bottaro,

About how many column families are in your keyspaces? We have 28 per
keyspace.

Are you using Vnodes? We are and they are set to 256

What version of cassandra are you running. We are running 1.2.9


On Mon, Nov 25, 2013 at 11:36 AM, Christopher J. Bottaro <
cjbott...@academicworks.com> wrote:

> We have the same setup:  one keyspace per client, and currently about 300
> keyspaces.  nodetool repair takes a long time, 4 hours with -pr on a single
> node.  We have a 4 node cluster with about 10 gb per node.  Unfortunately,
> we haven't been keeping track of the running time as keyspaces, or load,
> increases.
>
> -- C
>
>
> On Wed, Nov 20, 2013 at 6:53 AM, John Pyeatt 
> wrote:
>
>> We have an application that has been designed to use potentially 100s of
>> keyspaces (one for each company).
>>
>> One thing we are noticing is that nodetool repair across all of the
>> keyspaces seems to increase linearly based on the number of keyspaces. For
>> example, if we have a 6 node ec2 (m1.large) cluster across 3 Availability
>> Zones and create 20 keyspaces a nodetool repair -pr on one node takes 3
>> hours even with no data in any of the keyspaces. If I bump that up to 40
>> keyspaces it takes 6 hours.
>>
>> Is this the behaviour you would expect?
>>
>> Is there anything you can think of (short of redesigning the cluster to
>> limit keyspaces) to increase the performance of the nodetool repairs?
>>
>> My obvious concern is that as this application grows and we get more
>> companies using our it we will eventually have too many keyspaces to
>> perform repairs on the cluster.
>>
>> --
>> John Pyeatt
>> Singlewire Software, LLC
>> www.singlewire.com
>> --
>> 608.661.1184
>> john.pye...@singlewire.com
>>
>
>


-- 
John Pyeatt
Singlewire Software, LLC
www.singlewire.com
--
608.661.1184
john.pye...@singlewire.com


Inefficiency with large set of small documents?

2013-11-25 Thread onlinespending
I’m trying to decide what noSQL database to use, and I’ve certainly decided 
against mongodb due to its use of mmap. I’m wondering if Cassandra would also 
suffer from a similar inefficiency with small documents. In mongodb, if you 
have a large set of small documents (each much less than the 4KB page size) you 
will require far more RAM to fit your working set into memory, since a large 
percentage of a 4KB chunk could very easily include infrequently accessed data 
outside of your working set. Cassandra doesn’t use mmap, but it would still 
have to intelligently discard the excess data that does not pertain to a small 
document that exists in the same allocation unit on the hard disk when reading 
it into RAM. As an example lets say your cluster size is 4KB as well, and you 
have 1000 small 256 byte documents that are scattered on the disk that you want 
to fetch on a given query (the total number of documents is over 1 billion). I 
want to make sure it only consumes roughly 256,000 bytes for those 1000 
documents and not 4,096,000 bytes. When it first fetches a cluster from disk it 
may consume 4KB of cache, but it should ultimately only ideally consume the 
relevant amount of bytes in RAM. If Cassandra just indiscriminately uses RAM in 
4KB blocks than that is unacceptable to me, because if my working set at any 
given time is just 20% of my huge collection of small sized documents, I don’t 
want to have to use servers with 5X as much RAM. That’s a huge expense.

Thanks,
Ben

P.S. Here’s a detailed post I made this morning in the mongodb user group about 
this topic.

People have often complained that because mongodb memory maps everything and 
leaves memory management to the OS's virtual memory system, the swapping 
algorithm isn't optimized for database usage. I disagree with this. For the 
most part, the swapping or paging algorithm itself can't be much better than 
the sophisticated algorithms (such as LRU based ones) that OSes have refined 
over many years. Why reinvent the wheel? Yes, you could potentially ensure that 
certain data (such as the indexes) never get swapped out to disk, because even 
if they haven't been accessed recently the cost of reading them back into 
memory will be too costly when they are in fact needed. But that's not the 
bigger issue.

It breaks down with small documents << than page size

This is where using virtual memory for everything really becomes an issue. 
Suppose you've got a bunch of really tiny documents (e.g. ~256 bytes) that are 
much smaller than the virtual memory page size (e.g. 4KB). Now let's say that 
you've determined that your working set (e.g. those documents in your 
collection that constitute say 99% of those accessed in a given hour) to be 
20GB. But your entire collection size is actually 100GB (it's just that 20% of 
your documents are much much likely to be accessed in a given time period. It's 
not uncommon that a small minority of documents will be accessed a large 
majority of the time). If your collection is randomly distributed (such as 
would happen if you simply inserted new documents into your collection) then in 
this example only about 20% of the documents that fit onto a 4KB page will be 
part of the working set (i.e. the data that you need frequent access to at the 
moment). The rest of the data will be made up of much less frequently accessed 
documents, that should ideally be sitting on disk. So there's a huge 
inefficiency here. 80% of the data that is in RAM is not even something I need 
to frequently access. In this example, I would need 5X the amount of RAM to 
accommodate my working set.

Now, as a solution to this problem, you could separate your documents into two 
(or even a few) collections with the grouping done by access frequency. The 
problem with this, is that your working set can often change as a function of 
time of day and day of week. If your application is global, your working set 
will be far different during 12pm local in NY vs 12pm local in Tokyo. But more 
even more likely is that the working set is constantly changing as new data is 
inserted into the database. Popularity of a document is often viral. As an 
example, a photo that's posted on a social network may start off infrequently 
accessed but then quickly after hundreds of "likes" could become very 
frequently accessed and part of your working set. You'd need to actively 
monitor your documents and manually move a document from one collection to the 
other, which is very inefficient.

Quite frankly this is not a burden that should be placed on the user anyways. 
By punting the problem of memory management to the OS, mongodb requires the 
user to essentially do its job and group data in a way that patches the 
inefficiencies in its memory management. As far as I'm concerned, not until 
mongodb steps up and takes control of memory management can it be taken 
seriously for very large datasets that often require many small documents with 
ever changing w

Re: Prepare is 100 to 200 times slower after migration to 1.2.11

2013-11-25 Thread Shahryar Sedghi
I did some test and apparently the prepared statement is not cached at all,
in a loop (native protocol, datastax java driver, both 1.3 and 4.0)  I
prepared the same statement  20 times and the elapsed time where almost
identical. I think it has something to do with CASSANDRA-6107 that was
implemented in 1.2.11 and 2.0.2, I did the same test with 2.0.2 and got the
same result. is there a setting for CQL3 prepared cahe that I have missed
in 1.2.11?

Thanks


On Mon, Nov 25, 2013 at 1:02 PM, Shahryar Sedghi  wrote:

>
> Hi
>
> I have migrated my DEV environment from 1.2.8 to 1.2.11 to finally move to
> 2.0.2,  and prepare is 100 to 200 times slower, something that was sub
> millisecond  now is 150 ms. Other CQL operations are normal.
>
> I am nor planning to move to 2,0.2 until I fix this. I do  not see any
> warn or error in the log, the only thing I saw was live ratio around 6.
>
> Please help.
>
> Thanks
>
> Shahryar
>
> --
>
>


-- 
Life has no meaning a priori … It is up to you to give it a meaning, and
value is nothing but the meaning that you choose ~ Jean-Paul Sartre


Re: Cannot TRUNCATE

2013-11-25 Thread Robert Wille
Blowing away the database does indeed seem to fix the problem, but it
doesn't exactly make me feel warm and cozy. I have no idea how the database
got screwed up, so I don't know what to avoid doing so that I don't have
this happen again on a production server. I never had any other nodes, so it
has nothing to do with adding or removing nodes. I guess I just cross my
fingers and hope it doesn't happen again.

Thanks

Robert

From:  Aaron Morton 
Reply-To:  
Date:  Monday, November 25, 2013 12:46 PM
To:  Cassandra User 
Subject:  Re: Cannot TRUNCATE

If it¹s just a test system nuke it and try again :)

Was there more than one node at any time ? Does nodetool status show only
one node ? 

Cheers
 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/11/2013, at 7:45 am, Robert Wille  wrote:

> I've got a single node with all empty tables, and truncate fails with the
> following error: Unable to complete request: one or more nodes were
> unavailable.
> 
> Everything else seems fine. I can insert, update, delete, etc.
> 
> The only thing in the logs that looks relevant is this:
> 
> INFO [HANDSHAKE-/192.168.98.121] 2013-11-20 11:36:59,064
> OutboundTcpConnection.java (line 386) Handshaking version with /192.168.98.121
> INFO [HANDSHAKE-/192.168.98.121] 2013-11-20 11:37:04,064
> OutboundTcpConnection.java (line 395) Cannot handshake version with
> /192.168.98.121
> 
> I'm running Cassandra 2.0.2. I get the same error in cqlsh as I do with the
> java driver.
> 
> Thanks
> 
> Robert





RE: Config changes to leverage new hardware

2013-11-25 Thread Arindam Barua

Here are some calculated 'latency' results reported by cassandra-stress when 
asked to write 10M rows, i.e.
cassandra-stress -d , -n 1000
(we actually had cassandra-stress running in deamon mode for the below tests)


avg_latency

(percentile)


90

99

99.9

99.99

Write: 8 cores, 32 GB, 3-disk RAID 0

0.002982182

0.003963931

0.004692996

0.004792326

Write: 32 cores, 128 GB, 7-disk RAID 0

0.003157515

0.003763181

0.005184429

0.005441946


Read: 8 cores, 32 GB, 3-disk RAID 0

0.002289879

0.057178021

0.173753058

0.24386912

Read: 32 cores, 128 GB, 7-disk RAID 0

0.002317525

0.010937648

0.013205977

0.014270511




The client was another node on the same network with the 8 core, 32 GB RAM 
specs. I wouldn't expect it to bottleneck, but I can monitor it while 
generating the load. In general, what would you expect it to bottleneck at?



>> Another interesting thing is that the linux disk cache doesn't seem to be 
>> growing in spite of a lot of free memory available.

>Things will only get paged in when they are accessed.

Hmm, interesting. I did a test where I just wrote large files to disk, eg.

dd if=/dev/zero of=bigfile18 bs=1M count=1

and checked the disk cache, and it increased by exactly the same size of the 
file written (no reads were done in this case)



-Original Message-
From: Aaron Morton [mailto:aa...@thelastpickle.com]
Sent: Monday, November 25, 2013 11:55 AM
To: Cassandra User
Subject: Re: Config changes to leverage new hardware



> However, for both writes and reads there was virtually no difference in the 
> latencies.

What sort of latency were you getting ?



> I'm still not very sure where the current *write* bottleneck is though.

What numbers are you getting ?

Could the bottle neck be the client ? Can it send writes fast enough to 
saturate the nodes ?



As a rule of thumb you should get 3,000 to 4,000 (non counter) writes per 
second per core.



> Sample iostat data (captured every 10s) for the dedicated disk where commit 
> logs are written is below. Does this seem like a bottle neck?

Does not look too bad.



> Another interesting thing is that the linux disk cache doesn't seem to be 
> growing in spite of a lot of free memory available.

Things will only get paged in when they are accessed.



Cheers





-

Aaron Morton

New Zealand

@aaronmorton



Co-Founder & Principal Consultant

Apache Cassandra Consulting

http://www.thelastpickle.com



On 21/11/2013, at 12:42 pm, Arindam Barua 
mailto:aba...@247-inc.com>> wrote:



>

> Thanks for the suggestions Aaron.

>

> As a follow up, we ran a bunch of tests with different combinations of these 
> changes on a 2-node ring. The load was generated using cassandra-stress, run 
> with default values to write 30 million rows, and read them back.

> However, for both writes and reads there was virtually no difference in the 
> latencies.

>

> The different combinations attempted:

> 1.   Baseline test with none of the below changes.

> 2.   Grabbing the TLAB setting from 1.2

> 3.   Moving the commit logs too to the 7 disk RAID 0.

> 4.   Increasing the concurrent_read to 32, and concurrent_write to 64

> 5.   (3) + (4), i.e. moving commit logs to the RAID + increasing 
> concurrent_read and concurrent_write config to 32 and 64.

>

> The write latencies were very similar, except them being ~3x worse for the 
> 99.9th percentile and above for scenario (5) above.

> The read latencies were also similar, with (3) and (5) being a little worse 
> for the 99.99th percentile.

>

> Overall, not making any changes, i.e. (1) performed as well or slightly 
> better than any of the other changes.

>

> Running cassandra-stress on both the old and new hardware without making any 
> config changes, the write performance was very similar, but the new hardware 
> did show ~10x improvement in the read for the 99.9th percentile and higher. 
> After thinking about this, the reason why we were not seeing any difference 
> with our test framework was perhaps the nature of the test where we write the 
> rows, and then do a bunch of reads to read the rows that were just written 
> immediately following. The data is read back from the memtables, and never 
> from the disk/sstables. Hence the new hardware's increased RAM and size of 
> the disk cache or higher number of disks never helps.

>

> I'm still not very sure where the current *write* bottleneck is though. The 
> new hardware has 32 cores vs 8 cores of the old hardware. Moving the commit 
> log from a dedicated disk to a 7 RAID-0 disk system (where it would be shared 
> by other data though) didn't make a difference too. (unless the extra 
> contention on the RAID nullified the positive effects of the RAID).

>

> Sample iostat data (captured every 10s) for the dedicated disk where commit 
> logs are written is below. Does this seem like a bottle neck? When the commit 
> logs are written the await/svctm ratio is high.

>

> Device

Re: Cannot TRUNCATE

2013-11-25 Thread Robert Coli
On Mon, Nov 25, 2013 at 3:35 PM, Robert Wille  wrote:

> Blowing away the database does indeed seem to fix the problem, but it
> doesn't exactly make me feel warm and cozy. I have no idea how the database
> got screwed up, so I don't know what to avoid doing so that I don't have
> this happen again on a production server. I never had any other nodes, so
> it has nothing to do with adding or removing nodes. I guess I just cross my
> fingers and hope it doesn't happen again.
>

[... snip ...]

> I'm running Cassandra 2.0.2. I get the same error in cqlsh as I do with
> the java driver.
>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

I know this is a slightly glib response, but hopefully whatever bug you
triggered will be resolved by the time 2.0.x series is ready to be run in
production.. :D

=Rob


Re: nodetool repair seems to increase linearly with number of keyspaces

2013-11-25 Thread Robert Coli
On Mon, Nov 25, 2013 at 12:28 PM, John Pyeatt wrote:

> Are you using Vnodes? We are and they are set to 256
> What version of cassandra are you running. We are running 1.2.9
>

Vnode performance vis a vis repair is this JIRA issue :

https://issues.apache.org/jira/browse/CASSANDRA-5220

Unfortunately, in Cassandra 2.0 repair has also been changed to be serial
per replica in the replica set by default, which is (unless I've
misunderstood something...) likely to make it even slower in a direct
relationship to RF.

https://issues.apache.org/jira/browse/CASSANDRA-5950

This will probably necessitate re-visiting of this :

https://issues.apache.org/jira/browse/CASSANDRA-5850

=Rob


Schema disagreement under normal conditions, ALTER TABLE hangs

2013-11-25 Thread Josh Dzielak
Recently we had a strange thing happen. Altering schema (gc_grace_seconds) for 
a column family resulted in a schema disagreement. 3/4 of nodes got it, 1/4 
didn't. There was no partition at the time, nor was there multiple schema 
updates issued. Going to the nodes with stale schema and trying to do the ALTER 
TABLE there resulted in hanging. We were eventually able to get schema 
agreement by restarting nodes, but both the initial disagreement under normal 
conditions and the hanging ALTER TABLE seem pretty weird. Any ideas here? Sound 
like a bug?  

We're on 1.2.8.

Thanks,
Josh

--
Josh Dzielak • Keen IO • @dzello (https://twitter.com/dzello)



Re: Prepare is 100 to 200 times slower after migration to 1.2.11

2013-11-25 Thread Mikhail Stepura
It can be https://issues.apache.org/jira/browse/CASSANDRA-6369 fixed in 
1.2.12/2.0.3

-M


"Shahryar Sedghi"  wrote in message 
news:cajuqix7_jvwbj7sx5p8hvmwy5od5ze7pbtv1y5ttga2aws6...@mail.gmail.com...
I did some test and apparently the prepared statement is not cached at all, in 
a loop (native protocol, datastax java driver, both 1.3 and 4.0)  I prepared 
the same statement  20 times and the elapsed time where almost identical. I 
think it has something to do with CASSANDRA-6107 that was implemented in 1.2.11 
and 2.0.2, I did the same test with 2.0.2 and got the same result. is there a 
setting for CQL3 prepared cahe that I have missed in 1.2.11?


Thanks




On Mon, Nov 25, 2013 at 1:02 PM, Shahryar Sedghi  wrote:


  Hi


  I have migrated my DEV environment from 1.2.8 to 1.2.11 to finally move to 
2.0.2,  and prepare is 100 to 200 times slower, something that was sub 
millisecond  now is 150 ms. Other CQL operations are normal.


  I am nor planning to move to 2,0.2 until I fix this. I do  not see any warn 
or error in the log, the only thing I saw was live ratio around 6.


  Please help.


  Thanks 


  Shahryar


  -- 






-- 

Life has no meaning a priori … It is up to you to give it a meaning, and value 
is nothing but the meaning that you choose ~ Jean-Paul Sartre