RE: Cassandra DevCenter

2018-03-12 Thread Jacques-Henri Berthemet
Hi, There is no DevCenter 2.x, latest is 1.6. It would help if you provide jar names and exceptions you encounter. Make sure you’re not mixing Guava versions from other dependencies. DevCenter uses Datastax driver to connect to Cassandra, double check the versions of the jars you need here: htt

Re: Adding new DC?

2018-03-12 Thread Oleksandr Shulgin
On Sun, Mar 11, 2018 at 10:31 PM, Kunal Gangakhedkar < kgangakhed...@gmail.com> wrote: > Hi all, > > We currently have a cluster in GCE for one of the customers. > They want it to be migrated to AWS. > > I have setup one node in AWS to join into the cluster by following: > https://docs.datastax.co

Re: Adding new DC?

2018-03-12 Thread Rahul Singh
How did you distribute your seed nodes across whole cluster? -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 12, 2018, 5:12 AM -0400, Oleksandr Shulgin , wrote: > > On Sun, Mar 11, 2018 at 10:31 PM, Kunal Gangakhedkar > > wrote: > > > Hi all, > > > > > > We currently have a clust

Re: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
Running two instance of Apache Cassandra on same server, each having their own commit log disk dis not help. Sum of cpu/ram usage for both instances would be less than half of all available resources. disk usage is less than 20% and network is still less than 300Mb in Rx. Sent using Zoho Mai

Re: vnodes: high availability

2018-03-12 Thread Hannu Kröger
If this is a universal recommendation, then should that actually be default in Cassandra? Hannu > On 18 Jan 2018, at 00:49, Jon Haddad wrote: > > I *strongly* recommend disabling dynamic snitch. I’ve seen it make latency > jump 10x. > > dynamic_snitch: false is your friend. > > > >> O

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
What’s your disk latency? What kind of disk is it? -- Jacques-Henri Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 10:48 AM To: user Subject: Re: yet another benchmark bottleneck Running two instance of Apache Cassandra on same server, each havin

Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger
Anyone? > On 4 Mar 2018, at 20:45, Hannu Kröger wrote: > > Hello, > > I am trying to verify and understand fully the functionality of row cache in > Cassandra. > > I have been using mainly two different sources for information: > https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
1.2 TB 15K latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms Sent using Zoho Mail On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet wrote What’s your disk latency? What kind of disk is it? -- Jacques-Henri B

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
Any errors/warning in Cassandra logs? What’s your RF? Using 300MB/s of network bandwidth for only 130 op/s looks very high. -- Jacques-Henri Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 11:38 AM To: user Subject: RE: yet another benchmark bottle

Re: Row cache functionality - Some confusion

2018-03-12 Thread Rahul Singh
What’s the goal? How big are your partitions , size in MB and in rows? -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 12, 2018, 6:37 AM -0400, Hannu Kröger , wrote: > Anyone? > > > On 4 Mar 2018, at 20:45, Hannu Kröger wrote: > > > > Hello, > > > > I am trying to verify and unders

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
RF=1 No errors or warnings. Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first mail, but anyway! the point is: More than half of node resources (cpu, mem, disk, network) is unused and i can't increase write throughput. Sent using Zoho Mail On Mon, 12 Mar 201

Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger
Hi, My goal is to make sure that I understand functionality correctly and that the documentation is accurate. The question in other words: Is the documentation or the comment in the code wrong (or inaccurate). Hannu > On 12 Mar 2018, at 13:00, Rahul Singh wrote: > > What’s the goal? How bi

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
It makes more sense now, 130K is not that bad. According to cassandra.yaml you should be able to increase your number of write threads in Cassandra: # On the other hand, since writes are almost never IO bound, the ideal # number of "concurrent_writes" is dependent on the number of cores in # your

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
no luck even with 320 threads for write Sent using Zoho Mail On Mon, 12 Mar 2018 14:44:15 +0330 Jacques-Henri Berthemet wrote It makes more sense now, 130K is not that bad. According to cassandra.yaml you should be able to increase yo

Archive cassandra old data into Hadoop

2018-03-12 Thread Javier Pareja
Hi, I understand that a well designed cassandra system will allow to query ANY data within it at an incredible speed as well as ingesting data at a very fast pace. However this data is going to grow until it is archived. As I see it, data has two stages, HOT DATA when data is accessible to be que

Re: Row cache functionality - Some confusion

2018-03-12 Thread Rahul Singh
I may be wrong, but what I’ve read and used in the past assumes that the “first” N rows are cached and the clustering key design is how I change what N rows are put into memory. Looking at the code, it seems that’s the case. The language of the comment basically says that it holds in cache what

Re: Archive cassandra old data into Hadoop

2018-03-12 Thread Rahul Singh
HDFS / S3 is a great place to dump this data. You can also consider other types of compaction strategies for “COLD DATA” in not so powerful C* clusters for which the purpose is write only. C* is still better in my opinion for data management than S3/HDFS.  It depends on how easy you want the ret

Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger
> On 12 Mar 2018, at 14:45, Rahul Singh wrote: > > I may be wrong, but what I’ve read and used in the past assumes that the > “first” N rows are cached and the clustering key design is how I change what > N rows are put into memory. Looking at the code, it seems that’s the case. So we agree

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
What happens if you increase number of client threads? Can you add another instance of cassandra-stress on another host? -- Jacques-Henri Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 12:50 PM To: user Subject: RE: yet another benchmark bottlenec

RE: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
I mentioned that already tested increasing client threads + many stress-client instances in one node + two stress-client in two separate nodes, in all of them the sum of throughputs is less than 130K. I've been tuning all aspects of OS and Cassandra (whatever I've seen in config files!) for two

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
If throughput decreases as you add more load then it’s probably due to disk latency, can you test SDDs? Are you using VMWare ESXi? -- Jacques-Henri Berthemet From: onmstester onmstester [mailto:onmstes...@zoho.com] Sent: Monday, March 12, 2018 2:15 PM To: user Subject: RE: yet another benchmark

What versions should the documentation support now?

2018-03-12 Thread Kenneth Brotman
I'm unclear what versions are most popular right now? What version are you running? What version should still be supported in the documentation? For example, I'm turning my attention back to writing a section on adding a data center. What versions should I support in that information? I'm

Re: What versions should the documentation support now?

2018-03-12 Thread Hannu Kröger
In my opinion, a good documentation should somehow include version specific pieces of information. Whether it is nodetool command that came in certain version or parameter for something or something else. That would very useful. It’s confusing if I see documentation talking about 4.0 specifics

Re: yet another benchmark bottleneck

2018-03-12 Thread Michael Burman
Although low amount of updates, it's possible that you hit a contention bug. A simple test would be to add multiple Cassandra nodes on the same physical node (like split your 20 cores to 5 instances of Cassandra). If you get much higher throughput, then you have an answer.. I don't think a sin

RE: What versions should the documentation support now?

2018-03-12 Thread Kenneth Brotman
If we use DataStax’s example, we would have instructions for v3.0 and v2.1. How’s that? We should have to be instructions for the cloud platforms like AWS but how do you do that and stay vendor neutral? Kenneth Brotman From: Hannu Kröger [mailto:hkro...@gmail.com] Sent: Monday, Ma

Re: What versions should the documentation support now?

2018-03-12 Thread Jonathan Haddad
The docs are in tree, meaning they are versioned, and should be written for the version they correspond to. Trunk docs should reflect the current state of trunk, and shouldn’t have caveats for other versions. On Mon, Mar 12, 2018 at 8:15 AM Kenneth Brotman wrote: > If we use DataStax’s example, w

RE: What versions should the documentation support now?

2018-03-12 Thread Kenneth Brotman
I see how that makes sense Jon but how does a user then select the documentation for the version they are running on the Apache Cassandra web site? Kenneth Brotman From: Jonathan Haddad [mailto:j...@jonhaddad.com] Sent: Monday, March 12, 2018 8:40 AM To: user@cassandra.apache.org Subject:

Re: What versions should the documentation support now?

2018-03-12 Thread Jonathan Haddad
Right now they can’t. On Mon, Mar 12, 2018 at 9:03 AM Kenneth Brotman wrote: > I see how that makes sense Jon but how does a user then select the > documentation for the version they are running on the Apache Cassandra web > site? > > > > Kenneth Brotman > > > > *From:* Jonathan Haddad [mailto:j.

RE: What versions should the documentation support now?

2018-03-12 Thread Kenneth Brotman
It seems like the documentation that should be in the trunk for version 3.0, should include information for users of version 3.0 and 2.1; the documentation that should in 4.0 (when its released), should include information for users of 4.0 and at least one previous version, etc. How about i

Anomaly detection

2018-03-12 Thread D. Salvatore
Hello everyone, Do you know if exist a Cassandra tool that performs anomaly detection? Thank you in advance Salvatore

Re: Anomaly detection

2018-03-12 Thread Rahul Singh
Anomaly detection of what? The data inside Cassandra or Casandra metrics? -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 12, 2018, 12:44 PM -0400, D. Salvatore , wrote: > Hello everyone, > Do you know if exist a Cassandra tool that performs anomaly detection? > > Thank you in advan

Re: Anomaly detection

2018-03-12 Thread D. Salvatore
Hi Rahul, I was mainly thinking about performance anomaly detection but I am also interested in other types such as fault detection, data or queries anomalies. Thanks 2018-03-12 16:52 GMT+00:00 Rahul Singh : > Anomaly detection of what? The data inside Cassandra or Casandra metrics? > > -- > Rah

Re: What versions should the documentation support now?

2018-03-12 Thread Jon Haddad
Docs for 3.0 go in the 3.0 branch. I’ve never heard of anyone shipping docs for multiple versions, I don’t know why we’d do that. You can get the docs for any version you need by downloading C*, the docs are included. I’m a firm -1 on changing that process. Jon > On Mar 12, 2018, at 9:19 AM,

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Durity, Sean R
You cannot migrate and upgrade at the same time across major versions. Streaming is (usually) not compatible between versions. As to the migration question, I would expect that you may need to put the external-facing ip addresses in several places in the cassandra.yaml file. And, yes, it would

Cassandra vs MySQL

2018-03-12 Thread Oliver Ruebenacker
Hello, We have a project currently using MySQL single-node with 5-6TB of data and some performance issues, and we plan to add data up to a total size of maybe 25-30TB. We are thinking of migrating to Cassandra. I have been trying to find benchmarks or other guidelines to compare MySQL an

Re: Cassandra vs MySQL

2018-03-12 Thread Gábor Auth
Hi, On Mon, Mar 12, 2018 at 8:58 PM Oliver Ruebenacker wrote: > We have a project currently using MySQL single-node with 5-6TB of data and > some performance issues, and we plan to add data up to a total size of > maybe 25-30TB. > There is no 'silver bullet', the Cassandra is not a 'drop in' re

Re: Cassandra vs MySQL

2018-03-12 Thread Matija Gobec
Hi Oliver, Few years back I had a similar problem where there was a lot of data in MySQL and it was starting to choke. I migrated data to Cassandra, ran benchmarks and blew MySQL out of the water with a small 3 node C* cluster. If you have a use case for Cassandra the answer is yes, but keep in mi

Re: TWCS enabling tombstone compaction

2018-03-12 Thread Lerh Chuan Low
Dear Lucas, Those properties that result in the log message you are seeing are properties common to all compaction strategies. See http://cassandra.apache. org/doc/latest/operating/compaction.html#common-options. They are *tombstone_compaction_interval *and *tombstone_threshold*. If you didn't def

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
You can’t migrate and upgrade at the same time perhaps but you could do one and then the other so as to end up on new version. I’m guessing it’s an error in the yaml file or a port not open. Is there any good reason for a production cluster to still be on version 2.1x? Kenneth Brotman

Re: Cassandra at Instagram with Dikang Gu interview by Jeff Carpenter

2018-03-12 Thread Carl Mueller
Again, I'd really like to get a feel for scylla vs rocksandra vs cassandra. Isn't the driver binary protocol the easiest / least redesign level of storage engine swapping? Scylla and Cassandra and Rocksandra are currently three options. Rocksandra can expand out it's non-java footprint without rea

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar
On 13 March 2018 at 00:06, Durity, Sean R wrote: > You cannot migrate and upgrade at the same time across major versions. > Streaming is (usually) not compatible between versions. > I'm not trying to upgrade as of now - first priority is the migration. We can look at version upgrade later on.

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
Kunal, Please provide the following setting from the yaml files you are using: seeds: listen_address: broadcast_address: rpc_address: endpoint_snitch: auto_bootstrap: Kenneth Brotman From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] Sent: Monday, March 12, 2018

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar
On 13 March 2018 at 03:28, Kenneth Brotman wrote: > You can’t migrate and upgrade at the same time perhaps but you could do > one and then the other so as to end up on new version. I’m guessing it’s > an error in the yaml file or a port not open. Is there any good reason for > a production clus

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
I didn’t understand something. Are you saying you are using one data center on Google and one on Amazon? Kenneth Brotman From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] Sent: Monday, March 12, 2018 4:24 PM To: user@cassandra.apache.org Cc: Nikhil Soman Subject: Re: [EXTERNAL] R

What snitch to use with AWS and Google

2018-03-12 Thread Kenneth Brotman
Quick question: If you have one cluster made of nodes of a datacenter in AWS and a datacenter in Google, what snitch do you use? Kenneth Brotman

Re: What snitch to use with AWS and Google

2018-03-12 Thread Madhu-Nosql
Kenneth, For AWS -EC2Snitch(if DC in Single Region) For Google- Better go with GossipingPropertyFileSnitch Thanks, Madhu On Mon, Mar 12, 2018 at 6:31 PM, Kenneth Brotman < kenbrot...@yahoo.com.invalid> wrote: > Quick question: If you have one cluster made of nodes of a datacenter in > AWS and

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar
On 13 March 2018 at 04:54, Kenneth Brotman wrote: > Kunal, > > > > Please provide the following setting from the yaml files you are using: > > > > seeds: > In GCE: seeds: "10.142.14.27" In AWS (new node being added): seeds: "35.196.96.247,35.227.127.245,35.196.241.232" (these are the public IP

Re: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kunal Gangakhedkar
Yes, that's correct. The customer wants us to migrate the cassandra setup in their AWS account. Thanks, Kunal On 13 March 2018 at 04:56, Kenneth Brotman wrote: > I didn’t understand something. Are you saying you are using one data > center on Google and one on Amazon? > > > > Kenneth Brotman >

Re: What snitch to use with AWS and Google

2018-03-12 Thread Lerh Chuan Low
I would just go with GossipingPropertyFileSnitch, it will work across both data centers (I once had a test cluster with 1 DC in Azure, 1 DC in AWS and 1 DC in GCP using GPFS). Even if it's just solely AWS, I think GPFS is superior because you can configure virtual racks if you ever need it while EC

Re: What snitch to use with AWS and Google

2018-03-12 Thread Jeff Jirsa
GPFS -- Jeff Jirsa > On Mar 12, 2018, at 4:31 PM, Kenneth Brotman > wrote: > > Quick question: If you have one cluster made of nodes of a datacenter in AWS > and a datacenter in Google, what snitch do you use? > > Kenneth Brotman

Re: Cassandra at Instagram with Dikang Gu interview by Jeff Carpenter

2018-03-12 Thread Jeff Jirsa
On Mon, Mar 12, 2018 at 3:58 PM, Carl Mueller wrote: > Rocksandra can expand out it's non-java footprint without rearchitecting > the java codebase. Or are there serious concerns with Datastax and the > binary protocols? > > Rockssandra should eventually become part of Cassandra. The pluggable s

RE: system.size_estimates - safe to remove sstables?

2018-03-12 Thread Kenneth Brotman
Kunal, Is this the GCE cluster you are speaking of in the “Adding new DC?” thread? Kenneth Brotman From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] Sent: Sunday, March 11, 2018 2:18 PM To: user@cassandra.apache.org Subject: Re: system.size_estimates - safe to remove sstables?

Is node restart required to update yaml changes in 2.1x

2018-03-12 Thread Kenneth Brotman
Can you update changes to cassandra.yaml in version 2.1x without restating the node? Kenneth Brotman

Re: Is node restart required to update yaml changes in 2.1x

2018-03-12 Thread Lerh Chuan Low
To my knowledge for any version updates to cassandra.yaml will only be applied after you restart the node.. On 13 March 2018 at 12:24, Kenneth Brotman wrote: > Can you update changes to cassandra.yaml in version 2.1x without restating > the node? > > > > Kenneth Brotman >

command to view yaml file setting in use on console

2018-03-12 Thread Kenneth Brotman
Is there a command, perhaps a nodetool command to view the actual yaml settings a node is using so you can confirm it is using the changes to a yaml file you made? Kenneth Brotman

Re: Is node restart required to update yaml changes in 2.1x

2018-03-12 Thread Jeff Jirsa
There’s a bit of nuance in that there are some undocumented situations in some versions where we may reload seeds from yaml without notice - notably when instances come online and we decided whether or not to gossip with them. That’s not really intended, and fixed in recent versions -- Jeff J

Re: command to view yaml file setting in use on console

2018-03-12 Thread Jeff Jirsa
Cassandra-7622 went patch available today -- Jeff Jirsa > On Mar 12, 2018, at 6:40 PM, Kenneth Brotman > wrote: > > Is there a command, perhaps a nodetool command to view the actual yaml > settings a node is using so you can confirm it is using the changes to a yaml > file you made? > >

RE: command to view yaml file setting in use on console

2018-03-12 Thread Kenneth Brotman
You say the nicest things! From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Monday, March 12, 2018 6:43 PM To: user@cassandra.apache.org Subject: Re: command to view yaml file setting in use on console Cassandra-7622 went patch available today -- Jeff Jirsa On Mar 12, 2018, at 6:4

Re: Anomaly detection

2018-03-12 Thread Fernando Ipar
Hello Salvatore, On Mon, Mar 12, 2018 at 2:12 PM, D. Salvatore wrote: > Hi Rahul, > I was mainly thinking about performance anomaly detection but I am also > interested in other types such as fault detection, data or queries > anomalies. > I know VividCortex (http://vividcortex.com) supports Ca

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
Kunal, Sorry for asking you things you already answered. You provided a lot of good information and you know what you’re are doing. It’s going to be something really simple to figure out. While I read through the thread more closely, I’m guessing we are right on top of it so could I ask y

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
Kunal, While we are looking into all this I feel compelled to ask you to check your security configurations now that you are using public addresses to communicate inter-node across data centers. Are you sure you are using best practices? Kenneth Brotman From: Kenneth Brotman [mailt

Re: command to view yaml file setting in use on console

2018-03-12 Thread Anthony Grasso
Hi Kenneth, In addition to CASSANDRA-7622, it may help to inspect the Cassandra *system.log* and look for the following entry: INFO [main] ... - Node configuration:[...] The content of "Node configuration" will have the settings the node is using. Regards, Anthony On Tue, 13 Mar 2018 at 12:

RE: system.size_estimates - safe to remove sstables?

2018-03-12 Thread Kunal Gangakhedkar
No, this is a different cluster. Kunal On 13-Mar-2018 6:27 AM, "Kenneth Brotman" wrote: Kunal, Is this the GCE cluster you are speaking of in the “Adding new DC?” thread? Kenneth Brotman *From:* Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] *Sent:* Sunday, March 11, 2018 2:18 PM

Re: Cassandra vs MySQL

2018-03-12 Thread Satendra
Cassandra is going to be die in next few time (What I see) - Cassandra is not solving the purpose rather people are facing fewer issue sometime where in virtual environments. We have tried crdb database cluster and migrated few of cluster over on the cockroach database environment, it seems workin

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman
Kunal, Also to check: You should use the same list of seeds, probably two in each data center if you will have five nodes in each, in all the yaml files. All the seeds node addresses from all the data centers listed in each yaml file where it says “-seeds:”. I’m not sure from your prev

Re: yet another benchmark bottleneck

2018-03-12 Thread onmstester onmstester
I already ran two instance of cassandra in one node, sum of throughput is less than 130K/ ops. Currently i'm suspecting network packet per seconds which seems like couldn't get higher than 10 K/pps. which Actually would be iperf limit for packets with the same size. I'm looking for how to tune