Re: Reg:- Multi DC Configuration

2017-06-08 Thread Justin Cameron
Hi Nandan, Take a look at the GossipingPropertyFileSnitch: http://cassandra.apache.org/doc/latest/operating/snitch.html#snitch-classes You'll also need to configure the cassandra-rackdc.properties file on each node: https://github.com/apache/cassandra/blob/trunk/conf/cassandra-rackdc.properties

Cassandra & Spark

2017-06-08 Thread 한 승호
Hello, I am Seung-ho and I work as a Data Engineer in Korea. I need some advice. My company recently consider replacing RDMBS-based system with Cassandra and Hadoop. The purpose of this system is to analyze Cadssandra and HDFS data with Spark. It seems many user cases put emphasis on data local

Huge Batches

2017-06-08 Thread techpyaasa .
Hi , Recently we are seeing huge batches and log prints as below in c* logs *Batch of prepared statements for [ks1.cf1] is of size 413350, exceeding specified threshold of 5120 by 362150* Along with the Column Family name (as found in above log print) , we would like to know the partion key , cl

Re: Cassandra & Spark

2017-06-08 Thread Kant Kodali
If you use Containers like Docker Plan A can work provided you do the resource and capacity planning. I tend to think that Plan B is more Standard and easier Although you can wait to hear from others for a second opinion. Caution: Data Locality will make sense if the Disk throughput is significant

Re: Cassandra & Spark

2017-06-08 Thread Tobias Eriksson
Hi Something to consider before moving to Apache Spark and Cassandra I have a background where we have tons of data in Cassandra, and we wanted to use Apache Spark to run various jobs We loved what we could do with Spark, BUT…. We realized soon that we wanted to run multiple jobs in parallel Some

Re: Cassandra & Spark

2017-06-08 Thread DuyHai Doan
Interesting Tobias, when you said "Instead we transferred the data to Apache Kudu", did you transfer all Cassandra data into Kudu from with a single migration and then tap into Kudo for aggregation or did you run data import every day/week/month from Cassandra into Kudu ? >From my point of view,

Re: Cassandra & Spark

2017-06-08 Thread Tobias Eriksson
Hi What I wanted was a dashboard with graphs/diagrams and it should not take minutes for the page to load Thus, it was a problem to have Spark with Cassandra, and not solving the parallelization to such an extent that I could have the diagrams rendered in seconds. Now with Kudu we get some decen

RE: Convert single node C* to cluster (rebalancing problem)

2017-06-08 Thread ZAIDI, ASAD A
Did you make sure auto_bootstrap property is indeed set to [true] when you added the node? From: Junaid Nasir [mailto:jna...@an10.io] Sent: Monday, June 05, 2017 6:29 AM To: Akhil Mehra Cc: Vladimir Yudovin ; user@cassandra.apache.org Subject: Re: Convert single node C* to cluster (rebalancing p

RE: Local_serial >> Adding nodes

2017-06-08 Thread ZAIDI, ASAD A
Please share exact timeout error message text to get idea what type of timeout you're facing. From: Nitan Kainth [mailto:ni...@bamlabs.com] Sent: Wednesday, June 07, 2017 7:24 AM To: vasu gunja Cc: user@cassandra.apache.org Subject: Re: Local_serial >> Adding nodes What is in system log? Does

RE: Data in multi disks is not evenly distributed

2017-06-08 Thread ZAIDI, ASAD A
Check status of load with nodetool status command. Make sure your there isn’t huge number of pending compactions for your tables. Ideally speaking data distribution should be even across your nodes. you should have reserved extra 15% of free space relative to your maximum size of your table i.e

Re: Huge Batches

2017-06-08 Thread Justin Cameron
I don't believe the keys within a large batch are logged by Cassandra. A large batch could potentially contain tens of thousands of primary keys, so this could quickly fill up the logs. Here are a couple of suggestions: - Large batches should also be slow, so you could try setting up slow q

Reg:- Data Modelling For Hierarchy Data

2017-06-08 Thread @Nandan@
Hi, I am working on Music database where we have multiple order of users of our portal. Different category of users is having some common attributes but some different attributes based on their registration. This becomes a hierarchy pattern. I am attaching one sample hierarchy pattern of User Modu

Definition of QUORUM consistency level

2017-06-08 Thread Dikang Gu
Hello there, We have some use cases are doing consistent read/write requests, and we have 4 replicas in that cluster, according to our setup. What's interesting to me is that, for both read and write quorum requests, they are blocked for 4/2+1 = 3 replicas, so we are accessing 3 (for write) + 3 (

Re: Definition of QUORUM consistency level

2017-06-08 Thread Justin Cameron
2/4 for write and 2/4 for read would not be sufficient to achieve strong consistency, as there is no overlap. In your particular case you could potentially use QUORUM for write and TWO for read (or vice-versa) and still achieve strong consistency. If you add additional nodes in the future this wou

Re: Definition of QUORUM consistency level

2017-06-08 Thread Dikang Gu
Justin, what I suggest is that for QUORUM consistent level, the block for write should be (num_replica/2)+1, this is same as today, but for read request, we just need to access (num_replica/2) nodes, which should provide enough strong consistency. Dikang. On Thu, Jun 8, 2017 at 7:38 PM, Justin Ca

Re: Definition of QUORUM consistency level

2017-06-08 Thread Jonathan Haddad
It would be a little weird to change the definition of QUORUM, which means majority, to mean something other than majority for a single use case. Sounds like you want to introduce a new CL, HALF. On Thu, Jun 8, 2017 at 7:43 PM Dikang Gu wrote: > Justin, what I suggest is that for QUORUM consisten

Re: Definition of QUORUM consistency level

2017-06-08 Thread Dikang Gu
So, for the quorum, what we really want is that there is one overlap among the nodes in write path and read path. It actually was my assumption for a long time that we need (N/2 + 1) for write and just need (N/2) for read, because it's enough to provide the strong consistency. On Thu, Jun 8, 2017

Re: Definition of QUORUM consistency level

2017-06-08 Thread Nate McCall
> So, for the quorum, what we really want is that there is one overlap among > the nodes in write path and read path. It actually was my assumption for a > long time that we need (N/2 + 1) for write and just need (N/2) for read, > because it's enough to provide the strong consistency. > You are wr

Re: Definition of QUORUM consistency level

2017-06-08 Thread Nate McCall
> > > So, for the quorum, what we really want is that there is one overlap among >> the nodes in write path and read path. It actually was my assumption for a >> long time that we need (N/2 + 1) for write and just need (N/2) for read, >> because it's enough to provide the strong consistency. >> > >

Re: Definition of QUORUM consistency level

2017-06-08 Thread Brandon Williams
We have CL.TWO. On Thu, Jun 8, 2017 at 10:03 PM, Dikang Gu wrote: > So, for the quorum, what we really want is that there is one overlap among > the nodes in write path and read path. It actually was my assumption for a > long time that we need (N/2 + 1) for write and just need (N/2) for read, >

Re: Definition of QUORUM consistency level

2017-06-08 Thread Nate McCall
> We have CL.TWO. > > > This was actually the original motivation for CL.TWO and CL.THREE if memory serves: https://issues.apache.org/jira/browse/CASSANDRA-2013

Re: Definition of QUORUM consistency level

2017-06-08 Thread Dikang Gu
To me, CL.TWO and CL.THREE are more like work around of the problem, for example, they do not work if the number of replicas go to 8, which does possible in our environment (2 replicas in each of 4 DCs). What people want from quorum is strong consistency guarantee, as long as R+W > N, there are th

Re: Definition of QUORUM consistency level

2017-06-08 Thread Brandon Williams
I don't disagree with you there and have never liked TWO/THREE. This is somewhat relevant: https://issues.apache.org/jira/browse/CASSANDRA-2338 I don't think going to CL.FOUR, etc, is a good long-term solution, but I'm also not sure what is. On Thu, Jun 8, 2017 at 11:20 PM, Dikang Gu wrote: >

Re: Definition of QUORUM consistency level

2017-06-08 Thread Jeff Jirsa
Would love to see real pluggable consistency levels. Sorta sad it got wont-fixed - may be time to revisit that, perhaps it's more feasible now. https://issues.apache.org/jira/browse/CASSANDRA-8119 is also semi-related, but a different approach (CL-as-UDF) On Thu, Jun 8, 2017 at 9:26 PM, Brandon W

Re: Definition of QUORUM consistency level

2017-06-08 Thread Justin Cameron
Firstly, this situation only occurs if you need strong consistency and are using an even replication factor (RF4, RF6, etc). Secondly, either the read or write still need to be performed at a minimum level of QUORUM. This means there are no extra availability benefits from your proposal (i.e. a min

Re: Definition of QUORUM consistency level

2017-06-08 Thread Jeff Jirsa
Short of actually making ConsistencyLevel pluggable or adding/changing one of the existing levels, an alternative approach would be to divide up the cluster into either real or pseudo-datacenters (with RF=2 in each DC), and then write with QUORUM (which would be 3 nodes, across any combination of d