Re: AMI to use to launch a cluster with OpsCenter on AWS

2015-02-20 Thread Clint Kelly
BTW I was able to use this script: https://github.com/joaquincasares/cassandralauncher to get a cluster up and running pretty easily on AWS. Cheers to the author for this. Still curious for answers to my questions above, but not as urgent. Best regards, Clint On Fri, Feb 20, 2015 at 5:36

Re: run cassandra on a small instance

2015-02-20 Thread Tim Dunphy
> > The most important things to note: > - don't include JNA (it needs to lock pages larger than what will be > available) > - turn down threadpools for transports > - turn compaction throughput way down > - make concurrent reads and writes very small > I have used the above run a healthy 5 node cl

Re: run cassandra on a small instance

2015-02-20 Thread Nate McCall
I frequently test with multi-node vagrant-based clusters locally. The following chef attributes should give you an idea of what to turn down in cassandra.yaml and cassandra-env.sh to build a decent testing cluster: :cassandra => {'cluster_name' => 'VerifyCluster',

AMI to use to launch a cluster with OpsCenter on AWS

2015-02-20 Thread Clint Kelly
Hi all, I am trying to follow the instructions here for installing DSE 4.6 on AWS: http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installAMIOpsc.html I was successful creating a single-node instance running OpsCenter, which I intended to bootstrap creat

Re: Storing bi-temporal data in Cassandra

2015-02-20 Thread Peter Lin
I think I get the basics of what you want to achieve. Side note, the sample insert seems to have a typo for the transaction time For the first query, I would store the data using weatherstation _id as the key. The create table statement might look like this. CREATE TABLE weatherstation ( weathers

Re: Running Cassandra + Spark on AWS - architecture questions

2015-02-20 Thread DuyHai Doan
"Cassandra would take care of keeping the data synced between the two sets of five nodes. Is that correct?" Correct "But doing so means that we need 2x as many nodes as we need for the real-time cluster alone" Not necessarily. With multi DC you can configure the replication factor value per DC,

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-20 Thread Clint Kelly
BTW are the performance concerns with vnodes a big deal for Spark? Or were those more important for MapReduce? Some of the DataStax videos that I watched discussed how the Cassandra Spark connecter has optimizations to deal with vnodes. I would imagine that Spark's ability to cache RDDs would me

Running Cassandra + Spark on AWS - architecture questions

2015-02-20 Thread Clint Kelly
Hi all, I read the DSE 4.6 documentation and I'm still not 100% sure what a mixed workload Cassandra + Spark installation would look like, especially on AWS. What I gather is that you use OpsCenter to set up the following: - One "virtual data center" for real-time processing (e.g., ingestion

Re: Moving from 2.1.x to 2.0.x

2015-02-20 Thread Tobias Hauth
Thanks, Tobias On Fri, Feb 20, 2015 at 11:28 AM, Robert Coli wrote: > On Fri, Feb 20, 2015 at 9:25 AM, Tobias Hauth > wrote: > >> Is there an recommended way of moving data from a 2.1.x cluster to a >> 2.0.x cluster? We would like to downgrade to a more stable version of C* >> and just adding n

Re: Logging client ID for YCSB workloads on Cassandra?

2015-02-20 Thread Jatin Ganhotra
Never mind, got it working. Thanks :) — Jatin Ganhotra Graduate Student, Computer Science University of Illinois at Urbana Champaign http://jatinganhotra.com http://linkedin.com/in/jatinganhotra On Wed, Feb 18, 2015 at 7:09 PM, Jatin Ganhotra wrote: > Hi, > > I'd like to log the client ID for

Re: Moving from 2.1.x to 2.0.x

2015-02-20 Thread Robert Coli
On Fri, Feb 20, 2015 at 9:25 AM, Tobias Hauth wrote: > Is there an recommended way of moving data from a 2.1.x cluster to a 2.0.x > cluster? We would like to downgrade to a more stable version of C* and just > adding nodes with C* 2.0.12 results in schema miss-matches and 'nodetool > describeclus

Re: PySpark and Cassandra integration

2015-02-20 Thread Jonathan Haddad
Awesome! On Fri Feb 20 2015 at 10:23:54 AM Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > I will try it for sure Frens, very nice! > Thanks for sharing! > > From: user@cassandra.apache.org > Subject: Re:PySpark and Cassandra integration > > Hi all, > > Wanted to let you

Re:PySpark and Cassandra integration

2015-02-20 Thread Marcelo Valle (BLOOMBERG/ LONDON)
I will try it for sure Frens, very nice! Thanks for sharing! From: user@cassandra.apache.org Subject: Re:PySpark and Cassandra integration Hi all, Wanted to let you know I've forked PySpark Cassandra on https://github.com/TargetHolding/pyspark-cassandra. Unfortunately the original code didn't

PySpark and Cassandra integration

2015-02-20 Thread Rumph, Frens Jan
Hi all, Wanted to let you know I've forked PySpark Cassandra on https://github.com/TargetHolding/pyspark-cassandra. Unfortunately the original code didn't work for me and I couldn't figure out how it could work. But it inspired! so I rewrote the majority of the project. The rewrite implements ful

Re: run cassandra on a small instance

2015-02-20 Thread Tim Dunphy
Hey guys, OK well I've experimented with this a bit, and I think at this point the problem with Cassandra crashing on the smaller instances is probably an issue with my data. Because what I've done is blown away my data directory to start fresh. And then started up Cassandra on the 2GB instance.

Moving from 2.1.x to 2.0.x

2015-02-20 Thread Tobias Hauth
Hi, Is there an recommended way of moving data from a 2.1.x cluster to a 2.0.x cluster? We would like to downgrade to a more stable version of C* and just adding nodes with C* 2.0.12 results in schema miss-matches and 'nodetool describecluster' reports different schema versions. Thanks, Tobias

Re: Node joining take a long time

2015-02-20 Thread 曹志富
my listen_address as some as rpc_address,and rpc_interface not config -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/ 2015-02-20 23:16 GMT+08:00 Jan Kesten : > Hi, > > a short hint for those upgrading: If you upgrade to 2.1.3 - there

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-20 Thread Clint Kelly
Hi Mark, Thanks for your reply. That makes sense. I recall looking at this back when we were going to run Hadoop against data in Cassandra tables at my previous company. Disabling virtual nodes seems unfortunate as it would make (as I understand it) scaling the cluster a lot trickier. I assume

Re: Storing bi-temporal data in Cassandra

2015-02-20 Thread Raj N
Thanks for the response Peter. I used the temperature table because its the most common example on CQL timeseries and I thought I would reuse it. From some of the responses, looks like I was wrong. event_time is the time the event happened. So yes it is valid time. I was trying to see if I can get

Re: Node joining take a long time

2015-02-20 Thread Jan Kesten
Hi, a short hint for those upgrading: If you upgrade to 2.1.3 - there is a bug in the config builder when rpc_interface is used. If you use rpc_address in your cassandra.yaml you will be fine - I ran into it this morning and filed an issue for it. https://issues.apache.org/jira/browse/CASSA

Re: Node joining take a long time

2015-02-20 Thread Michael Dykman
I believe the consensus is: upgrade to 2.1.3 On Fri, 20 Feb 2015 01:17 曹志富 wrote: > So ,what can I do???Waiting for 2.1.4 or upgrade to 2.1.3?? > > -- > 曹志富 > 手机:18611121927 > 邮箱:caozf.zh...@gmail.com > 微博:http://weibo.com/boliza/ > > 2015-02-20 3:16 GMT+08:00

Re:designing table

2015-02-20 Thread Marcelo Valle (BLOOMBERG/ LONDON)
My cents: You could partition your data per date and second query would be easy. If you need to query ALL data for a client id, it would be hard though, but querying last 10 days for a client id could be easy, for instance. If you need to query ALL, it would probably be better to create another

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-20 Thread Mark Reddy
Hey Clint, Someone for DataStax can correct me here, but I'm assuming that they have disabled vnodes because the AMI is built to make it easy to set up a pre-configured mixed workload cluster. A mixture of Real-Time/Transactional (Cassandra), Analytics (Hadoop), or Search (Solr). If you take a loo