BTW I was able to use this script:
https://github.com/joaquincasares/cassandralauncher
to get a cluster up and running pretty easily on AWS. Cheers to the author
for this.
Still curious for answers to my questions above, but not as urgent.
Best regards,
Clint
On Fri, Feb 20, 2015 at 5:36
>
> The most important things to note:
> - don't include JNA (it needs to lock pages larger than what will be
> available)
> - turn down threadpools for transports
> - turn compaction throughput way down
> - make concurrent reads and writes very small
> I have used the above run a healthy 5 node cl
I frequently test with multi-node vagrant-based clusters locally. The
following chef attributes should give you an idea of what to turn down in
cassandra.yaml and cassandra-env.sh to build a decent testing cluster:
:cassandra => {'cluster_name' => 'VerifyCluster',
Hi all,
I am trying to follow the instructions here for installing DSE 4.6 on AWS:
http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installAMIOpsc.html
I was successful creating a single-node instance running OpsCenter, which I
intended to bootstrap creat
I think I get the basics of what you want to achieve. Side note, the sample
insert seems to have a typo for the transaction time
For the first query, I would store the data using weatherstation _id as the
key. The create table statement might look like this.
CREATE TABLE weatherstation (
weathers
"Cassandra would take care of keeping the data synced between the two sets
of five nodes. Is that correct?"
Correct
"But doing so means that we need 2x as many nodes as we need for the
real-time cluster alone"
Not necessarily. With multi DC you can configure the replication factor
value per DC,
BTW are the performance concerns with vnodes a big deal for Spark? Or were
those more important for MapReduce? Some of the DataStax videos that I
watched discussed how the Cassandra Spark connecter has optimizations to
deal with vnodes.
I would imagine that Spark's ability to cache RDDs would me
Hi all,
I read the DSE 4.6 documentation and I'm still not 100% sure what a mixed
workload Cassandra + Spark installation would look like, especially on
AWS. What I gather is that you use OpsCenter to set up the following:
- One "virtual data center" for real-time processing (e.g., ingestion
Thanks,
Tobias
On Fri, Feb 20, 2015 at 11:28 AM, Robert Coli wrote:
> On Fri, Feb 20, 2015 at 9:25 AM, Tobias Hauth
> wrote:
>
>> Is there an recommended way of moving data from a 2.1.x cluster to a
>> 2.0.x cluster? We would like to downgrade to a more stable version of C*
>> and just adding n
Never mind, got it working.
Thanks :)
—
Jatin Ganhotra
Graduate Student, Computer Science
University of Illinois at Urbana Champaign
http://jatinganhotra.com
http://linkedin.com/in/jatinganhotra
On Wed, Feb 18, 2015 at 7:09 PM, Jatin Ganhotra
wrote:
> Hi,
>
> I'd like to log the client ID for
On Fri, Feb 20, 2015 at 9:25 AM, Tobias Hauth
wrote:
> Is there an recommended way of moving data from a 2.1.x cluster to a 2.0.x
> cluster? We would like to downgrade to a more stable version of C* and just
> adding nodes with C* 2.0.12 results in schema miss-matches and 'nodetool
> describeclus
Awesome!
On Fri Feb 20 2015 at 10:23:54 AM Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:
> I will try it for sure Frens, very nice!
> Thanks for sharing!
>
> From: user@cassandra.apache.org
> Subject: Re:PySpark and Cassandra integration
>
> Hi all,
>
> Wanted to let you
I will try it for sure Frens, very nice!
Thanks for sharing!
From: user@cassandra.apache.org
Subject: Re:PySpark and Cassandra integration
Hi all,
Wanted to let you know I've forked PySpark Cassandra on
https://github.com/TargetHolding/pyspark-cassandra. Unfortunately the original
code didn't
Hi all,
Wanted to let you know I've forked PySpark Cassandra on
https://github.com/TargetHolding/pyspark-cassandra. Unfortunately the
original code didn't work for me and I couldn't figure out how it could
work. But it inspired! so I rewrote the majority of the project.
The rewrite implements ful
Hey guys,
OK well I've experimented with this a bit, and I think at this point the
problem with Cassandra crashing on the smaller instances is probably an
issue with my data. Because what I've done is blown away my data directory
to start fresh. And then started up Cassandra on the 2GB instance.
Hi,
Is there an recommended way of moving data from a 2.1.x cluster to a 2.0.x
cluster? We would like to downgrade to a more stable version of C* and just
adding nodes with C* 2.0.12 results in schema miss-matches and 'nodetool
describecluster' reports different schema versions.
Thanks,
Tobias
my listen_address as some as rpc_address,and rpc_interface not config
--
曹志富
手机:18611121927
邮箱:caozf.zh...@gmail.com
微博:http://weibo.com/boliza/
2015-02-20 23:16 GMT+08:00 Jan Kesten :
> Hi,
>
> a short hint for those upgrading: If you upgrade to 2.1.3 - there
Hi Mark,
Thanks for your reply. That makes sense. I recall looking at this
back when we were going to run Hadoop against data in Cassandra tables
at my previous company.
Disabling virtual nodes seems unfortunate as it would make (as I
understand it) scaling the cluster a lot trickier. I assume
Thanks for the response Peter. I used the temperature table because its the
most common example on CQL timeseries and I thought I would reuse it. From
some of the responses, looks like I was wrong.
event_time is the time the event happened. So yes it is valid time. I was
trying to see if I can get
Hi,
a short hint for those upgrading: If you upgrade to 2.1.3 - there is a
bug in the config builder when rpc_interface is used. If you use
rpc_address in your cassandra.yaml you will be fine - I ran into it this
morning and filed an issue for it.
https://issues.apache.org/jira/browse/CASSA
I believe the consensus is: upgrade to 2.1.3
On Fri, 20 Feb 2015 01:17 曹志富 wrote:
> So ,what can I do???Waiting for 2.1.4 or upgrade to 2.1.3??
>
> --
> 曹志富
> 手机:18611121927
> 邮箱:caozf.zh...@gmail.com
> 微博:http://weibo.com/boliza/
>
> 2015-02-20 3:16 GMT+08:00
My cents:
You could partition your data per date and second query would be easy.
If you need to query ALL data for a client id, it would be hard though, but
querying last 10 days for a client id could be easy, for instance.
If you need to query ALL, it would probably be better to create another
Hey Clint,
Someone for DataStax can correct me here, but I'm assuming that they have
disabled vnodes because the AMI is built to make it easy to set up a
pre-configured
mixed workload cluster. A mixture of Real-Time/Transactional (Cassandra),
Analytics (Hadoop), or Search (Solr). If you take a loo
23 matches
Mail list logo