I am trying to follow an example given on "
http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive";
to connect R with Cassandra. Following is my code:
library(RJDBC)
#Load in the Cassandra-JDBC diver
cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.
My intended Cassandra cluster will have 15 nodes per DC, with 2 DCs.
I am considering using all the nodes as seed nodes.
It looks like having all the nodes as seeds should actually reduce the Gossip
overhead (See "Gossiper implementation" in
http://wiki.apache.org/cassandra/ArchitectureGossip)
Is
Hi,
I just saw this :
http://aws.amazon.com/fr/blogs/aws/new-ssd-backed-elastic-block-storage/
Since the problem with EBS was the network, there is no chance that this
hardware architecture might be useful alongside Cassandra, right ?
Alain
In this document it is said :
- Provisioned IOPS (SSD) - Volumes of this type are ideal for the most
demanding I/O intensive, transactional workloads and large relational or
NoSQL databases. This volume type provides the most consistent performance
and allows you to provision the exac
Hi,
pretty sure we started out like that and had not seen any problems doing
that. On a side node, that config may become inconsistent anyway after
adding new nodes, because I think you'll need a restart of all your
nodes if you add new seeds to the yaml file. (Though that's just assumption)
While they guarantee IOPS, they don't really make any guarantees about
latency. Since EBS goes over the network, there's so many things in the
path of getting at your data, I would be concerned with random latency
spikes, unless proven otherwise.
Thanks,
Daniel
On Wed, Jun 18, 2014 at 1:58 AM, A
Hi guys
Using 1.2.11, when I try to rolling restart the cluster, any node I restart
makes the whole cluster cpu load to increase, reaching a "red" state in
opscenter (load from 3-4 to 20+). This happens once the node is back online.
The restarted node uses 100 % cpu for 5 - 10 min and sometimes d
Hi,
I was wondering if there are any possible problems we may face if we use
completely fabricated values as TIMESTAMP when doing INSERTs and
UPDATEs. Because I can imagine a couple of examples where exploiting
column timestamps could simplify things.
Because Cassandra is LWW (last write win
Hello
Have you checked the log file to see what's happening during startup
? What caused the rolling restart? Did you preform an upgrade or
change a config?
> On Jun 18, 2014, at 5:40 AM, Alain RODRIGUEZ wrote:
>
> Hi guys
>
> Using 1.2.11, when I try to rolling restart the cluster, any nod
Hello,
What Artur is alluding to is that seed nodes do not bootstrap. Replacing
seed nodes requires a slightly different approach for node replacement
compared to non seed nodes. See here for more details:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_see
Well then you better provide your schema and query, as I select ranges like
this all the time using CQL and I (at least) must not understand your
problem from the description so far.
On Wed, Jun 18, 2014 at 2:54 AM, DuyHai Doan wrote:
> Hello Jason
>
> If you want to check for presence / absenc
There are several long Parnew pauses that were recorded during startup.
The young gen size looks large too, if I am reading that line correctly.
Did you happen to overwrite the default settings for MAX_HEAP and/or NEW
size in the cassandra-env.sh? The large you gen size, set via the env.sh
file,
Thanks a lot for taking time to check the log.
We just switched from 400M to 1600M NEW size in the cassandra-env.sh. It
reduced our latency and the PARNEW GC time / second significantly...
(described here
http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads
)
E
This last command was supposed to be a best practice a few years ago, hope
it is still the case. I just added the recent "nodetool disablebinary"
part...
2014-06-18 14:36 GMT+02:00 Alain RODRIGUEZ :
> Thanks a lot for taking time to check the log.
>
> We just switched from 400M to 1600M NEW size
Wouldn't be better to use "nodetool clearsnapshot"?
[]s
2014-06-14 17:38 GMT-03:00 S C :
> I am thinking of "rm " once the backup is complete. Any special
> cases to be careful about?
>
> -Kumar
> --
> Date: Sat, 14 Jun 2014 13:13:10 -0700
> Subject: Re: incremental b
I have a 10 node cluster with cassandra 2.0.8.
I am taking this exceptions in the log when I run my code. What my code
does is just reading data from a CF and in some cases it writes new data.
WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
BatchStatement.java (line 228) Batch of pr
I'm running on AWS m2.2xlarge instances using the ~800 gig
ephemeral/attached disk for my data directory. My data size per node is
nearing 400 gig.
Sometimes during maintenance operations (repairs mostly I think) I run out
of disk space as my understanding is that some of these operations require
One option is to add new nodes, and do a node repair/cleanup on everything.
That will at least reduce your per-node data size.
On Wed, Jun 18, 2014 at 11:01 AM, Brian Tarbox
wrote:
> I'm running on AWS m2.2xlarge instances using the ~800 gig
> ephemeral/attached disk for my data directory. My
We do a repair -pr on each node once a week on a rolling basis.
Should we be running cleanup as well? My understanding that was only used
after adding/removing nodes?
We'd like to avoid adding nodes if possible (which might not be). Still
curious if we can get C* to do the maintenance task on a
AFAIK, when you run a repair a snapshot is created.
After the repair, I run "nodetool clearsnapshot" to save disk space.
Not sure it's you case or not.
[]s
2014-06-18 13:10 GMT-03:00 Brian Tarbox :
> We do a repair -pr on each node once a week on a rolling basis.
> Should we be running cleanup a
For snapshots, yes. For incremental backups you need to delete the files
yourself.
On Wed, Jun 18, 2014 at 6:28 AM, Marcelo Elias Del Valle <
marc...@s1mbi0se.com.br> wrote:
> Wouldn't be better to use "nodetool clearsnapshot"?
> []s
>
>
> 2014-06-14 17:38 GMT-03:00 S C :
>
> I am thinking of "
repair only creates snapshots if you use the “-snapshot” option.
On June 18, 2014 at 12:28:58 PM, Marcelo Elias Del Valle
(marc...@s1mbi0se.com.br) wrote:
AFAIK, when you run a repair a snapshot is created.
After the repair, I run "nodetool clearsnapshot" to save disk space.
Not sure it's you
I have a column family that only stores the last 5 days worth of some
data...and yet I have files in the data directory for this CF that are 3
weeks old. They take the form:
keyspace-CFName-ic--Filter.db
keyspace-CFName-ic--Index.db
keyspace-CFName-ic--Data.db
keyspace-CFName-ic--
On Wed, Jun 18, 2014 at 9:10 AM, Brian Tarbox
wrote:
> We do a repair -pr on each node once a week on a rolling basis.
>
https://issues.apache.org/jira/browse/CASSANDRA-5850?focusedCommentId=14036057&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14036057
> Shoul
Another nice resource...
http://www.ecyrd.com/cassandracalculator/
On Tue, Jun 17, 2014 at 11:08 PM, Prabath Abeysekara <
prabathabeysek...@gmail.com> wrote:
> First of all, apologies if the $subject was discussed previously in this
> list before. I've already gone through quite a few email trails on this but
> still couldn't find a convincing answer which really
On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox
wrote:
> I have a column family that only stores the last 5 days worth of some
> data...and yet I have files in the data directory for this CF that are 3
> weeks old.
>
Are you using TTL? If so :
https://issues.apache.org/jira/browse/CASSANDRA-6654
On Wed, Jun 18, 2014 at 4:56 AM, Jonathan Lacefield wrote:
> What Artur is alluding to is that seed nodes do not bootstrap.
> Replacing seed nodes requires a slightly different approach for node
> replacement compared to non seed nodes. See here for more details:
> http://www.datastax.com/doc
Rob,
Thank you! We are not using TTL, we're manually deleting data more than 5
days old for this CF. We're running 1.2.13 and are using size tiered
compaction (this cf is append-only i.e.zero updates).
Sounds like we can get away with doing a (stop, delete old-data-file,
restart) process on a r
On Wed, Jun 18, 2014 at 5:36 AM, Alain RODRIGUEZ wrote:
> We stop the node using : nodetool disablegossip && nodetool disablethrift
> && nodetool disablebinary && sleep 10 && nodetool drain && sleep 30 &&
> service cassandra stop
>
The stuff before "nodetool drain" here is redundant and doesn't
On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox
wrote:
> Thank you! We are not using TTL, we're manually deleting data more than
> 5 days old for this CF. We're running 1.2.13 and are using size tiered
> compaction (this cf is append-only i.e.zero updates).
>
> Sounds like we can get away with
I don't think I have the space to run a major compaction right now (I'm
above 50% disk space used already) and compaction can take extra space I
think?
On Wed, Jun 18, 2014 at 3:24 PM, Robert Coli wrote:
> On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox
> wrote:
>
>> Thank you! We are not usi
Amen. I believe the whole seed node/bootstrapping confusion goes against
the "Why Cassandra", quoted from
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra
*Operational simplicity* – with all nodes in a cluster being the same,
there is no complex configu
hey all,
I know that something pretty basic must be wrong here. But what is the
mistake I'm making in creating this keyspace?
cqlsh> create keyspace animals with replication = { 'class':
'NetworkTopologyStrategy', 'replication_factor' : 3};
Bad Request: Error constructing replication strategy cla
Is "replication_factor" your DC name?
Here is what I would using:
CREATE KEYSPACE IF NOT EXISTS animals
WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
'DC1' : 3 };
But in my case, I am using GossipPropertyFileSnitch and DC1 is
configured there, so Cassandra knows which nodes are i
Hey that helped! Just to quell your curiosity here's my
snitch: endpoint_snitch: SimpleSnitch
thanks!
On Wed, Jun 18, 2014 at 11:03 PM, Marcelo Elias Del Valle <
marc...@s1mbi0se.com.br> wrote:
>
> Is "replication_factor" your DC name?
>
> Here is what I would using:
>
>
> CREATE KEYSPACE IF NO
+Cassandra DL
We have Cassandra nodes in three datacenters - dc1, dc2 and dc3 and the
cluster name is DataCluster. In the same way, our application code is also
in same three datacenters. Our application code is accessing cassandra.
Now I want to make sure if application call is coming from `dc1`
http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html
From the link:
EBS volumes are not recommended for Cassandra data volumes for the following
reasons:
• EBS volumes contend directly for network throughput with standard
packets. Th
38 matches
Mail list logo