unable to connect R with Cassandra using JDBC, need help

2014-06-18 Thread Osman Khalid
I am trying to follow an example given on " http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive"; to connect R with Cassandra. Following is my code: library(RJDBC) #Load in the Cassandra-JDBC diver cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", list.

Configuring all nodes as seeds

2014-06-18 Thread Peer, Oded
My intended Cassandra cluster will have 15 nodes per DC, with 2 DCs. I am considering using all the nodes as seed nodes. It looks like having all the nodes as seeds should actually reduce the Gossip overhead (See "Gossiper implementation" in http://wiki.apache.org/cassandra/ArchitectureGossip) Is

EBS SSD <-> Cassandra ?

2014-06-18 Thread Alain RODRIGUEZ
Hi, I just saw this : http://aws.amazon.com/fr/blogs/aws/new-ssd-backed-elastic-block-storage/ Since the problem with EBS was the network, there is no chance that this hardware architecture might be useful alongside Cassandra, right ? Alain

Re: EBS SSD <-> Cassandra ?

2014-06-18 Thread Alain RODRIGUEZ
In this document it is said : - Provisioned IOPS (SSD) - Volumes of this type are ideal for the most demanding I/O intensive, transactional workloads and large relational or NoSQL databases. This volume type provides the most consistent performance and allows you to provision the exac

Re: Configuring all nodes as seeds

2014-06-18 Thread Artur Kronenberg
Hi, pretty sure we started out like that and had not seen any problems doing that. On a side node, that config may become inconsistent anyway after adding new nodes, because I think you'll need a restart of all your nodes if you add new seeds to the yaml file. (Though that's just assumption)

Re: EBS SSD <-> Cassandra ?

2014-06-18 Thread Daniel Chia
While they guarantee IOPS, they don't really make any guarantees about latency. Since EBS goes over the network, there's so many things in the path of getting at your data, I would be concerned with random latency spikes, unless proven otherwise. Thanks, Daniel On Wed, Jun 18, 2014 at 1:58 AM, A

restarting node makes cpu load of the entire cluster to raise

2014-06-18 Thread Alain RODRIGUEZ
Hi guys Using 1.2.11, when I try to rolling restart the cluster, any node I restart makes the whole cluster cpu load to increase, reaching a "red" state in opscenter (load from 3-4 to 20+). This happens once the node is back online. The restarted node uses 100 % cpu for 5 - 10 min and sometimes d

Using fabricated values as timestamps in inserts and updates

2014-06-18 Thread Ondřej Nešpor
Hi, I was wondering if there are any possible problems we may face if we use completely fabricated values as TIMESTAMP when doing INSERTs and UPDATEs. Because I can imagine a couple of examples where exploiting column timestamps could simplify things. Because Cassandra is LWW (last write win

Re: restarting node makes cpu load of the entire cluster to raise

2014-06-18 Thread Jonathan Lacefield
Hello Have you checked the log file to see what's happening during startup ? What caused the rolling restart? Did you preform an upgrade or change a config? > On Jun 18, 2014, at 5:40 AM, Alain RODRIGUEZ wrote: > > Hi guys > > Using 1.2.11, when I try to rolling restart the cluster, any nod

Re: Configuring all nodes as seeds

2014-06-18 Thread Jonathan Lacefield
Hello, What Artur is alluding to is that seed nodes do not bootstrap. Replacing seed nodes requires a slightly different approach for node replacement compared to non seed nodes. See here for more details: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_see

Re: Summarizing Timestamp datatype

2014-06-18 Thread Laing, Michael
Well then you better provide your schema and query, as I select ranges like this all the time using CQL and I (at least) must not understand your problem from the description so far. On Wed, Jun 18, 2014 at 2:54 AM, DuyHai Doan wrote: > Hello Jason > > If you want to check for presence / absenc

Re: restarting node makes cpu load of the entire cluster to raise

2014-06-18 Thread Jonathan Lacefield
There are several long Parnew pauses that were recorded during startup. The young gen size looks large too, if I am reading that line correctly. Did you happen to overwrite the default settings for MAX_HEAP and/or NEW size in the cassandra-env.sh? The large you gen size, set via the env.sh file,

Re: restarting node makes cpu load of the entire cluster to raise

2014-06-18 Thread Alain RODRIGUEZ
Thanks a lot for taking time to check the log. We just switched from 400M to 1600M NEW size in the cassandra-env.sh. It reduced our latency and the PARNEW GC time / second significantly... (described here http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads ) E

Re: restarting node makes cpu load of the entire cluster to raise

2014-06-18 Thread Alain RODRIGUEZ
This last command was supposed to be a best practice a few years ago, hope it is still the case. I just added the recent "nodetool disablebinary" part... 2014-06-18 14:36 GMT+02:00 Alain RODRIGUEZ : > Thanks a lot for taking time to check the log. > > We just switched from 400M to 1600M NEW size

Re: incremental backups

2014-06-18 Thread Marcelo Elias Del Valle
Wouldn't be better to use "nodetool clearsnapshot"? []s 2014-06-14 17:38 GMT-03:00 S C : > I am thinking of "rm " once the backup is complete. Any special > cases to be careful about? > > -Kumar > -- > Date: Sat, 14 Jun 2014 13:13:10 -0700 > Subject: Re: incremental b

Batch of prepared statements exceeding specified threshold

2014-06-18 Thread Marcelo Elias Del Valle
I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of pr

running out of diskspace during maintenance tasks

2014-06-18 Thread Brian Tarbox
I'm running on AWS m2.2xlarge instances using the ~800 gig ephemeral/attached disk for my data directory. My data size per node is nearing 400 gig. Sometimes during maintenance operations (repairs mostly I think) I run out of disk space as my understanding is that some of these operations require

Re: running out of diskspace during maintenance tasks

2014-06-18 Thread Jeremy Jongsma
One option is to add new nodes, and do a node repair/cleanup on everything. That will at least reduce your per-node data size. On Wed, Jun 18, 2014 at 11:01 AM, Brian Tarbox wrote: > I'm running on AWS m2.2xlarge instances using the ~800 gig > ephemeral/attached disk for my data directory. My

Re: running out of diskspace during maintenance tasks

2014-06-18 Thread Brian Tarbox
We do a repair -pr on each node once a week on a rolling basis. Should we be running cleanup as well? My understanding that was only used after adding/removing nodes? We'd like to avoid adding nodes if possible (which might not be). Still curious if we can get C* to do the maintenance task on a

Re: running out of diskspace during maintenance tasks

2014-06-18 Thread Marcelo Elias Del Valle
AFAIK, when you run a repair a snapshot is created. After the repair, I run "nodetool clearsnapshot" to save disk space. Not sure it's you case or not. []s 2014-06-18 13:10 GMT-03:00 Brian Tarbox : > We do a repair -pr on each node once a week on a rolling basis. > Should we be running cleanup a

Re: incremental backups

2014-06-18 Thread Peter Sanford
For snapshots, yes. For incremental backups you need to delete the files yourself. On Wed, Jun 18, 2014 at 6:28 AM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > Wouldn't be better to use "nodetool clearsnapshot"? > []s > > > 2014-06-14 17:38 GMT-03:00 S C : > > I am thinking of "

Re: running out of diskspace during maintenance tasks

2014-06-18 Thread Russell Bradberry
repair only creates snapshots if you use the “-snapshot” option. On June 18, 2014 at 12:28:58 PM, Marcelo Elias Del Valle (marc...@s1mbi0se.com.br) wrote: AFAIK, when you run a repair a snapshot is created. After the repair, I run "nodetool clearsnapshot" to save disk space. Not sure it's you

can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Brian Tarbox
I have a column family that only stores the last 5 days worth of some data...and yet I have files in the data directory for this CF that are 3 weeks old. They take the form: keyspace-CFName-ic--Filter.db keyspace-CFName-ic--Index.db keyspace-CFName-ic--Data.db keyspace-CFName-ic--

Re: running out of diskspace during maintenance tasks

2014-06-18 Thread Robert Coli
On Wed, Jun 18, 2014 at 9:10 AM, Brian Tarbox wrote: > We do a repair -pr on each node once a week on a rolling basis. > https://issues.apache.org/jira/browse/CASSANDRA-5850?focusedCommentId=14036057&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14036057 > Shoul

Re: Minimum Cluster size to accommodate a single node failure

2014-06-18 Thread Ken Hancock
Another nice resource... http://www.ecyrd.com/cassandracalculator/

Re: Minimum Cluster size to accommodate a single node failure

2014-06-18 Thread Robert Coli
On Tue, Jun 17, 2014 at 11:08 PM, Prabath Abeysekara < prabathabeysek...@gmail.com> wrote: > First of all, apologies if the $subject was discussed previously in this > list before. I've already gone through quite a few email trails on this but > still couldn't find a convincing answer which really

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Robert Coli
On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox wrote: > I have a column family that only stores the last 5 days worth of some > data...and yet I have files in the data directory for this CF that are 3 > weeks old. > Are you using TTL? If so : https://issues.apache.org/jira/browse/CASSANDRA-6654

Re: Configuring all nodes as seeds

2014-06-18 Thread Robert Coli
On Wed, Jun 18, 2014 at 4:56 AM, Jonathan Lacefield wrote: > What Artur is alluding to is that seed nodes do not bootstrap. > Replacing seed nodes requires a slightly different approach for node > replacement compared to non seed nodes. See here for more details: > http://www.datastax.com/doc

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Brian Tarbox
Rob, Thank you! We are not using TTL, we're manually deleting data more than 5 days old for this CF. We're running 1.2.13 and are using size tiered compaction (this cf is append-only i.e.zero updates). Sounds like we can get away with doing a (stop, delete old-data-file, restart) process on a r

Re: restarting node makes cpu load of the entire cluster to raise

2014-06-18 Thread Robert Coli
On Wed, Jun 18, 2014 at 5:36 AM, Alain RODRIGUEZ wrote: > We stop the node using : nodetool disablegossip && nodetool disablethrift > && nodetool disablebinary && sleep 10 && nodetool drain && sleep 30 && > service cassandra stop > The stuff before "nodetool drain" here is redundant and doesn't

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Robert Coli
On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox wrote: > Thank you! We are not using TTL, we're manually deleting data more than > 5 days old for this CF. We're running 1.2.13 and are using size tiered > compaction (this cf is append-only i.e.zero updates). > > Sounds like we can get away with

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Brian Tarbox
I don't think I have the space to run a major compaction right now (I'm above 50% disk space used already) and compaction can take extra space I think? On Wed, Jun 18, 2014 at 3:24 PM, Robert Coli wrote: > On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox > wrote: > >> Thank you! We are not usi

Re: Configuring all nodes as seeds

2014-06-18 Thread Ken Hancock
Amen. I believe the whole seed node/bootstrapping confusion goes against the "Why Cassandra", quoted from http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra *Operational simplicity* – with all nodes in a cluster being the same, there is no complex configu

error creating keyspace in cqlsh

2014-06-18 Thread Tim Dunphy
hey all, I know that something pretty basic must be wrong here. But what is the mistake I'm making in creating this keyspace? cqlsh> create keyspace animals with replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor' : 3}; Bad Request: Error constructing replication strategy cla

Re: error creating keyspace in cqlsh

2014-06-18 Thread Marcelo Elias Del Valle
Is "replication_factor" your DC name? Here is what I would using: CREATE KEYSPACE IF NOT EXISTS animals WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 3 }; But in my case, I am using GossipPropertyFileSnitch and DC1 is configured there, so Cassandra knows which nodes are i

Re: error creating keyspace in cqlsh

2014-06-18 Thread Tim Dunphy
Hey that helped! Just to quell your curiosity here's my snitch: endpoint_snitch: SimpleSnitch thanks! On Wed, Jun 18, 2014 at 11:03 PM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > > Is "replication_factor" your DC name? > > Here is what I would using: > > > CREATE KEYSPACE IF NO

Re: Exception with java driver

2014-06-18 Thread Shaheen Afroz
+Cassandra DL We have Cassandra nodes in three datacenters - dc1, dc2 and dc3 and the cluster name is DataCluster. In the same way, our application code is also in same three datacenters. Our application code is accessing cassandra. Now I want to make sure if application call is coming from `dc1`

Re: EBS SSD <-> Cassandra ?

2014-06-18 Thread Ben Bromhead
http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html From the link: EBS volumes are not recommended for Cassandra data volumes for the following reasons: • EBS volumes contend directly for network throughput with standard packets. Th