Cassandra unit tests broke after upgrading from Cassandra 2.0.9 to Cassandra 2.2.4

2016-05-16 Thread Agrawal, Pratik
Hello all, In Cassandra 2.0.9 we used to unit test Cassandra Client side code by starting a local Cassandra server through code. We used to do the following: 1.) Initialize Yaml file by providing YAML file path: System.setProperty("cassandra.config", "file://" + TEMP_YAML_LOCATION); 2.) Initial

FW: Cassandra unit tests broke after upgrading from Cassandra 2.0.9 to Cassandra 2.2.4

2016-05-16 Thread Agrawal, Pratik
Hello all, In Cassandra 2.0.9 we used to unit test Cassandra Client side code by starting a local Cassandra server through code. We used to do the following: 1.) Initialize Yaml file by providing YAML file path: System.setProperty("cassandra.config", "file://" + TEMP_YAML_LOCATION); 2.) Initial

Cassandra Debian repos (Apache vs DataStax)

2016-05-16 Thread Drew Kutcharian
Hi, What’s the difference between the two “Community” repositories Apache (http://www.apache.org/dist/cassandra/debian) and DataStax (http://debian.datastax.com/community/)? If they are just mirrors, then it seems like the DataStax one is a bit behind (version 3.0.6 is available on Apache but

(Full) compaction does not delete (all) old files

2016-05-16 Thread Dongfeng Lu
ooks to me that Cassandra knows that the first 115 files are old and are not really used to create snapshot. Here is the newly created snapshot. -bash-4.1$ ls -ltr snapshots/20160516-1800/*-Data.db -rw-r--r-- 11 cassandra cassandra 20149715779 May 15 04:18 snapshots/20160516-1800/keyspace-event_inde

Repair schedules for new clusters

2016-05-16 Thread Ashic Mahtab
Hi All,My previous cassandra clusters had moderate loads, and I'd simply schedule full repairs at different times in the week (but on the same day). That seemed to work ok, but was redundant. In my current project, I'm going to need to care about repair times a lot more, and was wondering what w

RE: Data platform support

2016-05-16 Thread Ashic Mahtab
If Spark workers are installed on the same nodes as Cassandra nodes, then they can take advantage of data locality, greatly reducing the amount of network IO in Spark jobs. If you use a seperate / Cloudera / Hortonworks / EMR cluster, you won't be able to benefit from this. Other than the locali