from:"Ajay"

Cassandra counters

2015-07-08 Thread Ajay

Hi,

What is the accuracy improvement of counter in 2.1 over 2.0?

This below post, it mentioned 2.0.x issues fixed in 2.1 and perfomance
improvement.
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

But how accurate are the counter 2.1.x or any known issues in 2.1 using
UNLOGGED batch for counter update?

Thanks
Ajay

Re: Cassandra counters

2015-07-10 Thread Ajay

Any pointers on this?.

In 2.1, when updating the counter with UNLOGGED batch using timestamp isn't
safe as other column update with consistency level (with timestamp counter
update can be idempotent? ).

Thanks
Ajay

On 09-Jul-2015 11:47 am, "Ajay"  wrote:
>
> Hi,
>
> What is the accuracy improvement of counter in 2.1 over 2.0?
>
> This below post, it mentioned 2.0.x issues fixed in 2.1 and perfomance
improvement.
>
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>
> But how accurate are the counter 2.1.x or any known issues in 2.1 using
UNLOGGED batch for counter update with timestamp?
>
> Thanks
> Ajay

Re: Can't connect to Cassandra server

2015-07-19 Thread Ajay

Try with the correct IP address as below:

cqlsh 192.248.15.219 -u sinmin -p xx

CQL documentation -
http://docs.datastax.com/en/cql/3.0/cql/cql_reference/cqlsh.html

On Sun, Jul 19, 2015 at 2:00 PM, Chamila Wijayarathna <
cdwijayarat...@gmail.com> wrote:

> Hello all,
>
> After starting cassandra, I tried to connect to cassandra from cqlsh and
> java, but it fails to do so.
>
> Following is the error I get while trying to connect to cqlsh.
>
> cqlsh -u sinmin -p xx
> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
> Connection refused")})
>
> I have set listen_address and rpc_address in cassandra.yaml to the ip
> address of server address like follows.
>
> listen_address:192.248.15.219
> rpc_address:192.248.15.219
>
> Following is what I found from cassandra system.log.
>
> https://gist.githubusercontent.com/cdwijayarathna/a14586a9e39a943f89a0/raw/system%20log
>
> Following is the netstat result I got.
>
> maduranga@ubuntu:/var/log/cassandra$ netstat
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address   Foreign Address State
>
> tcp0  0 ubuntu:ssh  103.21.166.35:54417
> ESTABLISHED
> tcp0  0 ubuntu:1522 ubuntu:30820
>  ESTABLISHED
> tcp0  0 ubuntu:30820ubuntu:1522
> ESTABLISHED
> tcp0256 ubuntu:ssh  175.157.41.209:42435
>  ESTABLISHED
> Active UNIX domain sockets (w/o servers)
> Proto RefCnt Flags   Type   State I-Node   Path
> unix  9  [ ] DGRAM7936 /dev/log
> unix  3  [ ] STREAM CONNECTED 11737
> unix  3  [ ] STREAM CONNECTED 11736
> unix  3  [ ] STREAM CONNECTED 10949
>  /var/run/dbus/system_bus_socket
> unix  3  [ ] STREAM CONNECTED 10948
> unix  2  [ ] DGRAM10947
> unix  2  [ ] STREAM CONNECTED 10801
> unix  3  [ ] STREAM CONNECTED 10641
> unix  3  [ ] STREAM CONNECTED 10640
> unix  3  [ ] STREAM CONNECTED 10444
>  /var/run/dbus/system_bus_socket
> unix  3  [ ] STREAM CONNECTED 10443
> unix  3  [ ] STREAM CONNECTED 10437
>  /var/run/dbus/system_bus_socket
> unix  3  [ ] STREAM CONNECTED 10436
> unix  3  [ ] STREAM CONNECTED 10430
>  /var/run/dbus/system_bus_socket
> unix  3  [ ] STREAM CONNECTED 10429
> unix  2  [ ] DGRAM10424
> unix  3  [ ] STREAM CONNECTED 10422
>  /var/run/dbus/system_bus_socket
> unix  3  [ ] STREAM CONNECTED 10421
> unix  2  [ ] DGRAM10420
> unix  2  [ ] STREAM CONNECTED 10215
> unix  2  [ ] STREAM CONNECTED 10296
> unix  2  [ ] STREAM CONNECTED 9988
> unix  2  [ ] DGRAM9520
> unix  3  [ ] STREAM CONNECTED 8769
> /var/run/dbus/system_bus_socket
> unix  3  [ ] STREAM CONNECTED 8768
> unix  2  [ ] DGRAM8753
> unix  2  [ ] DGRAM9422
> unix  3  [ ] STREAM CONNECTED 7000
> @/com/ubuntu/upstart
> unix  3  [ ] STREAM CONNECTED 8485
> unix  2  [ ] DGRAM7947
> unix  3  [ ] STREAM CONNECTED 6712
> /var/run/dbus/system_bus_socket
> unix  3  [ ] STREAM CONNECTED 6711
> unix  3  [ ] STREAM CONNECTED 7760
> /var/run/dbus/system_bus_socket
> unix  3  [ ] STREAM CONNECTED 7759
> unix  3  [ ] STREAM CONNECTED 7754
> unix  3  [ ] STREAM CONNECTED 7753
> unix  3  [ ] DGRAM7661
> unix  3  [ ] DGRAM7660
> unix  3  [ ] STREAM CONNECTED 6490
> @/com/ubuntu/upstart
> unix  3  [ ] STREAM CONNECTED 6475
>
> What is the issue here? Why I can't connect to Cassandra server? How can I
> fix this?
>
> Thank You!
>
> --
> *Chamila Dilshan Wijayarathna,*
> Software Engineer
> Mobile:(+94)788193620
> WSO2 Inc., http://wso2.com/
>
>

Spark SQL Vs CQL performance on Cassandra

2014-12-11 Thread Ajay

Hi,

To test Spark SQL Vs CQL performance on Cassandra, I did the following:

1) Cassandra standalone server (1 server in a cluster)
2) Spark Master and 1 Worker
Both running in a Thinkpad laptop with 4 cores and 8GB RAM.
3) Written Spark SQL code using Cassandra-Spark Driver from Cassandra
(JavaApiDemo.java. Run with spark://127.0.0.1:7077 127.0.0.1)
4) Writen CQL code using Java driver from Cassandra
(CassandraJavaApiDemo.java)
In both the case, I create 1 millions rows and query for 1

Observation:
1) It takes less than 10 milliseconds using CQL (SELECT * FROM users WHERE
name='Anna')
2) It takes around .6 second using Spark (either SELECT * FROM users WHERE
name='Anna' or javaFunctions(sc).cassandraTable("test", "people",
mapRowTo(Person.class)).where("name=?", "Anna");

Please let me know if I am missing something in Spark configuration or
Cassandra-Spark Driver.

Thanks
Ajay Garga
package com.datastax.demo;

import java.text.SimpleDateFormat;
import java.util.Date;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.ExecutionInfo;
import com.datastax.driver.core.QueryTrace;
import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.SimpleStatement;
import com.datastax.driver.core.Statement;
import com.datastax.driver.core.querybuilder.QueryBuilder;

public class CassandraJavaApiDemo {
	private static SimpleDateFormat format = new SimpleDateFormat(
			"HH:mm:ss.SSS");

	public static void main(String[] args) {
		Cluster cluster = null;
		Session session = null;

		try {
			cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
			session = cluster.connect();

			session.execute("DROP KEYSPACE IF EXISTS test2");
			session.execute("CREATE KEYSPACE test2 WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1}");
			session.execute("CREATE TABLE test2.users (id INT, name TEXT, birth_date  TIMESTAMP, PRIMARY KEY (id) )");
			session.execute("CREATE INDEX people_name_idx2 ON test2.users(name)");

			session = cluster.connect("test2");
			Statement insert = null;
			for (int i = 0; i < 100; i++) {
insert = QueryBuilder.insertInto("users").value("id", i)
		.value("name", "Anna" + i)
		.value("birth_date", new Date());
session.execute(insert);
			}

			long start = System.currentTimeMillis();
			Statement scan = new SimpleStatement(
	"SELECT * FROM users WHERE name='Anna0';");
			scan.enableTracing();
			ResultSet results = session.execute(scan);
			for (Row row : results) {
System.out.format("%d %s\n", row.getInt("id"),
		row.getString("name"));
			}
			long end = System.currentTimeMillis();
			System.out.println(" Time Taken " +  (end - start));
			ExecutionInfo executionInfo = results.getExecutionInfo();
			QueryTrace queryTrace = executionInfo.getQueryTrace();

			System.out.printf("%-38s | %-12s | %-10s | %-12s\n", "activity",
	"timestamp", "source", "source_elapsed");
			System.out
	.println("---+--++--");
			for (QueryTrace.Event event : queryTrace.getEvents()) {
System.out.printf("%38s | %12s | %10s | %12s\n",
		event.getDescription(),
		millis2Date(event.getTimestamp()), event.getSource(),
		event.getSourceElapsedMicros());
			}

		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			if (session != null) {
session.close();
			}
			if (cluster != null) {
cluster.close();
			}
		}
	}

	private static Object millis2Date(long timestamp) {
		return format.format(timestamp);
	}
}
package com.datastax.spark.connector.demo;

import com.datastax.driver.core.Session;
import com.datastax.spark.connector.cql.CassandraConnector;
import com.datastax.spark.connector.japi.CassandraRow;
import com.google.common.base.Objects;

import org.apache.hadoop.util.StringUtils;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.SchemaRDD;
import org.apache.spark.sql.cassandra.CassandraSQLContext;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.List;

import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapRowTo;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapToRow;

/**
 * This Spark application demonstrates how to use Spark Cassandra Connector with
 * Java.
 *

Cassandra for Analytics?

2014-12-17 Thread Ajay

Hi,

Can Cassandra be used or best fit for Real Time Analytics? I went through
couple of benchmark between Cassandra Vs HBase (most of it was done 3 years
ago) and it mentioned that Cassandra is designed for intensive writes and
Cassandra has higher latency for reads than HBase. In our case, we will
have writes and reads (but reads will be more say 40% writes and 60%
reads). We are planning to use Spark as the in memory computation engine.

Thanks
Ajay

Re: Cassandra for Analytics?

2014-12-18 Thread Ajay

Thanks Ryan and Peter for the suggestions.

Our requirement(an ecommerce company) at a higher level is to build a
Datawarehouse as a platform or service(for different product teams to
consume) as below:

Datawarehouse as a platform/service
 |
Spark SQL
 |
Spark in memory computation engine (We were considering Drill/Flink but
Spark is better mature and in production)
 |
Cassandra/HBase (Yet to be decided. Aggregated views + data
directly written to this. So 40%-50% writes, 50-60% reads)
 |
Streaming processing (Spark Streaming or Storm. Yet to be decided.
Spark streaming is relatively new)
|
 My SQL/Mongo/Real Time data

Since we are planning to build it as a service, we cannot consider a
particular data access pattern.

Thanks
Ajay


On Thu, Dec 18, 2014 at 7:00 PM, Peter Lin  wrote:
>
>
> for the record I think spark is good and I'm glad we have options.
>
> my point wasn't to bad mouth spark. I'm not comparing spark to storm at
> all, so I think there's some confusion here. I'm thinking of espers,
> streambase, and other stream processing products. My point is to think
> about the problems that needs to be solved before picking a solution. Like
> everyone else, I've been guilty of this in the past, so it's not propaganda
> for or against any specific product.
>
> I've seen customers user IBM infosphere streams when something like storm
> or spark would work, but I've also seen cases where open source doesn't
> provide equivalent functionality. If spark meets the needs, then either
> hbase or cassandra will probably work fine. The bigger question is what
> patterns do you use in the architecture? Do you store the data first before
> doing analysis? Is the data noisy and needs filtering before persistence?
> What kinds of patterns/queries and operations are needed?
>
> having worked on trading systems and other real-time use cases, not all
> stream processing is the same.
>
> On Thu, Dec 18, 2014 at 8:18 AM, Ryan Svihla  wrote:
>>
>> I'll decline to continue the commentary on spark, as again this probably
>> belongs on another list, other than to say, microbatches is an intentional
>> design tradeoff that has notable benefits for the same use cases you're
>> referring too, and that while you may disagree with those tradeoffs, it's a
>> bit harsh to dismiss as "basic" something that was chosen and provides some
>> improvements over say..the Storm model.
>>
>> On Thu, Dec 18, 2014 at 7:13 AM, Peter Lin  wrote:
>>>
>>>
>>> some of the most common types of use cases in stream processing is
>>> sliding windows based on time or count. Based on my understanding of spark
>>> architecture and spark streaming, it does not provide the same
>>> functionality. One can fake it by setting spark streaming to really small
>>> micro-batches, but that's not the same.
>>>
>>> if the use case fits that model, than using spark is fine. For other
>>> kinds of use cases, spark may not be a good fit. Some people store all
>>> events before analyzing it, which works for some use cases. While other
>>> uses cases like trading systems, store before analysis isn't feasible or
>>> practical. Other use cases like command control also don't fit store before
>>> analysis model.
>>>
>>> Try to avoid putting the cart infront of the horse. Picking a tool
>>> before you have a clear understanding of the problem is a good recipe for
>>> disaster
>>>
>>> On Thu, Dec 18, 2014 at 8:04 AM, Ryan Svihla 
>>> wrote:
>>>>
>>>> Since Ajay is already using spark the Spark Cassandra Connector really
>>>> gets them where they want to be pretty easily
>>>> https://github.com/datastax/spark-cassandra-connector (joins, etc).
>>>>
>>>> As far as spark streaming having "basic support" I'd challenge that
>>>> assertion (namely Storm has a number of problems with delivery guarantees
>>>> that Spark basically solves), however, this isn't a Spark mailing list, and
>>>> perhaps this conversation is better had there.
>>>>
>>>> If the question "Is Cassandra used in real time analytics cases with
>>>> Spark?" the answer is absolutely yes (and Storm for that matter). If the
>>>> question is "Can you do your analytics queries on Cassandra while you have
>>>> Spark sitting there doing nothing?" then of course the answer is no, but
>&g

Re: Cassandra for Analytics?

2014-12-18 Thread Ajay

Hi Peter,

You are right.The idea is to directly query the data from No SQL, in our
case via Spark SQL on Spark (as largely Spark support
Mongo/Cassandra/HBase/Hadoop). As you said, the business users still need
to query using Spark SQL. We are already using No SQL BI tools like Pentaho
(which also plans to support Spark SQL soon). The idea is to abstract the
business users from the storage solutions (more than one. Cassandra/HBase &
Mongo).

Thanks
Ajay

On Thu, Dec 18, 2014 at 8:01 PM, Peter Lin  wrote:
>
>
> by data warehouse, what kind do you mean?
>
> is it the traditional warehouse where people create multi-dimensional
> cubes?
> or is it the newer class of UI tools that makes it easier for users to
> explore data and the warehouse is "mostly" a denormalized (ie flattened)
> format of the OLTP?
> or is it a combination of both?
>
> from my experience, the biggest challenge of data warehousing isn't
> storing the data. It's making it easy to explore for adhoc mdx-like
> queries. In the old days, the DBA's would define the cubes, write the ETL
> routines and let the data load for days/weeks. In the new nosql model, you
> can avoid the cube + ETL phase, but discovering the data and understanding
> the format still requires a developer.
>
> getting the data into an "user friendly" format like a cube with Spark
> still requires a developer. I find that business users hate to go to the
> developer, because we tend to ask "what's the functional specs?" Most of
> the time business users don't know, they just want to explore. At that
> point, the storage engine largely doesn't matter to the end user. It
> matters to the developers, but business users don't care.
>
> based on the description, I would watch out for how many aggregated views
> the platform creates. search the mailing list to see past discussions on
> the maximum recommended number of column families.
>
> where classic data warehouse caused lots of pain is creating cubes. Any
> general solution attempting to replace/supplement existing products needs
> to make it easy and trivial to define adhoc cubes and then query against
> it. There are existing products that already connect to a few nosql
> databases for data exploration. hope that helps
>
> peter
>
>
>
> On Thu, Dec 18, 2014 at 9:01 AM, Ajay  wrote:
>>
>> Thanks Ryan and Peter for the suggestions.
>>
>> Our requirement(an ecommerce company) at a higher level is to build a
>> Datawarehouse as a platform or service(for different product teams to
>> consume) as below:
>>
>> Datawarehouse as a platform/service
>>  |
>> Spark SQL
>>  |
>> Spark in memory computation engine (We were considering Drill/Flink but
>> Spark is better mature and in production)
>>  |
>> Cassandra/HBase (Yet to be decided. Aggregated views + data
>> directly written to this. So 40%-50% writes, 50-60% reads)
>>  |
>> Streaming processing (Spark Streaming or Storm. Yet to be
>> decided. Spark streaming is relatively new)
>> |
>>  My SQL/Mongo/Real Time data
>>
>> Since we are planning to build it as a service, we cannot consider a
>> particular data access pattern.
>>
>> Thanks
>> Ajay
>>
>>
>> On Thu, Dec 18, 2014 at 7:00 PM, Peter Lin  wrote:
>>>
>>>
>>> for the record I think spark is good and I'm glad we have options.
>>>
>>> my point wasn't to bad mouth spark. I'm not comparing spark to storm at
>>> all, so I think there's some confusion here. I'm thinking of espers,
>>> streambase, and other stream processing products. My point is to think
>>> about the problems that needs to be solved before picking a solution. Like
>>> everyone else, I've been guilty of this in the past, so it's not propaganda
>>> for or against any specific product.
>>>
>>> I've seen customers user IBM infosphere streams when something like
>>> storm or spark would work, but I've also seen cases where open source
>>> doesn't provide equivalent functionality. If spark meets the needs, then
>>> either hbase or cassandra will probably work fine. The bigger question is
>>> what patterns do you use in the architecture? Do you store the data first
>>> before doing analysis? Is the data noisy and needs filtering before
>>> persistence? What kinds of patterns/queries and operations are needed?
>>>
>>> having worked on trading systems

Throughput Vs Latency

2014-12-25 Thread Ajay

Hi,

I am new to No SQL (and Cassandra). As I am going through few articles on
Cassandra, it says Cassandra achieves highest throughput among various No
SQL solutions but at the cost of high  read and write latency. I have a
basic question here - (If my understanding is right) Latency means the time
taken to accept input, process and respond back. If Latency is more how
come the Throughput is high?

Thanks
Ajay

Re: Throughput Vs Latency

2014-12-25 Thread Ajay

Thanks Thomas for the clarification.

If I use the Consistency level of QUORUM for Read and Write, the Latency
would affect the Throughput right?

Thanks
Ajay

On Fri, Dec 26, 2014 at 11:15 AM, Job Thomas  wrote:

>  Hi,
>
> First of all,the write latency of cassandra is not high(Read is high).
>
> The high throughput is achieved through distributes read and write.
>
> Your doubt ( If Latency is more how come the Throughput is high ) is some
> what right if you put high consistency to both read and write.
>
> You will get distributed abilities since it is not Master/Slave
> architecture(Like HBase).
>
>  If  your consistency is lesser,then some nodes out of all replica nodes
> are free and will be used for another read/write . [ Think you are using
> multithreaded
> application ]
>
>  Thanks & Regards
> Job M Thomas
> Platform & Technology
> Mob : 7560885748
>
> --
> *From:* Ajay [mailto:ajay.ga...@gmail.com]
> *Sent:* Fri 12/26/2014 10:46 AM
> *To:* user@cassandra.apache.org
> *Subject:* Throughput Vs Latency
>
>   Hi,
>
> I am new to No SQL (and Cassandra). As I am going through few articles on
> Cassandra, it says Cassandra achieves highest throughput among various No
> SQL solutions but at the cost of high  read and write latency. I have a
> basic question here - (If my understanding is right) Latency means the time
> taken to accept input, process and respond back. If Latency is more how
> come the Throughput is high?
>
> Thanks
> Ajay
>

Re: Throughput Vs Latency

2014-12-25 Thread Ajay

Hi Thomas,

I am little confused when you say multithreaded client. Actually we don't
explicitly invoke read on multiple servers (for replicated data) from the
client code. So how does multithreaded client fix this?

Thanks
Ajay


On Fri, Dec 26, 2014 at 12:08 PM, Job Thomas  wrote:

> Hi Ajay,
>
> My understanding is this,If you have a cluster of 3 nodes with replication
> factor of 3 , then the latency has more roll in throughput.
>
> It the cluster size is 6 with replication factor or 3 and if  you are
> using multithreaded client, then the latency remain same and you will get
> better throughput.(Not because of 6 node but because of 6 nodes and
> multiple threads).
>
> Thanks & Regards
> Job M Thomas
> Platform & Technology
> Mob : 7560885748
>
> 
>
> From: Ajay [mailto:ajay.ga...@gmail.com]
> Sent: Fri 12/26/2014 11:57 AM
> To: user@cassandra.apache.org
> Subject: Re: Throughput Vs Latency
>
>
> Thanks Thomas for the clarification.
>
>
> If I use the Consistency level of QUORUM for Read and Write, the Latency
> would affect the Throughput right?
>
>
> Thanks
>
> Ajay
>
>
> On Fri, Dec 26, 2014 at 11:15 AM, Job Thomas  wrote:
>
>
> Hi,
>
> First of all,the write latency of cassandra is not high(Read is
> high).
>
> The high throughput is achieved through distributes read and write.
>
> Your doubt ( If Latency is more how come the Throughput is high )
> is some what right if you put high consistency to both read and write.
>
> You will get distributed abilities since it is not Master/Slave
> architecture(Like HBase).
>
>  If  your consistency is lesser,then some nodes out of all replica
> nodes are free and will be used for another read/write . [ Think you are
> using multithreaded
> application ]
>
> Thanks & Regards
> Job M Thomas
> Platform & Technology
> Mob : 7560885748
>
> 
>
> From: Ajay [mailto:ajay.ga...@gmail.com]
> Sent: Fri 12/26/2014 10:46 AM
> To: user@cassandra.apache.org
> Subject: Throughput Vs Latency
>
>
> Hi,
>
>
> I am new to No SQL (and Cassandra). As I am going through few
> articles on Cassandra, it says Cassandra achieves highest throughput among
> various No SQL solutions but at the cost of high  read and write latency. I
> have a basic question here - (If my understanding is right) Latency means
> the time taken to accept input, process and respond back. If Latency is
> more how come the Throughput is high?
>
>
> Thanks
>
> Ajay
>
>
>
>

Counter Column

2014-12-26 Thread Ajay

Hi,

If the nodes of Cassandra ring are in different timezone, could it affect
the counter column as it depends on the timestamp?

Thanks
Ajay

Re: Counter Column

2014-12-27 Thread Ajay

Thanks.

I went through some articles which mentioned that the client to pass the
timestamp for insert and update. Is that anyway we can avoid it and
Cassandra assume the current time of the server?

Thanks
Ajay
On Dec 26, 2014 10:50 PM, "Eric Stevens"  wrote:

> Timestamps are timezone independent.  This is a property of timestamps,
> not a property of Cassandra. A given moment is the same timestamp
> everywhere in the world.  To display this in a human readable form, you
> then need to know what timezone you're attempting to represent the
> timestamp as, this is the information necessary to convert it to local time.
>
> On Fri, Dec 26, 2014 at 2:05 AM, Ajay  wrote:
>>
>> Hi,
>>
>> If the nodes of Cassandra ring are in different timezone, could it affect
>> the counter column as it depends on the timestamp?
>>
>> Thanks
>> Ajay
>>
>

User click count

2014-12-29 Thread Ajay

Hi,

Is it better to use Counter to User click count than maintaining creating
new row as user id : timestamp and count it.

Basically we want to track the user clicks and use the same for
hourly/daily/monthly report.

Thanks
Ajay

Re: User click count

2014-12-29 Thread Ajay

Hi,

So you mean to say counters are not accurate? (It is highly likely that
multiple parallel threads trying to increment the counter as users click
the links).

Thanks
Ajay


On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen 
wrote:

>
> Hi!
>
> It’s really a tradeoff between accurate and fast and your read access
> patterns; if you need it to be fairly fast, use counters by all means, but
> accept the fact that they will (especially in older versions of cassandra
> or adverse network conditions) drift off from the true click count.  If you
> need accurate, use a timeuuid and count the rows (this is fairly safe for
> replays too).  However, if using timeuuids your storage will need lots of
> space; and your reads will be slow if the click counts are huge (because
> Cassandra will need to read every item).  Using counters makes it easy to
> just grab a slice of the time series data and shove it to a client for
> visualization.
>
> You could of course do a hybrid system; use timeuuids and then
> periodically count and add the result to a regular column, and then remove
> the columns.  Note that you might want to optimize this so that you don’t
> end up with a lot of tombstones, e.g. by bucketing the writes so that you
> can delete everything with just a single partition delete.
>
> At Thinglink some of the more important counters that we use are backed up
> by the actual data. So for speed purposes we use always counters for reads,
> but there’s a repair process that fixes the counter value if we suspect it
> starts drifting off the real data too much.  (You might be able to tell
> that we’ve been using counters for quite some time :-P)
>
> /Janne
>
> On 29 Dec 2014, at 13:00, Ajay  wrote:
>
> > Hi,
> >
> > Is it better to use Counter to User click count than maintaining
> creating new row as user id : timestamp and count it.
> >
> > Basically we want to track the user clicks and use the same for
> hourly/daily/monthly report.
> >
> > Thanks
> > Ajay
>
>

Re: User click count

2014-12-29 Thread Ajay

Thanks for the clarification.

In my case, Cassandra is the only storage. If the counters get incorrect,
it could't be corrected. For that if we store raw data, we can as well go
that approach. But the granularity has to be as seconds level as more than
one user can click the same link. So the data will be huge with more writes
and more rows to count for reads right?

Thanks
Ajay


On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ  wrote:

> Hi Ajay,
>
> Here is a good explanation you might want to read.
>
>
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>
> Though we use counters for 3 years now, we used them from start C* 0.8 and
> we are happy with them. Limits I can see in both ways are:
>
> Counters:
>
> - accuracy indeed (Tend to be small in our use case < 5% - when the
> business allow 10%, so fair enough for us) + we recount them through a
> batch processing tool (spark / hadoop - Kind of lambda architecture). So
> our real-time stats are inaccurate and after a few minutes or hours we have
> the real value.
> - Read-Before-Write model, which is an anti-pattern. Makes you use more
> machine due to the pressure involved, affordable for us too.
>
> Raw data (counted)
>
> - Space used (can become quite impressive very fast, depending on your
> business) !
> - Time to answer a request (we expose the data to customer, they don't
> want to wait 10 sec for Cassandra to read 1 000 000 + columns)
> - Performances in o(n) (linear) instead of o(1) (constant). Customer won't
> always understand that for you it is harder to read 1 than 1 000 000, since
> it should be reading 1 number in both case, and your interface will have
> very unstable read time.
>
> Pick the best solution (or combination) for your use case. Those
> disadvantages lists are not exhaustive, just things that came to my mind
> right now.
>
> C*heers
>
> Alain
>
> 2014-12-29 13:33 GMT+01:00 Ajay :
>
>> Hi,
>>
>> So you mean to say counters are not accurate? (It is highly likely that
>> multiple parallel threads trying to increment the counter as users click
>> the links).
>>
>> Thanks
>> Ajay
>>
>>
>> On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen > > wrote:
>>
>>>
>>> Hi!
>>>
>>> It’s really a tradeoff between accurate and fast and your read access
>>> patterns; if you need it to be fairly fast, use counters by all means, but
>>> accept the fact that they will (especially in older versions of cassandra
>>> or adverse network conditions) drift off from the true click count.  If you
>>> need accurate, use a timeuuid and count the rows (this is fairly safe for
>>> replays too).  However, if using timeuuids your storage will need lots of
>>> space; and your reads will be slow if the click counts are huge (because
>>> Cassandra will need to read every item).  Using counters makes it easy to
>>> just grab a slice of the time series data and shove it to a client for
>>> visualization.
>>>
>>> You could of course do a hybrid system; use timeuuids and then
>>> periodically count and add the result to a regular column, and then remove
>>> the columns.  Note that you might want to optimize this so that you don’t
>>> end up with a lot of tombstones, e.g. by bucketing the writes so that you
>>> can delete everything with just a single partition delete.
>>>
>>> At Thinglink some of the more important counters that we use are backed
>>> up by the actual data. So for speed purposes we use always counters for
>>> reads, but there’s a repair process that fixes the counter value if we
>>> suspect it starts drifting off the real data too much.  (You might be able
>>> to tell that we’ve been using counters for quite some time :-P)
>>>
>>> /Janne
>>>
>>> On 29 Dec 2014, at 13:00, Ajay  wrote:
>>>
>>> > Hi,
>>> >
>>> > Is it better to use Counter to User click count than maintaining
>>> creating new row as user id : timestamp and count it.
>>> >
>>> > Basically we want to track the user clicks and use the same for
>>> hourly/daily/monthly report.
>>> >
>>> > Thanks
>>> > Ajay
>>>
>>>
>>
>

Re: User click count

2014-12-29 Thread Ajay

Thanks Janne, Alain and Eric.

Now say I go with counters (hourly, daily, monthly) and also store UUID as
below:

user Id : /mm/dd as row key and dynamic columns for each click with
column key as timestamp and value as empty. Periodically count the columns
and rows and correct the counters. Now in this case, there will be one row
per day but as many columns as user click.

Other way is to store row per hour
user id : /mm/dd/hh as row key and dynamic columns for each click with
column key as timestamp and value as empty.

Is there any difference (in performance or any known issues) between more
rows Vs more columns as Cassandra deletes them through tombstones (say by
default 20 days).

Thanks
Ajay

On Mon, Dec 29, 2014 at 7:47 PM, Eric Stevens  wrote:

> > If the counters get incorrect, it could't be corrected
>
> You'd have to store something that allowed you to correct it.  For
> example, the TimeUUID approach to keep true counts, which are slow to read
> but accurate, and a background process that trues up your counter columns
> periodically.
>
> On Mon, Dec 29, 2014 at 7:05 AM, Ajay  wrote:
>
>> Thanks for the clarification.
>>
>> In my case, Cassandra is the only storage. If the counters get incorrect,
>> it could't be corrected. For that if we store raw data, we can as well go
>> that approach. But the granularity has to be as seconds level as more than
>> one user can click the same link. So the data will be huge with more writes
>> and more rows to count for reads right?
>>
>> Thanks
>> Ajay
>>
>>
>> On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ 
>> wrote:
>>
>>> Hi Ajay,
>>>
>>> Here is a good explanation you might want to read.
>>>
>>>
>>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>>>
>>> Though we use counters for 3 years now, we used them from start C* 0.8
>>> and we are happy with them. Limits I can see in both ways are:
>>>
>>> Counters:
>>>
>>> - accuracy indeed (Tend to be small in our use case < 5% - when the
>>> business allow 10%, so fair enough for us) + we recount them through a
>>> batch processing tool (spark / hadoop - Kind of lambda architecture). So
>>> our real-time stats are inaccurate and after a few minutes or hours we have
>>> the real value.
>>> - Read-Before-Write model, which is an anti-pattern. Makes you use more
>>> machine due to the pressure involved, affordable for us too.
>>>
>>> Raw data (counted)
>>>
>>> - Space used (can become quite impressive very fast, depending on your
>>> business) !
>>> - Time to answer a request (we expose the data to customer, they don't
>>> want to wait 10 sec for Cassandra to read 1 000 000 + columns)
>>> - Performances in o(n) (linear) instead of o(1) (constant). Customer
>>> won't always understand that for you it is harder to read 1 than 1 000 000,
>>> since it should be reading 1 number in both case, and your interface will
>>> have very unstable read time.
>>>
>>> Pick the best solution (or combination) for your use case. Those
>>> disadvantages lists are not exhaustive, just things that came to my mind
>>> right now.
>>>
>>> C*heers
>>>
>>> Alain
>>>
>>> 2014-12-29 13:33 GMT+01:00 Ajay :
>>>
>>>> Hi,
>>>>
>>>> So you mean to say counters are not accurate? (It is highly likely that
>>>> multiple parallel threads trying to increment the counter as users click
>>>> the links).
>>>>
>>>> Thanks
>>>> Ajay
>>>>
>>>>
>>>> On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen <
>>>> janne.jalka...@ecyrd.com> wrote:
>>>>
>>>>>
>>>>> Hi!
>>>>>
>>>>> It’s really a tradeoff between accurate and fast and your read access
>>>>> patterns; if you need it to be fairly fast, use counters by all means, but
>>>>> accept the fact that they will (especially in older versions of cassandra
>>>>> or adverse network conditions) drift off from the true click count.  If 
>>>>> you
>>>>> need accurate, use a timeuuid and count the rows (this is fairly safe for
>>>>> replays too).  However, if using timeuuids your storage will need lots of
>>>>> space; and your reads will be slow if the click counts are huge (because
>>>>> Cassandra will need to read every item

Re: User click count

2014-12-30 Thread Ajay

Thanks Janne and Rob.

The idea is like this : To store the User clicks on Cassandra and a
scheduler to count/aggregate the  clicks per link or ad
hourly/daily/monthly and store in My SQL (or may be in Cassandra itself).
Since tombstones will be deleted only after some days (as per
configuration), could the subsequent queries to count the rows get affected
(I mean say thousands of tombstones will affect the performance of the
query) ?

Secondly as I understand from this mail thread, the counter is not correct
for this use case, is there any technical reason behind it (just out of
curiosity)?

Thanks
Ajay

On Tue, Dec 30, 2014 at 10:37 PM, Janne Jalkanen 
wrote:

>
> Hi!
>
> Yes, since all the writes for a partition (or row if you speak Thrift)
> always go to the same replicas, you will need to design to avoid hotspots -
> a pure day row will cause all the writes for a single day to go to the same
> replicas, so those nodes will have to work really hard for a day, and then
> the next day it’s again hard work for some other nodes.  If you have an
> user id there in front, then it would distribute better.
>
> For tombstone purposes think of your access patterns; if you have a
> date-based system, it probably does not matter since you will scan those
> UUIDs once, and then they will be tombstoned away.  It’s cleaner if you can
> delete the entire row with a single command, but as long as you never read
> it again, I don’t think this matters much.
>
> The real problems with wide rows come with compaction, and you shouldn’t
> have much problems with compaction because this is an append-only row, so
> it should be fine as a fairly wide row.  Make some back-of-the-envelope
> calculations and if it looks like you’re going to be hitting tens of
> millions of columns per day, then store per hour.
>
> One important thing: in order not to lose clicks, always use timeuuids
> instead of timestamps (or else two clicks coming in for the same id would
> overwrite itself and count as one).
>
> /Janne
>
> On 30 Dec 2014, at 06:28, Ajay  wrote:
>
> Thanks Janne, Alain and Eric.
>
> Now say I go with counters (hourly, daily, monthly) and also store UUID as
> below:
>
> user Id : /mm/dd as row key and dynamic columns for each click with
> column key as timestamp and value as empty. Periodically count the columns
> and rows and correct the counters. Now in this case, there will be one row
> per day but as many columns as user click.
>
> Other way is to store row per hour
> user id : /mm/dd/hh as row key and dynamic columns for each click with
> column key as timestamp and value as empty.
>
> Is there any difference (in performance or any known issues) between more
> rows Vs more columns as Cassandra deletes them through tombstones (say by
> default 20 days).
>
> Thanks
> Ajay
>
> On Mon, Dec 29, 2014 at 7:47 PM, Eric Stevens  wrote:
>
>> > If the counters get incorrect, it could't be corrected
>>
>> You'd have to store something that allowed you to correct it.  For
>> example, the TimeUUID approach to keep true counts, which are slow to read
>> but accurate, and a background process that trues up your counter columns
>> periodically.
>>
>> On Mon, Dec 29, 2014 at 7:05 AM, Ajay  wrote:
>>
>>> Thanks for the clarification.
>>>
>>> In my case, Cassandra is the only storage. If the counters get
>>> incorrect, it could't be corrected. For that if we store raw data, we can
>>> as well go that approach. But the granularity has to be as seconds level as
>>> more than one user can click the same link. So the data will be huge with
>>> more writes and more rows to count for reads right?
>>>
>>> Thanks
>>> Ajay
>>>
>>>
>>> On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ 
>>> wrote:
>>>
>>>> Hi Ajay,
>>>>
>>>> Here is a good explanation you might want to read.
>>>>
>>>>
>>>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>>>>
>>>> Though we use counters for 3 years now, we used them from start C* 0.8
>>>> and we are happy with them. Limits I can see in both ways are:
>>>>
>>>> Counters:
>>>>
>>>> - accuracy indeed (Tend to be small in our use case < 5% - when the
>>>> business allow 10%, so fair enough for us) + we recount them through a
>>>> batch processing tool (spark / hadoop - Kind of lambda architecture). So
>>>> our real-time stats are inaccurate and after a few minutes or hours we have
>>>> the real value.
>&g

Re: User click count

2014-12-31 Thread Ajay

Thanks Eric.

Happy new year 2015 for all Cassandra developers and Users :). This group
seems the most active of apache big data projects.

Will come back with more questions :)

Thanks
Ajay
On Dec 31, 2014 8:02 PM, "Eric Stevens"  wrote:

> You can totally avoid the impact of tombstones by rotating your partition
> key in the exact counts table, and only deleting whole partitions once
> you've counted them.  Once you've counted them you never have cause to read
> that partition key again.
>
> You can totally store the final counts in Cassandra as a standard
> (non-counter) column, and you can even use counters to keep track of the
> time slices which haven't been formally counted yet so that you can get
> reasonably accurate information about time slices that haven't been trued
> up yet.
>
> This is basically what's called a Lambda architecture - use efficient real
> time processing to get pretty close to accurate values when real time
> performance matters, then use a cleanup process to get perfectly accurate
> values when you can afford non-real-time processing times, and store that
> final computation so that you can continue to access it quickly.
>
> > is there any technical reason behind it (just out of curiosity)?
>
> Distributed counting is a fundamentally hard problem if you wish to do so
> in a manner that avoids bottlenecks (i.e. not distributed) and also
> provides for perfect accuracy.  There's plenty of research in this area,
> and there isn't a single algorithm that provides for all the properties we
> would hope for.  Instead there are different algorithms that make different
> tradeoffs.
>
> The way that Cassandra's counters can fail is that most operations in
> Cassandra are idempotent - if we're not sure whether an update has been
> applied correctly or not, we can simply apply it again, because it's safe
> to do twice.  Counters are not idempotent.  If you try to increment a
> counter, and you're not certain whether the increment was successful or
> not, it is *not* safe to try again (if it was successful the previous
> time, you've now incremented twice when it should have been once).
>
> Most of the time counters are reasonable and accurate, but in failure
> scenarios you may get some changes applied more than once, or not at all.
> With that in mind, you might find that being perfectly accurate most of the
> time, and being within a fraction of a percent the other times is
> acceptable.  If so, counters are your friend, and if not, a more complex
> lambda style approach as we've been advocating here is best.
>
> On Tue, Dec 30, 2014 at 10:54 PM, Ajay  wrote:
>
>> Thanks Janne and Rob.
>>
>> The idea is like this : To store the User clicks on Cassandra and a
>> scheduler to count/aggregate the  clicks per link or ad
>> hourly/daily/monthly and store in My SQL (or may be in Cassandra itself).
>> Since tombstones will be deleted only after some days (as per
>> configuration), could the subsequent queries to count the rows get affected
>> (I mean say thousands of tombstones will affect the performance of the
>> query) ?
>>
>> Secondly as I understand from this mail thread, the counter is not
>> correct for this use case, is there any technical reason behind it (just
>> out of curiosity)?
>>
>> Thanks
>> Ajay
>>
>> On Tue, Dec 30, 2014 at 10:37 PM, Janne Jalkanen <
>> janne.jalka...@ecyrd.com> wrote:
>>
>>>
>>> Hi!
>>>
>>> Yes, since all the writes for a partition (or row if you speak Thrift)
>>> always go to the same replicas, you will need to design to avoid hotspots -
>>> a pure day row will cause all the writes for a single day to go to the same
>>> replicas, so those nodes will have to work really hard for a day, and then
>>> the next day it’s again hard work for some other nodes.  If you have an
>>> user id there in front, then it would distribute better.
>>>
>>> For tombstone purposes think of your access patterns; if you have a
>>> date-based system, it probably does not matter since you will scan those
>>> UUIDs once, and then they will be tombstoned away.  It’s cleaner if you can
>>> delete the entire row with a single command, but as long as you never read
>>> it again, I don’t think this matters much.
>>>
>>> The real problems with wide rows come with compaction, and you shouldn’t
>>> have much problems with compaction because this is an append-only row, so
>>> it should be fine as a fairly wide row.  Make some back-of-the-envelope
>>> calculations and if it looks like yo

Stable cassandra build for production usage

2014-12-31 Thread Ajay

Hi All,

For my research and learning I am using Cassandra 2.1.2. But I see couple
of mail threads going on issues in 2.1.2. So what is the stable or popular
build for production in Cassandra 2.x series.

Thanks
Ajay

Cassandra nodes in VirtualBox

2015-01-05 Thread Ajay

Hi,

I did the Cassandra cluster set up as below:

Node 1 : Seed Node
Node 2
Node 3
Node 4

All 4 nodes are Virtual Box VMs with Ubuntu 14.10. I have set the
listen_address, rpc_address as the inet address with SimpleSnitch.

When I start Node2 after Node1 is started, I get the
"java.lang.RuntimeException: Unable to news with any seeds".

What could be the reason?

Thanks
Ajay

Re: Cassandra nodes in VirtualBox

2015-01-05 Thread Ajay

Neha,

This is just for a trial set up. Anyway, thanks for the suggestion(more
than 1 seed node).

I figured out the problem. The Node2 was having the incorrect Cluster name.
The error seems to be misleading though.

Thanks
Ajay Garga



On Mon, Jan 5, 2015 at 4:21 PM, Neha Trivedi  wrote:

> Hi Ajay,
> 1. you should have at least 2 Seed nodes as it will help, Node1 (only one
> seed node) is down.
> 2. Check you should be using internal ip address in listen_address and
> rpc_address.
>
>
>
>
> On Mon, Jan 5, 2015 at 2:07 PM, Ajay  wrote:
>
>> Hi,
>>
>> I did the Cassandra cluster set up as below:
>>
>> Node 1 : Seed Node
>> Node 2
>> Node 3
>> Node 4
>>
>> All 4 nodes are Virtual Box VMs with Ubuntu 14.10. I have set the
>> listen_address, rpc_address as the inet address with SimpleSnitch.
>>
>> When I start Node2 after Node1 is started, I get the
>> "java.lang.RuntimeException: Unable to news with any seeds".
>>
>> What could be the reason?
>>
>> Thanks
>> Ajay
>>
>
>

Token function in CQL for composite partition key

2015-01-07 Thread Ajay

Hi,

I have a column family as below:

(Wide row design)
CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY
KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC);

Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to
2015-01-07 14, how do I use the token function in the CQL.

Thanks
Ajay

Re: Token function in CQL for composite partition key

2015-01-07 Thread Ajay

Thanks.

Basically there are two access patterns:
1) For last 1 hour (or more if last batch failed for some reason), get the
clicks data for all Ads. But it seems not possible as Ad Id is part of
Partition key.
2) For last 1 hour (or more if last batch failed for some reason),  get the
clicks data for a specific Ad Id(one or more may be).

How do we support 1 and 2 with a same data model? (I thought to use Ad ID +
Hour data as Partition key to avoid hotspots)

Thanks
Ajay

On Wed, Jan 7, 2015 at 6:34 PM, Sylvain Lebresne 
wrote:

> On Wed, Jan 7, 2015 at 10:18 AM, Ajay  wrote:
>
>> Hi,
>>
>> I have a column family as below:
>>
>> (Wide row design)
>> CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY
>> KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC);
>>
>> Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to
>> 2015-01-07 14, how do I use the token function in the CQL.
>>
>
> From that description, it doesn't appear to me that you need the token
> function. Just do 3 queries for each hour, each queries being something
> along the lines of
>   SELECT * FROM clicks WHERE adId=... AND hour='2015-01-07 11' AND ...
>
> For completness sake, I should note that you could do that with a single
> query by using an IN on the hour column, but it's actually not a better
> solution (provided you submit the 3 queries in an asynchronous fashion at
> least) in that case because of reason explained here:
> https://medium.com/@foundev/cassandra-query-patterns-not-using-the-in-query-e8d23f9b17c7
> .
>
> --
> Sylvain
>
>>
>

Re: Keyspace uppercase name issues

2015-01-07 Thread Ajay

We noticed the same issue. From the cassandra-cli, it allows to use upper
case or mixed case Keyspace name but from cqlsh it auto converts to lower
case.

Thanks
Ajay

On Wed, Jan 7, 2015 at 9:44 PM, Harel Gliksman  wrote:

> Hi,
>
> We have a Cassandra cluster with Keyspaces that were created using the
> thrift api and thei names contain upper case letters.
> We are trying to use the new Datastax driver (version 2.1.4, maven's
> latest ) but encountering some problems due to upper case handling.
>
> Datastax provide this guidance on how to handle lower-upper cases:
>
> http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/ucase-lcase_r.html
>
> However, there seems to be something confusing in the API.
>
> Attached a small java code that reproduces the problem.
>
> Many thanks,
> Harel.
>

User audit in Cassandra

2015-01-08 Thread Ajay

Hi,

Is there a way to enable user audit or trace if we have enabled
PasswordAuthenticator in cassandra.yaml and set up the users as well. I
noticed there are keyspaces system_auth and system_trace. But there is no
way to find out which user initiated which session. Is there anyway to find
out?. Also is it recommended to enable system_trace in production or to
know how many sessions started by a user?

Thanks
Ajay

Re: User audit in Cassandra

2015-01-09 Thread Ajay

Thanks Tyler Hobbs.


We need to capture what are the queries ran by a user in a session and its
time taken. (don't need query plan or so). Is that possible? With
Authenticator we can capture only the session creation right?

Thanks
Ajay


On Sat, Jan 10, 2015 at 6:07 AM, Tyler Hobbs  wrote:

> system_traces is for query tracing, which is for diagnosing performance
> problems, not logging activity.
>
> Cassandra is designed to allow you to write your own Authenticator pretty
> easily.  You can just subclass PasswordAuthenticator and add logging where
> desired.  Compile that into a jar, put it in the lib/ directory for
> Cassandra, and change cassandra.yaml to use that class.
>
> On Thu, Jan 8, 2015 at 6:34 AM, Ajay  wrote:
>
>> Hi,
>>
>> Is there a way to enable user audit or trace if we have enabled
>> PasswordAuthenticator in cassandra.yaml and set up the users as well. I
>> noticed there are keyspaces system_auth and system_trace. But there is no
>> way to find out which user initiated which session. Is there anyway to find
>> out?. Also is it recommended to enable system_trace in production or to
>> know how many sessions started by a user?
>>
>> Thanks
>> Ajay
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: Cassandra primary key design to cater range query

2015-01-09 Thread Ajay

Hi,

I read somewhere that the order of columns in the cluster key matters.
Please correct me if I am wrong.

For example,

PRIMARY KEY((prodgroup), status, productid).

Then the below query cannot run,

select * from product where prodgroup='xyz' and prodid > 0

But this query can be run:
select * from product where prodgroup='xyz' and prodid > 0 and status = 0

It means all the preceding part of the clustering key has to be provided in
the query. So with that, if you want to query "Get details of a specific
product"(either active or inactive), you might need to reorder the columns
like PRIMARY KEY((prodgroup), productid, status).

Thanks
Ajay


On Sat, Jan 10, 2015 at 6:03 AM, Tyler Hobbs  wrote:

> Your proposed model for the table to handle the last query looks good, so
> I would stick with that.
>
> On Mon, Jan 5, 2015 at 5:45 AM, Nagesh  wrote:
>
>> Hi All,
>>
>> I have designed a column family
>>
>> prodgroup text, prodid int, status int, , PRIMARY KEY ((prodgroup),
>> prodid, status)
>>
>> The data model is to cater
>>
>>- Get list of products from the product group
>>- get list of products for a given range of ids
>>- Get details of a specific product
>>- Update status of the product acive/inactive
>>- Get list of products that are active or inactive (select * from
>>product where prodgroup='xyz' and prodid > 0 and status = 0)
>>
>> The design works fine, except for the last query . Cassandra not allowing
>> to query on status unless I fix the product id. I think defining a super
>> column family which has the key "PRIMARY KEY((prodgroup), staus,
>> productid)" should work. Would like to get expert advice on other
>> alternatives.
>> --
>> Thanks,
>> Nageswara Rao.V
>>
>> *"The LORD reigns"*
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Performance difference between Regular Statement Vs PreparedStatement

2015-01-28 Thread Ajay

Hi All,

I tried both insert and select query (using QueryBuilder) in Regular
statement and PreparedStatement in a multithreaded code to do the query say
10k to 50k times. But I don't see any visible improvement using the
PreparedStatement. What could be the reason?

Note : I am using the same Session object in multiple threads.

Cassandra version : 2.0.11
Driver version : 2.1.4

Thanks
Ajay

Re: Performance difference between Regular Statement Vs PreparedStatement

2015-01-29 Thread Ajay

Thanks Eric. I didn't know the point about the token aware routing.

But with points 2 and 3 I didn't notice much improvement with prepared
statement.  I have 2 cassandra nodes running in virtual boxes in the same
machine and test client running in the same machine.

Thanks
Ajay
Prepared statements can take advantage of token aware routing which IIRC
non-prepared statements cannot in the DS Java Driver, so as your cluster
grows you reduce the overhead of statement coordination (assuming you use
token aware routing).  There should also be less data to transfer for
shipping the query (the CQL portion is shipped once during the prepare
stage, and only the data is shipped on subsequent executions).  You'll also
save the cluster the overhead of repeatedly parsing your CQL statements.

On Wed, Jan 28, 2015 at 11:50 PM, Ajay  wrote:

> Hi All,
>
> I tried both insert and select query (using QueryBuilder) in Regular
> statement and PreparedStatement in a multithreaded code to do the query say
> 10k to 50k times. But I don't see any visible improvement using the
> PreparedStatement. What could be the reason?
>
> Note : I am using the same Session object in multiple threads.
>
> Cassandra version : 2.0.11
> Driver version : 2.1.4
>
> Thanks
> Ajay
>

Pagination support on Java Driver Query API

2015-02-10 Thread Ajay

Hi,

I am working on exposing the Cassandra Query APIs(Java Driver) as REST APIs
for our internal project.

To support Pagination, I looked at the Cassandra documentation, Source code
and other forums.
What I mean by pagination support is like below:

1) Client fires query to REST server
2) Server prepares the statement, caches the query and return a query id
(unique id)
3) Get the query id, offset and limit and return the set of rows according
to the offset and limit and also return the last returned row offset.
4) Client make subsequent calls to the server with the offset returned by
the server until all rows are returned. In case once call fails or times
out, the client will make a call again.

Below are the details I found:

1) Java driver implicitly support Pagination in the ResultSet (using
Iterator) which can be controlled through FetchSize. But it is limited in a
way that we cannot skip or go previous. The FetchState is not exposed.

2) Using token() function on the clustering keys of the last returned row,
we can skip the returned rows and using the LIMIT keyword, we can limit the
number of rows. But the problem I see is that the token() function cannot
be used if the query contains ORDER BY clause.

Is there any other way to achieve the pagination support?

Thanks
Ajay

Re: Pagination support on Java Driver Query API

2015-02-10 Thread Ajay

Thanks Alex.

But is there any workaround possible?. I can't believe that everyone read &
process all rows at once (without pagination).

Thanks
Ajay
On Feb 10, 2015 11:46 PM, "Alex Popescu"  wrote:

>
> On Tue, Feb 10, 2015 at 4:59 AM, Ajay  wrote:
>
>> 1) Java driver implicitly support Pagination in the ResultSet (using
>> Iterator) which can be controlled through FetchSize. But it is limited in a
>> way that we cannot skip or go previous. The FetchState is not exposed.
>
>
> Cassandra doesn't support skipping so this is not really a limitation of
> the driver.
>
>
> --
>
> [:>-a)
>
> Alex Popescu
> Sen. Product Manager @ DataStax
> @al3xandru
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to java-driver-user+unsubscr...@lists.datastax.com.
>

Re: Pagination support on Java Driver Query API

2015-02-11 Thread Ajay

Hi Eric,

Thanks for your reply.

I am using Cassandra 2.0.11 and in that I cannot append condition like last
clustering key column > value of the last row in the previous batch. It
fails Preceding column is either not restricted or by a non-EQ relation. It
means I need to specify equal  condition for all preceding clustering key
columns. With this I cannot get the pagination correct.

Thanks
Ajay
> I can't believe that everyone read & process all rows at once (without
pagination).

Probably not too many people try to read all rows in a table as a single
rolling operation with a standard client driver.  But those who do would
use token() to keep track of where they are and be able to resume with that
as well.

But it sounds like you're talking about paginating a subset of data -
larger than you want to process as a unit, but prefiltered by some other
criteria which prevents you from being able to rely on token().  For this
there is no general purpose solution, but it typically involves you
maintaining your own paging state, typically keeping track of the last
partitioning and clustering key seen, and using that to construct your next
query.

For example, we have client queries which can span several partitioning
keys.  We make sure that the List of partition keys generated by a given
client query List(Pq) is deterministic, then our paging state is the index
offset of the final Pq in the response, plus the value of the final
clustering column.  A query coming in with a paging state attached to it
starts the next set of queries from the provided Pq offset where
clusteringKey > the provided value.

So if you can just track partition key offset (if spanning multiple
partitions), and clustering key offset, you can construct your next query
from those instead.

On Tue, Feb 10, 2015 at 6:58 PM, Ajay  wrote:

> Thanks Alex.
>
> But is there any workaround possible?. I can't believe that everyone read
> & process all rows at once (without pagination).
>
> Thanks
> Ajay
> On Feb 10, 2015 11:46 PM, "Alex Popescu"  wrote:
>
>>
>> On Tue, Feb 10, 2015 at 4:59 AM, Ajay  wrote:
>>
>>> 1) Java driver implicitly support Pagination in the ResultSet (using
>>> Iterator) which can be controlled through FetchSize. But it is limited in a
>>> way that we cannot skip or go previous. The FetchState is not exposed.
>>
>>
>> Cassandra doesn't support skipping so this is not really a limitation of
>> the driver.
>>
>>
>> --
>>
>> [:>-a)
>>
>> Alex Popescu
>> Sen. Product Manager @ DataStax
>> @al3xandru
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to java-driver-user+unsubscr...@lists.datastax.com.
>>
>

Re: Pagination support on Java Driver Query API

2015-02-11 Thread Ajay

Basically I am trying different queries with your approach.

One such query is like

Select * from mycf where condition on partition key order by ck1 asc, ck2
desc where ck1 and ck2 are clustering keys in that order.

Here how do we achieve pagination support?

Thanks
Ajay
On Feb 11, 2015 11:16 PM, "Ajay"  wrote:

>
> Hi Eric,
>
> Thanks for your reply.
>
> I am using Cassandra 2.0.11 and in that I cannot append condition like
> last clustering key column > value of the last row in the previous batch.
> It fails Preceding column is either not restricted or by a non-EQ relation.
> It means I need to specify equal  condition for all preceding clustering
> key columns. With this I cannot get the pagination correct.
>
> Thanks
> Ajay
> > I can't believe that everyone read & process all rows at once (without
> pagination).
>
> Probably not too many people try to read all rows in a table as a single
> rolling operation with a standard client driver.  But those who do would
> use token() to keep track of where they are and be able to resume with that
> as well.
>
> But it sounds like you're talking about paginating a subset of data -
> larger than you want to process as a unit, but prefiltered by some other
> criteria which prevents you from being able to rely on token().  For this
> there is no general purpose solution, but it typically involves you
> maintaining your own paging state, typically keeping track of the last
> partitioning and clustering key seen, and using that to construct your next
> query.
>
> For example, we have client queries which can span several partitioning
> keys.  We make sure that the List of partition keys generated by a given
> client query List(Pq) is deterministic, then our paging state is the
> index offset of the final Pq in the response, plus the value of the final
> clustering column.  A query coming in with a paging state attached to it
> starts the next set of queries from the provided Pq offset where
> clusteringKey > the provided value.
>
> So if you can just track partition key offset (if spanning multiple
> partitions), and clustering key offset, you can construct your next query
> from those instead.
>
> On Tue, Feb 10, 2015 at 6:58 PM, Ajay  wrote:
>
>> Thanks Alex.
>>
>> But is there any workaround possible?. I can't believe that everyone read
>> & process all rows at once (without pagination).
>>
>> Thanks
>> Ajay
>> On Feb 10, 2015 11:46 PM, "Alex Popescu"  wrote:
>>
>>>
>>> On Tue, Feb 10, 2015 at 4:59 AM, Ajay  wrote:
>>>
>>>> 1) Java driver implicitly support Pagination in the ResultSet (using
>>>> Iterator) which can be controlled through FetchSize. But it is limited in a
>>>> way that we cannot skip or go previous. The FetchState is not exposed.
>>>
>>>
>>> Cassandra doesn't support skipping so this is not really a limitation of
>>> the driver.
>>>
>>>
>>> --
>>>
>>> [:>-a)
>>>
>>> Alex Popescu
>>> Sen. Product Manager @ DataStax
>>> @al3xandru
>>>
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to java-driver-user+unsubscr...@lists.datastax.com.
>>>
>>
>

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Ajay

Thanks Eric. I figured out the same but didn't get time to put it on the
mail. Thanks.

But it is highly tied up to how data is stored internally in Cassandra.
Basically how partition keys are used to distribute (less likely to change.
We are not directly dependence on the partition algo) and clustering keys
are used to sort the data with in a partition( multi level sorting and
henceforth the restrictions on the ORDER BY clause) which I think can
change likely down the lane in Cassandra 3.x or 4.x in an different way for
some better storage or retrieval.

Thats said I am hesitant to implement this client side logic for pagination
for a) 2+ queries might need more than one query to Cassandra. b)  tied up
implementation to Cassandra internal storage details which can
change(though not often). c) in our case, we are building REST Apis which
will be deployed Tomcat clusters. Hence whatever we cache to support
pagination, need to be cached in a distributed way for failover support.

It (pagination support) is best done at the server side like ROWNUM in SQL
or better done in Java driver to hide the internal details and can be
optimized better as server sends the paging state with the driver.

Thanks
Ajay
On Feb 12, 2015 8:22 PM, "Eric Stevens"  wrote:

> Your page state then needs to track the last ck1 and last ck2 you saw.
> Pages 2+ will end up needing to be up to two queries if the first query
> doesn't fill the page size.
>
> CREATE TABLE foo (
>   partitionkey int,
>   ck1 int,
>   ck2 int,
>   col1 int,
>   col2 int,
>   PRIMARY KEY ((partitionkey), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);
>
> If you're pulling the whole of partition 1 and your page size is 2, your
> first page looks like:
>
> *PAGE 1*
>
> SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   1 |   3 |3 |3
> 1 |   1 |   2 |2 |2
>
> You got enough rows to satisfy the page, Your page state is taken from the
> last row: (ck1=1, ck2=2)
>
>
> *PAGE 2*
> Notice that you have a page state, and add some limiting clauses on the
> statement:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   1 |   1 |1 |1
>
> Oops, we didn't get enough rows to satisfy the page limit, so we need to
> continue on, we just need one more:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   2 |   3 |6 |6
>
> We have enough to satisfy page 2 now, our new page state: (ck1 = 2, ck2 =
> 3).
>
>
> *PAGE 3*
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 3 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   2 |   2 |5 |5
> 1 |   2 |   1 |4 |4
>
> Great, we satisfied this page with only one query, page state: (ck1 = 2,
> ck2 = 1).
>
>
> *PAGE 4*
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 1 LIMIT 2;
> (0 rows)
>
> Oops, our initial query was on the boundary of ck1, but this looks like
> any other time that the initial query returns < pageSize rows, we just move
> on to the next page:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 2 LIMIT 2;
> (0 rows)
>
> Aha, we've exhausted ck1 as well, so there are no more pages, page 3
> actually pulled the last possible value; page 4 is empty, and we're all
> done.  Generally speaking you know you're done when your first clustering
> key is the only non-equality operator in the statement, and you got no rows
> back.
>
>
>
>
>
>
> On Wed, Feb 11, 2015 at 10:55 AM, Ajay  wrote:
>
>> Basically I am trying different queries with your approach.
>>
>> One such query is like
>>
>> Select * from mycf where condition on partition key order by ck1 asc, ck2
>> desc where ck1 and ck2 are clustering keys in that order.
>>
>> Here how do we achieve pagination support?
>>
>&

Re: Pagination support on Java Driver Query API

2015-02-13 Thread Ajay

The syntax suggested by Ondrej is not working in some case in 2.0.11 and
logged an issue for the same.

https://issues.apache.org/jira/browse/CASSANDRA-8797

Thanks
Ajay
On Feb 12, 2015 11:01 PM, "Bulat Shakirzyanov" <
bulat.shakirzya...@datastax.com> wrote:

> Fixed my Mail.app settings so you can see my actual name, sorry.
>
> On Feb 12, 2015, at 8:55 AM, DataStax 
> wrote:
>
> Hello,
>
> As was mentioned earlier, the Java driver doesn’t actually perform
> pagination.
>
> Instead, it uses cassandra native protocol to set page size of the result
> set. (
> https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v2.spec#L699-L730
> )
> When Cassandra sends the result back to the java driver, it includes a
> some binary token.
> This token represents paging state. To fetch the next page, the driver
> re-executes the same
> statement with original page size and paging state attached. If there is
> another page available,
> Cassandra responds with a new paging state that can be used to fetch it.
>
> You could also try reporting this issue on the Cassandra user mailing list.
>
> On Feb 12, 2015, at 8:35 AM, Eric Stevens  wrote:
>
> I don't know what the shape of the page state data is deep inside the
> JavaDriver, I've actually tried to dig into that in the past and understand
> it to see if I could reproduce it as a general purpose any-query kind of
> thing.  I gave up before I fully understood it, but I think it's actually a
> handle to an in-memory state maintained by the coordinator, which is only
> maintained for the lifetime of the statement (i.e. it's not stateless
> paging). That would make it a bad candidate for stateless paging scenarios
> such as REST requests where a typical setup would load balance across HTTP
> hosts, never mind across coordinators.
>
> It shouldn't be too much work to abstract this basic idea for manual
> paging into a general purpose class that takes List[ClusteringKeyDef[T,
> O<:Ordering]], and can produce a connection agnostic PageState from a
> ResultSet or Row, or accepts a PageState to produce a WHERE CQL fragment.
>
>
>
> Also RE: possibly multiple queries to satisfy a page - yes, that's
> unfortunate.  Since you're on 2.0.11, see Ondřej's answer to avoid it.
>
> On Thu, Feb 12, 2015 at 8:13 AM, Ajay  wrote:
>
>> Thanks Eric. I figured out the same but didn't get time to put it on the
>> mail. Thanks.
>>
>> But it is highly tied up to how data is stored internally in Cassandra.
>> Basically how partition keys are used to distribute (less likely to change.
>> We are not directly dependence on the partition algo) and clustering keys
>> are used to sort the data with in a partition( multi level sorting and
>> henceforth the restrictions on the ORDER BY clause) which I think can
>> change likely down the lane in Cassandra 3.x or 4.x in an different way for
>> some better storage or retrieval.
>>
>> Thats said I am hesitant to implement this client side logic for
>> pagination for a) 2+ queries might need more than one query to Cassandra.
>> b)  tied up implementation to Cassandra internal storage details which can
>> change(though not often). c) in our case, we are building REST Apis which
>> will be deployed Tomcat clusters. Hence whatever we cache to support
>> pagination, need to be cached in a distributed way for failover support.
>>
>> It (pagination support) is best done at the server side like ROWNUM in
>> SQL or better done in Java driver to hide the internal details and can be
>> optimized better as server sends the paging state with the driver.
>>
>> Thanks
>> Ajay
>> On Feb 12, 2015 8:22 PM, "Eric Stevens"  wrote:
>>
>>> Your page state then needs to track the last ck1 and last ck2 you saw.
>>> Pages 2+ will end up needing to be up to two queries if the first query
>>> doesn't fill the page size.
>>>
>>> CREATE TABLE foo (
>>>   partitionkey int,
>>>   ck1 int,
>>>   ck2 int,
>>>   col1 int,
>>>   col2 int,
>>>   PRIMARY KEY ((partitionkey), ck1, ck2)
>>> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>>>
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
>>> INSERT INTO foo (partition

Caching the PreparedStatement (Java driver)

2015-02-27 Thread Ajay

Hi,

We are building REST APIs for Cassandra using the Cassandra Java Driver.

So as per the below guidlines from the documentation, we are caching the
Cluster instance (per cluster) and the Session instance (per keyspace) as
they are multi thread safe.
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/fourSimpleRules.html

As the Cluster and Session instance(s) are cached in the application
already and also as the PreparedStatement provide better performance, we
thought to build the PreparedStatement for REST query implicitly (as REST
calls are stateless) and cache the PreparedStatemen. Whenever a REST query
is invoked, we look for a PreparedStatement in the cache and create and put
it in the cache if it doesn't exists. (The cache is a in-memory fixed size
LRU based).

Is a safe approach to cache PreparedStatement in the client side?.  Looking
at the Java driver code, the Cluster class stores the PreparedStatements as
a weak reference (to rebuild when a node is down or a  new node added).

Thanks
Ajay

Re: Caching the PreparedStatement (Java driver)

2015-02-28 Thread Ajay

Hi,

My earlier question was whether it is safe to cache PreparedStatement
(using Java driver) in the client side for which I got it confirmed by
Olivier.

Now the question is do we really need to cache the PreparedStatement in the
client side?.

Lets take a scenario as below:

1) Client fires a REST query "SELECT * from Test where Pk = val1";
2) REST service prepares a statement "SELECT * from Test where Pk = ?"
3) Executes the PreparedStatement by setting the values.
4) Assume we don't cache the PreparedStatement
5) Client fires another REST query "SELECT * from Test where Pk = val2";
6) REST service prepares a statement "SELECT * from Test where Pk = ?"
7) Executes the PreparedStatement by setting the values.

In this case, is there any benefit of using the PreparedStatement?

From the Java driver code, the Session.prepare(query) doesn't check
whether a similar query was prepared earlier or not. It directly call the
server passing the query. The return from the server is a PreparedId. Do
the server maintains a cache of Prepared queries or it still perform the
all the steps to prepare a query if the client calls to prepare the same
query more than once (using the same Session and Cluster instance which I
think doesn't matter)?.

Thanks
Ajay

On Sat, Feb 28, 2015 at 9:17 AM, Ajay  wrote:

> Thanks Olivier.
>
> Most of the REST query calls would come from other applications to
> write/read to/from Cassandra which means most queries from an application
> would be same (same column families but different  values).
>
> Thanks
> Ajay
> On 28-Feb-2015 6:05 am, "Olivier Michallat" <
> olivier.michal...@datastax.com> wrote:
>
>> Hi Ajay,
>>
>> Yes, it is safe to hold a reference to PreparedStatement instances in
>> your client code. If you always run the same pre-defined statements, you
>> can store them as fields in your resource classes.
>>
>> If your statements are dynamically generated (for example, inserting
>> different subsets of the columns depending on what was provided in the REST
>> payload), your caching approach is valid. When you evict a
>> PreparedStatement from your cache, the driver will also remove the
>> corresponding id from its internal cache. If you re-prepare it later it
>> might still be in the Cassandra-side cache, but that is not a problem.
>>
>> One caveat: you should be reasonably confident that your prepared
>> statements will be reused. If your query strings are always different,
>> preparing will bring no advantage.
>>
>> --
>>
>> Olivier Michallat
>>
>> Driver & tools engineer, DataStax
>>
>> On Fri, Feb 27, 2015 at 7:04 PM, Ajay  wrote:
>>
>>> Hi,
>>>
>>> We are building REST APIs for Cassandra using the Cassandra Java Driver.
>>>
>>> So as per the below guidlines from the documentation, we are caching the
>>> Cluster instance (per cluster) and the Session instance (per keyspace) as
>>> they are multi thread safe.
>>>
>>> http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/fourSimpleRules.html
>>>
>>> As the Cluster and Session instance(s) are cached in the application
>>> already and also as the PreparedStatement provide better performance, we
>>> thought to build the PreparedStatement for REST query implicitly (as REST
>>> calls are stateless) and cache the PreparedStatemen. Whenever a REST query
>>> is invoked, we look for a PreparedStatement in the cache and create and put
>>> it in the cache if it doesn't exists. (The cache is a in-memory fixed size
>>> LRU based).
>>>
>>> Is a safe approach to cache PreparedStatement in the client side?.
>>> Looking at the Java driver code, the Cluster class stores the
>>> PreparedStatements as a weak reference (to rebuild when a node is down or
>>> a  new node added).
>>>
>>> Thanks
>>> Ajay
>>>
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to java-driver-user+unsubscr...@lists.datastax.com.
>>>
>>
>>  To unsubscribe from this group and stop receiving emails from it, send
>> an email to java-driver-user+unsubscr...@lists.datastax.com.
>>
>

Optimal Batch size (Unlogged) for Java driver

2015-03-01 Thread Ajay

Hi,

I am looking at a way to compute the optimal batch size in the client side
similar to the below mentioned bug in the server side (generic as we are
exposing REST APIs for Cassandra, the column family and the data are
different each request).

https://issues.apache.org/jira/browse/CASSANDRA-6487

How do we compute(approximately using ColumnDefintions or ColumnMetadata)
the size of a row of a column family from the client side using Cassandra
Java driver?

Thanks
Ajay

Re: Optimal Batch size (Unlogged) for Java driver

2015-03-02 Thread Ajay

I have a column family with 15 columns where there are timestamp,
timeuuid,  few text fields and rest int  fields.  If I calculate the size
of its column name  and it's value and divide 5kb (recommended max size for
batch) with the value,  I get result as 12. Is it correct?. Am I missing
something?

Thanks
Ajay
On 02-Mar-2015 12:13 pm, "Ankush Goyal"  wrote:

> Hi Ajay,
>
> I would suggest, looking at the approximate size of individual elements in
> the batch, and based on that compute max size (chunk size).
>
> Its not really a straightforward calculation, so I would further suggest
> making that chunk size a runtime parameter that you can tweak and play
> around with until you reach stable state.
>
> On Sunday, March 1, 2015 at 10:06:55 PM UTC-8, Ajay Garga wrote:
>>
>> Hi,
>>
>> I am looking at a way to compute the optimal batch size in the client
>> side similar to the below mentioned bug in the server side (generic as we
>> are exposing REST APIs for Cassandra, the column family and the data are
>> different each request).
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-6487
>> <https://www.google.com/url?q=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-6487&sa=D&sntz=1&usg=AFQjCNGOSliZnS1idXqTHXIr7aNfEN3mMg>
>>
>> How do we compute(approximately using ColumnDefintions or ColumnMetadata)
>> the size of a row of a column family from the client side using Cassandra
>> Java driver?
>>
>> Thanks
>> Ajay
>>
>  To unsubscribe from this group and stop receiving emails from it, send an
> email to java-driver-user+unsubscr...@lists.datastax.com.
>

Re: Optimal Batch size (Unlogged) for Java driver

2015-03-02 Thread Ajay

Hi Ankush,

We are already using Prepared statement and our case is a time series data
as well.

Thanks
Ajay
On 02-Mar-2015 10:00 pm, "Ankush Goyal"  wrote:

> Ajay,
>
> First of all, I would recommend using PreparedStatements, so you only
> would be sending the variable bound arguments over the wire. Second, I
> think that 5kb limit for WARN is too restrictive, and you could tune that
> on cassandra server side. I think if all you have is 15 columns (as long as
> their values are sanitized and do not go over certain limits), it should be
> fine to send all of them over at the same time. Chunking is necessary, when
> you have time-series type data (for writes) OR you might be reading a lot
> of data via IN query.
>
> On Monday, March 2, 2015 at 7:55:18 AM UTC-8, Ajay Garga wrote:
>>
>> I have a column family with 15 columns where there are timestamp,
>> timeuuid,  few text fields and rest int  fields.  If I calculate the size
>> of its column name  and it's value and divide 5kb (recommended max size for
>> batch) with the value,  I get result as 12. Is it correct?. Am I missing
>> something?
>>
>> Thanks
>> Ajay
>> On 02-Mar-2015 12:13 pm, "Ankush Goyal"  wrote:
>>
>>> Hi Ajay,
>>>
>>> I would suggest, looking at the approximate size of individual elements
>>> in the batch, and based on that compute max size (chunk size).
>>>
>>> Its not really a straightforward calculation, so I would further suggest
>>> making that chunk size a runtime parameter that you can tweak and play
>>> around with until you reach stable state.
>>>
>>> On Sunday, March 1, 2015 at 10:06:55 PM UTC-8, Ajay Garga wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am looking at a way to compute the optimal batch size in the client
>>>> side similar to the below mentioned bug in the server side (generic as we
>>>> are exposing REST APIs for Cassandra, the column family and the data are
>>>> different each request).
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-6487
>>>> <https://www.google.com/url?q=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-6487&sa=D&sntz=1&usg=AFQjCNGOSliZnS1idXqTHXIr7aNfEN3mMg>
>>>>
>>>> How do we compute(approximately using ColumnDefintions or
>>>> ColumnMetadata) the size of a row of a column family from the client side
>>>> using Cassandra Java driver?
>>>>
>>>> Thanks
>>>> Ajay
>>>>
>>>  To unsubscribe from this group and stop receiving emails from it, send
>>> an email to java-driver-us...@lists.datastax.com.
>>>
>>  To unsubscribe from this group and stop receiving emails from it, send
> an email to java-driver-user+unsubscr...@lists.datastax.com.
>

Adding a Cassandra node using OpsCenter

2015-03-11 Thread Ajay

Hi,

While adding a Cassandra node using OpsCenter (which is recommended), the
versions of Cassandra (Datastax community edition) shows only 2.0.9 and not
later versions in 2.0.x. Is there a reason behind it? 2.0.9 is recommended
than 2.0.11?

Thanks
Ajay

Steps to do after schema changes

2015-03-11 Thread Ajay

Hi,

Are there any steps to do (like nodetool or restart node) or any
precautions after schema changes are done in a column family say adding a
new column or modifying any table properties?

Thanks
Ajay

Re: Adding a Cassandra node using OpsCenter

2015-03-12 Thread Ajay

Is there a separate forum for Opscenter?

Thanks
Ajay
On 11-Mar-2015 4:16 pm, "Ajay"  wrote:

> Hi,
>
> While adding a Cassandra node using OpsCenter (which is recommended), the
> versions of Cassandra (Datastax community edition) shows only 2.0.9 and not
> later versions in 2.0.x. Is there a reason behind it? 2.0.9 is recommended
> than 2.0.11?
>
> Thanks
> Ajay
>

Re: Stable cassandra build for production usage

2015-03-12 Thread Ajay

Hi,

We did our research using 2.0.11 version. While preparing for the
production deployment, found out the following issues:

1) 2.0.12 has nodetool cleanup issue -
https://issues.apache.org/jira/browse/CASSANDRA-8718
2) 2.0.11 has nodetool issue -
https://issues.apache.org/jira/browse/CASSANDRA-8548
3) OpsCenter 5.1.0 supports only - 2.0.9 and not later 2.0.x -
https://issues.apache.org/jira/browse/CASSANDRA-8072
4) 2.0.9 has schema refresh issue -
https://issues.apache.org/jira/browse/CASSANDRA-7734

Please suggest what is the best option in this for production deployment in
EC2 given that we are deploying Cassandra cluster for the 1st time (so
likely that we add more data centers/nodes and schema changes in the
initial few months)

Thanks
Ajay

On Thu, Jan 1, 2015 at 9:49 PM, Neha Trivedi  wrote:

> Use 2.0.11 for production
>
> On Wed, Dec 31, 2014 at 11:50 PM, Robert Coli 
> wrote:
>
>> On Wed, Dec 31, 2014 at 8:38 AM, Ajay  wrote:
>>
>>> For my research and learning I am using Cassandra 2.1.2. But I see
>>> couple of mail threads going on issues in 2.1.2. So what is the stable or
>>> popular build for production in Cassandra 2.x series.
>>>
>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>
>> =Rob
>>
>
>

Re: Steps to do after schema changes

2015-03-12 Thread Ajay

Thanks Mark.

-
Ajay
On 12-Mar-2015 11:08 pm, "Mark Reddy"  wrote:

> It's always good to run "nodetool describecluster" after a schema change,
> this will show you all the nodes in your cluster and what schema version
> they have. If they have different versions you have a schema disagreement
> and should follow this guide to resolution:
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_handle_schema_disagree_t.html
>
> Regards,
> Mark
>
> On 12 March 2015 at 05:47, Phil Yang  wrote:
>
>> Usually, you have nothing to do. Changes will be synced to every nodes
>> automatically.
>>
>> 2015-03-12 13:21 GMT+08:00 Ajay :
>>
>>> Hi,
>>>
>>> Are there any steps to do (like nodetool or restart node) or any
>>> precautions after schema changes are done in a column family say adding a
>>> new column or modifying any table properties?
>>>
>>> Thanks
>>> Ajay
>>>
>>
>>
>>
>> --
>> Thanks,
>> Phil Yang
>>
>>
>

Re: Adding a Cassandra node using OpsCenter

2015-03-12 Thread Ajay

Thanks Nick.

Does it mean that only adding a new node with 2.0.10 or later is a
problem?. If a new node added manually can be monitored from Opscenter?

Thanks
Ajay
On 12-Mar-2015 10:19 pm, "Nick Bailey"  wrote:

> There isn't an OpsCenter specific mailing list no.
>
> To answer your question, the reason OpsCenter provisioning doesn't support
> 2.0.10 and 2.0.11 is due to
> https://issues.apache.org/jira/browse/CASSANDRA-8072.
>
> That bug unfortunately prevents OpsCenter provisioning from working
> correctly, but isn't serious outside of provisioning. OpsCenter may be able
> to come up with a workaround but at the moment those versions are
> unsupported. Sorry for inconvenience.
>
> -Nick
>
> On Thu, Mar 12, 2015 at 9:18 AM, Ajay  wrote:
>
>> Is there a separate forum for Opscenter?
>>
>> Thanks
>> Ajay
>> On 11-Mar-2015 4:16 pm, "Ajay"  wrote:
>>
>>> Hi,
>>>
>>> While adding a Cassandra node using OpsCenter (which is recommended),
>>> the versions of Cassandra (Datastax community edition) shows only 2.0.9 and
>>> not later versions in 2.0.x. Is there a reason behind it? 2.0.9 is
>>> recommended than 2.0.11?
>>>
>>> Thanks
>>> Ajay
>>>
>>
>

Re: Stable cassandra build for production usage

2015-03-17 Thread Ajay

Hi,

Now that 2.0.13 is out, I don't see nodetool cleanup issue(
https://issues.apache.org/jira/browse/CASSANDRA-8718) been fixed yet. The
bug show priority Minor. Anybody facing this issue?.

Thanks
Ajay

On Thu, Mar 12, 2015 at 11:41 PM, Robert Coli  wrote:

> On Thu, Mar 12, 2015 at 10:50 AM, Ajay  wrote:
>
>> Please suggest what is the best option in this for production deployment
>> in EC2 given that we are deploying Cassandra cluster for the 1st time (so
>> likely that we add more data centers/nodes and schema changes in the
>> initial few months)
>>
>
> Voting for 2.0.13 is in process. I'd wait for that. But I don't need
> OpsCenter.
>
> =Rob
>
>

Re: Stable cassandra build for production usage

2015-03-17 Thread Ajay

Yes we see https://issues.apache.org/jira/browse/CASSANDRA-8716 in our
testing

Thanks
Ajay

On Tue, Mar 17, 2015 at 3:20 PM, Marcus Eriksson  wrote:

> Do you see the segfault or do you see
> https://issues.apache.org/jira/browse/CASSANDRA-8716 ?
>
> On Tue, Mar 17, 2015 at 10:34 AM, Ajay  wrote:
>
>> Hi,
>>
>> Now that 2.0.13 is out, I don't see nodetool cleanup issue(
>> https://issues.apache.org/jira/browse/CASSANDRA-8718) been fixed yet.
>> The bug show priority Minor. Anybody facing this issue?.
>>
>> Thanks
>> Ajay
>>
>> On Thu, Mar 12, 2015 at 11:41 PM, Robert Coli 
>> wrote:
>>
>>> On Thu, Mar 12, 2015 at 10:50 AM, Ajay  wrote:
>>>
>>>> Please suggest what is the best option in this for production
>>>> deployment in EC2 given that we are deploying Cassandra cluster for the 1st
>>>> time (so likely that we add more data centers/nodes and schema changes in
>>>> the initial few months)
>>>>
>>>
>>> Voting for 2.0.13 is in process. I'd wait for that. But I don't need
>>> OpsCenter.
>>>
>>> =Rob
>>>
>>>
>>
>>
>

When to use STCS/DTCS/LCS

2015-04-08 Thread Ajay

Hi,

What are the guidelines on when to use STCS/DTCS/LCS?. Most preferred way
to test it with each of them and find the best fit. But is there some
guidelines or best practices (out of experience) which one to use when?

Thanks
Ajay

Re: Availability testing of Cassandra nodes

2015-04-08 Thread Ajay

Adding Java driver forum.

Even we like to know more on this.

-
Ajay

On Wed, Apr 8, 2015 at 8:15 PM, Jack Krupansky 
wrote:

> Just a couple of quick comments:
>
> 1. The driver is supposed to be doing availability and load balancing
> already.
> 2. If your cluster is lightly loaded, it isn't necessary to be so precise
> with load balancing.
> 3. If your cluster is heavily loaded, it won't help. Solution is to expand
> your cluster so that precise balancing of requests (beyond what the driver
> does) is not required.
>
> Is there anything special about your use case that you feel is worth the
> extra treatment?
>
> If you are having problems with the driver balancing requests and properly
> detecting available nodes or see some room for improvement, make sure to
> the issues so that they can be fixed.
>
>
> -- Jack Krupansky
>
> On Wed, Apr 8, 2015 at 10:31 AM, Jiri Horky  wrote:
>
>> Hi all,
>>
>> we are thinking of how to best proceed with availability testing of
>> Cassandra nodes. It is becoming more and more apparent that it is rather
>> complex task. We thought that we should try to read and write to each
>> cassandra node to "monitoring" keyspace with a unique value with low
>> TTL. This helps to find an issue but it also triggers flapping of
>> unaffected hosts, as the key of the value which is beining inserted
>> sometimes belongs to an affected host and sometimes not. Now, we could
>> calculate the right value to insert so we can be sure it will hit the
>> host we are connecting to, but then, you have replication factor and
>> consistency level, so you can not be really sure that it actually tests
>> ability of the given host to write values.
>>
>> So we ended up thinking that the best approach is to connect to each
>> individual host, read some system keyspace (which might be on a
>> different disk drive...), which should be local, and then check several
>> JMX values that could indicate an error + JVM statitics (full heap, gc
>> overhead). Moreover, we will more monitor our applications that are
>> using cassandra (with mostly datastax driver) and try to get fail node
>> information from them.
>>
>> How others do the testing?
>>
>> Jirka H.
>>
>
>

Hive support on Cassandra

2015-05-05 Thread Ajay

Hi,

Does Apache Cassandra (not DSE) support Hive Integration?

I found couple of open source efforts but nothing is available currently.

Thanks
Ajay

Re: Hive support on Cassandra

2015-05-07 Thread Ajay

Thanks everyone.

Basically we are looking at Hive because it supports advanced queries (CQL
is limited to the data model).

Does Stratio supports similar to Hive?

Thanks
Ajay


On Thu, May 7, 2015 at 10:33 PM, Andres de la Peña 
wrote:

> You may also find interesting https://github.com/Stratio/crossdata. This
> project provides batch and streaming capabilities for Cassandra and others
> databases though a SQL-like language.
>
> Disclaimer: I am an employee of Stratio
>
> 2015-05-07 17:29 GMT+02:00 :
>
>> You might also look at Apache Drill, which has support (I think alpha)
>> for ANSI SQL queries against Cassandra if that would suit your needs.
>>
>>
>> > On May 6, 2015, at 12:57 AM, Ajay  wrote:
>> >
>> > Hi,
>> >
>> > Does Apache Cassandra (not DSE) support Hive Integration?
>> >
>> > I found couple of open source efforts but nothing is available
>> currently.
>> >
>> > Thanks
>> > Ajay
>>
>>
>>
>
>
> --
>
> Andrés de la Peña
>
>
> <http://www.stratio.com/>
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>

Re: Caching the PreparedStatement (Java driver)

2015-05-15 Thread Ajay

Hi Joseph,

Java driver currently caches the prepared statements but using a weak
reference i.e the cache will hold it as long the client code uses it. So in
turn means that we need to cache the same.

But I am also not sure of what happens when a cached prepared statement is
executed after cassandra nodes restart. Does the server prepared statements
cache is persisted or in memory?. If it is in memory, how do we handle
stale prepared statement in the cache?

Thanks
Ajay


On Fri, May 15, 2015 at 6:28 PM, ja  wrote:

> Hi,
>
> Isn't it a good to have feature for the java driver to maintain a cache of
> PreparedStatements (PS) . Any reason why it's left to the application to do
> the same? . I am currently implementing a cache of PS that is loaded at app
> startup, but how do i ensure this cache is always good to use? . Say,
> there's a restart on the Cassandra server side, this cache would be stale
> and I assume the next use of a PS from cache would fail. Any way to recover
> from this.
>
> Thanks,
> Joseph
>
> On Sunday, March 1, 2015 at 12:46:14 AM UTC+5:30, Vishy Kasar wrote:
>>
>>
>> On Feb 28, 2015, at 4:25 AM, Ajay  wrote:
>>
>> Hi,
>>
>> My earlier question was whether it is safe to cache PreparedStatement
>> (using Java driver) in the client side for which I got it confirmed by
>> Olivier.
>>
>> Now the question is do we really need to cache the PreparedStatement in
>> the client side?.
>>
>> Lets take a scenario as below:
>>
>> 1) Client fires a REST query "SELECT * from Test where Pk = val1";
>> 2) REST service prepares a statement "SELECT * from Test where Pk = ?"
>> 3) Executes the PreparedStatement by setting the values.
>> 4) Assume we don't cache the PreparedStatement
>> 5) Client fires another REST query "SELECT * from Test where Pk = val2";
>> 6) REST service prepares a statement "SELECT * from Test where Pk = ?"
>> 7) Executes the PreparedStatement by setting the values.
>>
>>
>> You should avoid re-preparing the statement (step 6 above). When you
>> create a prepared statement, a round trip to server is involved. So you
>> should create it once and reuse it. You can bind it with different values
>> and execute the bound statement each time.
>>
>> In this case, is there any benefit of using the PreparedStatement?
>>
>> From the Java driver code, the Session.prepare(query) doesn't check
>> whether a similar query was prepared earlier or not. It directly call the
>> server passing the query. The return from the server is a PreparedId. Do
>> the server maintains a cache of Prepared queries or it still perform the
>> all the steps to prepare a query if the client calls to prepare the same
>> query more than once (using the same Session and Cluster instance which I
>> think doesn't matter)?.
>>
>> Thanks
>> Ajay
>>
>>
>> On Sat, Feb 28, 2015 at 9:17 AM, Ajay  wrote:
>>
>>> Thanks Olivier.
>>>
>>> Most of the REST query calls would come from other applications to
>>> write/read to/from Cassandra which means most queries from an application
>>> would be same (same column families but different  values).
>>>
>>> Thanks
>>> Ajay
>>> On 28-Feb-2015 6:05 am, "Olivier Michallat" 
>>> wrote:
>>>
>>>> Hi Ajay,
>>>>
>>>> Yes, it is safe to hold a reference to PreparedStatement instances in
>>>> your client code. If you always run the same pre-defined statements, you
>>>> can store them as fields in your resource classes.
>>>>
>>>> If your statements are dynamically generated (for example, inserting
>>>> different subsets of the columns depending on what was provided in the REST
>>>> payload), your caching approach is valid. When you evict a
>>>> PreparedStatement from your cache, the driver will also remove the
>>>> corresponding id from its internal cache. If you re-prepare it later it
>>>> might still be in the Cassandra-side cache, but that is not a problem.
>>>>
>>>> One caveat: you should be reasonably confident that your prepared
>>>> statements will be reused. If your query strings are always different,
>>>> preparing will bring no advantage.
>>>>
>>>> --
>>>> Olivier Michallat
>>>> Driver & tools engineer, DataStax
>>>>
>>>> On Fri, Feb 27, 2015 at 7:04 PM, Ajay  wrote:
>>>>
>>>>> Hi,
>>>>>
>>>&

Hbase vs Cassandra

2015-05-29 Thread Ajay

Hi,

I need some info on Hbase vs Cassandra as a data store (in general plus
specific to time series data).

The comparison in the following helps:
1: features
2: deployment and monitoring
3: performance
4: anything else

Thanks
Ajay

Re: Hbase vs Cassandra

2015-06-08 Thread Ajay

any supporting (and promoting) it.
* Getting started is easier with Cassandra. For HBase you need to run HDFS
and Zookeeper, etc.
* I've heard lots of anecdotes about Cassandra working nicely with small
cluster (< 50 nodes) and quick degenerating above that.
* HBase does not have a query language (but you can use Phoenix for full
SQL support)
* HBase does not have secondary indexes (having an eventually consistent
index, similar to what Cassandra has, is easy in HBase, but making it as
consistent as the rest of HBase is hard)

Thanks
Ajay

>
> On May 29, 2015, at 12:09 PM, Ajay  wrote:
>
> Hi,
>
> I need some info on Hbase vs Cassandra as a data store (in general plus
> specific to time series data).
>
> The comparison in the following helps:
> 1: features
> 2: deployment and monitoring
> 3: performance
> 4: anything else
>
> Thanks
> Ajay
>
>

Re: Hbase vs Cassandra

2015-06-08 Thread Ajay

Hi Jens,

All the points listed weren't from me. I posted the HBase Vs Cassandra in
both the forums and consolidated here for the discussion.


On Mon, Jun 8, 2015 at 2:27 PM, Jens Rantil  wrote:

> Hi,
>
> Some minor comments:
>
> > 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
> for Cassandra but it doesn't support vnodes.
>
> Not entirely sure what you mean here, but we ran Cloudera for a while and
> Cloudera Manager was buggy and hard to debug. Overall, our experience
> wasn't very good. This was definitely also due to us not knowing how all
> the Cloudera packages were configured.
>

*>>> This is the one of the response I got it from HBase forum. Datastax
OpsCenter is there but seems it doesn't support the latest Cassandra
versions (we tried it couple of times and there were bugs too)*

>
> > HBase is always consistent. Machine outages lead to inability to read
> or write data on that machine. With Cassandra you can always write.
>
> Sort of true. You can decide write consistency and throw an exception if
> write didn't go through consistently. However, do note that Cassandra will
> never rollback failed writes which means writes aren't atomic (as in ACID).
>
> *>>> If I understand correctly, you mean when we write with QUORUM and
Cassandra writes to few machines and fails to write to few machines and
throws exception if it doesn't satisfy QUORUM, leaving it inconsistent and
doesn't rollback?. *


> We chose Cassandra over HBase mostly due to ease of managability. We are a
> small team, and my feeling is that you will want dedicated people taking
> care of a Hadoop cluster if you are going down the HBase path. A Cassandra
> cluster can be handled by a single engineer and is, in my opinion, easier
> to maintain.
>

*>>> This is the most popular reason for Cassandra over HBase. But this
alone is not a sufficient driver. *


> Cheers,
> Jens
>
> On Mon, Jun 8, 2015 at 9:59 AM, Ajay  wrote:
>
>> Hi All,
>>
>> Thanks for all the input. I posted the same question in HBase forum and
>> got more response.
>>
>> Posting the consolidated list here.
>>
>> Our case is that a central team builds and maintain the platform
>> (Cassandra as a service). We have couple of usecases which fits Cassandra
>> like time-series data. But as a platform team, we need to know more
>> features and usecases which fits or best handled in Cassandra. Also to
>> understand the usecases where HBase performs better (we might need to have
>> it as a service too).
>>
>> *Cassandra:*
>>
>> 1) From 2013 both can still be relevant:
>> http://www.pythian.com/blog/watch-hbase-vs-cassandra/
>>
>> 2) Here are some use cases from PlanetCassandra.org of companies who
>> chose Cassandra over HBase after evaluation, or migrated to Cassandra from
>> HBase.
>> The eComNext interview cited on the page touches on time-series data;
>> http://planetcassandra.org/hbase-to-cassandra-migration/
>>
>> 3) From googling, the most popular advantages for Cassandra over HBase is
>> easy to deploy, maintain & monitor and no single point of failure.
>>
>> 4) From our six months research and POC experience in Cassandra, CQL is
>> pretty limited. Though CQL is targeted for Real time Read and Write, there
>> are cases where need to pull out data differently and we are OK with little
>> more latency. But Cassandra doesn't support that. We need MapReduce or
>> Spark for those. Then the debate starts why Cassandra and why not HBase if
>> we need Hadoop/Spark for MapReduce.
>>
>> Expected a few more technical features/usecases that is best handled by
>> Cassandra (and how it works).
>>
>> *HBase:*
>>
>> 1) As for the #4 you might be interested in reading
>> https://aphyr.com/posts/294-call-me-maybe-cassandra
>> Not sure if there is comparable article about HBase (anybody knows?) but
>> it can give you another perspective about what else to keep an eye on
>> regarding these systems.
>>
>> 2) See http://hbase.apache.org/book.html#perf.network.call_me_maybe
>>
>> 3) http://blog.parsely.com/post/1928/cass/
>> *Anyone have any comments on this?*
>>
>> 4) 1. No killer features comparing to hbase
>> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
>> for Cassandra but it doesn't support vnodes.
>> 3. Rumors say it fast when it works;) the reason- it can silently drop
>> data you try to write.
>> 4. Timeseries is a nightmare. The easiest approach is just replicate data
>> to hdfs, partition it by hour/day and

Re: Basic query in setting up secure inter-dc cluster

2016-04-25 Thread Ajay Garg

Hi Everyone.

Kindly reply in "yes" or "no", as to whether it is possible to setup
encryption only between particular pair of nodes?
Or is it an "all" or "none" feature, where encryption is present between
EVERY PAIR of nodes, or in NO PAIR of nodes.


Thanks and Regards,
Ajay

On Mon, Apr 18, 2016 at 9:55 AM, Ajay Garg  wrote:

> Also, wondering what is the difference between "all" and "dc" in
> "internode_encryption".
> Perhaps my answer lies in this?
>
> On Mon, Apr 18, 2016 at 9:51 AM, Ajay Garg  wrote:
>
>> Ok, trying to wake up this thread again.
>>
>> I went through the following links ::
>>
>>
>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>
>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html
>>
>>
>> and I am wondering *if it is possible to setup secure
>> inter-communication only between some nodes*.
>>
>> In particular, if I have a 2*2 cluster, is it possible to setup secure
>> communication ONLY between the nodes of DC2?
>> Once it works well, we would then setup secure-communication everywhere.
>>
>> We are wanting this, because DC2 is the backup centre, while DC1 is the
>> primary-centre connected directly to the application-server. We don't want
>> to screw things if something goes bad in DC1.
>>
>>
>> Will be grateful for pointers.
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>> On Sun, Jan 17, 2016 at 9:09 PM, Ajay Garg 
>> wrote:
>>
>>> Hi All.
>>>
>>> A gentle query-reminder.
>>>
>>> I will be grateful if I could be given a brief technical overview, as to
>>> how secure-communication occurs between two nodes in a cluster.
>>>
>>> Please note that I wish for some information on the "how it works below
>>> the hood", and NOT "how to set it up".
>>>
>>>
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>> On Wed, Jan 6, 2016 at 4:16 PM, Ajay Garg 
>>> wrote:
>>>
>>>> Thanks everyone for the reply.
>>>>
>>>> I actually have a fair bit of questions, but it will be nice if someone
>>>> could please tell me the flow (implementation-wise), as to how node-to-node
>>>> encryption works in a cluster.
>>>>
>>>> Let's say node1 from DC1, wishes to talk securely to node 2 from DC2
>>>> (with *"require_client_auth: false*").
>>>> I presume it would be like below (please correct me if am wrong) ::
>>>>
>>>> a)
>>>> node1 tries to connect to node2, using the certificate *as defined on
>>>> node1* in cassandra.yaml.
>>>>
>>>> b)
>>>> node2 will confirm if the certificate being offered by node1 is in the
>>>> truststore *as defined on node2* in cassandra.yaml.
>>>> if it is, secure-communication is allowed.
>>>>
>>>>
>>>> Is my thinking right?
>>>> I
>>>>
>>>> On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave 
>>>> wrote:
>>>>
>>>>> Hi Ajay,
>>>>> Have a look here :
>>>>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>>>>
>>>>> You can configure for DC level Security:
>>>>>
>>>>> Procedure
>>>>>
>>>>> On each node under sever_encryption_options:
>>>>>
>>>>>- Enable internode_encryption.
>>>>>The available options are:
>>>>>   - all
>>>>>   - none
>>>>>   - dc: Cassandra encrypts the traffic between the data centers.
>>>>>   - rack: Cassandra encrypts the traffic between the racks.
>>>>>
>>>>> regards
>>>>>
>>>>> Neha
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet <
>>>>> absi...@informatica.com> wrote:
>>>>>
>>>>>> Security is a very wide concept. What exactly do you want to achieve ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>>>>>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>>>>>> *To:* user@cassandra.apache.org
>>>>>> *Subject:* Basic query in setting up secure inter-dc cluster
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi All.
>>>>>>
>>>>>> We have a 2*2 cluster deployed, but no security as of now.
>>>>>>
>>>>>> As a first stage, we wish to implement inter-dc security.
>>>>>>
>>>>>> Is it possible to enable security one machine at a time?
>>>>>>
>>>>>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>>>>>
>>>>>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>>>>>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>>>>>> AFTER the changes are made in all the 4 machines?
>>>>>>
>>>>>> Asking here, because I don't want to screw up a live cluster due to
>>>>>> my lack of experience.
>>>>>>
>>>>>> Looking forward to some pointers.
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Regards,
>>>>>> Ajay
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay

Strange issue wherein cassandra not being started from cron

2017-01-09 Thread Ajay Garg

Hi All.

Facing a very weird issue, wherein the command

*/etc/init.d/cassandra start*

causes cassandra to start when the command is run from command-line.


However, if I put the above as a cron job



** * * * * /etc/init.d/cassandra start*
cassandra never starts.


I have checked, and "cron" service is running.


Any ideas what might be wrong?
I am pasting the cassandra script for brevity.


Thanks and Regards,
Ajay



#! /bin/sh
### BEGIN INIT INFO
# Provides:  cassandra
# Required-Start:$remote_fs $network $named $time
# Required-Stop: $remote_fs $network $named $time
# Should-Start:  ntp mdadm
# Should-Stop:   ntp mdadm
# Default-Start: 2 3 4 5
# Default-Stop:  0 1 6
# Short-Description: distributed storage system for structured data
# Description:   Cassandra is a distributed (peer-to-peer) system for
#the management and storage of structured data.
### END INIT INFO

# Author: Eric Evans 

DESC="Cassandra"
NAME=cassandra
PIDFILE=/var/run/$NAME/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
CONFDIR=/etc/cassandra
WAIT_FOR_START=10
CASSANDRA_HOME=/usr/share/cassandra
FD_LIMIT=10

[ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
[ -e /etc/cassandra/cassandra.yaml ] || exit 0
[ -e /etc/cassandra/cassandra-env.sh ] || exit 0

# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME

# Read Cassandra environment file.
. /etc/cassandra/cassandra-env.sh

if [ -z "$JVM_OPTS" ]; then
echo "Initialization failed; \$JVM_OPTS not set!" >&2
exit 3
fi

export JVM_OPTS

# Export JAVA_HOME, if set.
[ -n "$JAVA_HOME" ] && export JAVA_HOME

# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh

# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions

#
# Function that returns 0 if process is running, or nonzero if not.
#
# The nonzero value is 3 if the process is simply not running, and 1 if the
# process is not running but the pidfile exists (to match the exit codes for
# the "status" command; see LSB core spec 3.1, section 20.2)
#
CMD_PATT="cassandra.+CassandraDaemon"
is_running()
{
if [ -f $PIDFILE ]; then
pid=`cat $PIDFILE`
grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return 0
return 1
fi
return 3
}
#
# Function that starts the daemon/service
#
do_start()
{
# Return
#   0 if daemon has been started
#   1 if daemon was already running
#   2 if daemon could not be started

ulimit -l unlimited
ulimit -n "$FD_LIMIT"

cassandra_home=`getent passwd cassandra | awk -F ':' '{ print $6; }'`
heap_dump_f="$cassandra_home/java_`date +%s`.hprof"
error_log_f="$cassandra_home/hs_err_`date +%s`.log"

[ -e `dirname "$PIDFILE"` ] || \
install -d -ocassandra -gcassandra -m755 `dirname $PIDFILE`



start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -q -p
"$PIDFILE" -t >/dev/null || return 1

start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p
"$PIDFILE" -- \
-p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null ||
return 2

}

#
# Function that stops the daemon/service
#
do_stop()
{
# Return
#   0 if daemon has been stopped
#   1 if daemon was already stopped
#   2 if daemon could not be stopped
#   other if a failure occurred
start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
RET=$?
rm -f "$PIDFILE"
return $RET
}

case "$1" in
  start)
[ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
do_start
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
  stop)
[ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
do_stop
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
  restart|force-reload)
log_daemon_msg "Restarting $DESC" "$NAME"
do_stop
case "$?" in
  0|1)
do_start
case "$?" in
  0|1)
do_start
case "$?" in
0) log_end_msg 0 ;;
1) log

Re: Strange issue wherein cassandra not being started from cron

2017-01-11 Thread Ajay Garg

Tried everything.
Every other cron job/script I try works, just the cassandra-service does
not.

On Wed, Jan 11, 2017 at 8:51 AM, Edward Capriolo 
wrote:

>
>
> On Tuesday, January 10, 2017, Jonathan Haddad  wrote:
>
>> Last I checked, cron doesn't load the same, full environment you see when
>> you log in. Also, why put Cassandra on a cron?
>> On Mon, Jan 9, 2017 at 9:47 PM Bhuvan Rawal  wrote:
>>
>>> Hi Ajay,
>>>
>>> Have you had a look at cron logs? - mine is in path /var/log/cron
>>>
>>> Thanks & Regards,
>>>
>>> On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg 
>>> wrote:
>>>
>>>> Hi All.
>>>>
>>>> Facing a very weird issue, wherein the command
>>>>
>>>> */etc/init.d/cassandra start*
>>>>
>>>> causes cassandra to start when the command is run from command-line.
>>>>
>>>>
>>>> However, if I put the above as a cron job
>>>>
>>>>
>>>>
>>>> ** * * * * /etc/init.d/cassandra start*
>>>> cassandra never starts.
>>>>
>>>>
>>>> I have checked, and "cron" service is running.
>>>>
>>>>
>>>> Any ideas what might be wrong?
>>>> I am pasting the cassandra script for brevity.
>>>>
>>>>
>>>> Thanks and Regards,
>>>> Ajay
>>>>
>>>>
>>>> 
>>>> 
>>>> #! /bin/sh
>>>> ### BEGIN INIT INFO
>>>> # Provides:  cassandra
>>>> # Required-Start:$remote_fs $network $named $time
>>>> # Required-Stop: $remote_fs $network $named $time
>>>> # Should-Start:  ntp mdadm
>>>> # Should-Stop:   ntp mdadm
>>>> # Default-Start: 2 3 4 5
>>>> # Default-Stop:  0 1 6
>>>> # Short-Description: distributed storage system for structured data
>>>> # Description:   Cassandra is a distributed (peer-to-peer) system
>>>> for
>>>> #the management and storage of structured data.
>>>> ### END INIT INFO
>>>>
>>>> # Author: Eric Evans 
>>>>
>>>> DESC="Cassandra"
>>>> NAME=cassandra
>>>> PIDFILE=/var/run/$NAME/$NAME.pid
>>>> SCRIPTNAME=/etc/init.d/$NAME
>>>> CONFDIR=/etc/cassandra
>>>> WAIT_FOR_START=10
>>>> CASSANDRA_HOME=/usr/share/cassandra
>>>> FD_LIMIT=10
>>>>
>>>> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
>>>> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
>>>> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
>>>>
>>>> # Read configuration variable file if it is present
>>>> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
>>>>
>>>> # Read Cassandra environment file.
>>>> . /etc/cassandra/cassandra-env.sh
>>>>
>>>> if [ -z "$JVM_OPTS" ]; then
>>>> echo "Initialization failed; \$JVM_OPTS not set!" >&2
>>>> exit 3
>>>> fi
>>>>
>>>> export JVM_OPTS
>>>>
>>>> # Export JAVA_HOME, if set.
>>>> [ -n "$JAVA_HOME" ] && export JAVA_HOME
>>>>
>>>> # Load the VERBOSE setting and other rcS variables
>>>> . /lib/init/vars.sh
>>>>
>>>> # Define LSB log_* functions.
>>>> # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
>>>> . /lib/lsb/init-functions
>>>>
>>>> #
>>>> # Function that returns 0 if process is running, or nonzero if not.
>>>> #
>>>> # The nonzero value is 3 if the process is simply not running, and 1 if
>>>> the
>>>> # process is not running but the pidfile exists (to match the exit
>>>> codes for
>>>> # the "status" command; see LSB core spec 3.1, section 20.2)
>>>> #
>>>> CMD_PATT="cassandra.+CassandraDaemon"
>>>> is_running()
>>>> {
>>>> if [ -f $PIDFILE ]; then
>>>> pid=`cat $PIDFILE`
>>>> grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return
>>>> 0
>>>> return 1
>>>> fi
>

Re: Strange issue wherein cassandra not being started from cron

2017-01-11 Thread Ajay Garg

On Wed, Jan 11, 2017 at 8:29 PM, Martin Schröder  wrote:

> 2017-01-11 15:42 GMT+01:00 Ajay Garg :
> > Tried everything.
>
> Then try
>service cassandra start
> or
>systemctl start cassandra
>
> You still haven't explained to us why you want to start cassandra every
> minute.
>

Hi Martin.

Sometimes, the cassandra-process gets killed (reason unknown as of now).
Doing a manual "service cassandra start" works then.

Adding this in cron would at least ensure that the maximum downtime is 59
seconds (till the time root-cause of cassandra-crashing is known).

>
> Best
>Martin
>

-- 
Regards,
Ajay

Re: Strange issue wherein cassandra not being started from cron

2017-01-11 Thread Ajay Garg

Hi Hannu.

On Wed, Jan 11, 2017 at 8:31 PM, Hannu Kröger  wrote:

> One possible reason is that cassandra process gets different user when run
> differently. Check who owns the data files and check also what gets written
> into the /var/log/cassandra/system.log (or whatever that was).
>

Absolutely nothing gets written to /var/log/cassandra/system.log (when
trying to invoke cassandra via cron).


>
> Hannu
>
>
> On 11 Jan 2017, at 16.42, Ajay Garg  wrote:
>
> Tried everything.
> Every other cron job/script I try works, just the cassandra-service does
> not.
>
> On Wed, Jan 11, 2017 at 8:51 AM, Edward Capriolo 
> wrote:
>
>>
>>
>> On Tuesday, January 10, 2017, Jonathan Haddad  wrote:
>>
>>> Last I checked, cron doesn't load the same, full environment you see
>>> when you log in. Also, why put Cassandra on a cron?
>>> On Mon, Jan 9, 2017 at 9:47 PM Bhuvan Rawal  wrote:
>>>
>>>> Hi Ajay,
>>>>
>>>> Have you had a look at cron logs? - mine is in path /var/log/cron
>>>>
>>>> Thanks & Regards,
>>>>
>>>> On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg 
>>>> wrote:
>>>>
>>>>> Hi All.
>>>>>
>>>>> Facing a very weird issue, wherein the command
>>>>>
>>>>> */etc/init.d/cassandra start*
>>>>>
>>>>> causes cassandra to start when the command is run from command-line.
>>>>>
>>>>>
>>>>> However, if I put the above as a cron job
>>>>>
>>>>>
>>>>>
>>>>> ** * * * * /etc/init.d/cassandra start*
>>>>> cassandra never starts.
>>>>>
>>>>>
>>>>> I have checked, and "cron" service is running.
>>>>>
>>>>>
>>>>> Any ideas what might be wrong?
>>>>> I am pasting the cassandra script for brevity.
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>> Ajay
>>>>>
>>>>>
>>>>> 
>>>>> 
>>>>> #! /bin/sh
>>>>> ### BEGIN INIT INFO
>>>>> # Provides:  cassandra
>>>>> # Required-Start:$remote_fs $network $named $time
>>>>> # Required-Stop: $remote_fs $network $named $time
>>>>> # Should-Start:  ntp mdadm
>>>>> # Should-Stop:   ntp mdadm
>>>>> # Default-Start: 2 3 4 5
>>>>> # Default-Stop:  0 1 6
>>>>> # Short-Description: distributed storage system for structured data
>>>>> # Description:   Cassandra is a distributed (peer-to-peer) system
>>>>> for
>>>>> #the management and storage of structured data.
>>>>> ### END INIT INFO
>>>>>
>>>>> # Author: Eric Evans 
>>>>>
>>>>> DESC="Cassandra"
>>>>> NAME=cassandra
>>>>> PIDFILE=/var/run/$NAME/$NAME.pid
>>>>> SCRIPTNAME=/etc/init.d/$NAME
>>>>> CONFDIR=/etc/cassandra
>>>>> WAIT_FOR_START=10
>>>>> CASSANDRA_HOME=/usr/share/cassandra
>>>>> FD_LIMIT=10
>>>>>
>>>>> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
>>>>> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
>>>>> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
>>>>>
>>>>> # Read configuration variable file if it is present
>>>>> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
>>>>>
>>>>> # Read Cassandra environment file.
>>>>> . /etc/cassandra/cassandra-env.sh
>>>>>
>>>>> if [ -z "$JVM_OPTS" ]; then
>>>>> echo "Initialization failed; \$JVM_OPTS not set!" >&2
>>>>> exit 3
>>>>> fi
>>>>>
>>>>> export JVM_OPTS
>>>>>
>>>>> # Export JAVA_HOME, if set.
>>>>> [ -n "$JAVA_HOME" ] && export JAVA_HOME
>>>>>
>>>>> # Load the VERBOSE setting and other rcS variables
>>>>> . /lib/init/vars.sh
>>>>>
>>>>> # Define LSB log_* functions.
>>>>> # Depend on lsb-base (>= 3.0-6) to ensure that this file is prese

Test Subject

2015-09-14 Thread Ajay Garg

Testing simple content, as my previous email bounced :(

-- 
Regards,
Ajay

Not able to cqlsh on 2.1.9 on Ubuntu 14.04

2015-09-14 Thread Ajay Garg

Hi All.

We have setup a Ubuntu-14.04 server, and followed the steps exactly as
per http://wiki.apache.org/cassandra/DebianPackaging

Installation completes fine, Cassandra starts fine, however cqlsh does not work.
We get the error ::

###
ajay@comp:~$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1':
error(None, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
None")})
###



Version-Info ::

###
ajay@comp:~$ dpkg -l | grep cassandra
ii  cassandra   2.1.9
 all  distributed storage system for structured data
###



The port "seems" to be opened fine.

###
ajay@comp:~$ netstat -an | grep 9042
tcp6   0  0 127.0.0.1:9042  :::*LISTEN
###



Firewall-filters ::

###
ajay@comp:~$ sudo iptables -L
[sudo] password for ajay:
Chain INPUT (policy ACCEPT)
target prot opt source   destination
ACCEPT all  --  anywhere anywhere state
RELATED,ESTABLISHED
ACCEPT tcp  --  anywhere anywhere tcp dpt:ssh
DROP   all  --  anywhere anywhere

Chain FORWARD (policy ACCEPT)
target prot opt source   destination

Chain OUTPUT (policy ACCEPT)
target prot opt source   destination
###



Even telnet fails :(

###
ajay@comp:~$ telnet localhost 9042
Trying 127.0.0.1...
###



Any ideas please?? We have been stuck on this for a good 3 hours now :(



Thanks and Regards,
Ajay

Re: Not able to cqlsh on 2.1.9 on Ubuntu 14.04

2015-09-14 Thread Ajay Garg

Hi All.

Thanks for your replies.

a)
cqlsh  does not work either :(


b)
Following are the parameters as asked ::

listen_address: localhost
rpc_address: localhost

broadcast_rpc_address is not set.
According to the yaml file ::

# RPC address to broadcast to drivers and other Cassandra nodes. This cannot
# be set to 0.0.0.0. If left blank, this will be set to the value of
# rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must
# be set.
# broadcast_rpc_address: 1.2.3.4


c)
Following is the netstat-output, with process information ::

###
ajay@comp:~$ sudo netstat -apn | grep 9042
[sudo] password for admin:
tcp6   0  0 127.0.0.1:9042  :::*
LISTEN  10169/java
###


Kindly let me know what else we can try .. it is really driving us nuttsss :(

On Mon, Sep 14, 2015 at 9:40 PM, Jared Biel
 wrote:
> Whoops, I accidentally pressed a hotkey and sent my message prematurely.
> Here's what netstat should look like with those settings:
>
> sudo netstat -apn | grep 9042
> tcp6   0  0 0.0.0.0:9042:::*LISTEN
> 21248/java
>
> -Jared
>
> On 14 September 2015 at 16:09, Jared Biel 
> wrote:
>>
>> I assume "@ Of node" is ethX's IP address? Has cassandra been restarted
>> since changes were made to cassandra.yaml? The netstat output that you
>> posted doesn't look right; we use settings similar to what you've posted.
>> Here's what it looks like on one of our nodes.
>>
>>
>> -Jared
>>
>> On 14 September 2015 at 10:34, Ahmed Eljami 
>> wrote:
>>>
>>> In cassanrda.yaml:
>>> listen_address:@ Of node
>>> rpc_address:0.0.0.0
>>>
>>> brodcast_rpc_address:@ Of node
>>>
>>> 2015-09-14 11:31 GMT+01:00 Neha Dave :
>>>>
>>>> Try
>>>> >cqlsh 
>>>>
>>>> regards
>>>> Neha
>>>>
>>>> On Mon, Sep 14, 2015 at 3:53 PM, Ajay Garg 
>>>> wrote:
>>>>>
>>>>> Hi All.
>>>>>
>>>>> We have setup a Ubuntu-14.04 server, and followed the steps exactly as
>>>>> per http://wiki.apache.org/cassandra/DebianPackaging
>>>>>
>>>>> Installation completes fine, Cassandra starts fine, however cqlsh does
>>>>> not work.
>>>>> We get the error ::
>>>>>
>>>>>
>>>>> ###
>>>>> ajay@comp:~$ cqlsh
>>>>> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
>>>>> error(None, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
>>>>> None")})
>>>>>
>>>>> ###
>>>>>
>>>>>
>>>>>
>>>>> Version-Info ::
>>>>>
>>>>>
>>>>> ###
>>>>> ajay@comp:~$ dpkg -l | grep cassandra
>>>>> ii  cassandra   2.1.9
>>>>>  all  distributed storage system for structured data
>>>>>
>>>>> ###
>>>>>
>>>>>
>>>>>
>>>>> The port "seems" to be opened fine.
>>>>>
>>>>>
>>>>> ###
>>>>> ajay@comp:~$ netstat -an | grep 9042
>>>>> tcp6   0  0 127.0.0.1:9042  :::*
>>>>> LISTEN
>>>>>
>>>>> ###
>>>>>
>>>>>
>>>>>
>>>>> Firewall-filters ::
>>>>>
>>>>>
>>>>> ###
>>>>> ajay@comp:~$ sudo iptables -L
>>>>> [sudo] password for ajay:
>>>>>

Re: Not able to cqlsh on 2.1.9 on Ubuntu 14.04

2015-09-14 Thread Ajay Garg

Hi Jared.

Thanks for your help.

I made the config-changes.
Also, I changed the seed (right now, we are just trying to get one
instance up and running) ::


seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring.  You must change this if you are running
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  # seeds is actually a comma-delimited list of addresses.
  # Ex: ",,"
  - seeds: "our.ip.address.here"





Following is the netstat output ::

####
ajay@comp:~$ sudo netstat -apn | grep 9042
tcp6   0  0 0.0.0.0:9042:::*
LISTEN  22469/java




Still, when I try, we get ::

####
ajay@comp:~$ cqlsh our.ip.address.here
Connection error: ('Unable to connect to any servers',
{'our.ip.address.here': error(None, "Tried connecting to
[('our.ip.address.here', 9042)]. Last error: None")})



:( :(

On Mon, Sep 14, 2015 at 11:00 PM, Jared Biel
 wrote:
> Is there a reason that you're setting listen_address and rpc_address to
> localhost?
>
> listen_address doc: "the Right Thing is to use the address associated with
> the hostname". So, set the IP address of this to eth0 for example. I believe
> if it is set to localhost then you won't be able to form a cluster with
> other nodes.
>
> rpc_address: this is the address to which clients will connect. I recommend
> 0.0.0.0 here so clients can connect to IP address of the server as well as
> localhost if they happen to reside on the same instance.
>
>
> Here are all of the address settings from our config file. 192.168.1.10 is
> the IP address of eth0 and broadcast_address is commented out.
>
> listen_address: 192.168.1.10
> # broadcast_address: 1.2.3.4
> rpc_address: 0.0.0.0
> broadcast_rpc_address: 192.168.1.10
>
> Follow these directions to get up and running with the first node
> (destructive process):
>
> 1. Stop cassandra
> 2. Remove data from cassandra var directory (rm -rf /var/lib/cassandra/*)
> 3. Make above changes to config file. Also set seeds to the eth0 IP address
> 4. Start cassandra
> 5. Set seeds in config file back to "" after cassandra is up and running.
>
> After following that process, you'll be able to connect to the node from any
> host that can reach Cassandra's ports on that node ("cqlsh" command will
> work.) To join more nodes to the cluster, follow the steps same steps as
> above, except the seeds value to the IP address of an already running node.
>
> Regarding the empty "seeds" config entry: our configs are automated with
> configuration management. During the node bootstrap process a script
> performs the above. The reason that we set seeds back to empty is that we
> don't want nodes coming up/down to cause the config file to change and thus
> cassandra to restart needlessly. So far we haven't had any issues with seeds
> being set to empty after a node has joined the cluster, but this may not be
> the recommended way of doing things.
>
> -Jared
>
> On 14 September 2015 at 16:46, Ajay Garg  wrote:
>>
>> Hi All.
>>
>> Thanks for your replies.
>>
>> a)
>> cqlsh  does not work either :(
>>
>>
>> b)
>> Following are the parameters as asked ::
>>
>> listen_address: localhost
>> rpc_address: localhost
>>
>> broadcast_rpc_address is not set.
>> According to the yaml file ::
>>
>> # RPC address to broadcast to drivers and other Cassandra nodes. This
>> cannot
>> # be set to 0.0.0.0. If left blank, this will be set to the value of
>> # rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address
>> must
>> # be set.
>> # broadcast_rpc_address: 1.2.3.4
>>
>>
>> c)
>> Following is the netstat-output, with process information ::
>>
>>
>> ###
>> ajay@comp:~$ sudo netstat -apn | grep 9042
>> [sudo] password for admin:
>> tcp6   0  0 127.0.0.1:9042  :::*
>> LISTEN  10169/java
>>
>>

Re: Not able to cqlsh on 2.1.9 on Ubuntu 14.04

2015-09-14 Thread Ajay Garg

Hi All.

I re-established my server from scratch, and installed the 21x server.
Now, cqlsh works right out of the box.

When I had last setup the server, I had (accidentally) installed the
20x server on first attempt, removed it, and then installed the 21x
series server. Seems that caused some hidden problem.


I am heartfully grateful to everyone for bearing with me.


Thanks and Regards,
Ajay

On Tue, Sep 15, 2015 at 10:16 AM, Ajay Garg  wrote:
> Hi Jared.
>
> Thanks for your help.
>
> I made the config-changes.
> Also, I changed the seed (right now, we are just trying to get one
> instance up and running) ::
>
> 
> seed_provider:
> # Addresses of hosts that are deemed contact points.
> # Cassandra nodes use this list of hosts to find each other and learn
> # the topology of the ring.  You must change this if you are running
> # multiple nodes!
> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>   parameters:
>   # seeds is actually a comma-delimited list of addresses.
>   # Ex: ",,"
>   - seeds: "our.ip.address.here"
> 
>
>
>
>
> Following is the netstat output ::
>
> 
> ajay@comp:~$ sudo netstat -apn | grep 9042
> tcp6   0  0 0.0.0.0:9042:::*
> LISTEN  22469/java
> 
>
>
>
> Still, when I try, we get ::
>
> 
> ajay@comp:~$ cqlsh our.ip.address.here
> Connection error: ('Unable to connect to any servers',
> {'our.ip.address.here': error(None, "Tried connecting to
> [('our.ip.address.here', 9042)]. Last error: None")})
> 
>
>
> :( :(
>
> On Mon, Sep 14, 2015 at 11:00 PM, Jared Biel
>  wrote:
>> Is there a reason that you're setting listen_address and rpc_address to
>> localhost?
>>
>> listen_address doc: "the Right Thing is to use the address associated with
>> the hostname". So, set the IP address of this to eth0 for example. I believe
>> if it is set to localhost then you won't be able to form a cluster with
>> other nodes.
>>
>> rpc_address: this is the address to which clients will connect. I recommend
>> 0.0.0.0 here so clients can connect to IP address of the server as well as
>> localhost if they happen to reside on the same instance.
>>
>>
>> Here are all of the address settings from our config file. 192.168.1.10 is
>> the IP address of eth0 and broadcast_address is commented out.
>>
>> listen_address: 192.168.1.10
>> # broadcast_address: 1.2.3.4
>> rpc_address: 0.0.0.0
>> broadcast_rpc_address: 192.168.1.10
>>
>> Follow these directions to get up and running with the first node
>> (destructive process):
>>
>> 1. Stop cassandra
>> 2. Remove data from cassandra var directory (rm -rf /var/lib/cassandra/*)
>> 3. Make above changes to config file. Also set seeds to the eth0 IP address
>> 4. Start cassandra
>> 5. Set seeds in config file back to "" after cassandra is up and running.
>>
>> After following that process, you'll be able to connect to the node from any
>> host that can reach Cassandra's ports on that node ("cqlsh" command will
>> work.) To join more nodes to the cluster, follow the steps same steps as
>> above, except the seeds value to the IP address of an already running node.
>>
>> Regarding the empty "seeds" config entry: our configs are automated with
>> configuration management. During the node bootstrap process a script
>> performs the above. The reason that we set seeds back to empty is that we
>> don't want nodes coming up/down to cause the config file to change and thus
>> cassandra to restart needlessly. So far we haven't had any issues with seeds
>> being set to empty after a node has joined the cluster, but this may not be
>> the recommended way of doing things.
>>
>> -Jared
>>
>> On 14 September 2015 at 16:46, Ajay Garg  wrote:
>>>
>>> Hi All.
>>>
>>> Thanks for your replies.
>>>
>>> a)
>>> cqlsh  does not work either :(
>>>
>>>
>>> b)
>>> Following are the parameters as asked ::
>>>
>>> listen_address: localhost
>>> rpc_address: localhost
>>&

Possible to restore ENTIRE data from Cassandra-Schema in one go?

2015-09-14 Thread Ajay Garg

Hi All.

We have a schema on one Cassandra-node, and wish to duplicate the
entire schema on another server.
Think of this a 2 clusters, each cluster containing one node.

We have found the way to dump/restore schema-metainfo at ::

https://dzone.com/articles/dumpingloading-schema


And dumping/restoring data at ::

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_takes_snapshot_t.html
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html


For the restoring data step, it seems that restoring every "table"
requires a dedicated step.
So, if the schema has 100 "tables", we would need 100 steps.


Is it so? If yes, can the entire data be dumped/restored in one go?
Just asking, to save time, if it could :)




Thanks and Regards,
Ajay

Getting intermittent errors while taking snapshot

2015-09-14 Thread Ajay Garg

Hi All.

Taking snapshots sometimes works, sometimes don't.
Following is the stacktrace whenever the process fails ::


##
ajay@ajay-HP-15-Notebook-PC:/var/lib/cassandra/data/instamsg$ nodetool
-h localhost -p 7199 snapshot instamsgRequested creating snapshot(s)
for [instamsg] with snapshot name [1442298538121]
error: 
/var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/snapshots/1442298538121/instamsg-clients-ka-15-TOC.txt
-> 
/var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/instamsg-clients-ka-15-TOC.txt:
Operation not permitted
-- StackTrace --
java.nio.file.FileSystemException:
/var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/snapshots/1442298538121/instamsg-clients-ka-15-TOC.txt
-> 
/var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/instamsg-clients-ka-15-TOC.txt:
Operation not permitted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476)
at java.nio.file.Files.createLink(Files.java:1086)
at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:94)
at 
org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1842)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:2279)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2361)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2355)
at org.apache.cassandra.db.Keyspace.snapshot(Keyspace.java:207)
at 
org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2388)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1466)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:828)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$251(TCPTransport.java:683)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$1/13812661.run(Unknown
Source)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1

Re: Possible to restore ENTIRE data from Cassandra-Schema in one go?

2015-09-14 Thread Ajay Garg

Thanks Mam for the reply.

I guess there is manual work needed to bring all the SSTables files
into one directory, so doesn't really solve the purpose I guess. So,
going the "vanilla" way might be simpler :)

Thanks anyways for the help !!!

Thanks and Regards,
Ajay

On Tue, Sep 15, 2015 at 11:34 AM, Neha Dave  wrote:
> Havent used it.. but u can try SSTaable Bulk Loader:
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html
>
> regards
> Neha
>
> On Tue, Sep 15, 2015 at 11:21 AM, Ajay Garg  wrote:
>>
>> Hi All.
>>
>> We have a schema on one Cassandra-node, and wish to duplicate the
>> entire schema on another server.
>> Think of this a 2 clusters, each cluster containing one node.
>>
>> We have found the way to dump/restore schema-metainfo at ::
>>
>> https://dzone.com/articles/dumpingloading-schema
>>
>>
>> And dumping/restoring data at ::
>>
>>
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_takes_snapshot_t.html
>>
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html
>>
>>
>> For the restoring data step, it seems that restoring every "table"
>> requires a dedicated step.
>> So, if the schema has 100 "tables", we would need 100 steps.
>>
>>
>> Is it so? If yes, can the entire data be dumped/restored in one go?
>> Just asking, to save time, if it could :)
>>
>>
>>
>>
>> Thanks and Regards,
>> Ajay
>
>



-- 
Regards,
Ajay

Re: Getting intermittent errors while taking snapshot

2015-09-14 Thread Ajay Garg

Hi All.

Granting complete-permissions to the keyspace-folder
(/var/lib/cassandra/data/instamsg) fixed the issue.
Now, multiple, successive snapshot-commands run to completion fine.


sudo chmod -R 777 /var/lib/cassandra/data/instamsg



Thanks and Regards,
Ajay

On Tue, Sep 15, 2015 at 12:04 PM, Ajay Garg  wrote:
> Hi All.
>
> Taking snapshots sometimes works, sometimes don't.
> Following is the stacktrace whenever the process fails ::
>
>
> ######
> ajay@ajay-HP-15-Notebook-PC:/var/lib/cassandra/data/instamsg$ nodetool
> -h localhost -p 7199 snapshot instamsgRequested creating snapshot(s)
> for [instamsg] with snapshot name [1442298538121]
> error: 
> /var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/snapshots/1442298538121/instamsg-clients-ka-15-TOC.txt
> -> 
> /var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/instamsg-clients-ka-15-TOC.txt:
> Operation not permitted
> -- StackTrace --
> java.nio.file.FileSystemException:
> /var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/snapshots/1442298538121/instamsg-clients-ka-15-TOC.txt
> -> 
> /var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/instamsg-clients-ka-15-TOC.txt:
> Operation not permitted
> at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476)
> at java.nio.file.Files.createLink(Files.java:1086)
> at 
> org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:94)
> at 
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1842)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:2279)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2361)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2355)
> at org.apache.cassandra.db.Keyspace.snapshot(Keyspace.java:207)
> at 
> org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2388)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1466)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
> at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:828)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
> at sun.r

Is replication possible with already existing data?

2015-10-07 Thread Ajay Garg

Hi All.

We have a scenario, where till now we had been using a plain, simple
single node, with the keyspace created using ::

CREATE KEYSPACE our_db WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '1'}  AND durable_writes = true;


We now plan to introduce replication (in the true sense) in our scheme
of things, but cannot afford to lose any data.
We, however can take a bit of downtime, and do any data-migration if
required (we have already done data-migration once in the past, when
we moved our plain, simple single node from one physical machine to
another).


So,

a)
Is it possible at all to introduce replication in our scenario?
If yes, what needs to be done to NOT LOSE our current existing data?

b)
Also, will "NetworkTopologyStrategy" work in our scenario (since
NetworkTopologyStrategy seems to be more robust)?


Brief pointers to above will give huge confidence-boosts in our endeavours.


Thanks and Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-07 Thread Ajay Garg

Hi Sean.

Thanks for the reply.

On Wed, Oct 7, 2015 at 10:13 PM,   wrote:
> How many nodes are you planning to add?

I guess 2 more.

> How many replicas do you want?

1 (original) + 2 (replicas).
That makes it a total of 3 copies of every row of data.

> In general, there shouldn't be a problem adding nodes and then altering the 
> keyspace to change replication.

Great !!
I guess 
http://docs.datastax.com/en/cql/3.0/cql/cql_reference/alter_keyspace_r.html
will do the trick for changing schema-replication-details !!

> You will want to run repairs to stream the data to the new replicas.

Hmm.. we'll be really grateful if you could point us to a suitable
link for the above step.
If there is a nice-utility, we would be perfectly set up to start our
fun-exercise, consisting of following steps ::

a)
(As advised by you) Changing the schema, to allow a replication_factor of 3.

b)
(As advised by you) Duplicating the already-existing-data on the other 2 nodes.

c)
Thereafter, let Cassandra create a total of 3 copies for every row of
new-incoming-data.

Once again, thanks a ton for the help !!

Thanks and Regards,
Ajay

> You shouldn't need downtime or data migration -- this is the beauty of
> Cassandra.

>
>
> Sean Durity – Lead Cassandra Admin
>

> 
>
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.

-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-07 Thread Ajay Garg

Thanks Eric for the reply.

On Thu, Oct 8, 2015 at 1:44 AM, Eric Stevens  wrote:
> If you're at 1 node (N=1) and RF=1 now, and you want to go N=3 RF=3, you
> ought to be able to increase RF to 3 before bootstrapping your new nodes,
> with no downtime and no loss of data (even temporary).  Effective RF is
> min-bounded by N, so temporarily having RF > N ought to behave as RF = N.
>
> If you're starting at N > RF and you want to increase RF, things get
harrier
> if you can't afford temporary consistency issues.
>

We are ok with temporary consistency issues.

Also, I was going through the following articles
https://10kloc.wordpress.com/2012/12/27/cassandra-chapter-5-data-replication-strategies/

and following doubts came up in my mind ::

a)
Let's say at site-1, Application-Server (APP1) uses the two
Cassandra-instances (CAS11 and CAS12), and APP1 generally uses CAS11 for
all its needs (of course, whatever happens on CAS11, the same is replicated
to CAS12 at Cassandra-level).

Now, if CAS11 goes down, will it be the responsibility of APP1 to "detect"
this and pick up CAS12 for its needs?
Or some automatic Cassandra-magic will happen?

b)
In the same above scenario, let's say before CAS11 goes down, the amount of
data in both CAS11 and CAS12 was "x".

After CAS11 goes down, the data is being put in CAS12 only.
After some time, CAS11 comes back up.

Now, data in CAS11 is still "x", while data in CAS12 is "y" (obviously, "y"
> "x").

Now, will the additional ("y" - "x") data be automatically
put/replicated/whatever back in CAS11 through Cassandra?
Or it has to be done manually?

If there are easy recommended solutions to above, I am beginning to think
that a 2*2 (2 nodes each at 2 data-centres) will be the ideal setup
(allowing failures of entire site, or a few nodes on the same site).

I am sorry for asking such newbie questions, and I will be grateful if
these silly questions could be answered by the experts :)

Thanks and Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-08 Thread Ajay Garg

On Thu, Oct 8, 2015 at 9:47 AM, Ajay Garg  wrote:
> Thanks Eric for the reply.
>
>
> On Thu, Oct 8, 2015 at 1:44 AM, Eric Stevens  wrote:
>> If you're at 1 node (N=1) and RF=1 now, and you want to go N=3 RF=3, you
>> ought to be able to increase RF to 3 before bootstrapping your new nodes,
>> with no downtime and no loss of data (even temporary).  Effective RF is
>> min-bounded by N, so temporarily having RF > N ought to behave as RF = N.
>>
>> If you're starting at N > RF and you want to increase RF, things get
>> harrier
>> if you can't afford temporary consistency issues.
>>
>
> We are ok with temporary consistency issues.
>
> Also, I was going through the following articles
> https://10kloc.wordpress.com/2012/12/27/cassandra-chapter-5-data-replication-strategies/
>
> and following doubts came up in my mind ::
>
>
> a)
> Let's say at site-1, Application-Server (APP1) uses the two
> Cassandra-instances (CAS11 and CAS12), and APP1 generally uses CAS11 for all
> its needs (of course, whatever happens on CAS11, the same is replicated to
> CAS12 at Cassandra-level).
>
> Now, if CAS11 goes down, will it be the responsibility of APP1 to "detect"
> this and pick up CAS12 for its needs?
> Or some automatic Cassandra-magic will happen?
>
>
> b)
> In the same above scenario, let's say before CAS11 goes down, the amount of
> data in both CAS11 and CAS12 was "x".
>
> After CAS11 goes down, the data is being put in CAS12 only.
> After some time, CAS11 comes back up.
>
> Now, data in CAS11 is still "x", while data in CAS12 is "y" (obviously, "y"
>> "x").
>
> Now, will the additional ("y" - "x") data be automatically
> put/replicated/whatever back in CAS11 through Cassandra?
> Or it has to be done manually?
>

Any pointers, please ???

>
> If there are easy recommended solutions to above, I am beginning to think
> that a 2*2 (2 nodes each at 2 data-centres) will be the ideal setup
> (allowing failures of entire site, or a few nodes on the same site).
>
> I am sorry for asking such newbie questions, and I will be grateful if these
> silly questions could be answered by the experts :)
>
>
> Thanks and Regards,
> Ajay



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-10 Thread Ajay Garg

Thanks a ton Anuja for the help !!!

On Fri, Oct 9, 2015 at 12:38 PM, anuja jain  wrote:
> Hi Ajay,
>
>
> On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg  wrote:
>>
> In this case, it will be the responsibility of APP1 to start connection to
> CAS12. On the other hand if your APP1 is connecting to cassandra using Java
> driver, you can add multiple contact points(CAS11 and CAS12 here) so that if
> CAS11 is down it will directly connect to CAS12.

Great .. Java-driver it will be :)

>>
> In such a case, CAS12 will store hints for the data to be stored on CAS11
> (the tokens of which lies within the range of tokens CAS11 holds)  and
> whenever CAS11 is up again, the hints will be transferred to it and the data
> will be distributed evenly.
>

Evenly?

Should not the data be """EXACTLY""" equal after CAS11 comes back up
and the sync/transfer/whatever happens?
After all, before CAS11 went down, CAS11 and CAS12 were replicating all data.

Once again, thanks for your help.
I will be even more grateful if you would help me clear the lingering
doubt to second point.

Thanks and Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-22 Thread Ajay Garg

Hi Carlos.


I setup a following setup ::

CAS11 and CAS12 in DC1
CAS21 and CAS22 in DC2

a)
Brought all the 4 up, replication worked perfect !!!

b)
Thereafter, downed CAS11 via "sudo service cassandra stop".
Replication continued to work fine on CAS12, CAS21 and CAS22.

c)
Thereafter, upped CAS11 via "sudo service cassandra start".


However, CAS11 refuses to come up now.
Following is the error in /var/log/cassandra/system.log ::



ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Cannot change the
number of tokens from 1 to 256
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:966)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:734)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
[apache-cassandra-2.1.10.jar:2.1.10]
INFO  [StorageServiceShutdownHook] 2015-10-23 03:07:34,271
Gossiper.java:1442 - Announcing shutdown
INFO  [GossipStage:1] 2015-10-23 03:07:34,282 OutboundTcpConnection.java:97
- OutboundTcpConnection using coalescing strategy DISABLED
ERROR [StorageServiceShutdownHook] 2015-10-23 03:07:34,305
CassandraDaemon.java:227 - Exception in thread
Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException: null
at
org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1624)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1632)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1686)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1510)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationStateInternal(Gossiper.java:1412)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationStates(Gossiper.java:1427)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1417)
~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.gms.Gossiper.stop(Gossiper.java:1443)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:678)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-2.1.10.jar:2.1.10]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]



Ideas?


Thanks and Regards,
Ajay



On Mon, Oct 12, 2015 at 3:46 PM, Carlos Alonso  wrote:

> Yes Ajay, in your particular scenario, after all hints are delivered, both
> CAS11 and CAS12 will have the exact same data.
>
> Cheers!
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 11 October 2015 at 05:21, Ajay Garg  wrote:
>
>> Thanks a ton Anuja for the help !!!
>>
>> On Fri, Oct 9, 2015 at 12:38 PM, anuja jain  wrote:
>> > Hi Ajay,
>> >
>> >
>> > On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg 
>> wrote:
>> >>
>> > In this case, it will be the responsibility of APP1 to start connection
>> to
>> > CAS12. On the other hand if your APP1 is connecting to cassandra using
>> Java
>> > driver, you can add multiple contact points(CAS11 and CAS12 here) so
>> that if
>> > CAS11 is down it will directly connect to CAS12.
>>
>> Great .. Java-driver it will be :)
>>
>>
>>
>>
>> >>
>> > In such a case, CAS12 will store hints for the data to be stored on
>> CAS11
>> > (the tokens of which lies within the range of tokens CAS11 holds)  and
>> > whenever CAS11 is up again, the hints will be transferred to it and the
>> data
>> > will be distributed evenly.
>> >
>>
>> Evenly?
>>
>> Should not the data be "

Re: Is replication possible with already existing data?

2015-10-22 Thread Ajay Garg

Hi Michael.

Please find below the contents of cassandra.yaml for CAS11 (the files on
the rest of the three nodes are also exactly the same, except the
"initial_token" and "listen_address" fields) ::

CAS11 ::


cluster_name: 'InstaMsg Cluster'
num_tokens: 256
initial_token: -9223372036854775808
hinted_handoff_enabled: true
max_hint_window_in_ms: 1080 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

disk_failure_policy: stop
commit_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: "104.239.200.33,119.9.92.77"

concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

memtable_allocation_type: heap_buffers

index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 104.239.200.33
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: localhost
rpc_port: 9160
rpc_keepalive: true

rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true

tombstone_warn_threshold: 1000
tombstone_failure_threshold: 10

column_index_size_in_kb: 64
batch_size_warn_threshold_in_kb: 5

compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100

sstable_preemptive_open_interval_in_mb: 50

read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 1

write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 6
request_timeout_in_ms: 1
cross_node_timeout: false
endpoint_snitch: PropertyFileSnitch

dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1

request_scheduler: org.apache.cassandra.scheduler.NoScheduler

server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra

client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra

internode_compression: all
inter_dc_tcp_nodelay: false



What changes need to be made, so that whenever a downed server comes back
up, the missing data comes back over to it?

Thanks and Regards,
Ajay



On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler 
wrote:

> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>
>> However, CAS11 refuses to come up now.
>> Following is the error in /var/log/cassandra/system.log ::
>>
>>
>> 
>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>> configuration error
>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>> the number of tokens from 1 to 256
>>
>
> Check your cassandra.yaml - this node has vnodes enabled in the
> configuration when it did not, previously. Check all nodes. Something
> changed. Mixed vnode/non-vnode clusters is bad juju.
>
> --
> Kind regards,
> Michael
>



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-23 Thread Ajay Garg

Any ideas, please?
To repeat, we are using the exact same cassandra-version on all 4 nodes
(2.1.10).

On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg  wrote:

> Hi Michael.
>
> Please find below the contents of cassandra.yaml for CAS11 (the files on
> the rest of the three nodes are also exactly the same, except the
> "initial_token" and "listen_address" fields) ::
>
> CAS11 ::
>
> 
> cluster_name: 'InstaMsg Cluster'
> num_tokens: 256
> initial_token: -9223372036854775808
> hinted_handoff_enabled: true
> max_hint_window_in_ms: 1080 # 3 hours
> hinted_handoff_throttle_in_kb: 1024
> max_hints_delivery_threads: 2
> batchlog_replay_throttle_in_kb: 1024
> authenticator: AllowAllAuthenticator
> authorizer: AllowAllAuthorizer
> permissions_validity_in_ms: 2000
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> data_file_directories:
> - /var/lib/cassandra/data
>
> commitlog_directory: /var/lib/cassandra/commitlog
>
> disk_failure_policy: stop
> commit_failure_policy: stop
> key_cache_size_in_mb:
> key_cache_save_period: 14400
> row_cache_size_in_mb: 0
> row_cache_save_period: 0
> counter_cache_size_in_mb:
> counter_cache_save_period: 7200
> saved_caches_directory: /var/lib/cassandra/saved_caches
> commitlog_sync: periodic
> commitlog_sync_period_in_ms: 1
> commitlog_segment_size_in_mb: 32
> seed_provider:
> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>   parameters:
>   - seeds: "104.239.200.33,119.9.92.77"
>
> concurrent_reads: 32
> concurrent_writes: 32
> concurrent_counter_writes: 32
>
> memtable_allocation_type: heap_buffers
>
> index_summary_capacity_in_mb:
> index_summary_resize_interval_in_minutes: 60
> trickle_fsync: false
> trickle_fsync_interval_in_kb: 10240
> storage_port: 7000
> ssl_storage_port: 7001
> listen_address: 104.239.200.33
> start_native_transport: true
> native_transport_port: 9042
> start_rpc: true
> rpc_address: localhost
> rpc_port: 9160
> rpc_keepalive: true
>
> rpc_server_type: sync
> thrift_framed_transport_size_in_mb: 15
> incremental_backups: false
> snapshot_before_compaction: false
> auto_snapshot: true
>
> tombstone_warn_threshold: 1000
> tombstone_failure_threshold: 10
>
> column_index_size_in_kb: 64
> batch_size_warn_threshold_in_kb: 5
>
> compaction_throughput_mb_per_sec: 16
> compaction_large_partition_warning_threshold_mb: 100
>
> sstable_preemptive_open_interval_in_mb: 50
>
> read_request_timeout_in_ms: 5000
> range_request_timeout_in_ms: 1
>
> write_request_timeout_in_ms: 2000
> counter_write_request_timeout_in_ms: 5000
> cas_contention_timeout_in_ms: 1000
> truncate_request_timeout_in_ms: 6
> request_timeout_in_ms: 1
> cross_node_timeout: false
> endpoint_snitch: PropertyFileSnitch
>
> dynamic_snitch_update_interval_in_ms: 100
> dynamic_snitch_reset_interval_in_ms: 60
> dynamic_snitch_badness_threshold: 0.1
>
> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>
> server_encryption_options:
> internode_encryption: none
> keystore: conf/.keystore
> keystore_password: cassandra
> truststore: conf/.truststore
> truststore_password: cassandra
>
> client_encryption_options:
> enabled: false
> keystore: conf/.keystore
> keystore_password: cassandra
>
> internode_compression: all
> inter_dc_tcp_nodelay: false
> 
>
>
> What changes need to be made, so that whenever a downed server comes back
> up, the missing data comes back over to it?
>
> Thanks and Regards,
> Ajay
>
>
>
> On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler 
> wrote:
>
>> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>>
>>> However, CAS11 refuses to come up now.
>>> Following is the error in /var/log/cassandra/system.log ::
>>>
>>>
>>> 
>>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>>> configuration error
>>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>>> the number of tokens from 1 to 256
>>>
>>
>> Check your cassandra.yaml - this node has vnodes enabled in the
>> configuration when it did not, previously. Check all nodes. Something
>> changed. Mixed vnode/non-vnode clusters is bad juju.
>>
>> --
>> Kind regards,
>> Michael
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-23 Thread Ajay Garg

Thanks Steve and Michael.

Simply uncommenting "initial_token" did the trick !!!

Right now, I was evaluating replication, for the case when everything is a
clean install.
Will now try my hands on integrating/starting replication, with
pre-existing data.


Once again, thanks a ton for all the help guys !!!


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 2:06 AM, Steve Robenalt 
wrote:

> Hi Ajay,
>
> Please take a look at the cassandra.yaml configuration reference regarding
> intial_token and num_tokens:
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__initial_token
>
> This is basically what Michael was referring to in his earlier message.
> Setting an initial token overrode your num_tokens setting on initial
> startup, but after initial startup, the initial token setting is ignored,
> so num_tokens comes into play, attempting to start up with 256 vnodes.
> That's where your error comes from.
>
> It's likely that all of your nodes started up like this since you have the
> same config on all of them (hopefully, you at least changed initial_token
> for each node).
>
> After reviewing the doc on the two sections above, you'll need to decide
> which path to take to recover. You can likely bring the downed node up by
> setting num_tokens to 1 (which you'd need to do on all nodes), in which
> case you're not really running vnodes. Alternately, you can migrate the
> cluster to vnodes:
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configVnodesProduction_t.html
>
> BTW, I recommend carefully reviewing the cassandra.yaml configuration
> reference for ANY change you make from the default. As you've experienced
> here, not all settings are intended to work together.
>
> HTH,
> Steve
>
>
>
> On Fri, Oct 23, 2015 at 12:07 PM, Ajay Garg 
> wrote:
>
>> Any ideas, please?
>> To repeat, we are using the exact same cassandra-version on all 4 nodes
>> (2.1.10).
>>
>> On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg 
>> wrote:
>>
>>> Hi Michael.
>>>
>>> Please find below the contents of cassandra.yaml for CAS11 (the files on
>>> the rest of the three nodes are also exactly the same, except the
>>> "initial_token" and "listen_address" fields) ::
>>>
>>> CAS11 ::
>>>
>>>
>>>
>>> What changes need to be made, so that whenever a downed server comes
>>> back up, the missing data comes back over to it?
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>>
>>>
>>> On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler 
>>> wrote:
>>>
>>>> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>>>>
>>>>> However, CAS11 refuses to come up now.
>>>>> Following is the error in /var/log/cassandra/system.log ::
>>>>>
>>>>>
>>>>> 
>>>>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>>>>> configuration error
>>>>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>>>>> the number of tokens from 1 to 256
>>>>>
>>>>
>>>> Check your cassandra.yaml - this node has vnodes enabled in the
>>>> configuration when it did not, previously. Check all nodes. Something
>>>> changed. Mixed vnode/non-vnode clusters is bad juju.
>>>>
>>>> --
>>>> Kind regards,
>>>> Michael
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Steve Robenalt
> Software Architect
> sroben...@highwire.org 
> (office/cell): 916-505-1785
>
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org
>
> Technology for Scholarly Communication
>



-- 
Regards,
Ajay

Some questions about setting public/private IP-Addresses in Cassandra Cluster

2015-10-23 Thread Ajay Garg

Hi All.

We have a scenario, where the Application-Server (APP), Node-1 (CAS11), and
Node-2 (CAS12) are hosted in DC1.
Node-3 (CAS21) and Node-4 (CAS22) are in DC2.

The intention is that we provide 4-way redundancy to APP, by specifying
CAS11, CAS12, CAS21 and CAS22 as the addresses via Java-Cassandra-connector.
That means, as long as at least one of the 4 nodes are up, the APP should
work.

We are using Network-Topology, with Murmur3Paritioning.
Each Cassandra-Node has two IPs :: one public, and one
private-within-the-same-data-center.


Following are our IP-Addresses configuration ::

a)
Everywhere in "cassandra-topology.properties", we have specified
Public-IP-Addresses of all 4 nodes.

b)
In each of "listen_address" in /etc/cassandra/cassandra.yaml, we have
specified the corresponding Public-IP-Address of the node.

c)
For CAS11 and CAS12, we have specified the corresponding private-IP-Address
for "rpc_address" in /etc/cassandra/cassandra.yaml (since APP is hosted in
the same data-center).
For CAS21 and CAS22, we have specified the corresponding public-IP-Address
for "rpc_address" in /etc/cassandra/cassandra.yaml (since APP can only
communicate over public IP-Addresses with these nodes).


Are any further optimizations possible, in the sense that specifying
private-IP-Addresses would work?
I ask this, because we need to minimize network-latency, so possibility of
private-IP-addresses will help in this regard.


Thanks and Regards,
Ajay

Downtime-Limit for a node in Network-Topology-Replication-Cluster?

2015-10-23 Thread Ajay Garg

If a node in the cluster goes down and comes up, the data gets synced up on
this downed node.
Is there a limit on the interval for which the node can remain down? Or the
data will be synced up even if the node remains down for weeks/months/years?



-- 
Regards,
Ajay

Re: Downtime-Limit for a node in Network-Topology-Replication-Cluster?

2015-10-24 Thread Ajay Garg

Thanks Vasileios for the reply !!!
That makes sense !!!

I will be grateful if you could point me to the node-repair command for
Cassandra-2.1.10.
I don't want to get stuck in a wrong-versioned documentation (already
bitten once hard when setting up replication).

Thanks again...


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 4:14 PM, Vasileios Vlachos <
vasileiosvlac...@gmail.com> wrote:

> Hello Ajay,
>
> Have a look in the *max_hint_window_in_ms* :
>
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
>
> My understanding is that if a node remains down for more than
> *max_hint_window_in_ms*, then you will need to repair that node.
>
> Thanks,
> Vasilis
>
> On Sat, Oct 24, 2015 at 7:48 AM, Ajay Garg  wrote:
>
>> If a node in the cluster goes down and comes up, the data gets synced up
>> on this downed node.
>> Is there a limit on the interval for which the node can remain down? Or
>> the data will be synced up even if the node remains down for
>> weeks/months/years?
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-24 Thread Ajay Garg

Hi All.

I have been doing extensive testing, and replication works fine, even if
any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
Syncing always takes place (obviously, as long as continuous-downtime-value
does not exceed *max_hint_window_in_ms*).


However, things behave weird when I try connecting via DataStax Java-Driver.
I always add the nodes to the cluster in the order ::

 CAS11, CAS12, CAS21, CAS22

during "cluster.connect" method.


Now, following happens ::

a)
If CAS11 goes down, data is persisted fine (presumably first in CAS12, and
later replicated to CAS21 and CAS22).

b)
If CAS11 and CAS12 go down, data is NOT persisted.
Instead the following exceptions are observed in the Java-Driver ::

##
Exception in thread "main"
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (no host was tried)
at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
at
com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
at com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (no host was tried)
at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at
com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
at
com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
at
com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
at
com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
... 3 more
###


I have already tried ::

1)
Increasing driver-read-timeout from 12 seconds to 30 seconds.

2)
Increasing driver-connect-timeout from 5 seconds to 30 seconds.

3)
I have also confirmed that each of the 4 nodes are telnet-able over ports
9042 and 9160 each.


Definitely seems to be some driver-issue, since
data-persistence/replication works perfect (with any permutation) if
data-persistence is done via "cqlsh".


Kindly provide some pointers.
Ultimately, it is the Java-driver that will be used in production, so it is
imperative that data-persistence/replication happens for any downing of any
permutation of node(s).


Thanks and Regards,
Ajay

Re: Downtime-Limit for a node in Network-Topology-Replication-Cluster?

2015-10-24 Thread Ajay Garg

Thanks a ton Vasileios !!

Just one last question ::
Does running "nodetool repair" affect the functionality of cluster for
current-live data?

It's ok if the insertions/deletions of current-live data become a little
slow during the process, but data-consistency must be maintained. If that
is the case, I think we are good.


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 6:03 PM, Vasileios Vlachos <
vasileiosvlac...@gmail.com> wrote:

> Hello Ajay,
>
> Here is a good link:
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesManualRepair.html
>
> Generally, I find the DataStax docs to be OK. You could consult them for
> all usual operations etc. Ofc there are occasions where a given concept is
> not as clear, but you can always ask this list for clarification.
>
> If you find that something is wrong in the docs just email them (more info
> and contact email here: http://docs.datastax.com/en/ ).
>
> Regards,
> Vasilis
>
> On Sat, Oct 24, 2015 at 1:04 PM, Ajay Garg  wrote:
>
>> Thanks Vasileios for the reply !!!
>> That makes sense !!!
>>
>> I will be grateful if you could point me to the node-repair command for
>> Cassandra-2.1.10.
>> I don't want to get stuck in a wrong-versioned documentation (already
>> bitten once hard when setting up replication).
>>
>> Thanks again...
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>> On Sat, Oct 24, 2015 at 4:14 PM, Vasileios Vlachos <
>> vasileiosvlac...@gmail.com> wrote:
>>
>>> Hello Ajay,
>>>
>>> Have a look in the *max_hint_window_in_ms* :
>>>
>>>
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
>>>
>>> My understanding is that if a node remains down for more than
>>> *max_hint_window_in_ms*, then you will need to repair that node.
>>>
>>> Thanks,
>>> Vasilis
>>>
>>> On Sat, Oct 24, 2015 at 7:48 AM, Ajay Garg 
>>> wrote:
>>>
>>>> If a node in the cluster goes down and comes up, the data gets synced
>>>> up on this downed node.
>>>> Is there a limit on the interval for which the node can remain down? Or
>>>> the data will be synced up even if the node remains down for
>>>> weeks/months/years?
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-24 Thread Ajay Garg

Ideas please, on what I may be doing wrong?

On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg  wrote:

> Hi All.
>
> I have been doing extensive testing, and replication works fine, even if
> any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
> Syncing always takes place (obviously, as long as continuous-downtime-value
> does not exceed *max_hint_window_in_ms*).
>
>
> However, things behave weird when I try connecting via DataStax
> Java-Driver.
> I always add the nodes to the cluster in the order ::
>
>  CAS11, CAS12, CAS21, CAS22
>
> during "cluster.connect" method.
>
>
> Now, following happens ::
>
> a)
> If CAS11 goes down, data is persisted fine (presumably first in CAS12, and
> later replicated to CAS21 and CAS22).
>
> b)
> If CAS11 and CAS12 go down, data is NOT persisted.
> Instead the following exceptions are observed in the Java-Driver ::
>
>
> ##
> Exception in thread "main"
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
> tried for query failed (no host was tried)
> at
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
> at
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
> at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
> at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
> at
> com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
> at
> com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
> All host(s) tried for query failed (no host was tried)
> at
> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
> at
> com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
> at
> com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
> at
> com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
> at
> com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
> at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
> ... 3 more
>
> ###
>
>
> I have already tried ::
>
> 1)
> Increasing driver-read-timeout from 12 seconds to 30 seconds.
>
> 2)
> Increasing driver-connect-timeout from 5 seconds to 30 seconds.
>
> 3)
> I have also confirmed that each of the 4 nodes are telnet-able over ports
> 9042 and 9160 each.
>
>
> Definitely seems to be some driver-issue, since
> data-persistence/replication works perfect (with any permutation) if
> data-persistence is done via "cqlsh".
>
>
> Kindly provide some pointers.
> Ultimately, it is the Java-driver that will be used in production, so it
> is imperative that data-persistence/replication happens for any downing of
> any permutation of node(s).
>
>
> Thanks and Regards,
> Ajay
>



-- 
Regards,
Ajay

Re: Downtime-Limit for a node in Network-Topology-Replication-Cluster?

2015-10-24 Thread Ajay Garg

Never mind Vasileios, you have been a great help !!
Thanks a ton again !!!


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 10:17 PM, Vasileios Vlachos <
vasileiosvlac...@gmail.com> wrote:

> I am not sure I fully understand the question, because nodetool repair is
> one of the three ways for Cassandra to ensure consistency. If by "affect"
> you mean "make your data consistent and ensure all replicas are
> up-to-date", then yes, that's what I think it does.
>
> And yes, I would expect nodetool repair (especially depending on the
> options appended to it) to have a performance impact, but how big that
> impact is going to be depends on many things.
>
> We currently perform no scheduled repairs because of our workload and the
> consistency level that we use. So, as you can understand I am certainly not
> the best person to analyse that bit...
>
> Regards,
> Vasilis
>
> On Sat, Oct 24, 2015 at 5:09 PM, Ajay Garg  wrote:
>
>> Thanks a ton Vasileios !!
>>
>> Just one last question ::
>> Does running "nodetool repair" affect the functionality of cluster for
>> current-live data?
>>
>> It's ok if the insertions/deletions of current-live data become a little
>> slow during the process, but data-consistency must be maintained. If that
>> is the case, I think we are good.
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>> On Sat, Oct 24, 2015 at 6:03 PM, Vasileios Vlachos <
>> vasileiosvlac...@gmail.com> wrote:
>>
>>> Hello Ajay,
>>>
>>> Here is a good link:
>>>
>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesManualRepair.html
>>>
>>> Generally, I find the DataStax docs to be OK. You could consult them for
>>> all usual operations etc. Ofc there are occasions where a given concept is
>>> not as clear, but you can always ask this list for clarification.
>>>
>>> If you find that something is wrong in the docs just email them (more
>>> info and contact email here: http://docs.datastax.com/en/ ).
>>>
>>> Regards,
>>> Vasilis
>>>
>>> On Sat, Oct 24, 2015 at 1:04 PM, Ajay Garg 
>>> wrote:
>>>
>>>> Thanks Vasileios for the reply !!!
>>>> That makes sense !!!
>>>>
>>>> I will be grateful if you could point me to the node-repair command for
>>>> Cassandra-2.1.10.
>>>> I don't want to get stuck in a wrong-versioned documentation (already
>>>> bitten once hard when setting up replication).
>>>>
>>>> Thanks again...
>>>>
>>>>
>>>> Thanks and Regards,
>>>> Ajay
>>>>
>>>> On Sat, Oct 24, 2015 at 4:14 PM, Vasileios Vlachos <
>>>> vasileiosvlac...@gmail.com> wrote:
>>>>
>>>>> Hello Ajay,
>>>>>
>>>>> Have a look in the *max_hint_window_in_ms* :
>>>>>
>>>>>
>>>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
>>>>>
>>>>> My understanding is that if a node remains down for more than
>>>>> *max_hint_window_in_ms*, then you will need to repair that node.
>>>>>
>>>>> Thanks,
>>>>> Vasilis
>>>>>
>>>>> On Sat, Oct 24, 2015 at 7:48 AM, Ajay Garg 
>>>>> wrote:
>>>>>
>>>>>> If a node in the cluster goes down and comes up, the data gets synced
>>>>>> up on this downed node.
>>>>>> Is there a limit on the interval for which the node can remain down?
>>>>>> Or the data will be synced up even if the node remains down for
>>>>>> weeks/months/years?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Ajay
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-25 Thread Ajay Garg

Some more observations ::

a)
CAS11 and CAS12 are down, CAS21 and CAS22 up.
If I connect via the driver to the cluster using only CAS21 and CAS22 as
contact-points, even then the exception occurs.

b)
CAS11 down, CAS12 up, CAS21 and CAS22 up.
If I connect via the driver to the cluster using only CAS21 and CAS22 as
contact-points, then connection goes fine.

c)
CAS11 up, CAS12 down, CAS21 and CAS22 up.
If I connect via the driver to the cluster using only CAS21 and CAS22 as
contact-points, then connection goes fine.


Seems the java-driver is kinda always requiring either one of CAS11 or
CAS12 to be up (although the expectation is that the driver must work fine
if ANY of the 4 nodes is up).


Thoughts, experts !? :)



On Sat, Oct 24, 2015 at 9:40 PM, Ajay Garg  wrote:

> Ideas please, on what I may be doing wrong?
>
> On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg  wrote:
>
>> Hi All.
>>
>> I have been doing extensive testing, and replication works fine, even if
>> any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
>> Syncing always takes place (obviously, as long as continuous-downtime-value
>> does not exceed *max_hint_window_in_ms*).
>>
>>
>> However, things behave weird when I try connecting via DataStax
>> Java-Driver.
>> I always add the nodes to the cluster in the order ::
>>
>>  CAS11, CAS12, CAS21, CAS22
>>
>> during "cluster.connect" method.
>>
>>
>> Now, following happens ::
>>
>> a)
>> If CAS11 goes down, data is persisted fine (presumably first in CAS12,
>> and later replicated to CAS21 and CAS22).
>>
>> b)
>> If CAS11 and CAS12 go down, data is NOT persisted.
>> Instead the following exceptions are observed in the Java-Driver ::
>>
>>
>> ##
>> Exception in thread "main"
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>> tried for query failed (no host was tried)
>> at
>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>> at
>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
>> at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
>> at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
>> at
>> com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
>> at
>> com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
>> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
>> All host(s) tried for query failed (no host was tried)
>> at
>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>> at
>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
>> at
>> com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
>> at
>> com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
>> at
>> com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
>> at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
>> ... 3 more
>>
>> ###
>>
>>
>> I have already tried ::
>>
>> 1)
>> Increasing driver-read-timeout from 12 seconds to 30 seconds.
>>
>> 2)
>> Increasing driver-connect-timeout from 5 seconds to 30 seconds.
>>
>> 3)
>> I have also confirmed that each of the 4 nodes are telnet-able over ports
>> 9042 and 9160 each.
>>
>>
>> Definitely seems to be some driver-issue, since
>> data-persistence/replication works perfect (with any permutation) if
>> data-persistence is done via "cqlsh".
>>
>>
>> Kindly provide some pointers.
>> Ultimately, it is the Java-driver that will be used in production, so it
>> is imperative that data-persistence/replication happens for any downing of
>> any permutation of node(s).
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

2015-10-25 Thread Ajay Garg

Bingo !!!

Using "LoadBalancingPolicy" did the trick.
Exactly what was needed !!!


Thanks and Regards,
Ajay

On Sun, Oct 25, 2015 at 5:52 PM, Ryan Svihla  wrote:

> Ajay,
>
> So It's the default driver behavior to pin requests to the first data
> center it connects to (DCAwareRoundRobin strategy). but let me explain why
> this is.
>
> I think you're thinking about data centers in Cassandra as a unit of
> failure, and while you can have say a rack fail, as you scale up and use
> rack awareness, it's rare you lose a whole "data center" in the sense
> you're thinking about, so lets reset a bit:
>
>1. If I'm designing a multidc architecture, usually the nature of
>latency I will not want my app servers connecting _across_ data centers.
>2. So since the common desire is not to magically have very high
>latency requests  bleed out to remote data centers, the default behavior of
>the driver is to pin to the first data center it connects too, you can
>change this with a different Load Balancing Policy (
>
> http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/policies/LoadBalancingPolicy.html
>)
>3. However, I generally do NOT advise users connecting to an app
>server from another data center, since Cassandra is a masterless
>architecture you typically have issues that affect nodes, and not an entire
>data center and if they affect an entire data center (say the intra DC link
>is down) then it's going to affect your app server as well!
>
> So for new users, I typically just recommend pinning an app server to a DC
> and do your data center level switching further up. You can get more
> advanced and handle bleed out later, but you have to think of latencies.
>
> Final point, rely on repairs for your data consistency, hints are great
> and all but repair is how you make sure you're in sync.
>
> On Sun, Oct 25, 2015 at 3:10 AM, Ajay Garg  wrote:
>
>> Some more observations ::
>>
>> a)
>> CAS11 and CAS12 are down, CAS21 and CAS22 up.
>> If I connect via the driver to the cluster using only CAS21 and CAS22 as
>> contact-points, even then the exception occurs.
>>
>> b)
>> CAS11 down, CAS12 up, CAS21 and CAS22 up.
>> If I connect via the driver to the cluster using only CAS21 and CAS22 as
>> contact-points, then connection goes fine.
>>
>> c)
>> CAS11 up, CAS12 down, CAS21 and CAS22 up.
>> If I connect via the driver to the cluster using only CAS21 and CAS22 as
>> contact-points, then connection goes fine.
>>
>>
>> Seems the java-driver is kinda always requiring either one of CAS11 or
>> CAS12 to be up (although the expectation is that the driver must work fine
>> if ANY of the 4 nodes is up).
>>
>>
>> Thoughts, experts !? :)
>>
>>
>>
>> On Sat, Oct 24, 2015 at 9:40 PM, Ajay Garg 
>> wrote:
>>
>>> Ideas please, on what I may be doing wrong?
>>>
>>> On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg 
>>> wrote:
>>>
>>>> Hi All.
>>>>
>>>> I have been doing extensive testing, and replication works fine, even
>>>> if any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought
>>>> up. Syncing always takes place (obviously, as long as
>>>> continuous-downtime-value does not exceed *max_hint_window_in_ms*).
>>>>
>>>>
>>>> However, things behave weird when I try connecting via DataStax
>>>> Java-Driver.
>>>> I always add the nodes to the cluster in the order ::
>>>>
>>>>  CAS11, CAS12, CAS21, CAS22
>>>>
>>>> during "cluster.connect" method.
>>>>
>>>>
>>>> Now, following happens ::
>>>>
>>>> a)
>>>> If CAS11 goes down, data is persisted fine (presumably first in CAS12,
>>>> and later replicated to CAS21 and CAS22).
>>>>
>>>> b)
>>>> If CAS11 and CAS12 go down, data is NOT persisted.
>>>> Instead the following exceptions are observed in the Java-Driver ::
>>>>
>>>>
>>>> ##
>>>> Exception in thread "main"
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>> tried for query failed (no host was tried)
>>>> at
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailable

Can consistency-levels be different for "read" and "write" in Datastax Java-Driver?

2015-10-26 Thread Ajay Garg

Right now, I have setup "LOCAL QUORUM" as the consistency level in the
driver, but it seems that "SERIAL" is being used during writes, and I
consistently get this error of type ::

*Cassandra timeout during write query at consistency SERIAL (3 replica were
required but only 0 acknowledged the write)*


Am I missing something?


-- 
Regards,
Ajay

Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-02 Thread Ajay Garg

Hi All.

I have a 2*2 Network-Topology Replication setup, and I run my application
via DataStax-driver.

I frequently get the errors of type ::
*Cassandra timeout during write query at consistency SERIAL (3 replica were
required but only 0 acknowledged the write)*

I have already tried passing a "write-options with LOCAL_QUORUM
consistency-level" in all create/save statements, but I still get this
error.

Does something else need to be changed in /etc/cassandra/cassandra.yaml too?
Or may be some another place?

-- 
Regards,
Ajay

Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-02 Thread Ajay Garg

Hi Eric,

I am sorry, but I don't understand.

If there had been some issue in the configuration, then the
consistency-issue would be seen everytime (I guess).
As of now, the error is seen sometimes (probably 30% of times).

On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens  wrote:

> Serial consistency gets invoked at the protocol level when doing
> lightweight transactions such as CAS operations.  If you're expecting that
> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
> there aren't enough nodes available to satisfy serial consistency.
>
> See
> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>
> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg  wrote:
>
>> Hi All.
>>
>> I have a 2*2 Network-Topology Replication setup, and I run my application
>> via DataStax-driver.
>>
>> I frequently get the errors of type ::
>> *Cassandra timeout during write query at consistency SERIAL (3 replica
>> were required but only 0 acknowledged the write)*
>>
>> I have already tried passing a "write-options with LOCAL_QUORUM
>> consistency-level" in all create/save statements, but I still get this
>> error.
>>
>> Does something else need to be changed in /etc/cassandra/cassandra.yaml
>> too?
>> Or may be some another place?
>>
>>
>> --
>> Regards,
>> Ajay
>>
>


-- 
Regards,
Ajay

Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-03 Thread Ajay Garg

Hmm... ok.

Ideally, we require ::

a)
The intra-DC-node-syncing takes place at the statement/query level.

b)
The inter-DC-node-syncing takes place at cassandra level.


That way, we don't spend too much delay at the statement/query level.


For the so-called CAS/lightweight transactions, the above are impossible
then?

On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng  wrote:

> What Eric means is that SERIAL consistency is a special type of
> consistency that is only invoked for a subset of operations: those that use
> CAS/lightweight transactions, for example "IF NOT EXISTS" queries.
>
> The differences between CAS operations and standard operations are
> significant and there are large repercussions for tunable consistency. The
> amount of time such an operation takes is greatly increased as well; you
> may need to increase your internal node-to-node timeouts .
>
> On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg  wrote:
>
>> Hi Eric,
>>
>> I am sorry, but I don't understand.
>>
>> If there had been some issue in the configuration, then the
>> consistency-issue would be seen everytime (I guess).
>> As of now, the error is seen sometimes (probably 30% of times).
>>
>> On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens  wrote:
>>
>>> Serial consistency gets invoked at the protocol level when doing
>>> lightweight transactions such as CAS operations.  If you're expecting that
>>> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
>>> there aren't enough nodes available to satisfy serial consistency.
>>>
>>> See
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>>>
>>> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg  wrote:
>>>
>>>> Hi All.
>>>>
>>>> I have a 2*2 Network-Topology Replication setup, and I run my
>>>> application via DataStax-driver.
>>>>
>>>> I frequently get the errors of type ::
>>>> *Cassandra timeout during write query at consistency SERIAL (3 replica
>>>> were required but only 0 acknowledged the write)*
>>>>
>>>> I have already tried passing a "write-options with LOCAL_QUORUM
>>>> consistency-level" in all create/save statements, but I still get this
>>>> error.
>>>>
>>>> Does something else need to be changed in /etc/cassandra/cassandra.yaml
>>>> too?
>>>> Or may be some another place?
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay

Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-04 Thread Ajay Garg

Hi All.

I think we got the root-cause.

One of the fields in one of the class was marked with "@Version"
annotation, which was causing the Cassandra-Java-Driver to insert "If Not
Exists" in the insert query, thus invoking SERIAL consistency-level.

We removed the annotation (didn't really need that), and we have not
observed the error since about an hour or so.


Thanks Eric and Bryan for the help !!!


Thanks and Regards,
Ajay

On Wed, Nov 4, 2015 at 8:51 AM, Ajay Garg  wrote:

> Hmm... ok.
>
> Ideally, we require ::
>
> a)
> The intra-DC-node-syncing takes place at the statement/query level.
>
> b)
> The inter-DC-node-syncing takes place at cassandra level.
>
>
> That way, we don't spend too much delay at the statement/query level.
>
>
> For the so-called CAS/lightweight transactions, the above are impossible
> then?
>
> On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng  wrote:
>
>> What Eric means is that SERIAL consistency is a special type of
>> consistency that is only invoked for a subset of operations: those that use
>> CAS/lightweight transactions, for example "IF NOT EXISTS" queries.
>>
>> The differences between CAS operations and standard operations are
>> significant and there are large repercussions for tunable consistency. The
>> amount of time such an operation takes is greatly increased as well; you
>> may need to increase your internal node-to-node timeouts .
>>
>> On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg  wrote:
>>
>>> Hi Eric,
>>>
>>> I am sorry, but I don't understand.
>>>
>>> If there had been some issue in the configuration, then the
>>> consistency-issue would be seen everytime (I guess).
>>> As of now, the error is seen sometimes (probably 30% of times).
>>>
>>> On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens  wrote:
>>>
>>>> Serial consistency gets invoked at the protocol level when doing
>>>> lightweight transactions such as CAS operations.  If you're expecting that
>>>> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
>>>> there aren't enough nodes available to satisfy serial consistency.
>>>>
>>>> See
>>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>>>>
>>>> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg 
>>>> wrote:
>>>>
>>>>> Hi All.
>>>>>
>>>>> I have a 2*2 Network-Topology Replication setup, and I run my
>>>>> application via DataStax-driver.
>>>>>
>>>>> I frequently get the errors of type ::
>>>>> *Cassandra timeout during write query at consistency SERIAL (3 replica
>>>>> were required but only 0 acknowledged the write)*
>>>>>
>>>>> I have already tried passing a "write-options with LOCAL_QUORUM
>>>>> consistency-level" in all create/save statements, but I still get this
>>>>> error.
>>>>>
>>>>> Does something else need to be changed in
>>>>> /etc/cassandra/cassandra.yaml too?
>>>>> Or may be some another place?
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Ajay
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay

Basic query in setting up secure inter-dc cluster

2016-01-05 Thread Ajay Garg

Hi All.

We have a 2*2 cluster deployed, but no security as of now.
As a first stage, we wish to implement inter-dc security.

Is it possible to enable security one machine at a time?

For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
If I make the changes JUST IN DC2M2 and restart it, will the traffic
between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
AFTER the changes are made in all the 4 machines?

Asking here, because I don't want to screw up a live cluster due to my lack
of experience.

Looking forward to some pointers.

-- 
Regards,
Ajay

Re: Basic query in setting up secure inter-dc cluster

2016-01-06 Thread Ajay Garg

Thanks everyone for the reply.

I actually have a fair bit of questions, but it will be nice if someone
could please tell me the flow (implementation-wise), as to how node-to-node
encryption works in a cluster.

Let's say node1 from DC1, wishes to talk securely to node 2 from DC2
(with *"require_client_auth:
false*").
I presume it would be like below (please correct me if am wrong) ::

a)
node1 tries to connect to node2, using the certificate *as defined on node1*
in cassandra.yaml.

b)
node2 will confirm if the certificate being offered by node1 is in the
truststore *as defined on node2* in cassandra.yaml.
if it is, secure-communication is allowed.

Is my thinking right?
I

On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave  wrote:

> Hi Ajay,
> Have a look here :
> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>
> You can configure for DC level Security:
>
> Procedure
>
> On each node under sever_encryption_options:
>
>- Enable internode_encryption.
>The available options are:
>   - all
>   - none
>   - dc: Cassandra encrypts the traffic between the data centers.
>   - rack: Cassandra encrypts the traffic between the racks.
>
> regards
>
> Neha
>
>
>
> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet 
> wrote:
>
>> Security is a very wide concept. What exactly do you want to achieve ?
>>
>>
>>
>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Basic query in setting up secure inter-dc cluster
>>
>>
>>
>> Hi All.
>>
>> We have a 2*2 cluster deployed, but no security as of now.
>>
>> As a first stage, we wish to implement inter-dc security.
>>
>> Is it possible to enable security one machine at a time?
>>
>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>
>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>> AFTER the changes are made in all the 4 machines?
>>
>> Asking here, because I don't want to screw up a live cluster due to my
>> lack of experience.
>>
>> Looking forward to some pointers.
>>
>>
>> --
>>
>> Regards,
>> Ajay
>>
>
>

-- 
Regards,
Ajay

Re: Basic query in setting up secure inter-dc cluster

2016-01-17 Thread Ajay Garg

Hi All.

A gentle query-reminder.

I will be grateful if I could be given a brief technical overview, as to
how secure-communication occurs between two nodes in a cluster.

Please note that I wish for some information on the "how it works below the
hood", and NOT "how to set it up".



Thanks and Regards,
Ajay

On Wed, Jan 6, 2016 at 4:16 PM, Ajay Garg  wrote:

> Thanks everyone for the reply.
>
> I actually have a fair bit of questions, but it will be nice if someone
> could please tell me the flow (implementation-wise), as to how node-to-node
> encryption works in a cluster.
>
> Let's say node1 from DC1, wishes to talk securely to node 2 from DC2 (with 
> *"require_client_auth:
> false*").
> I presume it would be like below (please correct me if am wrong) ::
>
> a)
> node1 tries to connect to node2, using the certificate *as defined on
> node1* in cassandra.yaml.
>
> b)
> node2 will confirm if the certificate being offered by node1 is in the
> truststore *as defined on node2* in cassandra.yaml.
> if it is, secure-communication is allowed.
>
>
> Is my thinking right?
> I
>
> On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave  wrote:
>
>> Hi Ajay,
>> Have a look here :
>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>
>> You can configure for DC level Security:
>>
>> Procedure
>>
>> On each node under sever_encryption_options:
>>
>>- Enable internode_encryption.
>>The available options are:
>>   - all
>>   - none
>>   - dc: Cassandra encrypts the traffic between the data centers.
>>   - rack: Cassandra encrypts the traffic between the racks.
>>
>> regards
>>
>> Neha
>>
>>
>>
>> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet > > wrote:
>>
>>> Security is a very wide concept. What exactly do you want to achieve ?
>>>
>>>
>>>
>>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Basic query in setting up secure inter-dc cluster
>>>
>>>
>>>
>>> Hi All.
>>>
>>> We have a 2*2 cluster deployed, but no security as of now.
>>>
>>> As a first stage, we wish to implement inter-dc security.
>>>
>>> Is it possible to enable security one machine at a time?
>>>
>>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>>
>>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>>> AFTER the changes are made in all the 4 machines?
>>>
>>> Asking here, because I don't want to screw up a live cluster due to my
>>> lack of experience.
>>>
>>> Looking forward to some pointers.
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Ajay
>>>
>>
>>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay

Can we set TTL on individual fields (columns) using the Datastax java-driver

2016-02-08 Thread Ajay Garg

Something like ::


##
class A {

  @Id
  @Column (name = "pojo_key")
  int key;

  @Ttl(10)
  @Column (name = "pojo_temporary_guest")
  String guest;

}
##


When I persist, let's say value "ajay" in guest-field (pojo_temporary_guest
column), it stays forever, and does not become "null" after 10 seconds.

Kindly point me what I am doing wrong.
I will be grateful.


Thanks and Regards,
Ajay

Re: Basic query in setting up secure inter-dc cluster

2016-04-17 Thread Ajay Garg

Ok, trying to wake up this thread again.

I went through the following links ::

https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html


and I am wondering *if it is possible to setup secure inter-communication
only between some nodes*.

In particular, if I have a 2*2 cluster, is it possible to setup secure
communication ONLY between the nodes of DC2?
Once it works well, we would then setup secure-communication everywhere.

We are wanting this, because DC2 is the backup centre, while DC1 is the
primary-centre connected directly to the application-server. We don't want
to screw things if something goes bad in DC1.


Will be grateful for pointers.


Thanks and Regards,
Ajay

On Sun, Jan 17, 2016 at 9:09 PM, Ajay Garg  wrote:

> Hi All.
>
> A gentle query-reminder.
>
> I will be grateful if I could be given a brief technical overview, as to
> how secure-communication occurs between two nodes in a cluster.
>
> Please note that I wish for some information on the "how it works below
> the hood", and NOT "how to set it up".
>
>
>
> Thanks and Regards,
> Ajay
>
> On Wed, Jan 6, 2016 at 4:16 PM, Ajay Garg  wrote:
>
>> Thanks everyone for the reply.
>>
>> I actually have a fair bit of questions, but it will be nice if someone
>> could please tell me the flow (implementation-wise), as to how node-to-node
>> encryption works in a cluster.
>>
>> Let's say node1 from DC1, wishes to talk securely to node 2 from DC2
>> (with *"require_client_auth: false*").
>> I presume it would be like below (please correct me if am wrong) ::
>>
>> a)
>> node1 tries to connect to node2, using the certificate *as defined on
>> node1* in cassandra.yaml.
>>
>> b)
>> node2 will confirm if the certificate being offered by node1 is in the
>> truststore *as defined on node2* in cassandra.yaml.
>> if it is, secure-communication is allowed.
>>
>>
>> Is my thinking right?
>> I
>>
>> On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave  wrote:
>>
>>> Hi Ajay,
>>> Have a look here :
>>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>>
>>> You can configure for DC level Security:
>>>
>>> Procedure
>>>
>>> On each node under sever_encryption_options:
>>>
>>>- Enable internode_encryption.
>>>The available options are:
>>>   - all
>>>   - none
>>>   - dc: Cassandra encrypts the traffic between the data centers.
>>>   - rack: Cassandra encrypts the traffic between the racks.
>>>
>>> regards
>>>
>>> Neha
>>>
>>>
>>>
>>> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet <
>>> absi...@informatica.com> wrote:
>>>
>>>> Security is a very wide concept. What exactly do you want to achieve ?
>>>>
>>>>
>>>>
>>>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>>>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Basic query in setting up secure inter-dc cluster
>>>>
>>>>
>>>>
>>>> Hi All.
>>>>
>>>> We have a 2*2 cluster deployed, but no security as of now.
>>>>
>>>> As a first stage, we wish to implement inter-dc security.
>>>>
>>>> Is it possible to enable security one machine at a time?
>>>>
>>>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>>>
>>>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>>>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>>>> AFTER the changes are made in all the 4 machines?
>>>>
>>>> Asking here, because I don't want to screw up a live cluster due to my
>>>> lack of experience.
>>>>
>>>> Looking forward to some pointers.
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay

Re: Basic query in setting up secure inter-dc cluster

2016-04-17 Thread Ajay Garg

Also, wondering what is the difference between "all" and "dc" in
"internode_encryption".
Perhaps my answer lies in this?

On Mon, Apr 18, 2016 at 9:51 AM, Ajay Garg  wrote:

> Ok, trying to wake up this thread again.
>
> I went through the following links ::
>
>
> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>
> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html
>
>
> and I am wondering *if it is possible to setup secure inter-communication
> only between some nodes*.
>
> In particular, if I have a 2*2 cluster, is it possible to setup secure
> communication ONLY between the nodes of DC2?
> Once it works well, we would then setup secure-communication everywhere.
>
> We are wanting this, because DC2 is the backup centre, while DC1 is the
> primary-centre connected directly to the application-server. We don't want
> to screw things if something goes bad in DC1.
>
>
> Will be grateful for pointers.
>
>
> Thanks and Regards,
> Ajay
>
> On Sun, Jan 17, 2016 at 9:09 PM, Ajay Garg  wrote:
>
>> Hi All.
>>
>> A gentle query-reminder.
>>
>> I will be grateful if I could be given a brief technical overview, as to
>> how secure-communication occurs between two nodes in a cluster.
>>
>> Please note that I wish for some information on the "how it works below
>> the hood", and NOT "how to set it up".
>>
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>> On Wed, Jan 6, 2016 at 4:16 PM, Ajay Garg  wrote:
>>
>>> Thanks everyone for the reply.
>>>
>>> I actually have a fair bit of questions, but it will be nice if someone
>>> could please tell me the flow (implementation-wise), as to how node-to-node
>>> encryption works in a cluster.
>>>
>>> Let's say node1 from DC1, wishes to talk securely to node 2 from DC2
>>> (with *"require_client_auth: false*").
>>> I presume it would be like below (please correct me if am wrong) ::
>>>
>>> a)
>>> node1 tries to connect to node2, using the certificate *as defined on
>>> node1* in cassandra.yaml.
>>>
>>> b)
>>> node2 will confirm if the certificate being offered by node1 is in the
>>> truststore *as defined on node2* in cassandra.yaml.
>>> if it is, secure-communication is allowed.
>>>
>>>
>>> Is my thinking right?
>>> I
>>>
>>> On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave 
>>> wrote:
>>>
>>>> Hi Ajay,
>>>> Have a look here :
>>>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>>>
>>>> You can configure for DC level Security:
>>>>
>>>> Procedure
>>>>
>>>> On each node under sever_encryption_options:
>>>>
>>>>- Enable internode_encryption.
>>>>The available options are:
>>>>   - all
>>>>   - none
>>>>   - dc: Cassandra encrypts the traffic between the data centers.
>>>>   - rack: Cassandra encrypts the traffic between the racks.
>>>>
>>>> regards
>>>>
>>>> Neha
>>>>
>>>>
>>>>
>>>> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet <
>>>> absi...@informatica.com> wrote:
>>>>
>>>>> Security is a very wide concept. What exactly do you want to achieve ?
>>>>>
>>>>>
>>>>>
>>>>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>>>>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* Basic query in setting up secure inter-dc cluster
>>>>>
>>>>>
>>>>>
>>>>> Hi All.
>>>>>
>>>>> We have a 2*2 cluster deployed, but no security as of now.
>>>>>
>>>>> As a first stage, we wish to implement inter-dc security.
>>>>>
>>>>> Is it possible to enable security one machine at a time?
>>>>>
>>>>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>>>>
>>>>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>>>>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>>>>> AFTER the changes are made in all the 4 machines?
>>>>>
>>>>> Asking here, because I don't want to screw up a live cluster due to my
>>>>> lack of experience.
>>>>>
>>>>> Looking forward to some pointers.
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>> Ajay
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay

New to cassandra

2010-06-21 Thread Ajay Singh

Hi

I am a php developer, I am new to cassandra. Is there any starting guide or
tutorial  from where i can begin

Thanks
Ajay

1 2 >

100 matches

Mail list logo