Re: Should maintenance repairs be run on system related keyspaces?

2015-08-11 Thread Prem Yadav
Hi Ken,
the system_auth keyspace should be repaired. However the system keyspace
uses a local replication strategy and there is no point is repairing it.

Thanks,
Prem

On Tue, Aug 11, 2015 at 3:01 PM, K F  wrote:

> Hi,
>
> I have a question in general with regards to repairs on system related
> keyspaces. Is it necessary in maintenance repair kicked of via cron should
> also repair system related keyspaces?
>
> Regards,
> Ken
>
>
>


Re: Change from single region EC2 to multi-region

2015-08-11 Thread Prem Yadav
1) There are ways to connect two VPCs using VPN.
2) About the connectivity using public IP. Can you ping the one public ip
from another one in a different region.
If ping works, please check port connectivity using telnet. You can start a
temp server on a port using netcat. If connectivity fails, you need to
looks into your routing tables to allow connectivity on the public ip
addresses.

On Tue, Aug 11, 2015 at 7:51 PM, Asher Newcomer  wrote:

> X-post w/ SO: link
> 
>
> I have (had) a working 4 node Cassandra cluster setup in an EC2 VPC. Setup
> was as follows:
>
> 172.18.100.110 - seed - DC1 / RAC1
>
> 172.18.100.111 - DC1 / RAC1
>
> 172.18.100.112 - seed - DC1 / RAC2
>
> 172.18.100.113 - DC1 / RAC2
>
> All of the above nodes are in East-1D, and I have configured it using the
> GossipingPropertyFileSnitch (I would rather not use the EC2 specific
> snitches).
>
> listen_address & broadcast_address were both set to the node's private IP.
>
> I then wanted to expand the cluster into a new region (us-west). Because
> cross-region private IP communication is not supported in EC2, I attempted
> to change the settings to have the nodes communicate through their public
> IPs.
>
> listen_address remained set to private IP
> broadcast_address was changed to the public IP
> seeds_list IPs were changed to the appropriate public IPs
>
> I restarted the nodes one by one expecting them to simply 'work', but now
> they only see themselves and not the other nodes.
>
> nodetool status consistently returns:
>
> Datacenter: DC1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID Rack
> DN 172.18.100.112 ? 256 ? 968aaa8a-32b7-4493-9747-3df1c3784164 r1
> DN 172.18.100.113 ? 256 ? 8e03643c-9db8-4906-aabc-0a8f4f5c087d r1
> UN [public IP of local node] 75.91 GB 256 ?
> 6fdcc85d-6c78-46f2-b41f-abfe1c86ac69 RAC1
> DN 172.18.100.110 ? 256 ? fb7b78a8-d1cc-46fe-ab18-f0d3075cb426 r1
>
> On each individual node, the other nodes seem 'stuck' using the private IP
> addresses.
>
> *How do I force the nodes to look for each other at their public
> addresses?*
>
> I have fully opened the EC2 security group/firewall as a test to rule out
> any problems there - and it hasn't helped.
>
> Any ideas most appreciated.
>


Re: sstableloader in version 2.0.14 doesn't honor thrift_framed_transport_size_in_mb set on server side

2015-08-14 Thread Prem Yadav
We had this issue when using hive on cassandra.
We had to replace the thrift jar with our own patches.

On Fri, Aug 14, 2015 at 5:27 PM, K F  wrote:

> While using sstableloader in 2.0.14 we have discovered that setting
> the thrift_framed_transport_size_in_mb to 16 in cassandra.yaml doesn't
> honor it. Did anybody see similar issue?
>
> So, this is the exception seen,
>
> org.apache.thrift.transport.TTransportException: Frame size (16165888)
> larger than max length (15728640)!
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:282)
> at
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:94)
> Caused by: org.apache.thrift.transport.TTransportException: Frame size
> (16165888) larger than max length (15728640)!
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
> at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1251)
> at
> org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1238)
> at
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:258)
> ... 2 more
>
> On Server side, it's the following.
>
> 2015-08-14 15:10:10,637 [main] INFO ThriftServer Using TFramedTransport
> with a max frame size of 16777216 bytes.
>
>


Re: Need advice for multi DC C* setup

2015-08-14 Thread Prem Yadav
The EC2 nodes must be in the default VPC.

create a ring in the VPC in region B. Use VPC peering to connect the
default and the region B VPC.
The two rings should join the existing one. Alter the replication strategy
to network replication so that the data is replicated to the new rings.
Repair the keyspaces.
Once it is done, you can decommission the existing ring.

For spark,if you are using datastax version, it comes with spark. You just
need to change a config and spark starts along with cassandra. A separate
ring is advised for analytics stuff.


On Sat, Aug 15, 2015 at 1:10 AM, Aiman Parvaiz  wrote:

> Hi all
> We are planning to move C* from EC2 (region A) to VPC in region B. I will
> enumerate our goals so that you guys can advice me keeping in mind the
> bigger picture.
>
> Goals:
> - Move to VPC is another region.
> - Enable Vnodes.
> - Bump up RF to 3.
> - Ability to have a spark cluster.
>
> I know this is a LOT of work and I know this all might not be possible in
> one go.
>
> Existing cluster in EC2 is using RF=2, simple snitch and simple
> replication.
>
> I am not sure what would be the best way to approach this task. So please
> anyone who has done this and would like to share anything I would really
> appreciate the effort.
>
> Thanks
>


Re: Need advice for multi DC C* setup

2015-08-16 Thread Prem Yadav
I meant the existing nodes must be in the default VPC if you did not create
one,
In any case, you can use the VPC peering.

On Sun, Aug 16, 2015 at 5:34 AM, John Wong  wrote:

> > The EC2 nodes must be in the default VPC.
> Did you really mean the default VPC created by AWS or just a VPC? Because
> I would be very surprise default VPC must be used.
>
> On Sat, Aug 15, 2015 at 2:50 AM, Prem Yadav  wrote:
>
>>
>> The EC2 nodes must be in the default VPC.
>>
>> create a ring in the VPC in region B. Use VPC peering to connect the
>> default and the region B VPC.
>> The two rings should join the existing one. Alter the replication
>> strategy to network replication so that the data is replicated to the new
>> rings. Repair the keyspaces.
>> Once it is done, you can decommission the existing ring.
>>
>> For spark,if you are using datastax version, it comes with spark. You
>> just need to change a config and spark starts along with cassandra. A
>> separate ring is advised for analytics stuff.
>>
>>
>> On Sat, Aug 15, 2015 at 1:10 AM, Aiman Parvaiz 
>> wrote:
>>
>>> Hi all
>>> We are planning to move C* from EC2 (region A) to VPC in region B. I
>>> will enumerate our goals so that you guys can advice me keeping in mind the
>>> bigger picture.
>>>
>>> Goals:
>>> - Move to VPC is another region.
>>> - Enable Vnodes.
>>> - Bump up RF to 3.
>>> - Ability to have a spark cluster.
>>>
>>> I know this is a LOT of work and I know this all might not be possible
>>> in one go.
>>>
>>> Existing cluster in EC2 is using RF=2, simple snitch and simple
>>> replication.
>>>
>>> I am not sure what would be the best way to approach this task. So
>>> please anyone who has done this and would like to share anything I would
>>> really appreciate the effort.
>>>
>>> Thanks
>>>
>>
>>
>


Re: How to run any application on Cassandra cluster in high availability mode

2015-08-16 Thread Prem Yadav
The MySQL is there just to save the state of things. I suppose it very
lightweight. Why not just install mysql on one of the nodes or a VM
somewhere.


On Sun, Aug 16, 2015 at 3:39 PM, John Wong  wrote:

> Sorry i meant integration with Cassandra (based on the docs by default it
> suggests MySQL)
>
>
> On Sunday, August 16, 2015, John Wong  wrote:
>
>> There is no leader in cassandra. I suggest you ask Azkaban community
>> about intgteation with Azkaban and Azkaban HA.
>>
>> On Sunday, August 16, 2015, Vikram Kone  wrote:
>>
>>> Can't we use zoo keeper for leader election in Cassandra and based on
>>> who is leader ..run azkaban or any app instance for that matter on that
>>> Cassandra server. I'm thinking that I can copy the applocation folder to
>>> all nodes and then determine which one to run using zookeeper. Is that
>>> possible ?
>>>
>>> Sent from Outlook 
>>>
>>>
>>>
>>>
>>> On Sun, Aug 16, 2015 at 6:47 AM -0700, "John Wong" <
>>> gokoproj...@gmail.com> wrote:
>>>
>>> Hi

 I am not familiar with Azkaban and probably a better question to the
 Azkaban community IMO. But there seems to be two modes (
 http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
 two-server mode, but either way I think still SPOF? If there is no
 election, just based on process, my 2 cents would be monitor, alert, and
 start the process somewhere else. Better yet, don't install the process on
 Cassandra node. Keep your instance for one purpose only. If you run cloud
 like AWS you will be able to autoscale min1 max1 easily.


 Note: In peer-to-peer architecture, there is simply no concept of
 master. You can start with some seed nodes for discovery. It depends how
 you design discovery.

 On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone 
 wrote:

> Hi,
> We are planning to install Azkaban in solo server mode on a 24
> node cassandra cluster to be able to schedule spark jobs with intricate
> dependency chain. The problem, is since Cassandra has a no-SPOF
> architecture ie any node can become the master for the cluster, it creates
> the problem for Azkaban master since it's not a peer-peer architecture
> where any node can become the master. Only a single mode has to be master
> at any given time.
>
> What are our options here? Are there any framworks or tools out there
> that would allow any application to run on a cluster of machines with high
> availablity?
> Should I be looking at something like zookeeper for this ? Or Mesos
> may be?



>>
>> --
>> Sent from Jeff Dean's printf() mobile console
>>
>
>
> --
> Sent from Jeff Dean's printf() mobile console
>


Spark on cassandra

2015-11-12 Thread Prem Yadav
Hi,
Is it better to use Spark APIs to do join on cassandra tables or should we
use SPARK-SQL.
We have been struggling with SPARK-SQL as we need to do multiple large
table joins and there is always failure.

I tried to do joins using the API like this:
val join1 =
sc.cassandraTable("Keyspace1","table1").joinWithCassandraTable("keyspace1","table2").on(SomeColumns("column1"))

however, we need to join multiple table from multiple keyspaces. How can we
do that?


Thanks,
Prem


Re: Triggering Deletion/Updation

2015-11-22 Thread Prem Yadav
if it is cassandra 2.0+,
you can implement your trigger. Please check the following link:

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support

Thanks,
Prem

On Sun, Nov 22, 2015 at 4:48 PM, Harikrishnan A  wrote:

> Trying for second time to get some insights to my below query ...
> Thanks
> Hari
>
> --
> *From:* Harikrishnan A 
> *To:* "user@cassandra.apache.org" 
> *Sent:* Friday, November 20, 2015 2:27 PM
> *Subject:* Triggering Deletion/Updation
>
> Hello,
>
> I have a generic question.  How can I initiate a triggering deletion in
> cassandra?
> I mean I need to delete few rows/Partition itself from other tables based
> on a status change in another table.
>
> Thanks & Regards,
> Hari
>
>
>


I am a Datastax certified Cassandra architect now :)

2015-11-22 Thread Prem Yadav
Just letting the community know that I just passed the Cassandra architect
certification with flying colors :).
Have to say I learnt a lot from this forum.

Thanks,
Prem


Re: No query results while expecting results

2015-11-23 Thread Prem Yadav
Can you run the trace again for  the query "select * " without any
conditions and see if you are getting results for tnt_id=5?


On Mon, Nov 23, 2015 at 1:23 PM, Ramon Rockx  wrote:

> Hello Oded and Carlos,
>
> Many thanks for your tips. I modified the consistency level in cqlsh, but
> with no success:
>
> cqlsh> consistency ALL
> Consistency level set to ALL.
> cqlsh> select * from mls.te where period=62013120356 and tnt_id=5;
>
> Tracing session: ef0e6590-91e3-11e5-8c24-6783eab735d4
>
>  activity
> | timestamp| source| source_elapsed
> 
> -+--+---+
>
> execute_cql3_query | 14:13:05,898 | 192.168.0.210 |  0
>  Parsing select * from mls.te where period=62013120356 and tnt_id=5 LIMIT
> 1; | 14:13:05,898 | 192.168.0.210 | 40
> Message received from /
> 192.168.0.210 | 14:13:05,898 | 192.168.0.211 | 13
>  Preparing
> statement | 14:13:05,898 | 192.168.0.210 |276
>   Executing single-partition
> query on te | 14:13:05,898 | 192.168.0.211 |259
> Enqueuing data request to /
> 192.168.0.211 | 14:13:05,898 | 192.168.0.210 |526
> Acquiring sstable
> references | 14:13:05,898 | 192.168.0.211 |272
>   Enqueuing digest request to /
> 192.168.0.212 | 14:13:05,898 | 192.168.0.210 |565
>  Merging memtable
> tombstones | 14:13:05,898 | 192.168.0.211 |301
>Sending message to /
> 192.168.0.211 | 14:13:05,898 | 192.168.0.210 |792
>Bloom filter allows skipping
> sstable 1638 | 14:13:05,898 | 192.168.0.211 |322
>  Bloom filter allows skipping
> sstable 10 | 14:13:05,898 | 192.168.0.211 |330
>   Merging data from memtables and 0
> sstables | 14:13:05,898 | 192.168.0.211 |336
>   Read 0 live and 0
> tombstoned cells | 14:13:05,898 | 192.168.0.211 |355
>Sending message to /
> 192.168.0.212 | 14:13:05,899 | 192.168.0.210 |   1011
> Enqueuing response to /
> 192.168.0.210 | 14:13:05,899 | 192.168.0.211 |417
>Sending message to /
> 192.168.0.210 | 14:13:05,899 | 192.168.0.211 |   1314
> Message received from /
> 192.168.0.212 | 14:13:05,901 | 192.168.0.210 |   2950
>  Processing response from /
> 192.168.0.212 | 14:13:05,901 | 192.168.0.210 |   3014
> Message received from /
> 192.168.0.211 | 14:13:05,901 | 192.168.0.210 |   3598
>  Processing response from /
> 192.168.0.211 | 14:13:05,901 | 192.168.0.210 |   3638
> Message received from /
> 192.168.0.210 | 14:13:05,909 | 192.168.0.212 | 28
>   Executing single-partition
> query on te | 14:13:05,909 | 192.168.0.212 |276
> Acquiring sstable
> references | 14:13:05,909 | 192.168.0.212 |297
>  Merging memtable
> tombstones | 14:13:05,909 | 192.168.0.212 |317
>Bloom filter allows skipping
> sstable 1648 | 14:13:05,909 | 192.168.0.212 |335
>   Bloom filter allows skipping
> sstable 5 | 14:13:05,909 | 192.168.0.212 |347
>   Merging data from memtables and 0
> sstables | 14:13:05,909 | 192.168.0.212 |352
>   Read 0 live and 0
> tombstoned cells | 14:13:05,909 | 192.168.0.212 |367
> Enqueuing response to /
> 192.168.0.210 | 14:13:05,909 | 192.168.0.212 |413
>Sending message to /
> 192.168.0.210 | 14:13:05,909 | 192.168.0.212 |695
> Request
> complete | 14:13:05,901 | 192.168.0.210 |   3785
>
> As you can see all the hosts of this cluster are accessed. I also repaired
> the *mls* keyspace o

Re: No query results while expecting results

2015-11-23 Thread Prem Yadav
Agree with Carlos. It might be the cql shell doing tricks.
Please check the json.

On Mon, Nov 23, 2015 at 2:31 PM, Carlos Alonso  wrote:

> Well, this makes me wonder how varints are compared in java vs python
> because the problem may be there.
>
> I'd suggest getting the token, to know which server contains the missing
> data. Go there and convert sstables to json, find the record and see what's
> there as the tnt_id. You could also use the thrift client to list it and
> see how it looks on disk and see if there's something wrong.
>
> If the data is there and looks fine, probably there's a problem managing
> varints somewhere in the read path.
>
> Regarfds
>
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 23 November 2015 at 13:55, Ramon Rockx  wrote:
>
>> Hello Prem,
>>
>> On Mon, Nov 23, 2015 at 2:36 PM, Prem Yadav  wrote:
>>
>>> Can you run the trace again for  the query "select * " without any
>>> conditions and see if you are getting results for tnt_id=5?
>>> <http://www.iqnomy.com/>
>>
>>
>> Of course, here are the results, with trace:
>>
>> cqlsh> tracing on
>> Now tracing requests.
>> cqlsh> select * from mls.te limit 330;
>>
>>  period  | tnt_id | evt_id   |
>> evt_type | data
>>
>> -++--+--+--
>> ...8<...
>> 62013120356 |  5 | c16bb2a0-5c1b-11e3-bf53-402d20524153 |
>> 0 |
>> {"v":1383387,"s":2052457,"r":95257,"pvs":3610247,"u":"
>> http://www.example.com"}
>>  62013120356 |  5 | 9f6cb1e0-5c1b-11e3-bf53-402d20524153 |
>> 0 |
>> {"v":1383387,"s":2052457,"r":95257,"pvs":3610246,"u":"
>> http://www.example.com"}
>>  62013120356 |  5 | 3f33f950-5c1b-11e3-bf53-402d20524153 |
>> 0 |
>> {"v":1383387,"s":2052457,"r":95257,"pvs":3610245,"u":"
>> http://www.example.com"}
>>  62013120356 |  5 | ec15e5d0-5c1a-11e3-bf53-402d20524153 |
>> 0 |
>> {"v":1383387,"s":2052457,"r":95257,"pvs":3610243,"u":"
>> http://www.example.com"}
>>  62015032164 | 2063819251 | 63d5c920-cfdb-11e4-85e9-000c2981ebb4
>> |0
>> |
>> {"v":1451223,"s":2130306,"r":104667,"u":"http://www.example.com"}
>>  62015032164 | 2063819251 | 111ce010-cfdb-11e4-85e9-000c2981ebb4
>> |0
>> |
>> {"v":1451222,"s":2130305,"r":104769,"u":"http://www.example.com"}
>>  62015032164 | 2063819251 | 105e7210-cfdb-11e4-85e9-000c2981ebb4
>> |0
>> |
>> {"v":1451221,"s":2130304,"r":104769,"u":"http://www.example.com"}
>>  62015061055 | 2147429759 | 35b97470-0f68-11e5-8cc3-000c2981ebb4
>> |1
>> |
>> {"v":1453821,"s":2134354,"r":105462,"q":"13082ede-0843-47ee-8126-ba3767eae547"}
>>  62015061055 | 2147429759 | 35a0bc50-0f68-11e5-8cc3-000c2981ebb4
>> |0
>> |
>> {"v":1453821,"s":2134354,"r":105462,"u":"http://www.example.com"}
>>
>>
>> Tracing session: 5bf98eb0-91e8-11e5-9612-6783eab735d4
>>
>>  activity
>> | timestamp| source| source_elapsed
>>
>> +--+---+
>>
>> execute_cql3_query | 14:44:46,620 | 192.168.0.210 |  0
>> Parsing select *
>> from mls.te limit 330; | 14:44:46,621 | 192.168.0.210 |526
>>
>> Preparing statement | 14:44:46,621 | 192.168.0.210 |990
>>
>> Determining replicas to query | 14:44:46,621 | 192.168.0.210 |
>> 1174
>> Enqueuing
>> request to /192.168.0.212 | 14:44:46,622 | 192.168.0.210 |   1661
>>   Sending
>> message to /192

UUID question

2015-11-23 Thread Prem Yadav
Hi,

I am trying to understand different use cases related to using UUID as the
partition key. I am sure I am missing something trivial and will be
grateful and you can help me understand this.

When do you use the UUID as the primary key? What can be a use case?
Since it is unique, how do you query it?

Let's take a user table with UUID as primary key.

create table user (id uuid primary key, name varchar,company
varchar,country varchar);

Now I can write to the table using the uuid() function to generate the uid.
But how do you query it?
The only use case I see is create a secondary index and use that for
querying.

Am I missing something here?

Thanks,
Prem


Re: UUID question

2015-11-23 Thread Prem Yadav
Thanks Jay. Now this is great while creating the user. How does the user
change the details? let's say email id or password? How do you lookup the
user table?


On Mon, Nov 23, 2015 at 4:14 PM, Jay Reddy  wrote:

> Here is one use case ..
>
> We are designing  a web application using Cassandra.
> When a user signs on we create user info in user table with userid (uuid)
> is primary and is responded back to UI.
>
> UI uses this UUID for any future communications.  UI can also get user id
> when searched for an user detail  in "search" (achieved by Solr).
>
> Thanks,
> Jay
>
> On Mon, Nov 23, 2015 at 11:08 AM, Prem Yadav  wrote:
>
>> Hi,
>>
>> I am trying to understand different use cases related to using UUID as
>> the partition key. I am sure I am missing something trivial and will be
>> grateful and you can help me understand this.
>>
>> When do you use the UUID as the primary key? What can be a use case?
>> Since it is unique, how do you query it?
>>
>> Let's take a user table with UUID as primary key.
>>
>> create table user (id uuid primary key, name varchar,company
>> varchar,country varchar);
>>
>> Now I can write to the table using the uuid() function to generate the
>> uid. But how do you query it?
>> The only use case I see is create a secondary index and use that for
>> querying.
>>
>> Am I missing something here?
>>
>> Thanks,
>> Prem
>>
>
>


Re: UUID question

2015-11-23 Thread Prem Yadav
OK.
My question is more about what are the use case of any table uuid as
partition key.
Will appreciate inputs from others.

Thanks,
Prem

On Mon, Nov 23, 2015 at 4:25 PM, Jay Reddy  wrote:

> Hi Prem,
>
> We have two tables, one with email id as partition key and other with
> userid(uuid).
> Please refer to www.killrvideo.com website. It is a great place to
> understand how a web application is built on Cassandra.
>
> Thanks,
> Jay
>
> On Mon, Nov 23, 2015 at 11:18 AM, Prem Yadav  wrote:
>
>> Thanks Jay. Now this is great while creating the user. How does the user
>> change the details? let's say email id or password? How do you lookup the
>> user table?
>>
>>
>> On Mon, Nov 23, 2015 at 4:14 PM, Jay Reddy  wrote:
>>
>>> Here is one use case ..
>>>
>>> We are designing  a web application using Cassandra.
>>> When a user signs on we create user info in user table with userid
>>> (uuid) is primary and is responded back to UI.
>>>
>>> UI uses this UUID for any future communications.  UI can also get user
>>> id when searched for an user detail  in "search" (achieved by Solr).
>>>
>>> Thanks,
>>> Jay
>>>
>>> On Mon, Nov 23, 2015 at 11:08 AM, Prem Yadav 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to understand different use cases related to using UUID as
>>>> the partition key. I am sure I am missing something trivial and will be
>>>> grateful and you can help me understand this.
>>>>
>>>> When do you use the UUID as the primary key? What can be a use case?
>>>> Since it is unique, how do you query it?
>>>>
>>>> Let's take a user table with UUID as primary key.
>>>>
>>>> create table user (id uuid primary key, name varchar,company
>>>> varchar,country varchar);
>>>>
>>>> Now I can write to the table using the uuid() function to generate the
>>>> uid. But how do you query it?
>>>> The only use case I see is create a secondary index and use that for
>>>> querying.
>>>>
>>>> Am I missing something here?
>>>>
>>>> Thanks,
>>>> Prem
>>>>
>>>
>>>
>>
>


Re: Upgrade instructions don't make sense

2015-11-23 Thread Prem Yadav
*If your cluster does not use vnodes*

Are you using vnodes now?

On Mon, Nov 23, 2015 at 10:55 PM, Robert Wille  wrote:

> I’m wanting to upgrade from 2.0 to 2.1. The upgrade instructions at
> http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeCassandraDetails.html
>  has
> the following, which leaves me with more questions than it answers:
>
> If your cluster does not use vnodes, disable vnodes in each new
> cassandra.yaml before doing the rolling restart.
> In Cassandra 2.0.x, virtual nodes (vnodes) are enabled by default. Disable
> vnodes in the 2.0.x version before upgrading.
>
>1. In the cassandra.yaml
>
> 
>  file,
>set num_tokens to 1.
>2. Uncomment the initial_token property and set it to 1 or to the
>value of a generated token
>
> 
>  for
>a multi-node cluster.
>
>
> It seems strange that vnodes has to be disabled to upgrade, but whatever.
> If I use an initial token generator to set the initial_token property of
> each node, then I assume that my token ranges are all going to change, and
> that there’s going to be a whole bunch of streaming as the data is shuffled
> around. The docs don’t mention that. Should I wait until the streaming is
> done before proceeding with the upgrade?
>
> The docs don’t talk about vnodes and initial_tokens post-upgrade. Can I
> turn vnodes back on? Am I forever after stuck with having to have manually
> generated initial tokens (and needing to have a unique cassandra.yaml for
> every node)? Can I just set num_tokens = 256 and comment out initial_token
> and do a rolling restart?
>
> Thanks in advance
>
> Robert
>
>


Re: Compaction And Write performance

2015-11-25 Thread Prem Yadav
Compaction is done to improve the reads. The compaction process is very CPU
intensive and it can make writes perform slow. Writes are also CPU-bound.



On Wed, Nov 25, 2015 at 11:12 AM,  wrote:

> Hi all,
>
>
>
> Does compaction throughput impact write performance ?
>
>
>
> Increasing the value of *compaction_throughput_mb_per_sec* can improve
> the insert data ? If yes, is it possible to explain to me the concept ?
>
>
>
> Thanks.
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> France Telecom - Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorization.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, France Telecom - Orange shall not be liable if this 
> message was modified, changed or falsified.
> Thank you.
>
>


Re:

2017-09-28 Thread Prem Yadav
Dan,
As part of upgrade, did you upgrade the sstables?

Sent from mobile. Please excuse typos

On 28 Sep 2017 17:45, "Dan Kinder"  wrote:

> I should also note, I also see nodes become locked up without seeing that
> Exception. But the GossipStage buildup does seem correlated with gossip
> activity, e.g. me restarting a different node.
>
> On Thu, Sep 28, 2017 at 9:20 AM, Dan Kinder  wrote:
>
>> Hi,
>>
>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>> following. The cluster does function, for a while, but then some stages
>> begin to back up and the node does not recover and does not drain the
>> tasks, even under no load. This happens both to MutationStage and
>> GossipStage.
>>
>> I do see the following exception happen in the logs:
>>
>>
>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[ReadRepairStage:2328,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>> out - received only 1 responses.
>>
>> at 
>> org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at 
>> org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at 
>> org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_91]
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[na:1.8.0_91]
>>
>> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:81)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>
>>
>> But it's hard to correlate precisely with things going bad. It is also
>> very strange to me since I have both read_repair_chance and
>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>> confusing why ReadRepairStage would err.
>>
>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>> If I can't find a resolution I'm going to need to downgrade and restore to
>> backup...
>>
>> The only issue I found that looked similar is https://issues.apache.org/j
>> ira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>>
>>
>> $ nodetool tpstats
>>
>> Pool Name Active   Pending  Completed
>> Blocked  All time blocked
>>
>> ReadStage  0 0 582103
>>   0 0
>>
>> MiscStage  0 0  0
>>   0 0
>>
>> CompactionExecutor1111   2868
>>   0 0
>>
>> MutationStage 32   4593678   55057393
>>   0 0
>>
>> GossipStage1  2818 371487
>>   0 0
>>
>> RequestResponseStage   0 04345522
>>   0 0
>>
>> ReadRepairStage0 0 151473
>>   0 0
>>
>> CounterMutationStage   0 0  0
>>   0 0
>>
>> MemtableFlushWriter181 76
>>   0 0
>>
>> MemtablePostFlush  1   382139
>>   0 0
>>
>> ValidationExecutor 0 0  0
>>   0 0
>>
>> ViewMutationStage  0 0  0
>>   0 0
>>
>> CacheCleanupExecutor   0 0  0
>>   0 0
>>
>> PerDiskMemtableFlushWriter_10  0 0 69
>>   0 0
>>
>> PerDiskMemtableFlushWriter_11  0 0 69
>>   0 0
>>
>> MemtableReclaimMemory  0 0 81
>>   0 0
>>
>> PendingRangeCalculator 0 0 32
>>   0 0
>>
>> SecondaryIndexManagement   0 0  0
>>   0 0
>>
>> HintsDispatcher0 0  

node keeps dying

2014-09-24 Thread Prem Yadav
Hi,
this is an issue that has happened a few times. We are using DSE 4.0
One of the Cassandra nodes is detected as dead by the opscenter even though
I can see the process is up.

the logs show heap space error:

 INFO [RMI TCP Connection(18270)-172.31.49.189] 2014-09-24 08:31:05,340
StorageService.java (line 2538) Starting repair command #30766, repairing 1
ranges for keyspace 
ERROR [BatchlogTasks:1] 2014-09-24 08:48:54,780 CassandraDaemon.java (line
196) Exception in thread Thread[BatchlogTasks:1,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.(Unknown Source)
at
org.antlr.runtime.CommonTokenStream.(CommonTokenStream.java:68)
at
org.antlr.runtime.CommonTokenStream.(CommonTokenStream.java:72)
at
org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:413)
at
org.apache.cassandra.cql3.QueryProcessor.getStatement(QueryProcessor.java:396)
at
org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:253)
at
org.apache.cassandra.db.BatchlogManager.process(BatchlogManager.java:355)
at
org.apache.cassandra.db.BatchlogManager.replayAllFailedBatches(BatchlogManager.java:179)
at
org.apache.cassandra.db.BatchlogManager$1.runMayThrow(BatchlogManager.java:97)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:75)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown
Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)


Any advice will be helpful

thanks


Re: node keeps dying

2014-09-24 Thread Prem Yadav
Well its not the Linux OOM killer. The system is running with all default
settings.

Total memory 7GB- Cassandra gets assigned 2GB
2 core processors.
Two rings with 3 nodes in each ring.

On Wed, Sep 24, 2014 at 9:53 PM, Michael Shuler 
wrote:

> On 09/24/2014 11:32 AM, Prem Yadav wrote:
>
>> this is an issue that has happened a few times. We are using DSE 4.0
>>
>
> I believe this is Apache Cassandra 2.0.5, which is better info for this
> list.
>
>  One of the Cassandra nodes is detected as dead by the opscenter even
>> though I can see the process is up.
>>
>> the logs show heap space error:
>>
>>   INFO [RMI TCP Connection(18270)-172.31.49.189] 2014-09-24 08:31:05,340
>> StorageService.java (line 2538) Starting repair command #30766,
>> repairing 1 ranges for keyspace 
>> ERROR [BatchlogTasks:1] 2014-09-24 08:48:54,780 CassandraDaemon.java
>> (line 196) Exception in thread Thread[BatchlogTasks:1,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>>  at java.util.ArrayList.(Unknown Source)
>>
>
> OOM.
>
> System environment and configuration modification details might be helpful
> for others to give you advice. Searching for "cassandra oom" gave me a few
> good links to read, and knowing some details about your nodes might be
> really helpful. Additionally, CASSANDRA-7507 [0] suggests that an OOM
> leaving the process running in an unclean state is not desired, and the
> process should be killed.
>
> Several of the search links provide details on how to capture and dig
> around a heap dump to aid in troubleshooting.
>
> [0] https://issues.apache.org/jira/browse/CASSANDRA-7507
> --
> Kind regards,
> Michael
>


Re: node keeps dying

2014-09-24 Thread Prem Yadav
BTW, thanks Michael.
I am surprised why I didn't search for Cassandra oom before.
I got some good links that discuss that. Will try to optimize and see how
it goes.


On Wed, Sep 24, 2014 at 10:27 PM, Prem Yadav  wrote:

> Well its not the Linux OOM killer. The system is running with all default
> settings.
>
> Total memory 7GB- Cassandra gets assigned 2GB
> 2 core processors.
> Two rings with 3 nodes in each ring.
>
> On Wed, Sep 24, 2014 at 9:53 PM, Michael Shuler 
> wrote:
>
>> On 09/24/2014 11:32 AM, Prem Yadav wrote:
>>
>>> this is an issue that has happened a few times. We are using DSE 4.0
>>>
>>
>> I believe this is Apache Cassandra 2.0.5, which is better info for this
>> list.
>>
>>  One of the Cassandra nodes is detected as dead by the opscenter even
>>> though I can see the process is up.
>>>
>>> the logs show heap space error:
>>>
>>>   INFO [RMI TCP Connection(18270)-172.31.49.189] 2014-09-24 08:31:05,340
>>> StorageService.java (line 2538) Starting repair command #30766,
>>> repairing 1 ranges for keyspace 
>>> ERROR [BatchlogTasks:1] 2014-09-24 08:48:54,780 CassandraDaemon.java
>>> (line 196) Exception in thread Thread[BatchlogTasks:1,5,main]
>>> java.lang.OutOfMemoryError: Java heap space
>>>  at java.util.ArrayList.(Unknown Source)
>>>
>>
>> OOM.
>>
>> System environment and configuration modification details might be
>> helpful for others to give you advice. Searching for "cassandra oom" gave
>> me a few good links to read, and knowing some details about your nodes
>> might be really helpful. Additionally, CASSANDRA-7507 [0] suggests that an
>> OOM leaving the process running in an unclean state is not desired, and the
>> process should be killed.
>>
>> Several of the search links provide details on how to capture and dig
>> around a heap dump to aid in troubleshooting.
>>
>> [0] https://issues.apache.org/jira/browse/CASSANDRA-7507
>> --
>> Kind regards,
>> Michael
>>
>
>


repair getting stuck

2014-10-14 Thread Prem Yadav
Hi,
this is an issue we have a faced a couple times now.

Every ones in a while Opscenter throws an error that repair service failed
die to errors. In the logs we can see multiple lines like:

 Repair task (,
(-6964720218971987043L, -6963882488374905088L), set([tables])) timed out
after 3600 seconds.

manually running "nodetool repair -pr" on that node just hangs there and
doesn't do anything.
Once we restart dse, the repair job starts fine.

Any ideas?

Thanks


Re: Why select returns tombstoned results?

2015-03-30 Thread Prem Yadav
Increase the read CL to quorum and you should get correct results.
How many nodes do you have in the cluster and what is the replication
factor for the keyspace?

On Mon, Mar 30, 2015 at 7:41 PM, Benyi Wang  wrote:

> Create table tomb_test (
>guid text,
>content text,
>range text,
>rank int,
>id text,
>cnt int
>primary key (guid, content, range, rank)
> )
>
> Sometime I delete the rows using cassandra java driver using this query
>
> DELETE FROM tomb_test WHERE guid=? and content=? and range=?
>
> in Batch statement with UNLOGGED. CONSISTENCE_LEVEL is local_one.
>
> But if I run
>
> SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1' and
> range='week'
> or
> SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1' and
> range='week' and rank = 1
>
> The result shows the deleted rows.
>
> If I run this select, the deleted rows are not shown
>
> SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1'
>
> If I run delete statement in cqlsh, the deleted rows won't show up.
>
> How can I fix this?
>
>


Re: Data migration

2015-04-14 Thread Prem Yadav
Look into sqoop. I believe using sqoop you can transfer data between C*
clusters. I haven't tested it though.
other option is to write a program to read from one cluster and write the
required data to another.

On Tue, Apr 14, 2015 at 12:27 PM, skrynnikov_m  wrote:

> Hello!!!
> Need to migrate data from one C* cluster to another periodically. During
> migration schema can change(add or remove one, two fields). Could you
> please suggest some tool?
>
>


Add new DC to cluster

2015-06-07 Thread Prem Yadav
Hi,
We have an existing cluster consisting of 3 DCs. Authentication is enabled.
I am trying to add a new DC. I followed the steps mentioned at:

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

Bit, I still can't login to any of the nodes in the new DC using the
superuser. The cassandra superuser was always disabled.

I ran the repair. How long should the repair take for system_auth? Anything
else I am missing?

I restarted the nodes a couple of times. Shouldn't the nodes pull the data?

Any help is appreciated.


Re: Add new DC to cluster

2015-06-07 Thread Prem Yadav
Thanks Anuj.
In our setup it's already NetworkTopology for system_auth. Not sure if it
was default or whether we altered it at some point.
Repair seems to take forever. A few things that worked for me:

I used cqlsh to connect to one of the nodes in existing DCs and set
consistency to ALL. After that I changed the password of the user I wanted
to connect to. Voila :) the consistency worked and I can now connect to the
new DCs using the user.

I guess if I do it for all users, it will work. This is enough for me.
However, I would think 'nodetool rebuild' and a restart should take care of
this. Not sure why it doesn't.






On Sun, Jun 7, 2015 at 5:10 PM, Anuj Wadehra  wrote:

> Hi Prem,
>
> I think system_auth keyspace uses SimpleStrategy by default and thus not
> applicable across dc.Please alter the keyspace to use NetworkTopology and
> set appropriate replication factor for both Dc. After that you can run full
> repair across dc on this keyspace.
>
> I dnt see any drawbacks of doing this. Any suggestions against
> this..please feel free to discuss...
>
> Thanks
> Anuj Wadehra
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> --
>   *From*:"Prem Yadav" 
> *Date*:Sun, 7 Jun, 2015 at 8:19 pm
> *Subject*:Add new DC to cluster
>
> Hi,
> We have an existing cluster consisting of 3 DCs. Authentication is enabled.
> I am trying to add a new DC. I followed the steps mentioned at:
>
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>
> Bit, I still can't login to any of the nodes in the new DC using the
> superuser. The cassandra superuser was always disabled.
>
> I ran the repair. How long should the repair take for system_auth?
> Anything else I am missing?
>
> I restarted the nodes a couple of times. Shouldn't the nodes pull the data?
>
> Any help is appreciated.
>


Cassandra blob storage

2014-03-18 Thread prem yadav
Hi,
I have been spending some time looking into whether large files(>100mb) can
be stores in Cassandra. As per Cassandra faq:


*"Currently Cassandra isn't optimized specifically for large file or BLOB
storage. However, files of around 64Mb and smaller can be easily stored in
the database without splitting them into smaller chunks. This is primarily
due to the fact that Cassandra's public API is based on Thrift, which
offers no streaming abilities; any value written or fetched has to fit in
to memory."*

Does the above statement still hold? Thrift supports framed data transport,
does that change the above statement. If not, why does casssandra not adopt
the Thrift framed data transfer support?

Thanks


Re: Cassandra blob storage

2014-03-18 Thread prem yadav
Thanks Brian,
I have seen that. Its more of a workaround and a hack. Of course a great
solution.
But my question is more about why Cassandra itself can't support that. Give
then Thrift supports frames.

Thanks.


On Tue, Mar 18, 2014 at 5:55 PM, Brian O'Neill wrote:

> You may want to look at:
> https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store
>
> -brian
>
> ---
>
> Brian O'Neill
>
> Chief Technology Officer
>
>
> *Health Market Science*
>
> *The Science of Better Results*
>
> 2700 Horizon Drive * King of Prussia, PA * 19406
>
> M: 215.588.6024 * @boneill42 <http://www.twitter.com/boneill42>  *
>
> healthmarketscience.com
>
>
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If
> you received this email in error and are not the intended recipient, or the
> person responsible to deliver it to the intended recipient, please contact
> the sender at the email above and delete this email and any attachments and
> destroy any copies thereof. Any review, retransmission, dissemination,
> copying or other use of, or taking any action in reliance upon, this
> information by persons or entities other than the intended recipient is
> strictly prohibited.
>
>
>
>
> From: prem yadav 
> Reply-To: 
> Date: Tuesday, March 18, 2014 at 1:41 PM
> To: 
> Subject: Cassandra blob storage
>
> Hi,
> I have been spending some time looking into whether large files(>100mb)
> can be stores in Cassandra. As per Cassandra faq:
>
>
> *"Currently Cassandra isn't optimized specifically for large file or BLOB
> storage. However, files of around 64Mb and smaller can be easily stored in
> the database without splitting them into smaller chunks. This is primarily
> due to the fact that Cassandra's public API is based on Thrift, which
> offers no streaming abilities; any value written or fetched has to fit in
> to memory."*
>
> Does the above statement still hold? Thrift supports framed data
> transport, does that change the above statement. If not, why does
> casssandra not adopt the Thrift framed data transfer support?
>
> Thanks
>
>


Kernel keeps killing cassandra process - OOM

2014-03-22 Thread prem yadav
Hi,
I have a 3 node cassandra test cluster. The nodes have 4 GB total memory/2
cores. Cassndra run with all default settings.
But, the cassandra process keeps getting killed due to OOM. Cassandra
version in use is 1.1.9.
here are the settings in use:

compaction_throughput_mb_per_sec: 16
row_cache_save_period: 0
encryption_options:
  keystore: conf/.keystore
  internode_encryption: none
  truststore: conf/.truststore
  algorithm: SunX509
  cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,
TLS_RSA_WITH_AES_256_CBC_SHA]

  protocol: TLS
  store_type: JKS
multithreaded_compaction: false
#authority: org.apache.cassandra.auth.AllowAllAuthority
populate_io_cache_on_flush: false
storage_port: 7000
key_cache_save_period: 14400
hinted_handoff_throttle_delay_in_ms: 1
trickle_fsync_interval_in_kb: 10240
rpc_timeout_in_ms: 1
dynamic_snitch_update_interval_in_ms: 100
column_index_size_in_kb: 64
thrift_framed_transport_size_in_mb: 15
hinted_handoff_enabled: true
dynamic_snitch_reset_interval_in_ms: 60
reduce_cache_capacity_to: 0.6
snapshot_before_compaction: false
request_scheduler_options: {weights: null, default_weight: 5,
throttle_limit: 80}
request_scheduler: org.apache.cassandra.scheduler.RoundRobinScheduler
incremental_backups: false
commitlog_sync: periodic
trickle_fsync: false
rpc_keepalive: true
max_hint_window_in_ms: 360
commitlog_segment_size_in_mb: 32
thrift_max_message_length_in_mb: 16
request_scheduler_id: keyspace
cluster_name: TVCASCLU002
memtable_flush_queue_size: 4
index_interval: 128
authenticator: com.datastax.bdp.cassandra.auth.PasswordAuthenticator
authorizer: com.datastax.bdp.cassandra.auth.CassandraAuthorizer
auth_replication_options:
replication_factor : 3
#DC1: 3
row_cache_size_in_mb: 0
row_cache_provider: SerializingCacheProvider
dynamic_snitch_badness_threshold: 0.1
commitlog_sync_period_in_ms: 1
auto_snapshot: true
concurrent_reads: 32
in_memory_compaction_limit_in_mb: 64
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
partitioner: org.apache.cassandra.dht.RandomPartitioner
saved_caches_directory: /var/lib/cassandra/saved_caches
ssl_storage_port: 7001

rpc_port: 9160
commitlog_directory: /var/lib/cassandra/commitlog

rpc_server_type: sync
compaction_preheat_key_cache: true
concurrent_writes: 32
data_file_directories: [/var/lib/cassandra/data]
initial_token: 56713727820156410577229101238628035242
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:


rpc_server_type: sync #changed from sync to hsha

rpc_min_threads: 2
rpc_max_threads: 64

How do I resolve this?


Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread prem yadav
Its Oracle jdk 1.6.
Robert, any fix that you know of which went into 1.2.15 for this particular
issue?


On Sat, Mar 22, 2014 at 4:50 PM, Robert Coli  wrote:

> On Sat, Mar 22, 2014 at 7:48 AM, prem yadav  wrote:
>
>> But, the cassandra process keeps getting killed due to OOM. Cassandra
>> version in use is 1.1.9.
>>
>
> Try using 1.2.15, instead?
>
> =Rob
>
>


Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread prem yadav
Michael, no memory constraints. System memory is 4 GB and Cassandra run on
default.


On Sat, Mar 22, 2014 at 5:32 PM, prem yadav  wrote:

> Its Oracle jdk 1.6.
> Robert, any fix that you know of which went into 1.2.15 for this
> particular issue?
>
>
> On Sat, Mar 22, 2014 at 4:50 PM, Robert Coli  wrote:
>
>> On Sat, Mar 22, 2014 at 7:48 AM, prem yadav  wrote:
>>
>>> But, the cassandra process keeps getting killed due to OOM. Cassandra
>>> version in use is 1.1.9.
>>>
>>
>> Try using 1.2.15, instead?
>>
>> =Rob
>>
>>
>
>


Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread prem yadav
Upgrading is not possible right now. Any other suggestions guys?
I have already tried reducing the number of rpc threads. Also tried
reducing the linux kernel overcommit.


On Sat, Mar 22, 2014 at 5:44 PM, Laing, Michael
wrote:

> I ran into the same problem some time ago.
>
> Upgrading to Cassandra 2, jdk 1.7, and default parameters fixed it.
>
> I think the jdk change was the key for my similarly small memory cluster.
>
> ml
>
>
>
> On Sat, Mar 22, 2014 at 1:36 PM, prem yadav  wrote:
>
>> Michael, no memory constraints. System memory is 4 GB and Cassandra run
>> on default.
>>
>>
>> On Sat, Mar 22, 2014 at 5:32 PM, prem yadav  wrote:
>>
>>> Its Oracle jdk 1.6.
>>> Robert, any fix that you know of which went into 1.2.15 for this
>>> particular issue?
>>>
>>>
>>> On Sat, Mar 22, 2014 at 4:50 PM, Robert Coli wrote:
>>>
>>>> On Sat, Mar 22, 2014 at 7:48 AM, prem yadav wrote:
>>>>
>>>>> But, the cassandra process keeps getting killed due to OOM. Cassandra
>>>>> version in use is 1.1.9.
>>>>>
>>>>
>>>> Try using 1.2.15, instead?
>>>>
>>>> =Rob
>>>>
>>>>
>>>
>>>
>>
>


Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread prem yadav
The output of ps waux . Also there is no load on cluster. None

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 1  0.0  0.0  19224  1076 ?Ss   Mar19   0:01 /sbin/init
root 2  0.0  0.0  0 0 ?SMar19   0:00 [kthreadd]
root 3  0.0  0.0  0 0 ?SMar19   0:00
[migration/0]
root 4  0.0  0.0  0 0 ?SMar19   0:00
[ksoftirqd/0]
root 5  0.0  0.0  0 0 ?SMar19   0:00
[migration/0]
root 6  0.0  0.0  0 0 ?SMar19   0:00
[watchdog/0]
root 7  0.0  0.0  0 0 ?SMar19   0:00
[migration/1]
root 8  0.0  0.0  0 0 ?SMar19   0:00
[migration/1]
root 9  0.0  0.0  0 0 ?SMar19   0:00
[ksoftirqd/1]
root10  0.0  0.0  0 0 ?SMar19   0:00
[watchdog/1]
root11  0.0  0.0  0 0 ?SMar19   0:03 [events/0]
root12  0.0  0.0  0 0 ?SMar19   0:52 [events/1]
root13  0.0  0.0  0 0 ?SMar19   0:00 [cpuset]
root14  0.0  0.0  0 0 ?SMar19   0:00 [khelper]
root15  0.0  0.0  0 0 ?SMar19   0:00 [netns]
root16  0.0  0.0  0 0 ?SMar19   0:00 [async/mgr]
root17  0.0  0.0  0 0 ?SMar19   0:00 [pm]
root18  0.0  0.0  0 0 ?SMar19   0:00
[sync_supers]
root19  0.0  0.0  0 0 ?SMar19   0:00
[bdi-default]
root20  0.0  0.0  0 0 ?SMar19   0:00
[kintegrityd/0]
root21  0.0  0.0  0 0 ?SMar19   0:00
[kintegrityd/1]
root22  0.0  0.0  0 0 ?SMar19   0:00 [kblockd/0]
root23  0.0  0.0  0 0 ?SMar19   0:00 [kblockd/1]
root24  0.0  0.0  0 0 ?SMar19   0:00 [kacpid]
root25  0.0  0.0  0 0 ?SMar19   0:00
[kacpi_notify]
root26  0.0  0.0  0 0 ?SMar19   0:00
[kacpi_hotplug]
root27  0.0  0.0  0 0 ?SMar19   0:00 [ata/0]
root28  0.0  0.0  0 0 ?SMar19   0:00 [ata/1]
root29  0.0  0.0  0 0 ?SMar19   0:00 [ata_aux]
root30  0.0  0.0  0 0 ?SMar19   0:00
[ksuspend_usbd]
root31  0.0  0.0  0 0 ?SMar19   0:00 [khubd]
root32  0.0  0.0  0 0 ?SMar19   0:00 [kseriod]
root33  0.0  0.0  0 0 ?SMar19   0:00 [md/0]
root34  0.0  0.0  0 0 ?SMar19   0:00 [md/1]
root35  0.0  0.0  0 0 ?SMar19   0:00 [md_misc/0]
root36  0.0  0.0  0 0 ?SMar19   0:00 [md_misc/1]
root37  0.0  0.0  0 0 ?SMar19   0:00
[khungtaskd]
root38  0.0  0.0  0 0 ?SMar19   0:23 [kswapd0]
root39  0.0  0.0  0 0 ?SN   Mar19   0:00 [ksmd]
root40  0.0  0.0  0 0 ?SN   Mar19   0:01
[khugepaged]
root41  0.0  0.0  0 0 ?SMar19   0:00 [aio/0]
root42  0.0  0.0  0 0 ?SMar19   0:00 [aio/1]
root43  0.0  0.0  0 0 ?SMar19   0:00 [crypto/0]
root44  0.0  0.0  0 0 ?SMar19   0:00 [crypto/1]
root49  0.0  0.0  0 0 ?SMar19   0:00
[kthrotld/0]
root50  0.0  0.0  0 0 ?SMar19   0:00
[kthrotld/1]
root51  0.0  0.0  0 0 ?SMar19   0:00 [pciehpd]
root53  0.0  0.0  0 0 ?SMar19   0:00 [kpsmoused]
root54  0.0  0.0  0 0 ?SMar19   0:00
[usbhid_resumer]
root85  0.0  0.0  0 0 ?SMar19   0:00 [iscsi_eh]
root89  0.0  0.0  0 0 ?SMar19   0:00 [cnic_wq]
root   101  0.0  0.0  0 0 ?SMar19   0:00 [kstriped]
root   179  0.0  0.0  0 0 ?SMar19   0:00 [scsi_eh_0]
root   180  0.0  0.0  0 0 ?SMar19   0:00 [scsi_eh_1]
root   217  0.0  0.0  0 0 ?SMar19   0:07
[mpt_poll_0]
root   218  0.0  0.0  0 0 ?SMar19   0:00 [mpt/0]
root   219  0.0  0.0  0 0 ?SMar19   0:00 [scsi_eh_2]
root   343  0.0  0.0  0 0 ?SMar19   0:00 [kdmflush]
root   345  0.0  0.0  0 0 ?SMar19   0:00 [kdmflush]
root   364  0.0  0.0  0 0 ?SMar19   0:16
[jbd2/dm-0-8]
root   365  0.0  0.0  0 0 ?SMar19   0:00
[ext4-dio-unwrit]
root   366  0.0  0.0  0 0 ?SMar19   0:00
[ext4-dio-unwrit]
root   449  0.0  0.0  11420  1420 ?S

Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread prem yadav
Also  we use datastax. The version cassandra-1.1.9 doesn't work with java 7


On Sat, Mar 22, 2014 at 9:09 PM, prem yadav  wrote:

> The output of ps waux . Also there is no load on cluster. None
>
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> root 1  0.0  0.0  19224  1076 ?Ss   Mar19   0:01
> /sbin/init
> root 2  0.0  0.0  0 0 ?SMar19   0:00 [kthreadd]
> root 3  0.0  0.0  0 0 ?SMar19   0:00
> [migration/0]
> root 4  0.0  0.0  0 0 ?SMar19   0:00
> [ksoftirqd/0]
> root 5  0.0  0.0  0 0 ?SMar19   0:00
> [migration/0]
> root 6  0.0  0.0  0 0 ?SMar19   0:00
> [watchdog/0]
> root 7  0.0  0.0  0 0 ?SMar19   0:00
> [migration/1]
> root 8  0.0  0.0  0 0 ?SMar19   0:00
> [migration/1]
> root 9  0.0  0.0  0 0 ?SMar19   0:00
> [ksoftirqd/1]
> root10  0.0  0.0  0 0 ?SMar19   0:00
> [watchdog/1]
> root11  0.0  0.0  0 0 ?SMar19   0:03 [events/0]
> root12  0.0  0.0  0 0 ?SMar19   0:52 [events/1]
> root13  0.0  0.0  0 0 ?SMar19   0:00 [cpuset]
> root14  0.0  0.0  0 0 ?SMar19   0:00 [khelper]
> root15  0.0  0.0  0 0 ?SMar19   0:00 [netns]
> root16  0.0  0.0  0 0 ?SMar19   0:00
> [async/mgr]
> root17  0.0  0.0  0 0 ?SMar19   0:00 [pm]
> root18  0.0  0.0  0 0 ?SMar19   0:00
> [sync_supers]
> root19  0.0  0.0  0 0 ?SMar19   0:00
> [bdi-default]
> root20  0.0  0.0  0 0 ?SMar19   0:00
> [kintegrityd/0]
> root21  0.0  0.0  0 0 ?SMar19   0:00
> [kintegrityd/1]
> root22  0.0  0.0  0 0 ?SMar19   0:00
> [kblockd/0]
> root23  0.0  0.0  0 0 ?SMar19   0:00
> [kblockd/1]
> root24  0.0  0.0  0 0 ?SMar19   0:00 [kacpid]
> root25  0.0  0.0  0 0 ?SMar19   0:00
> [kacpi_notify]
> root26  0.0  0.0  0 0 ?SMar19   0:00
> [kacpi_hotplug]
> root27  0.0  0.0  0 0 ?SMar19   0:00 [ata/0]
> root28  0.0  0.0  0 0 ?SMar19   0:00 [ata/1]
> root29  0.0  0.0  0 0 ?SMar19   0:00 [ata_aux]
> root30  0.0  0.0  0 0 ?SMar19   0:00
> [ksuspend_usbd]
> root31  0.0  0.0  0 0 ?SMar19   0:00 [khubd]
> root32  0.0  0.0  0 0 ?SMar19   0:00 [kseriod]
> root33  0.0  0.0  0 0 ?SMar19   0:00 [md/0]
> root34  0.0  0.0  0 0 ?SMar19   0:00 [md/1]
> root35  0.0  0.0  0 0 ?SMar19   0:00
> [md_misc/0]
> root36  0.0  0.0  0 0 ?SMar19   0:00
> [md_misc/1]
> root37  0.0  0.0  0 0 ?SMar19   0:00
> [khungtaskd]
> root38  0.0  0.0  0 0 ?SMar19   0:23 [kswapd0]
> root39  0.0  0.0  0 0 ?SN   Mar19   0:00 [ksmd]
> root40  0.0  0.0  0 0 ?SN   Mar19   0:01
> [khugepaged]
> root41  0.0  0.0  0 0 ?SMar19   0:00 [aio/0]
> root42  0.0  0.0  0 0 ?SMar19   0:00 [aio/1]
> root43  0.0  0.0  0 0 ?SMar19   0:00 [crypto/0]
> root44  0.0  0.0  0 0 ?SMar19   0:00 [crypto/1]
> root49  0.0  0.0  0 0 ?SMar19   0:00
> [kthrotld/0]
> root50  0.0  0.0  0 0 ?SMar19   0:00
> [kthrotld/1]
> root51  0.0  0.0  0 0 ?SMar19   0:00 [pciehpd]
> root53  0.0  0.0  0 0 ?SMar19   0:00
> [kpsmoused]
> root54  0.0  0.0  0 0 ?SMar19   0:00
> [usbhid_resumer]
> root85  0.0  0.0  0 0 ?SMar19   0:00 [iscsi_eh]
> root89  0.0  0.0  0 0 ?SMar19   0:00 [cnic_wq]
> root   101  0.0  0.0  0 0 ?SMar19   0:00 [kstriped]
> root   179  0.0  0.0  0 0 ?SMar19   0:00
> [scsi_eh_0]
> root   180  0.0  0.0  0 0 ?SMar19   0:00
> [scsi_eh_1]
> root   217  0.0  0.0  0 0 ?SMar19   0:07
> [mpt_poll_0]
> root   218  0.0  0.0  0 0 ?SMar19   0:00 [mpt/0

Re: Kernel keeps killing cassandra process - OOM

2014-03-24 Thread prem yadav
the nodes die *without * being under any load. Completely idle.
And 4 GB system memory is not low.  or is it?
I have tried tweaking the overcommit memory. Tried disabling it,
under-committing and over-committing.
I also reduced rpc threads min and max. Will try other setting from that
link Michael has given.



On Mon, Mar 24, 2014 at 10:26 AM, Romain HARDOUIN  wrote:

> You have to tune Cassandra in order to run it under a low memory
> environment.
> Many settings must be tuned. The link that Michael mentions provides a
> quick start.
>
> There is a point that I haven't understood. *When* did your nodes die?
> Under load? Or can they be killed via OOM killer even if they are not
> loaded?
> If the nodes are VM you have to pay attention to hypervisor memory
> overcommit.
>
>
> "Laing, Michael"  a écrit sur 22/03/2014
> 22:25:30 :
>
> > De : "Laing, Michael" 
> > A : user@cassandra.apache.org,
> > Date : 22/03/2014 22:26
> > Objet : Re: Kernel keeps killing cassandra process - OOM
> >
> > You might want to look at:
> >
> > http://www.opensourceconnections.com/2013/08/31/building-the-
> > perfect-cassandra-test-environment/
>


Re: Kernel keeps killing cassandra process - OOM

2014-03-26 Thread prem yadav
Thanks Robert. That seems to be the issue. however the fix mentioned there
doesn't work. I downgraded Java to jdk6_37 and that seems to have done the
trick. Thanks for pointing me to that Jira ticket.


On Mon, Mar 24, 2014 at 6:48 PM, Robert Coli  wrote:

> On Mon, Mar 24, 2014 at 4:11 AM, prem yadav  wrote:
>
>> the nodes die *without * being under any load. Completely idle.
>>
>
> https://issues.apache.org/jira/browse/CASSANDRA-6541
>
> ?
>
> =Rob
>
>


memory usage spikes

2014-03-26 Thread prem yadav
Hi,
in another thread, I has mentioned that we had issue with Cassandra getting
killed by kernel due to OOM. Downgrading to jdk6_37 seems to have fixed it.

However, even now, after every couple of hours, the nodes are showing a
spike in memory usage.
For ex: on a 8GB ram machine, once the usage reached to 7.5 GB.
and then slowly it comes down to normal.

Cassandra version in use is 1.1.9.10.

Any idea why this could be happening? There is no load on the cluster.

Thanks.


Re: memory usage spikes

2014-03-26 Thread prem yadav
here:

ps -p `/usr/java/jdk1.6.0_37/bin/jps | awk '/Dse/ {print $1}'` uww

SER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
497  20450  0.9 31.0 4727620 2502644 ? SLl  06:55   3:28
/usr/java/jdk1.6.0_37//bin/java -ea
-javaagent:/usr/share/dse/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1968M -Xmx1968M
-Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss190k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true -Dcassandra-pidfile=/var/run/dse.pid -cp
:/usr/share/dse/dse.jar:/usr/share/dse/common/commons-codec-1.6.jar:/usr/share/dse/common/commons-io-2.4.jar:/usr/share/dse/common/guava-13.0.jar:/usr/share/dse/common/jbcrypt-0.3m.jar:/usr/share/dse/common/log4j-1.2.16.jar:/usr/share/dse/common/slf4j-api-1.6.1.jar:/usr/share/dse/common/slf4j-log4j12-1.6.1.jar:/etc/dse:/usr/share/java/jna.jar:/etc/dse/cassandra:/usr/share/dse/cassandra/tools/lib/stress.jar:/usr/share/dse/cassandra/lib/antlr-2.7.7.jar:/usr/share/dse/cassandra/lib/antlr-3.2.jar:/usr/share/dse/cassandra/lib/antlr-runtime-3.2.jar:/usr/share/dse/cassandra/lib/avro-1.4.0-cassandra-1.jar:/usr/share/dse/cassandra/lib/cassandra-all-1.1.9.10.jar:/usr/share/dse/cassandra/lib/cassandra-clientutil-1.1.9.10.jar:/usr/share/dse/cassandra/lib/cassandra-thrift-1.1.9.10.jar:/usr/share/dse/cassandra/lib/commons-cli-1.1.jar:/usr/share/dse/cassandra/lib/commons-codec-1.6.jar:/usr/share/dse/cassandra/lib/commons-lang-2.4.jar:/usr/share/dse/cassandra/lib/commons-logging-1.1.1.jar:/usr/share/dse/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/dse/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/dse/cassandra/lib/guava-13.0.jar:/usr/share/dse/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/dse/cassandra/lib/httpclient-4.0.1.jar:/usr/share/dse/cassandra/lib/httpcore-4.0.1.jar:/usr/share/dse/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/dse/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/dse/cassandra/lib/jamm-0.2.5.jar:/usr/share/dse/cassandra/lib/jline-0.9.94.jar:/usr/share/dse/cassandra/lib/joda-time-1.6.2.jar:/usr/share/dse/cassandra/lib/json-simple-1.1.jar:/usr/share/dse/cassandra/lib/libthrift-0.7.0.jar:/usr/share/dse/cassandra/lib/log4j-1.2.16.jar:/usr/share/dse/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/dse/cassandra/lib/servlet-api-2.5.jar:/usr/share/dse/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/dse/cassandra/lib/snakeyaml-1.6.jar:/usr/share/dse/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/dse/cassandra/lib/snaptree-0.1.jar:/usr/share/dse/cassandra/lib/stringtemplate-3.2.jar::/usr/share/dse/solr/lib/solr-4.0.2.4-SNAPSHOT-uber.jar:/usr/share/dse/solr/lib/solr-web-4.0.2.4-SNAPSHOT.jar:/usr/share/dse/solr/conf::/usr/share/dse/tomcat/lib/annotations-api-6.0.32.jar:/usr/share/dse/tomcat/lib/catalina-6.0.32.jar:/usr/share/dse/tomcat/lib/catalina-ha-6.0.32.jar:/usr/share/dse/tomcat/lib/coyote-6.0.32.jar:/usr/share/dse/tomcat/lib/el-api-6.0.29.jar:/usr/share/dse/tomcat/lib/jasper-6.0.29.jar:/usr/share/dse/tomcat/lib/jasper-el-6.0.29.jar:/usr/share/dse/tomcat/lib/jasper-jdt-6.0.29.jar:/usr/share/dse/tomcat/lib/jsp-api-6.0.29.jar:/usr/share/dse/tomcat/lib/juli-6.0.32.jar:/usr/share/dse/tomcat/lib/servlet-api-6.0.29.jar:/usr/share/dse/tomcat/lib/tribes-6.0.32.jar:/usr/share/dse/tomcat/conf::/usr/share/dse/hadoop:/etc/dse/hadoop:/usr/share/dse/hadoop/lib/ant-1.6.5.jar:/usr/share/dse/hadoop/lib/automaton-1.11-8.jar:/usr/share/dse/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/share/dse/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/share/dse/hadoop/lib/commons-cli-1.2.jar:/usr/share/dse/hadoop/lib/commons-codec-1.4.jar:/usr/share/dse/hadoop/lib/commons-collections-3.2.1.jar:/usr/share/dse/hadoop/lib/commons-configuration-1.6.jar:/usr/share/dse/hadoop/lib/commons-digester-1.8.jar:/usr/share/dse/hadoop/lib/commons-el-1.0.jar:/usr/share/dse/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/share/dse/hadoop/lib/commons-lang-2.4.jar:/u

Its the spike in RAM usage. Now it is normal but keeps showing the spikes.


On Wed, Mar 26, 2014 at 5:31 PM, Marcin Cabaj wrote:

> Hi,
>
> RSS or VIRT?
>
> Could you paste output of:
> $ ps -p `jps | awk '/CassandraDaemon/ {print $1}'` uww
> please?
>
>
> On Wed, Mar 26, 2014 at 5:20 PM, prem yadav  wrote:
>
>> Hi,
>> in another thread, I has mentioned that we had issue with Cassandra
>> getting killed by kernel due to OOM. Downgrading to jdk6_37 seems to have
>> fixed it.
>>
>> However, even now, after every couple of hours, the nodes are showing a
>> spike in memory usage.
>>

Re: memory usage spikes

2014-03-26 Thread prem yadav
Thanks Don,
Yes have followed those steps. Except jna. The version I am using is 3.2.4.
The link you have shared is for Cassandra 2.0. I am using 1.1. Let me
install jna 3.2.7 and see if that helps.

Thanks


On Wed, Mar 26, 2014 at 5:38 PM, Donald Smith <
donald.sm...@audiencescience.com> wrote:

>  Prem,
>
>
>
> Did you follow the instructions at
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html?scroll=reference_ds_sxl_gf3_2k
>
>
>
> And did you install jna-3.2.7.jar into /usr/share/java, as per
> http://www.datastax.com/documentation/cassandra/2.0/mobile/cassandra/install/installJnaRHEL.html?
>
>
>
> Don
>
>
>
> *From:* prem yadav [mailto:ipremya...@gmail.com]
> *Sent:* Wednesday, March 26, 2014 10:36 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: memory usage spikes
>
>
>
> here:
>
>
>
> ps -p `/usr/java/jdk1.6.0_37/bin/jps | awk '/Dse/ {print $1}'` uww
>
>
>
> SER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>
> 497  20450  0.9 31.0 4727620 2502644 ? SLl  06:55   3:28
> /usr/java/jdk1.6.0_37//bin/java -ea
> -javaagent:/usr/share/dse/cassandra/lib/jamm-0.2.5.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1968M -Xmx1968M
> -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss190k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
> -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true -Dcassandra-pidfile=/var/run/dse.pid -cp
> :/usr/share/dse/dse.jar:/usr/share/dse/common/commons-codec-1.6.jar:/usr/share/dse/common/commons-io-2.4.jar:/usr/share/dse/common/guava-13.0.jar:/usr/share/dse/common/jbcrypt-0.3m.jar:/usr/share/dse/common/log4j-1.2.16.jar:/usr/share/dse/common/slf4j-api-1.6.1.jar:/usr/share/dse/common/slf4j-log4j12-1.6.1.jar:/etc/dse:/usr/share/java/jna.jar:/etc/dse/cassandra:/usr/share/dse/cassandra/tools/lib/stress.jar:/usr/share/dse/cassandra/lib/antlr-2.7.7.jar:/usr/share/dse/cassandra/lib/antlr-3.2.jar:/usr/share/dse/cassandra/lib/antlr-runtime-3.2.jar:/usr/share/dse/cassandra/lib/avro-1.4.0-cassandra-1.jar:/usr/share/dse/cassandra/lib/cassandra-all-1.1.9.10.jar:/usr/share/dse/cassandra/lib/cassandra-clientutil-1.1.9.10.jar:/usr/share/dse/cassandra/lib/cassandra-thrift-1.1.9.10.jar:/usr/share/dse/cassandra/lib/commons-cli-1.1.jar:/usr/share/dse/cassandra/lib/commons-codec-1.6.jar:/usr/share/dse/cassandra/lib/commons-lang-2.4.jar:/usr/share/dse/cassandra/lib/commons-logging-1.1.1.jar:/usr/share/dse/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/dse/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/dse/cassandra/lib/guava-13.0.jar:/usr/share/dse/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/dse/cassandra/lib/httpclient-4.0.1.jar:/usr/share/dse/cassandra/lib/httpcore-4.0.1.jar:/usr/share/dse/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/dse/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/dse/cassandra/lib/jamm-0.2.5.jar:/usr/share/dse/cassandra/lib/jline-0.9.94.jar:/usr/share/dse/cassandra/lib/joda-time-1.6.2.jar:/usr/share/dse/cassandra/lib/json-simple-1.1.jar:/usr/share/dse/cassandra/lib/libthrift-0.7.0.jar:/usr/share/dse/cassandra/lib/log4j-1.2.16.jar:/usr/share/dse/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/dse/cassandra/lib/servlet-api-2.5.jar:/usr/share/dse/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/dse/cassandra/lib/snakeyaml-1.6.jar:/usr/share/dse/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/dse/cassandra/lib/snaptree-0.1.jar:/usr/share/dse/cassandra/lib/stringtemplate-3.2.jar::/usr/share/dse/solr/lib/solr-4.0.2.4-SNAPSHOT-uber.jar:/usr/share/dse/solr/lib/solr-web-4.0.2.4-SNAPSHOT.jar:/usr/share/dse/solr/conf::/usr/share/dse/tomcat/lib/annotations-api-6.0.32.jar:/usr/share/dse/tomcat/lib/catalina-6.0.32.jar:/usr/share/dse/tomcat/lib/catalina-ha-6.0.32.jar:/usr/share/dse/tomcat/lib/coyote-6.0.32.jar:/usr/share/dse/tomcat/lib/el-api-6.0.29.jar:/usr/share/dse/tomcat/lib/jasper-6.0.29.jar:/usr/share/dse/tomcat/lib/jasper-el-6.0.29.jar:/usr/share/dse/tomcat/lib/jasper-jdt-6.0.29.jar:/usr/share/dse/tomcat/lib/jsp-api-6.0.29.jar:/usr/share/dse/tomcat/lib/juli-6.0.32.jar:/usr/share/dse/tomcat/lib/servlet-api-6.0.29.jar:/usr/share/dse/tomcat/lib/tribes-6.0.32.jar:/usr/share/dse/tomcat/conf::/usr/share/dse/hadoop:/etc/dse/hadoop:/usr/share/dse/hadoop/lib/ant-1.6.5.jar:/usr/share/dse/hadoop/lib/automaton-1.11-8.jar:/usr/share/dse/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/share/dse/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/share/dse/hadoop/lib/commons-cli-

Re: Kernel keeps killing cassandra process - OOM

2014-03-27 Thread prem yadav
Hi all,
thanks for the help. the issue happened again. I looked into Romain's
suggestion and yes, it was VMware balloning.
So, i will not be updating the jira ticket as I am not sure if the fix
actually helped or not.

Thanks,
Prem


On Wed, Mar 26, 2014 at 6:54 PM, Robert Coli  wrote:

> On Wed, Mar 26, 2014 at 8:35 AM, prem yadav  wrote:
>
>> Thanks Robert. That seems to be the issue. however the fix mentioned
>> there doesn't work. I downgraded Java to jdk6_37 and that seems to have
>> done the trick. Thanks for pointing me to that Jira ticket.
>>
>
> If the workaround on that ticket doesn't work with some versions, I'm sure
> the community would appreciate it if you registered for the Apache JIRA and
> detailed your findings there. :)
>
> =Rob
>
>


Re: Installing Datastax Cassandra 1.2.15 Using Yum (Java Issue)

2014-03-27 Thread prem yadav
I have noticed that too. But even though dse installs opsndjk, it never
gets used. So you should be ok.


On Thu, Mar 27, 2014 at 8:29 PM, Jon Forrest  wrote:

> I'm using Oracle Java 7 on a CentOS 6.5 system.
> Running 'java -version' works correctly and shows
> I'm running Oracle Java. I don't want to use OpenJDK.
> This is good since I notice that the Datastax
> documentation in several places says to install
> Oracle Java, not OpenJDK.
>
> I want to install Datastax Cassandra 1.2.15 using Yum.
> I added the Datastax repository and then ran
>
> yum install dsc12-1.2.10-1 cassandra12-1.2.10-1
>
> Guess what! Yum tries to install java-1.7.0-openjdk !
> This is in spite of the fact that a) I already have Oracle
> Java installed, and b) the docs say not to install
> OpenJDK.
>
> It appears that the Datastax rpm file is violating its
> own documentation. What am I doing wrong? How to other
> people get around this problem? Is it worth somehow
> rebuilding the rpm so that it's not dependent on OpenJDK
> or is installing from a tarball the only way to resolve
> this?
>
> Thanks for any advice.
>
> Jon Forrest
>
>
> The information transmitted in this email is intended only for the person
> or entity to which it is addressed, and may contain material confidential
> to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any
> review, retransmission, dissemination or other use of, or taking of any
> action in reliance upon, this information by persons or entities other than
> the intended recipient(s) is prohibited. If you received this email in
> error, please contact the sender and delete the material from your files.
>


Re: Using C* and CAS to coordinate workers

2014-04-04 Thread prem yadav
Though cassandra can work but to me it looks like you could use a
persistent queue for example (rabbitMQ) to implement this. All your workers
can subscribe to a queue.
In fact, why not just MySQL?


On Thu, Apr 3, 2014 at 11:44 PM, Jan Algermissen  wrote:

> Hi,
>
> maybe someone knows a nice solution to the following problem:
>
> I have N worker processes that are intentionally masterless and do not
> know about each other - they are stateless and independent instances of a
> given service system.
>
> These workers need to poll an event feed, say about every 10 seconds and
> persist a state after processing the polled events so the next worker knows
> where to continue processing events.
>
> I would like to use C*'s CAS feature to coordinate the workers and protect
> the shared state (a row or cell in  a C* key space, too).
>
> Has anybody done something similar and can suggest a 'clever' data model
> design and interaction?
>
>
>
> Jan


Re: Using C* and CAS to coordinate workers

2014-04-04 Thread prem yadav
Oh ok. I thought you did not have a cassandra cluster already. Sorry about
that.


On Fri, Apr 4, 2014 at 11:42 AM, Jan Algermissen  wrote:

>
> On 04 Apr 2014, at 11:18, prem yadav  wrote:
>
> Though cassandra can work but to me it looks like you could use a
> persistent queue for example (rabbitMQ) to implement this. All your workers
> can subscribe to a queue.
> In fact, why not just MySQL?
>
>
> Hey, I have got a C* cluster that can (potentially) do CAS.
>
> Why would I set up a MySQL cluster to solve that problem?
>
> And yeah, I could use a queue or redis or whatnot, but I want to avoid yet
> another moving part :-)
>
> Jan
>
>
>
>
> On Thu, Apr 3, 2014 at 11:44 PM, Jan Algermissen <
> jan.algermis...@nordsc.com> wrote:
>
>> Hi,
>>
>> maybe someone knows a nice solution to the following problem:
>>
>> I have N worker processes that are intentionally masterless and do not
>> know about each other - they are stateless and independent instances of a
>> given service system.
>>
>> These workers need to poll an event feed, say about every 10 seconds and
>> persist a state after processing the polled events so the next worker knows
>> where to continue processing events.
>>
>> I would like to use C*'s CAS feature to coordinate the workers and
>> protect the shared state (a row or cell in  a C* key space, too).
>>
>> Has anybody done something similar and can suggest a 'clever' data model
>> design and interaction?
>>
>>
>>
>> Jan
>
>
>
>


Re: Cassandra Disk storage capacity

2014-04-07 Thread Prem Yadav
you can specify multiple data directories in cassandra.yaml.
ex:
data_file_directories:
  - /var/lib.cass1
  - /var/lib/cass2
  -/


On Mon, Apr 7, 2014 at 12:10 PM, Jan Kesten  wrote:

>  Hi Hari,
>
> C* will use your entire space - that is something one should monitor.
> Depending on your choose on compaction strategy your data_dir should not be
> filled up entirely - in the worst case compaction will need space as large
> as the sstables on disk, therefore 50% should be free space.
>
> The parameters used for on disk storage are commitlog_directory and
> data_file_directories and saved_caches_directory. The paramter
> data_file_directories is in plural, you can easily put more than one
> directory here (and you should do this instead of using RAID).
>
> Cheers,
> Jan
>
> Am 07.04.2014 12:56, schrieb Hari Rajendhran:
>
> Hi Team,
>
>  We have a 3 node Apache cassandra 2.0.4 setup installed in our lab
> setup.We have set data directory to /var/lib/cassandra/data.What would be
> the maximum
> disk storage that will be used for cassandra data storage.
>
>  Note : /var partition has a storage capacity of 40GB.
>
>  My question is whether cassandra will  the entire / directory for data
> storage ?
> If no, how to specify multiple directories for data storage ??
>
>
>
>
>
> Best Regards
> Hari Krishnan Rajendhran
> Hadoop Admin
> DESS-ABIM ,Chennai BIGDATA Galaxy
> Tata Consultancy Services
> Cell:- 9677985515
> Mailto: hari.rajendh...@tcs.com
> Website: http://www.tcs.com
> 
> Experience certainty. IT Services
> Business Solutions
> Consulting
> 
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
> --
> Jan Kesten, mailto:j.kes...@enercast.de 
> Tel.: +49 561/4739664-0 FAX: -9
> enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel   
> HRB15471http://www.enercast.de Online-Prognosen für erneuerbare Energien
> Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz
>
> Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
> geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
> sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
> benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie 
> diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese 
> E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen 
> Dank.
>
> This e-mail and any attachment may contain confidential and/or privileged 
> information. If you are not the named addressee or if this transmission has 
> been addressed to you in error, please notify us immediately by reply e-mail 
> and then delete this e-mail and any attachment from your system. Please 
> understand that you must not copy this e-mail or any attachment or disclose 
> the contents to any other person. Thank you for your cooperation.
>
>


regarding schema and suitability of cassandra

2014-04-11 Thread Prem Yadav
Hi,
I am now to cassandra and even though I am not familiar to the
implementation and architecture of cassandra, Is struggle with how to best
design the schema.

We have an application where we need to store huge amounts of data. Its a
per user storage where we store a lot of data for each user and do a lot of
random reads using userid.
Initially, there will be a lot of writes and once it has stabilized, the
reads will increase.

We are expecting to randomly read about 15 GB of data everyday. The reads
will be per user id.

Could you please suggest an implementation and things I need to consider if
I have to go with Cassandra.
Also, I have heard that Cassandra doesn't perform will with high read ops.
How true is that? How many read connections per machine can handle and how
do I measure that in cassandra/


Thanks


Re: regarding schema and suitability of cassandra

2014-04-11 Thread Prem Yadav
Thanks.
For the use case, what should I be thinking about schema-wise. ?

Thanks,
Prem


On Fri, Apr 11, 2014 at 2:16 PM, Sergey Murylev wrote:

>  Hi Prem,
>
>
> Also, I have heard that Cassandra doesn't perform will with high read ops.
> How true is that?
>
> I think that it isn't true. Cassandra has very good read performance. For
> more details you can look to 
> benchmark<http://planetcassandra.org/nosql-performance-benchmarks/#EndPoint>
> .
>
> How many read connections per machine can handle and how do I measure that
> in cassandra/
>
>  Cassandra uses one thread-per-client for remote procedure calls. For a
> large number of client connections, this can cause excessive memory usage
> for the thread stack. Connection pooling on the client side is highly
> recommended.
>
> --
> Thanks,
> Sergey
>
>
> On 11/04/14 13:03, Prem Yadav wrote:
>
> Hi,
> I am now to cassandra and even though I am not familiar to the
> implementation and architecture of cassandra, Is struggle with how to best
> design the schema.
>
>  We have an application where we need to store huge amounts of data. Its
> a per user storage where we store a lot of data for each user and do a lot
> of random reads using userid.
> Initially, there will be a lot of writes and once it has stabilized, the
> reads will increase.
>
>  We are expecting to randomly read about 15 GB of data everyday. The
> reads will be per user id.
>
>  Could you please suggest an implementation and things I need to consider
> if I have to go with Cassandra.
> Also, I have heard that Cassandra doesn't perform will with high read ops.
> How true is that? How many read connections per machine can handle and how
> do I measure that in cassandra/
>
>
>  Thanks
>
>
>


Re: java.lang.OutOfMemoryError: Java heap space

2014-04-28 Thread Prem Yadav
Are the virtual machines? The last time I had this issues was because of
VMWare "ballooning".
If not, what versions of Cassandra and Java are you using?


On Mon, Apr 28, 2014 at 6:30 PM, Gary Zhao  wrote:

> BTW, the CPU usage on this node is pretty high, but data size is pretty
> small.
>
>   PID USERNAME  THR PRI NICE  SIZE   RES   SHR STATE   TIMECPU COMMAND
> 28674 cassandr   89  250 9451M 8970M  525M sleep  32.1H   329% java
>
> UN  8.92 GB256 35.4%  c2d9d02e-bdb3-47cb-af1b-eabc2eeb503b  rack1
> UN  5.32 GB256 33.1%  874a269f-96bc-4db6-9d15-7cac6771a2ea  rack1
> DN  339.94 MB  256 31.5%  fba5b071-aee1-451f-ae42-49c3bbe70bbe  rack1
>
>
> On Mon, Apr 28, 2014 at 10:25 AM, Gary Zhao  wrote:
>
>> Hello
>>
>> I have a three nodes cluster. I noticed one node was always down.
>> Restarting Cassandra fixes it but it will go down again after a couple of
>> days. I'm pretty new to Cassandra so I'm wondering how I should
>> troubleshoot it. Logs is as below.
>>
>>  INFO [StorageServiceShutdownHook] 2014-04-28 13:21:05,091
>> ThriftServer.java (line 141) Stop listening to thrift clients
>> ERROR [GossipStage:4] 2014-04-28 13:11:59,877 CassandraDaemon.java (line
>> 196) Exception in thread Thread[GossipStage:4,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> ERROR [ACCEPT-/10.20.132.44] 2014-04-28 13:06:10,261
>> CassandraDaemon.java (line 196) Exception in thread Thread[ACCEPT-/
>> 10.20.132.44,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> ERROR [GossipStage:3] 2014-04-28 12:54:02,116 CassandraDaemon.java (line
>> 196) Exception in thread Thread[GossipStage:3,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> ERROR [GossipStage:2] 2014-04-28 12:52:27,644 CassandraDaemon.java (line
>> 196) Exception in thread Thread[GossipStage:2,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> ERROR [Thread-222] 2014-04-28 12:50:18,689 CassandraDaemon.java (line
>> 196) Exception in thread Thread[Thread-222,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> ERROR [GossipTasks:1] 2014-04-28 12:47:12,879 CassandraDaemon.java (line
>> 196) Exception in thread Thread[GossipTasks:1,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> ERROR [GossipTasks:1] 2014-04-28 13:24:59,113 CassandraDaemon.java (line
>> 196) Exception in thread Thread[GossipTasks:1,5,main]
>> java.lang.IllegalThreadStateException
>> at java.lang.Thread.start(Thread.java:704)
>>  at
>> org.apache.cassandra.service.CassandraDaemon$2.uncaughtException(CassandraDaemon.java:202)
>> at
>> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.handleOrLog(DebuggableThreadPoolExecutor.java:220)
>>  at
>> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:79)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>  at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>>  INFO [StorageServiceShutdownHook] 2014-04-28 13:25:35,105 Server.java
>> (line 181) Stop listening for CQL clients
>>  INFO [StorageServiceShutdownHook] 2014-04-28 13:25:35,105 Gossiper.java
>> (line 1251) Announcing shutdown
>>  INFO [StorageServiceShutdownHook] 2014-04-28 13:26:12,524
>> MessagingService.java (line 667) Waiting for messaging service to quiesce
>>
>> Thanks
>> Gary
>>
>
>


Re: autoscaling cassandra cluster

2014-05-21 Thread Prem Yadav
Hi Jabbar,
with vnodes, scaling up should not be a problem. You could just add a
machines with the cluster/seed/datacenter conf and it should join the
cluster.
Scaling down has to be manual where you drain the node and decommission it.

thanks,
Prem



On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam  wrote:

> Hello,
>
> Has anybody got a cassandra cluster which autoscales depending on load or
> times of the day?
>
> I've seen the documentation on the datastax website and that only
> mentioned adding and removing nodes, unless I've missed something.
>
> I want to know how to do this for the google compute engine. This isn't
> for a production system but a test system(multiple nodes) where I want to
> learn. I'm not sure how to check the performance of the cluster, whether I
> use one performance metric or a mix of performance metrics and then invoke
> a script to add or remove nodes from the cluster.
>
> I'd be interested to know whether people out there are autoscaling
> cassandra on demand.
>
> Thanks
>
> Jabbar Azam
>


Re: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-21 Thread Prem Yadav
I would think its because of the index and filter files. Also the
additional data which gets added because of serialization. Also, since
SStables are only deleted after the compaction us finished, it might be
possible that when you checked, the intermediate SSTables were not yet
deleted.

However, 50% additional disk usage does sound bad.


On Wed, May 21, 2014 at 4:42 PM, Phil Luckhurst <
phil.luckhu...@powerassure.com> wrote:

> I'm wondering if the lack of response to this means it was a dumb question
> however I've searched the documentation again but I still can't find an
> answer :-(
>
> Phil
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594627.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Question about replacing a dead node

2014-06-03 Thread Prem Yadav
Hi,

in the last week week, we saw at least two emails about dead node
replacement. Though I saw the documentation about how to do this, i am not
sure I understand why this is required.

Assuming replication factor is >2, if a node dies, why does it matter? If
we add a new node is added, shouldn't it just take the chunk of data it
server as the "primary" node from the other existing nodes.
Why do we need to worry about replacing the dead node?

Thanks


Cassandra use cases/Strengths/Weakness

2014-07-04 Thread Prem Yadav
Hi,
I have seen this in a lot of replies that Cassandra is not designed for
this and that. I don't want to sound rude, i just need some info about this
so that i can compare it to technologies like hbase, mongo,
elasticsearch, solr,
etc.

1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
ElasticSearch
What is the use case(s) that suit Cassandra.

2) What kind of queries are best suited for Cassandra.
I ask this Because I have seen people asking about queries and getting
replies that its not suited for Cassandra. For ex: queries where large
number of rows are requested and timeout happens. Or range queries or
aggregate queries.

3) Where does Cassandra excel compared to other technologies?

I have been working on Casandra for some time. I know how it works and I
like it very much.
We are moving towards building a big cluster. But at this point, I am not
sure if its a right decision.

A lot of people including me like Cassandra in my company. But it has more
to do with the CQL and not the internals or the use cases. Until now, there
have been small PoCs and people enjoyed it. But a large scale project, we
are not so sure.

Please guide us.
Please note that the drawbacks of other technologies do not interest me,
its the strengths/weaknesses of Cassandra I am interested in.
Thanks


Re: Cassandra use cases/Strengths/Weakness

2014-07-04 Thread Prem Yadav
Thanks Manoj. Great post for those who already have Cassandra in production.
However it brings me back to my original post.
All the points you have mentioned apply to any big data technology.
Storage- All of them
Query- All of them. In fact lot of them perform better. Agree that CQL
structure is better. But hive,mongo all good
Availability- many of them

So my question is basically to Cassandra support people e.g.- Datastax Or
the developers.
What makes Cassandra special.
If I have to convince my CTO to spend million dollars on a cluster and
support, his first question would be why Cassandra? Why not this or that?

So I still am not sure about what special Cassandra brings to the table?

Sorry about the rant. But in the enterprise world, decisions are taken
based on taking into account the stability, convincing managers and what
not. Chosen technology has to be stable for years. People should be
convinced that the engineers are not going to do a lot of firefighting.

Any inputs appreciated.



On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar 
wrote:

> These are my personal opinions based on few months using Cassandra. These
> are my views. Others
> may have different opinion
>
>
>
> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
>
> regards
>
>
>
> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav  wrote:
>
>> Hi,
>> I have seen this in a lot of replies that Cassandra is not designed for
>> this and that. I don't want to sound rude, i just need some info about this
>> so that i can compare it to technologies like hbase, mongo, elasticsearch, 
>> solr,
>> etc.
>>
>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
>> ElasticSearch
>> What is the use case(s) that suit Cassandra.
>>
>> 2) What kind of queries are best suited for Cassandra.
>> I ask this Because I have seen people asking about queries and getting
>> replies that its not suited for Cassandra. For ex: queries where large
>> number of rows are requested and timeout happens. Or range queries or
>> aggregate queries.
>>
>> 3) Where does Cassandra excel compared to other technologies?
>>
>> I have been working on Casandra for some time. I know how it works and I
>> like it very much.
>> We are moving towards building a big cluster. But at this point, I am not
>> sure if its a right decision.
>>
>> A lot of people including me like Cassandra in my company. But it has
>> more to do with the CQL and not the internals or the use cases. Until now,
>> there have been small PoCs and people enjoyed it. But a large scale
>> project, we are not so sure.
>>
>> Please guide us.
>> Please note that the drawbacks of other technologies do not interest me,
>> its the strengths/weaknesses of Cassandra I am interested in.
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> http://khangaonkar.blogspot.com/
>


Re: Cassandra use cases/Strengths/Weakness

2014-07-04 Thread Prem Yadav
Jens,
thanks for the response but your reply doesn't serve any purpose. I asked
about use cases suitable for Cassandra. It is a basic question about what
purpose does this technology serve? My use case or requirements do not
matter in that regard. And 'fits our requirements' is not a valid reason
anymore. Until hadoop came along, RDBMS fit all requirements just fine. Its
about choosing a superior technology which turns into minimal overhead and
more profit for the company.


*James*, excellent points and very helpful. I have supported multiple
systems as well. Hadoop, Hbase, ElasticSearch, Solr and we also have a very
efficient and fast, horizontally scalable scala based internal system.
Apart from the performance and operational excellenece that you mentioned,
is there is use case which cassandra excels by design?
Take OPENTSDB for example. They chose hbase because hbase serves this use
case by design. Great scan performance.

Any use case like that where Cassandra is the obvious choice?

Thanks



On Fri, Jul 4, 2014 at 8:58 PM, James Horey  wrote:

> I’ve supported a variety of different “big data” systems and most have
> their own particular set of use cases that make sense. Having said that, I
> believe that Cassandra uniquely excels at the following:
>
> * Low write latency with respect to small to medium write sizes (logs,
> sensor data, etc.)
> * Linear write scalability
> * Fault-tolerance across geographic locations
>
> The first two points makes it an excellent candidate for high-throughput
> “transactional” systems. Other systems that play in this space tend to be
> HBase and Riak (there may be others, but I’m most familiar with those two).
> However, the last point is pretty unique to Cassandra.
>
> So if you’re looking for a high-scale out, high-throughput transactional
> system then Cassandra may make sense for you. If you’re looking for
> something more geared towards analytics (so few bulk writes, many reads),
> then something in the Hadoop space may make sense.
>
> Cheers
> James
>
> On Jul 4, 2014, at 3:31 PM, Prem Yadav  wrote:
>
> Thanks Manoj. Great post for those who already have Cassandra in
> production.
> However it brings me back to my original post.
> All the points you have mentioned apply to any big data technology.
> Storage- All of them
> Query- All of them. In fact lot of them perform better. Agree that CQL
> structure is better. But hive,mongo all good
> Availability- many of them
>
> So my question is basically to Cassandra support people e.g.- Datastax Or
> the developers.
> What makes Cassandra special.
> If I have to convince my CTO to spend million dollars on a cluster and
> support, his first question would be why Cassandra? Why not this or that?
>
> So I still am not sure about what special Cassandra brings to the table?
>
> Sorry about the rant. But in the enterprise world, decisions are taken
> based on taking into account the stability, convincing managers and what
> not. Chosen technology has to be stable for years. People should be
> convinced that the engineers are not going to do a lot of firefighting.
>
> Any inputs appreciated.
>
>
>
> On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar 
> wrote:
>
>> These are my personal opinions based on few months using Cassandra. These
>> are my views. Others
>> may have different opinion
>>
>>
>>
>> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
>>
>> regards
>>
>>
>>
>> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav  wrote:
>>
>>> Hi,
>>> I have seen this in a lot of replies that Cassandra is not designed for
>>> this and that. I don't want to sound rude, i just need some info about this
>>> so that i can compare it to technologies like hbase, mongo, elasticsearch, 
>>> solr,
>>> etc.
>>>
>>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
>>> ElasticSearch
>>> What is the use case(s) that suit Cassandra.
>>>
>>> 2) What kind of queries are best suited for Cassandra.
>>> I ask this Because I have seen people asking about queries and getting
>>> replies that its not suited for Cassandra. For ex: queries where large
>>> number of rows are requested and timeout happens. Or range queries or
>>> aggregate queries.
>>>
>>> 3) Where does Cassandra excel compared to other technologies?
>>>
>>> I have been working on Casandra for some time. I know how it works and I
>>> like it very much.
>>> We are moving towards building a big cluster. But at this point, I am
>>> not 

Re: Cassandra use cases/Strengths/Weakness

2014-07-04 Thread Prem Yadav
Duy,
if you are not already working for Datastax, they should hire you. :)

Great response. You have given me some good points to think about.  I will
do the rest of the research.

Thanks.





On Fri, Jul 4, 2014 at 10:10 PM, DuyHai Doan  wrote:

> I would answer your question this way:
>
> 1) Why should I choose C* ?
>
>  a. linear scalability, throughputs scale "almost" linearly with number of
> nodes
>
>  b. almost unbounded extensivity (there is no limit, or at least  huge
> limit in term of number of nodes you can have on a cluster)
>
>  c. operational simplicity due to master-less architecture. This feature
> is, although quite transparent for developers, is a key selling point.
> Having suffered when installing manually a Hadoop cluster, I happen to love
> the deployment simplicity of C*, only one process per node, no moving parts.
>
> d. high availability. C* trades consistency for availability clearly so
> you can expect to have something like 99.99% of uptime. Very selling point
> for critical business which need to be up all the time
>
> e. support for multi data centers out of the box. Again, on the
> operational side, it's a great feature if you plan a worldwide deployment
>
> That's all I can see for now
>
> 2) Why shouldn't I choose C* ?
>
> a. need for a strong consistency most of the time. Although you can
> perform all requests  with Consistency level ALL, it's clearly not the best
> use of C*. You'll suffer for higher latency and reduced availability. Even
> the new "lightweight transaction" feature is not meant to be use on large
> scale
>
> b. very complicated and changing queries. Denormalizing is great when you
> know ahead of time exactly how you'll query your data. Once done, any new
> way of querying will require new coding & new tables to support it
>
> c. ridiculous data load. I've seen people in prod using C* for only 200Gb
> because they want to be trendy and use bleeding edge technologies. They'd
> better off using a classical RDBMS solution that fit perfectly their load
>
> Hope that helps
>
> Duy Hai DOAN
>
>
>
> On Fri, Jul 4, 2014 at 9:31 PM, Prem Yadav  wrote:
>
>> Thanks Manoj. Great post for those who already have Cassandra in
>> production.
>> However it brings me back to my original post.
>> All the points you have mentioned apply to any big data technology.
>> Storage- All of them
>> Query- All of them. In fact lot of them perform better. Agree that CQL
>> structure is better. But hive,mongo all good
>> Availability- many of them
>>
>> So my question is basically to Cassandra support people e.g.- Datastax Or
>> the developers.
>> What makes Cassandra special.
>> If I have to convince my CTO to spend million dollars on a cluster and
>> support, his first question would be why Cassandra? Why not this or that?
>>
>> So I still am not sure about what special Cassandra brings to the table?
>>
>> Sorry about the rant. But in the enterprise world, decisions are taken
>> based on taking into account the stability, convincing managers and what
>> not. Chosen technology has to be stable for years. People should be
>> convinced that the engineers are not going to do a lot of firefighting.
>>
>> Any inputs appreciated.
>>
>>
>>
>> On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar 
>> wrote:
>>
>>> These are my personal opinions based on few months using Cassandra.
>>> These are my views. Others
>>>  may have different opinion
>>>
>>>
>>>
>>> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
>>>
>>> regards
>>>
>>>
>>>
>>> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav  wrote:
>>>
>>>> Hi,
>>>> I have seen this in a lot of replies that Cassandra is not designed for
>>>> this and that. I don't want to sound rude, i just need some info about this
>>>> so that i can compare it to technologies like hbase, mongo, elasticsearch, 
>>>> solr,
>>>> etc.
>>>>
>>>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
>>>> ElasticSearch
>>>> What is the use case(s) that suit Cassandra.
>>>>
>>>> 2) What kind of queries are best suited for Cassandra.
>>>> I ask this Because I have seen people asking about queries and getting
>>>> replies that its not suited for Cassandra. For ex: queries where large
>>>> number of rows are requested and timeout happens. Or range qu

Re: UnavailableException

2014-07-11 Thread Prem Yadav
Please post the full exception.


On Fri, Jul 11, 2014 at 1:50 PM, Ruchir Jha  wrote:

> We have a 12 node cluster and we are consistently seeing this exception
> being thrown during peak write traffic. We have a replication factor of 3
> and a write consistency level of QUORUM. Also note there is no unusual Or
> Full GC activity during this time. Appreciate any help.
>
> Sent from my iPhone


spark on cassandra

2014-08-08 Thread Prem Yadav
HI,
are there any cluster specific prerequisites for running spark on Cassandra?

I create two DCs. DC1 and DC2. DC1 had two cassandra nodes with vnodes.
I create two nodes in DC2 with murmu partitioning and set num_token: 1.
Enabled Hadoop and Spark and started DSE.

I can verify that hadoop started because the Jobtracker page shows up. But
I don't know how to verify Spark.
I also see a lot of this in logs:

Activating plugin: com.datastax.bdp.plugin.ExternalProcessAuthPlugin
 INFO [main] 2014-08-08 17:31:14,492 PluginManager.java (line 232) No
enough available nodes to start plugin
com.datastax.bdp.plugin.ExternalProcessAuthPlugin. Trying once again...


When I run command dse shark-> it just hangs.

Am I doing something wrong?

Thanks