write timeouts when saving a big object

2015-04-22 Thread Alexander Shutyaev
Hi all! We have a problem with cassandra. We're getting write timeouts when saving a big object. The object size is approx. 30 Mb which is big but not enormous as (if I'm not wrong) cassandra can handle up to 2Gb in theory. We tried increasing write timeout but that didn't help - 10 minutes that w

Re: Is 2.1.5 ready for upgrade?

2015-04-22 Thread Nathan Bijnens
We had some serious issues with 2.1.3: - Bootstrapping a new node resulted in OOM - Repair resulted in an OOM on several nodes - When reading some parts of the data it caused cascading crashes on all it's replica nodes. Downgrading to the 2.0.X branch didn't work because of some incompatibilities,

Reading hundreds of thousands of rows at once?

2015-04-22 Thread John Anderson
Hey, I'm looking at querying around 500,000 rows that I need to pull into a Pandas data frame for processing. Currently testing this on a single cassandra node it takes around 21 seconds: https://gist.github.com/sontek/4ca95f5c5aa539663eaf I tried introducing multiprocessing so I could use 4 pro

Re: Reading hundreds of thousands of rows at once?

2015-04-22 Thread Anishek Agarwal
I think these will help speed up - removing compression - you have lot of independent columns mentioned. If you are always going to query all of them together one other thing that will help is have a full json(or some custom obj representation) of the value data and change the model to just have s

Re: Reading hundreds of thousands of rows at once?

2015-04-22 Thread Anishek Agarwal
also might want to go through a thread here in with subject "High latencies for simple queries" On Wed, Apr 22, 2015 at 1:55 PM, Anishek Agarwal wrote: > I think these will help speed up > > - removing compression > - you have lot of independent columns mentioned. If you are always going > to qu

OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Mich Talebzadeh
Hi, I have a table of 300,000 rows. When I try to do a simple cqlsh:ase> select count(1) from t; OperationTimedOut: errors={}, last_host=127.0.0.1 Appreciate any feedback Thanks, Mich NOTE: The information in this email is proprietary and confidential. This message i

ROperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Mich Talebzadeh
Hi, I have a table of 300,000 rows. When I try to do a simple cqlsh:ase> select count(1) from t; OperationTimedOut: errors={}, last_host=127.0.0.1 Appreciate any feedback Thanks, Mich NOTE: The information in this email is proprietary and confidential. This message i

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Tommy Stendahl
Hi, Checkout CASSANDRA-8899, my guess is that you have to increase the timeout in cqlsh. /Tommy On 2015-04-22 11:15, Mich Talebzadeh wrote: Hi, I have a table of 300,000 rows. When I try to do a simple cqlsh:ase> select count(1) from t; OperationTimedOut: errors={}, last_host=127.0.0.1

Re: ROperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Umang Shah
HI, It is common problem, if your machine has 4 GB of RAM then you can only retrieve records about 20 so you have to increase the RAM of your system to avoid this problem.. Thanks, Umang Shah On Wed, Apr 22, 2015 at 9:34 AM, Mich Talebzadeh wrote: > Hi, > > > > I have a table of 300,000 ro

RE: ROperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Mich Talebzadeh
Thanks Umang. I have 9GB of memory free here out of 24GB cassandra@rhes564::/apps/cassandra> free total used free sharedbuffers cached Mem: 24675328 147745329900796 0 5399009097992 -/+ buffers/cache:5136640 19538688

Re: ROperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Umang Shah
In that case you have to increase the readtimeout as some suggested. On Wed, Apr 22, 2015 at 10:06 AM, Mich Talebzadeh wrote: > Thanks Umang. > > > > I have 9GB of memory free here out of 24GB > > > > cassandra@rhes564::/apps/cassandra> free > > total used free share

Adhoc querying in Cassandra?

2015-04-22 Thread Matthew Johnson
Hi all, Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two

Re: Adhoc querying in Cassandra?

2015-04-22 Thread Ali Akhtar
You might find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store. On Apr 22, 2015 4:42 PM, "Matthew Johnson" wrote: > Hi all, > > > > Currently we are setting up a “big” data cluster, but we are only going to > have a couple of servers

Re: Adhoc querying in Cassandra?

2015-04-22 Thread Brian O'Neill
+1, I think many organizations (including ours) pair Elastic Search with Cassandra. Use Cassandra as your system of record, then index the data with ES. -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile € @boneill42

RE: Adhoc querying in Cassandra?

2015-04-22 Thread Matthew Johnson
Hi Ali, Brian, Thanks for the suggestion – we have previously used Solr (SolrCloud for distribution) for a lot of other products, presumably this will do the same job as ElasticSearch? Or does ElasticSearch have specifically better integration with Cassandra or better support for aggregate queri

Re: Adhoc querying in Cassandra?

2015-04-22 Thread Ali Akhtar
I believe ElasticSearch has better support for scaling horizontally (by adding nodes) than Solr does. Some benchmarks that I've looked at, also show it as performing better under high load. I probably wouldn't run them both on the same node, or you might see low performance as they compete for res

Re: Adhoc querying in Cassandra?

2015-04-22 Thread Brian O'Neill
Again ‹ agreed. They have different usage patterns (C* heavy writes, ES heavy read), I would separate them. SOLR should be sufficient. I believe DSE is a tight integration between SOLR and C*. -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588

Re: Getting " ParNew GC in ... CMS Old Gen ... " in logs

2015-04-22 Thread Phil Yang
Only if there is a gc over more than 200ms it will be logged. You can use jstat to see whether each young gen gc takes so long like this, if so, you may need to reduce the size of young gen in conf/cassandra-env.sh to reduce the stopping time. Of course it will make the gc triggered more frequently

Re: Is 2.1.5 ready for upgrade?

2015-04-22 Thread Phil Yang
I think it is an acceptable idea to build the latest code in cassandra-2.1 branch rather than waiting for official release because the older versions for 2.1.x indeed have some serious issues. At least I did this in our cluster and our troubles in 2.1.1 had been fixed. 2015-04-22 15:22 GMT+08:00 N

Re: LCS Strategy, compaction pending tasks keep increasing

2015-04-22 Thread Brice Dutheil
Reads are mostly limited by IO so I’d set concurrent_read to something related to your disks, we have set it to 64 (yet we have SSDs) Writes are mostly limited by CPU, so the number of cores matter, we set concurrent_read to 48 and 128 (depending on the CPU on the nodes) Careful with LCS it is not

Re: Handle Write Heavy Loads in Cassandra 2.0.3

2015-04-22 Thread Anuj Wadehra
Thanks Brice for all the comments.. We analyzed gc logs and heap dump before tuning JVM n gc. With new JVM config I specified we were able to remove promotion failures seen with default config. With Heap dump I got an idea that memetables and compaction are biggest culprits. CAASSANDRA-6142

Re: Handle Write Heavy Loads in Cassandra 2.0.3

2015-04-22 Thread Anuj Wadehra
Any other suggestions on the JVM Tuning and Cassandra config we did to solve the promotion failures during gc? I would appreciate if someone can try to answer our queries mentioned in initial mail? Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Wed, 22 Apr, 2

Re: Reading hundreds of thousands of rows at once?

2015-04-22 Thread Robert Wille
Add more nodes to your cluster On Apr 22, 2015, at 1:39 AM, John Anderson mailto:son...@gmail.com>> wrote: Hey, I'm looking at querying around 500,000 rows that I need to pull into a Pandas data frame for processing. Currently testing this on a single cassandra node it takes around 21 seconds

cassandra and spark from cloudera distirbution

2015-04-22 Thread Serega Sheypak
Hi, are Cassandra and Spark from Cloudera compatible? Where can I find these compatilibity notes?

RE: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Mich Talebzadeh
Thanks Robert, In RDBMS select count(1) basically returns the rows. 1> select count(1) from t 2> go --- 30 (1 row affected) Is count(1) fundamentally different in Cassandra? Does count(1) means return (in my case) 1 three hundred thousand time? Cheers

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
Keep in mind that "select count(l)" and "select l" amount to essentially the same thing. On Apr 22, 2015, at 3:41 AM, Tommy Stendahl mailto:tommy.stend...@ericsson.com>> wrote: Hi, Checkout CASSANDRA-8899, my guess is that you have to increase the timeout in cqlsh. /Tommy On 2015-04-22 11:1

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
I should have been more clear. What I meant was that its about the same amount of work for the cluster to do a “select count(l)” as it is to do a “select l” (unlike in the RDBMS world, where count(l) can use the primary key index). The reason why is the coordinator has to retrieve all the rows f

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
And I should have read the post more clearly. I thought it was count(l), not count(1). But, either way, you’re counting the number of records in the table, which in the RDBMS world means scanning an index, and in Cassandra means the coordinator has to select all the records from all the nodes.

RE: Adhoc querying in Cassandra?

2015-04-22 Thread Matthew Johnson
Our requirements are somewhat in flux at the moment, but initially it will be mostly writes with periodic read spikes (probably overnight etc) for various analytics. Going forwards however, as our application usage scales up, we may end up using it as a read/write replacement for MySQL in some case

RE: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Mich Talebzadeh
Thanks Robert for explanation. Please correct me if I am wrong. Currently running a single node cluster of Cassandra. There is the primary key on object_id column in both RDBMS and Cassandra. As you correctly pointed out RDBMS does not need to touch the base table. It can just go throug

RE: Cassandra tombstones being created by updating rows with TTL's

2015-04-22 Thread Walsh, Stephen
Hey Anuj, I think this might be related to me quickly dropping the tables and re-creating then to add in the gc_grace_seconds to 0, instead of doing a ALTER TABLE command. This might have caused the FileNotFound Issue. I might just drop the keyspace do a nodetool clean up on each node, then re-

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
Use a counter table to maintain the count so you don’t have to compute it. When you do something that affects the count, its generally easy to issue an asynchronous query to update the counter in parallel with the actual work. It definitely complicates the code, especially if you have a lot of p

Re: Adhoc querying in Cassandra?

2015-04-22 Thread Nathan Bijnens
For Analytics workloads combining Spark and Cassandra will bring you lots of flexibility and performance. However you will have to setup and learn Spark. The Spark Cassandra connector is very performant and a joy to work with. N. On Wed, Apr 22, 2015 at 4:09 PM Matthew Johnson wrote: > Our requ

Re: Handle Write Heavy Loads in Cassandra 2.0.3

2015-04-22 Thread Brice Dutheil
Another reason for memtable to be kept in memory if there's wide rows. Maybe someone can chime in and confirm or not, but I believe wide rows (in the thrift sense) need to synced entirely across nodes. So from the number you gave a node can send ~100 Mb over the network for a single row. With compa

unsubscribe

2015-04-22 Thread Bill Tsay
From: Mich Talebzadeh mailto:m...@peridale.co.uk>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Wednesday, April 22, 2015 at 3:06 AM To: "user@cassandra.apache.org" mailto:user@cassandra.apache

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-22 Thread Anuj Wadehra
Thanks Robert!! The JIRA was very helpful in understanding how tombstone threshold is implemented. And ticket also says that running major compaction weekly is an alternative. I actually want to understand if I run major compaction on a cf with 500gb of data and a single giant file is created.

RE: unsubscribe

2015-04-22 Thread Matthew Johnson
Hi Bill, To remove your address from the list, send a message to: Cheers, Matt *From:* Bill Tsay [mailto:bt...@splunk.com] *Sent:* 22 April 2015 15:36 *To:* user@cassandra.apache.org *Subject:* unsubscribe *From: *Mich Talebzadeh *Reply-To: *"user@cassandra.apache.org" *Da

any "nodetool-like showparameters" to show loaded cassandra.yaml parameters ?

2015-04-22 Thread DE VITO Dominique
Hi, I have not seen any available cmd like "nodetool showparameters" to show loaded cassandra.yaml parameters of one node (to display them remotely, or to check if loaded parameters are the ones of the "cassandra.yaml"). Does anyone know if there is a cmd to display those parameters (I don't th

Re: cassandra and spark from cloudera distirbution

2015-04-22 Thread Jay Ken
There is a Enerprise Edition from Datastax; where they have Spark and Cassandra Integration. http://www.datastax.com/what-we-offer/products-services/datastax-enterprise Thanks, Jay On Wed, Apr 22, 2015 at 6:41 AM, Serega Sheypak wrote: > Hi, are Cassandra and Spark from Cloudera compatible? >

Re: cassandra and spark from cloudera distirbution

2015-04-22 Thread Serega Sheypak
We already use it. Would like to use Spark from cloudera distribution. Should it work? 2015-04-22 19:43 GMT+02:00 Jay Ken : > There is a Enerprise Edition from Datastax; where they have Spark and > Cassandra Integration. > > http://www.datastax.com/what-we-offer/products-services/datastax-enterpr

Re: cassandra and spark from cloudera distirbution

2015-04-22 Thread Brian O'Neill
Depends which veresion of Spark you are running on Cloudera. Once you know that ‹ have a look at the compatibility chart here: https://github.com/datastax/spark-cassandra-connector -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile

Re: cassandra and spark from cloudera distirbution

2015-04-22 Thread Sebastian Estevez
There is no supported way to replace the embedded spark that comes in DSE with something else. However you could probably read or write from/to DSE / Cassandra from a cloudera spark cluster using the open source DataStax connector. Are you looking for a particular feature that is not available in

Re: cassandra and spark from cloudera distirbution

2015-04-22 Thread Serega Sheypak
What is "embedded" spark? Where can I read about it? Right now we just install spark 1.2 built fro hadoop 2.4 and use it to query data from cassandra. 2015-04-22 19:56 GMT+02:00 Sebastian Estevez : > There is no supported way to replace the embedded spark that comes in DSE > with something else.

Re: RE: Cassandra tombstones being created by updating rows with TTL's

2015-04-22 Thread Anuj Wadehra
Hi Stephen, Dropping cf or keyspace and recreating it looks undesirable here. What I understood is that your rows survive 10sec but if u set gc_grace_seconds to 10 u find lot of tombstones in query.Please correct me if needed. I think that problem is that auto compaction is not getting trigg

Re: cassandra and spark from cloudera distirbution

2015-04-22 Thread Sebastian Estevez
Oh sorry Jay mentioned DSE and you said we already use it. I think the answer to your question is Brian's response. These are the dse docs if you want to read about it: http://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/spark/sparkTOC.html On Apr 22, 2015 2:05 PM, "Serega Sh

Re: cassandra and spark from cloudera distirbution

2015-04-22 Thread Serega Sheypak
we use vanilla spark right now. Here is an ansible script to install spark: https://github.com/seregasheypak/ansible-vagrant-dse-spark/blob/master/roles/spark_configuration/tasks/main.yml and cassandra dse: https://github.com/seregasheypak/ansible-vagrant-dse-spark/blob/master/roles/dse-install/ta

Re: any "nodetool-like showparameters" to show loaded cassandra.yaml parameters ?

2015-04-22 Thread Robert Coli
On Wed, Apr 22, 2015 at 10:09 AM, DE VITO Dominique < dominique.dev...@thalesgroup.com> wrote: > I have not seen any available cmd like “nodetool showparameters” to show > loaded cassandra.yaml parameters of one node (to display them remotely, or > to check if loaded parameters are the ones of the

Re: Is 2.1.5 ready for upgrade?

2015-04-22 Thread Robert Coli
On Tue, Apr 21, 2015 at 2:48 PM, Brian Sam-Bodden wrote: > Robert, > Can you elaborate more please? > 2.1.3 had enough serious issues that I would not run it in production. Nathan mentions some down-thread. 2.1.4 is just a security release 2.1.5 is unreleased, and I don't generally run unrelease

Re: Cluster imbalance caused due to #Num_Tokens

2015-04-22 Thread Robert Coli
On Tue, Apr 21, 2015 at 10:14 PM, Tiwari, Tarun wrote: > I read that there was nodetool balance kind of command in Cassandra 0.7 > but not anymore. > It never worked, really. > UN Node3 23.72 MB 1 0.4% > 41a71df-7e6c-40ab-902f-237697eaaf3e rack1 > > UN Node2 79.35 MB 1 0

Re: Cluster imbalance caused due to #Num_Tokens

2015-04-22 Thread Kiran mk
Bring down the second node using nodetool removenode or decommission Add the node with num_tokens and run the nodetool repair. At last run the nodetool cleanup on both the nodes (one after the other) Observe after some time using nodetool status. On Apr 23, 2015 12:39 AM, "Robert Coli" wrote:

How much data is bootstrapping supposed to send?

2015-04-22 Thread Dave Galbraith
I had a one-node Cassandra 2.1.3 cluster, where the output of nodetool status looked like this: Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 172.31.2