Re: batch_size_warn_threshold_in_kb

2014-12-11 Thread Jens Rantil
Maybe slightly off-topic, but what is a mutation? Is it equivalent to a CQL row? Or maybe a column in a row? Does include tombstones within the selected range? Thanks, Jens On Thu, Dec 11, 2014 at 9:56 PM, Ryan Svihla wrote: > Nothing magic, just put in there based on experience. You can find

RE: batch_size_warn_threshold_in_kb

2014-12-11 Thread Mohammed Guller
Ryan, Thanks for the quick response. I did see that jira before posting my question on this list. However, I didn’t see any information about why 5kb+ data will cause instability. 5kb or even 50kb seems too small. For example, if each mutation is 1000+ bytes, then with just 5 mutations, you wil

Re: Get column family size

2014-12-11 Thread Chamila Wijayarathna
Hi Philip, Ryan, I checked cassandra system.log for any issues, but it showed no error there. I tried using cfstats and it gave me https://gist.github.com/cdwijayarathna/e6b4d3d7d8c272fcfd24. It doesn't seem to have any information like number of keys. I am running cassandra in a single node and

nodetool breaks on firewall ?

2014-12-11 Thread Kevin Burton
I have a firewall I need to bring up to keep our boxes off the Internet (obviously). The problem is that once I do nodetool doesn’t work anymore. There’s a bunch of advice on this on the Internet: http://stackoverflow.com/questions/17430872/cassandra-1-2-nodetool-getting-failed-to-connect-when-t

Re: batch_size_warn_threshold_in_kb

2014-12-11 Thread Shane Hansen
I don't know why 5kb was chosen. The general trend is that larger batches will put more stress on the coordinator node. The precise point at which things fall over will vary. On Thu, Dec 11, 2014 at 1:43 PM, Mohammed Guller wrote: > Hi – > > The cassandra.yaml file has property called *batch_

Re: Get column family size

2014-12-11 Thread Ryan Svihla
An estimated partition key count can be had from nodetool cfstats, however for large data sets analytics style queries (such as verification of large data sets) I recommend spark, hive, hadoop, and even solr for some use cases. On Thu, Dec 11, 2014 at 3:10 PM, Philip Thompson < philip.thomp...@dat

Re: Get column family size

2014-12-11 Thread Ryan Svihla
So that query in cqlsh actually has a default limit of 1 and so if you're timing out trying to retrieve only 10k rows that makes me suspect you have either a lot of data per row, or you've got a really really unhappy server. I'd check the cassandra logs for errors, there is probably a lot more

Re: Get column family size

2014-12-11 Thread Philip Thompson
Chamila, You can find more detailed explanations in previous posts on this mailing list as to why, but a "Select count(*) from table;" query is inefficient in Cassandra for non-trivial datasets. You will need a better way to get the number of partition keys of a CF, which hopefully someone else in

Re: batch_size_warn_threshold_in_kb

2014-12-11 Thread Ryan Svihla
Nothing magic, just put in there based on experience. You can find the story behind the original recommendation here https://issues.apache.org/jira/browse/CASSANDRA-6487 Key reasoning for the desire comes from Patrick McFadden: "Yes that was in bytes. Just in my own experience, I don't recommend

batch_size_warn_threshold_in_kb

2014-12-11 Thread Mohammed Guller
Hi - The cassandra.yaml file has property called batch_size_warn_threshold_in_kb. The default size is 5kb and according to the comments in the yaml file, it is used to log WARN on any batch size exceeding this value in kilobytes. It says caution should be taken on increasing the size of this thre

Re: Get column family size

2014-12-11 Thread Chamila Wijayarathna
Hi Philip, Yes, I'm using cqlsh. Is there any way I can solve this? Thank You! On Fri, Dec 12, 2014 at 12:26 AM, Philip Thompson < philip.thomp...@datastax.com> wrote: > I assume the query you are sending is through cqlsh. You are actually > getting a client-side timeout error, which is unclear

Re: Get column family size

2014-12-11 Thread Philip Thompson
I assume the query you are sending is through cqlsh. You are actually getting a client-side timeout error, which is unclear in 2.1.2, but I believe the error message will be more helpful as of 2.1.3. On Thu, Dec 11, 2014 at 1:52 PM, Chamila Wijayarathna < cdwijayarat...@gmail.com> wrote: > Hello

Get column family size

2014-12-11 Thread Chamila Wijayarathna
Hello all, I am trying to get the number of key value pairs. I used following query for this. select count(*) from corpus.word_usage ; This returns number of key value pairs when CF is relatively small. But when I insert more key-velue pairs, I am getting error saying, "errors={}, last_host=127

Re: java.lang.AssertionError in cqlsh

2014-12-11 Thread Chamila Wijayarathna
Done. https://issues.apache.org/jira/browse/CASSANDRA-8461 On Thu, Dec 11, 2014 at 9:00 PM, Philip Thompson < philip.thomp...@datastax.com> wrote: > That is definitely a bug, and I do not see a JIRA with the same problem > already filed. Could you file an issue please? Cassandra - ASF JIRA >

Re: java.lang.AssertionError in cqlsh

2014-12-11 Thread Tyler Hobbs
It looks like this is resolved in the latest 2.1. I think the fix was a combination of CASSANDRA-8286 and something else. On Thu, Dec 11, 2014 at 9:30 AM, Philip Thompson < philip.thomp...@datastax.com> wrote: > That is definitely a bug, and

Re: java.lang.AssertionError in cqlsh

2014-12-11 Thread Philip Thompson
That is definitely a bug, and I do not see a JIRA with the same problem already filed. Could you file an issue please? Cassandra - ASF JIRA On Thu, Dec 11, 2014 at 10:15 AM, Chamila Wijayarathna < cdwijayarat...@gmail.com> wrote:

Re: java.lang.AssertionError in cqlsh

2014-12-11 Thread Chamila Wijayarathna
Hi Philip, I'm using version 2.1.2. Following is the error log at system.log. ( https://gist.github.com/cdwijayarathna/2749f52c52f5c7fd807d ) ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, / 127.0.0.1:35624 =

Re: java.lang.AssertionError in cqlsh

2014-12-11 Thread Philip Thompson
The full error should be in that node's system.log file. What version are you running? On Thu, Dec 11, 2014 at 9:42 AM, Chamila Wijayarathna < cdwijayarat...@gmail.com> wrote: > Hi Philip, > > I ran my queries on cqlsh terminal and it only shows this. > > Thank you! > > On Thu, Dec 11, 2014 at 6:

Re: java.lang.AssertionError in cqlsh

2014-12-11 Thread Chamila Wijayarathna
Hi Philip, I ran my queries on cqlsh terminal and it only shows this. Thank you! On Thu, Dec 11, 2014 at 6:43 PM, Philip Thompson < philip.thomp...@datastax.com> wrote: > There is a definitely a problem here, but without the entire stack trace, > it is unclear what exactly may be wrong. > > On

Re: Good partition key doubt

2014-12-11 Thread DuyHai Doan
"what is a good partition key? Is partition key direct related with my query performance? What is the best practices?" A good partition key is a partition key that will scale with your data. An example: if you have a business involving individuals, it is likely that your business will scale as soo

Re: java.lang.AssertionError in cqlsh

2014-12-11 Thread Philip Thompson
There is a definitely a problem here, but without the entire stack trace, it is unclear what exactly may be wrong. On Thu, Dec 11, 2014 at 7:37 AM, Chamila Wijayarathna < cdwijayarat...@gmail.com> wrote: > Hello all, > > I have a column family with following schema. > > CREATE TABLE corpus.trigra

Re: Spark SQL Vs CQL performance on Cassandra

2014-12-11 Thread Peter Lin
Spark is an in-memory architecture, so you're not going to see it go faster than CQL for a simple select from 1 table on a few keys. Where you'll see a benefit is loading lots of data into memory and doing some "report like" query where you join data from multiple tables. On Thu, Dec 11, 2014 at 8

Spark SQL Vs CQL performance on Cassandra

2014-12-11 Thread Ajay
Hi, To test Spark SQL Vs CQL performance on Cassandra, I did the following: 1) Cassandra standalone server (1 server in a cluster) 2) Spark Master and 1 Worker Both running in a Thinkpad laptop with 4 cores and 8GB RAM. 3) Written Spark SQL code using Cassandra-Spark Driver from Cassandra (JavaAp

java.lang.AssertionError in cqlsh

2014-12-11 Thread Chamila Wijayarathna
Hello all, I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run

Good partition key doubt

2014-12-11 Thread José Guilherme Vanz
Hello folks I am studying Cassandra for a short a period of time and now I am modeling a database for study purposes. During my modeling I have faced a doubt, what is a good partition key? Is partition key direct related with my query performance? What is the best practices? Just to study case, l