cassandra dump file path

2016-10-03 Thread Jean Carlo
Hi I see in the log of my node cassandra that the parameter -XX:HeapDumpPath is charged two times. INFO [main] 2016-10-03 04:21:29,941 CassandraDaemon.java:205 - JVM Arguments: [-ea, -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar, -XX:+CMSClassUnloadingEnabled, -XX:+UseThreadPriorities, -XX:

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
While that sentence leaves a lot to be desired (for me because it confers a different meaning on row store), it doesn't say "Cassandra is like a RDBMS" - it says "like an RDBMS, it organises data by rows and columns" - i.e., in this regard only it is like an RDBMS, not more generally. I believe it

Re: cassandra dump file path

2016-10-03 Thread Jean Carlo
OK I got the response to one of my questions. In the script /etc/init.d/cassandra we set the path for the heap dump by default in the cassandra_home. Now the thing I don't understand is, why do the dumps are located by the file set by /etc/init.d/cassandra and not by the conf file cassandra-env.s

Re: Difference in token range count

2016-10-03 Thread techpyaasa .
Can someone please reply? On Fri, Sep 30, 2016 at 11:25 PM, laxmikanth sadula wrote: > Hi Eric, > > Thanks for the reply... > RF=3 for all DCs... > > On Fri, Sep 30, 2016 at 9:57 PM, Eric Stevens wrote: > >> What is your replication factor in this DC? >> >> On Fri, Sep 30, 2016 at 8:08 AM techp

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
The phrase is defensible, but that is the root of the problem. Take for example a skateboard. "A skateboard is like a bike because it has wheels and you ride on it." That is true and defensively true. :) However with not much more text you can accurately describe what it is, as opposed to somethi

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
Also every piece of techincal information that describes a rowstore http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems Does it like this: 001:10,Smith,Joe,4; 002:12,Jones,Mary,5; 003:11,Johnson,Cathy

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
The equivalent statement would be: "Like a bike, a scooter has wheels." This is a really important linguistic distinction you seem to be glossing over. It is not saying "A is like X," it is saying "A has specific traits in common with X." For example "Like cancer, heart disease is a leading cau

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad
Sorry Ed, but you're really stretching here. A table in Cassandra is structured by a schema with the data for each row stored together in each data file. Just because it uses log structured storage, sparse fields, and semi-flexible collections doesn't disqualify it from calling it a "row store" Po

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin
Whether a storage engine requires schema isn't really critical for row oriented storage. How about CSV that doesn't have a header row? CSV is probably the most commonly used row oriented storage and tons of businesses still use it for B2B transactions. As you pointed out, some traditional RDBMS ha

Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry
A couple things I would like to note: 1. Cassandra does not determine how data is stored on disk, the compaction strategy does. One could, in theory, (and I believe some are trying) could create a column-store compaction strategy. There is a large effort in the database community overall to sepa

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
My original point can be summed up as: Do not define cassandra in terms SMILES & METAPHORS. Such words include "like" and "close relative". For the specifics: Any relational db could (and I'm sure one does!) allow for sparse fields as well. MySQL can be backed by rocksdb now, does that make it n

Unsubscribe

2016-10-03 Thread Christof Bornhoevd
Unsubscribe On Monday, October 3, 2016, Benedict Elliott Smith wrote: > While that sentence leaves a lot to be desired (for me because it confers > a different meaning on row store), it doesn't say "Cassandra is like a > RDBMS" - it says "like an RDBMS, it organises data by rows and columns" - >

Re: Repairing without -pr shows unexpected out-of-sync ranges

2016-10-03 Thread Stefano Ortolani
I was wondering: is (2) a direct consequence of a repair on the full token range (and thus anti-compaction ran only on a subset of the RF nodes)?. If I understand correctly, a repair with -pr should fix this, at the cost of all nodes performing the anticompaction phase? Cheers, Stefano On Tue, Se

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread Eric Stevens
It sounds like you're trying to avoid the latency of waiting for a write confirmation to a remote data center? App ==> DC1 ==high-latency==> DC2 If you need the write to be confirmed before you consider the write successful in your application (definitely recommended unless you're ok with losing

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread Dorian Hoxha
Thanks for the explanation Eric. I would think it as something like: The keyspace will be on dc1 + dc2, with the option that no long-term-data is in dc1. So you write to dc1 (to the right nodes), they write to commit-log/memtable and when they push for inter-dc-replication dc1 then deletes local d

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread INDRANIL BASU
Hello All, I am getting the below error repeatedly in the system log of C* 2.1.0 WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835 SliceQueryFilter.java:236 - Read 0 live and 1923 tombstoned cells in test_schema.test_cf.test_cf_col1_idx (see tombstone_warn_threshold). 5000 columns was reques

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad
Nobody is claiming Cassandra is a relational I'm not sure why that keeps coming up. On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo wrote: > My original point can be summed up as: > > Do not define cassandra in terms SMILES & METAPHORS. Such words include > "like" and "close relative". > > For th

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad
It's a row store because its schemed (vs ad hoc documents), and data (rows) are stored together. What would you call the things you iterate over when you query a partition? Rows. That makes it a thing that stores "rows" of data, row store isn't some crazy stretch. On Mon, Oct 3, 2016 at 12:33 PM Jo

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
... and my response can be summed up as "you are not parsing English correctly." The word "like" does not mean what you think it means in this context. It does not mean "close relative." It is constrained to the similarities expressed, and no others. You don't seem to be reading any of my respo

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin
I've met clients that read the cassandra docs and then said in a big meeting "it's just like relational database, it has tables just like sqlserver/oracle." I'm not putting words in other people's mouth either, but I've heard that said enough times to want to puke. Does the docs claim cassandra is

Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry
"X-store" refers to how data is stored, in almost every case it refers to what logical constructs are grouped together physically on disk. It has nothing to do with whether a database is relational or not. Cassandra does, in fact meet the definition of row-store, however, I would like to re-itera

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
Nobody is disputing that the docs can and should be improved to avoid this misreading. I've invited Ed to file a JIRA and/or pull request twice now. You are of course just as welcome to do this. Perhaps you will actually do it, so we can all move on with our lives! On 3 October 2016 at 17:45

Tombstoned error and then OOM

2016-10-03 Thread INDRANIL BASU
Hello All, I am getting the below error repeatedly in the system log of C* 2.1.0 WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835 SliceQueryFilter.java:236 - Read 0 live and 1923 tombstoned cells in test_schema.test_cf.test_cf_col1_idx (see tombstone_warn_threshold). 5000 columns was requ

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread Dorian Hoxha
@INDRANIL Please go find your own thread and don't hijack mine. On Mon, Oct 3, 2016 at 6:19 PM, INDRANIL BASU wrote: > Hello All, > > I am getting the below error repeatedly in the system log of C* 2.1.0 > > WARN [SharedPool-Worker-64] 2016-09-27 00:43:35,835 > SliceQueryFilter.java:236 - Read

An extremely fast cassandra table full scan utility

2016-10-03 Thread siddharth verma
Hi, I was working on a utility which can be used for cassandra full table scan, at a tremendously high velocity, cassandra fast full table scan. How fast? The script dumped ~ 229 million rows in 116 seconds, with a cluster of size 6 nodes. Data transfer rates were upto 25MBps was observed on cassan

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread DuyHai Doan
Hello Siddarth I just throw an eye over the architecture diagram. The idea of using multiple threads, one for each token range is great. It help maxing out parallelism. With https://issues.apache.org/jira/browse/CASSANDRA-11521 it would be even faster. On Mon, Oct 3, 2016 at 7:51 PM, siddharth v

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread siddharth verma
Hi DuyHai, Thanks for your reply. A few more features planned in the next one(if there is one) like, custom policy keeping in mind the replication of token range on specific nodes, fine graining the token range(for more speedup), and a few more. I think, as fine graining a token range, If one toke

Re: Row cache not working

2016-10-03 Thread Abhinav Solan
Hi, can anyone please help me with this Thanks, Abhinav On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan wrote: > Hi Everyone, > > My table looks like this - > CREATE TABLE test.reads ( > svc_pt_id bigint, > meas_type_id bigint, > flags bigint, > read_time timestamp, > value do

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
You know what don't "go low" and suggest the recent un-subscriber on me. If your so eager to deal with my pull request please review this one: I would rather you review this pull request: https://issues.apache.org/jira/browse/CASSANDRA-10825 On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Sm

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Bhuvan Rawal
It will be interesting to have a comparison with spark here for basic use cases. >From a naive observation it appears that this could be slower than spark as a lot of data is streamed over network. On the other hand in this approach we have seen that Young GC takes nearly full CPU (possibly becau

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Jonathan Haddad
It almost sounds like you're duplicating all the work of both spark and the connector. May I ask why you decided to not use the existing tools? On Mon, Oct 3, 2016 at 2:21 PM siddharth verma wrote: > Hi DuyHai, > Thanks for your reply. > A few more features planned in the next one(if there is on

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Siddharth Verma
Hi Jon, We couldn't setup a spark cluster. For some use case, a spark cluster was required, but for some reason we couldn't create spark cluster. Hence, one may use this utility to iterate through the entire table at very high speed. Had to find a work around, that would be faster than paging on r

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Jonathan Haddad
Couldn't set up as couldn't get it working, or its not allowed? On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma wrote: > Hi Jon, > We couldn't setup a spark cluster. > > For some use case, a spark cluster was required, but for some reason we > couldn't create spark cluster. Hence, one may use this

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread siddharth verma
Hi Jon, It wan't allowed. Moreover, if someone who isn't familiar with spark, and might be new to map filter reduce etc. operations, could also use the utility for some simple operations assuming a sequential scan of the cassandra table. Regards Siddharth Verma On Tue, Oct 4, 2016 at 1:32 AM, Jon

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Bhuvan Rawal
Hi Jonathan, If full scan is a regular requirement then setting up a spark cluster in locality with Cassandra nodes makes perfect sense. But supposing that it is a one off requirement, say a weekly or a fortnightly task, a spark cluster could be an added overhead with additional capacity, resource

Re: Row cache not working

2016-10-03 Thread Edward Capriolo
I was thinking about this issue. I was wondering on the dev side if it would make sense to make a utility for the unit tests that could enable tracing and then assert that a number of steps in the trace happened. Something like: setup() runQuery("SELECT * FROM X") Assertion.assertTrace("Preparing

Re: Row cache not working

2016-10-03 Thread Jeff Jirsa
Which version of Cassandra are you running (I can tell it’s newer than 2.1, but exact version would be useful)? From: Abhinav Solan Reply-To: "user@cassandra.apache.org" Date: Monday, October 3, 2016 at 11:35 AM To: "user@cassandra.apache.org" Subject: Re: Row cache not working Hi, ca

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Edward Capriolo
I undertook a similar effort a while ago. https://issues.apache.org/jira/browse/CASSANDRA-7014 Other than the fact that it was closed with no comments, I can tell you that other efforts I had to embed things in Cassandra did not go swimmingly. Although at the time ideas were rejected like groovy

Re: Row cache not working

2016-10-03 Thread Abhinav Solan
It's cassandra 3.0.7, I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then only it works don't know why. If I set 'rows_per_partition':'1' then it does not work. Also wanted to ask one thing, if I set row_cache_save_period: 60 then this cache would be refreshed automatically o

Re: Row cache not working

2016-10-03 Thread Jeff Jirsa
Seems like it’s probably worth opening a jira issue to track it (either to confirm it’s a bug, or to be able to better explain if/that it’s working as intended – the row cache is probably missing because trace indicates the read isn’t cacheable, but I suspect it should be cacheable).  

Re: Row cache not working

2016-10-03 Thread Edward Capriolo
Since the feature is off by default. The coverage might could be only as deep as the specific tests that test it. On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa wrote: > Seems like it’s probably worth opening a jira issue to track it (either to > confirm it’s a bug, or to be able to better explain i

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
I did not ascribe blame. I only empathised with their predicament; I don't want to listen to either of us, either! On 3 October 2016 at 19:45, Edward Capriolo wrote: > You know what don't "go low" and suggest the recent un-subscriber on me. > > If your so eager to deal with my pull request

Re: Row cache not working

2016-10-03 Thread Hannu Kröger
If I remember correctly row cache caches only N rows from the beginning of the partition. N being some configurable number. See this link which is suggesting that: http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1 Br, Hannu > On 4 Oct 2016, at 1.32, Edward Capriolo wrote: > > Sin

Re: Row cache not working

2016-10-03 Thread Jeff Jirsa
That’s true for versions 2.1 and newer. However, it’s possible that 3.0 engine rewrite introduced a bug or two that haven’t yet been found. From: Hannu Kröger Reply-To: "user@cassandra.apache.org" Date: Monday, October 3, 2016 at 3:52 PM To: "user@cassandra.apache.org" Subject: Re: Row

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread Yabin Meng
Dorian, I don't think Cassandra is able to achieve what you want natively. In short words, what you want to achieve is conditional data replication. Yabin On Mon, Oct 3, 2016 at 1:37 PM, Dorian Hoxha wrote: > @INDRANIL > Please go find your own thread and don't hijack mine. > > On Mon, Oct 3,

Re: cassandra dump file path

2016-10-03 Thread Yabin Meng
Have you restarted Cassandra after making changes in cassandra-env.sh? Yabin On Mon, Oct 3, 2016 at 7:44 AM, Jean Carlo wrote: > OK I got the response to one of my questions. In the script > /etc/init.d/cassandra we set the path for the heap dump by default in the > cassandra_home. > > Now the

Re: Cassandra 3 node cluster with intermittent network issues on one node

2016-10-03 Thread Yabin Meng
Most likely node A has some gossip related problems. You can try purging the gossip state on node A, as per the procedure: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_gossip_purge.html . Yabin On Mon, Oct 3, 2016 at 2:38 AM, Girish Kamarthi < girish.kamar...@stellapps.com>

Re: Replacing a dead node in a live Cassandra Cluster

2016-10-03 Thread Yabin Meng
Are you sure cassandra.yaml file of the new node is correctly configured? What is your seeds and listen_address setup of your new node and existing nodes? Yabin On Fri, Sep 30, 2016 at 7:56 PM, Rajath Subramanyam wrote: > Hello Cassandra-users, > > I was running some tests today. My end goal wa

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread INDRANIL BASU
@Dorain, yes i did that by mistake. I rectified it by starting a new thread.   Thanks and regards,-- Indranil Basu From: Dorian Hoxha To: user@cassandra.apache.org; INDRANIL BASU Sent: Monday, 3 October 2016 11:07 PM Subject: Re: Way to write to dc1 but keep data only in dc2 @INDRA