UUID coming as int while using SPARK SQL

2016-05-24 Thread Rajesh Radhakrishnan
Hi, I got a Cassandra keyspace, but while reading the data(especially UUID) via Spark SQL using Python is not returning the correct value. Cassandra: -- My table 'SAM'' is described below: CREATE table ks.sam (id uuid, dept text, workflow text, type double primary key (id, dept))

Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael
Try converting that int from decimal to hex and inserting dashes in the appropriate spots - or go the other way. Also, you are looking at different rows, based upon your selection criteria... ml On Tue, May 24, 2016 at 6:23 AM, Rajesh Radhakrishnan < rajesh.radhakrish...@phe.gov.uk> wrote: > Hi

RE: UUID coming as int while using SPARK SQL

2016-05-24 Thread Rajesh Radhakrishnan
Hi Michael, Thank you for the quick reply. So you are suggesting to convert this int value(UUID comes back as int via Spark SQL) to hex? And selection is just a example to highlight the UUID convertion issue. So in Cassandra it should be SELECT id, workflow FROM sam WHERE dept='blah'; And in S

Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael
Yes - a UUID is just a 128 bit value. You can view it using any base or format. If you are looking at the same row, you should see the same 128 bit value, otherwise my theory is incorrect :) Cheers, ml On Tue, May 24, 2016 at 6:57 AM, Rajesh Radhakrishnan < rajesh.radhakrish...@phe.gov.uk> wrote

Cassandra event notification on INSERT/DELETE of records

2016-05-24 Thread Aaditya Vadnere
Hi experts, We are evaluating Cassandra as messaging infrastructure for a project. In our workflow Cassandra database will be synchronized across two nodes, a component will INSERT/UPDATE records on one node and another component (who has registered for the specific table) on second node will get

RE: Removing a datacenter

2016-05-24 Thread Anubhav Kale
Sorry I should have more clear. What I meant was doing exactly what you wrote, but do a “removenode” instead of “decommission” to make it even faster. Will that have any side-effect (I think it shouldn’t) ? From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] Sent: Monday, May 23, 2016 4:43 PM T

Re: Removing a datacenter

2016-05-24 Thread Jeff Jirsa
The fundamental difference between a removenode and a decommission is which node(s) stream data. In decom, the leaving node streams. In removenode, other owners of the data stream. If you set replication factor for that DC to 0, there’s nothing to stream, so it’s irrelevant – do whichever you l

Re: Cassandra event notification on INSERT/DELETE of records

2016-05-24 Thread Eric Stevens
It sounds like you're trying to build a queue in Cassandra, which is one of the classic anti-pattern use cases for Cassandra. You may be able to do something clever with triggers, but I highly recommend you look at purpose-built queuing software such as Kafka to solve this instead. On Tue, May 24

Too many keyspaces causes cql connection to time out ?

2016-05-24 Thread Justin Lin
Stop-The-World GC might block the connection until it times out. This is the log that i think is relevant. INFO 20160524-060930.028882 :: Initializing sandbox_20160524_t06_09_18.table1 INFO 20160524-060933.908008 :: G1 Young Generation GC in 551ms. G1 Eden Space: 98112 -> 0; G1

Re: Too many keyspaces causes cql connection to time out ?

2016-05-24 Thread Eric Stevens
ht block the connection until it times > out. This is the log that i think is relevant. > > INFO 20160524-060930.028882 :: Initializing > sandbox_20160524_t06_09_18.table1 > > INFO 20160524-060933.908008 :: G1 Young Generation GC in 551ms. G1 Eden > Space: 98112 -> 0; G1

Re: Thrift client creates massive amounts of network packets

2016-05-24 Thread Eric Stevens
I'm not familiar with Titan's usage patterns for Cassandra, but I wonder if this is because of the consistency level it's querying Cassandra at - i.e. if CL isn't LOCAL_[something], then this might just be lots of little checksums required to satisfy consistency requirements. On Mon, May 23, 2016

Cassandra and Kubernetes and scaling

2016-05-24 Thread Mike Wojcikiewicz
I saw a thread from April 2016 talking about Cassandra and Kubernetes, and have a few follow up questions. It seems that especially after v1.2 of Kubernetes, and the upcoming 1.3 features, this would be a very viable option of running Cassandra on. My questions pertain to HostIds and Scaling Up/D

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Luke Jolly
Here's my setup: Datacenter: gce-us-central1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.3 6.4 GB 256 100.0% 3317a3de-9113-48e2-9a85-bbf7

Re: Too many keyspaces causes cql connection to time out ?

2016-05-24 Thread Justin Lin
an connect >> to cassandra). And from cassandra log, we can see it takes roughly 3 >> seconds to do gc when there is an incoming connection. And the gc is the >> only difference between the timeout connection and the successful >> connection. So we suspect this Stop-The

Re: Cassandra event notification on INSERT/DELETE of records

2016-05-24 Thread Mark Reddy
+1 to what Eric said, a queue is a classic C* anti-pattern. Something like Kafka or RabbitMQ might fit your use case better. Mark On 24 May 2016 at 18:03, Eric Stevens wrote: > It sounds like you're trying to build a queue in Cassandra, which is one > of the classic anti-pattern use cases for

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bhuvan Rawal
Hi Luke, You mentioned that replication factor was increased from 1 to 2. In that case was the node bearing ip 10.128.0.20 carried around 3GB data earlier? You can run nodetool repair with option -local to initiate repair local datacenter for gce-us-central1. Also you may suspect that if a lot o

Re: Cassandra and Kubernetes and scaling

2016-05-24 Thread Aiman Parvaiz
Looking forward to hearing from the community about this. Sent from my iPhone > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz wrote: > > I saw a thread from April 2016 talking about Cassandra and Kubernetes, and > have a few follow up questions. It seems that especially after v1.2 of > Kub

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Luke Jolly
So I guess the problem may have been with the initial addition of the 10.128.0.20 node because when I added it in it never synced data I guess? It was at around 50 MB when it first came up and transitioned to "UN". After it was in I did the 1->2 replication change and tried repair but it didn't fix

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bhuvan Rawal
For the other DC, it can be acceptable because partition reside on one node, so say if you have a large partition, it may skew things a bit. On May 25, 2016 2:41 AM, "Luke Jolly" wrote: > So I guess the problem may have been with the initial addition of the > 10.128.0.20 node because when I adde

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread kurt Greaves
Not necessarily considering RF is 2 so both nodes should have all partitions. Luke, are you sure the repair is succeeding? You don't have other keyspaces/duplicate data/extra data in your cassandra data directory? Also, you could try querying on the node with less data to confirm if it has the same

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bryan Cheng
Hi Luke, I've never found nodetool status' load to be useful beyond a general indicator. You should expect some small skew, as this will depend on your current compaction status, tombstones, etc. IIRC repair will not provide consistency of intermediate states nor will it remove tombstones, it onl

Re: OOM under high write throughputs on 2.2.5

2016-05-24 Thread Bryan Cheng
Hi Zhiyan, Silly question but are you sure your heap settings are actually being applied? "697,236,904 (51.91%)" would represent a sub-2GB heap. What's the real memory usage for Java when this crash happens? Other thing to look into might be memtable_heap_space_in_mb, as it looks like you're usi

Error while rebuilding a node: Stream failed

2016-05-24 Thread George Sigletos
I am getting this error repeatedly while I am trying to add a new DC consisting of one node in AWS to my existing cluster. I have tried 5 times already. Running Cassandra 2.1.13 I have also set: streaming_socket_timeout_in_ms: 360 in all of my nodes Does anybody have any idea how this can be

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Mike Yeap
Hi Luke, I've encountered similar problem before, could you please advise on following? 1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml? 2) when you add 10.128.0.20, were the data and cache directories in 10.128.0.20 empty? - /var/lib/cassandra/data - /var/lib/cas

Re: Error while rebuilding a node: Stream failed

2016-05-24 Thread Mike Yeap
Hi George, are you using NetworkTopologyStrategy as the replication strategy for your keyspace? If yes, can you check the cassandra-rackdc.properties of this new node? https://issues.apache.org/jira/browse/CASSANDRA-8279 Regards, Mike Yeap On Wed, May 25, 2016 at 2:31 PM, George Sigletos wrote