Re: Cassandra - Spark - Flume: best architecture for log analytics.

2015-07-22 Thread Pierre Devops
Cassandra is not very good at massive read/bulk read if you need to retrieve and compute a large amount of data on multiple machines using something like spark or hadoop (or you'll need to hack and process the sstable directly, something which is not "natively" supported, you'll have to hack your w

Issues with SSL encrption after updating to 2.2.0 from 2.1.6

2015-07-22 Thread Carlos Scheidecker
Hello all, After updating to Cassandra 2.2.0 from 2.1.6 I am having SSL issues: My JVM is java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) Ubuntu 14.04.2 LTS is on all nodes, they are the same. Below i

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Robert, thanks for these references! We're not using DTCS, so 9056 and 8243 seem out, but I'll take a look at 9577 (also looked at the referenced thread on this list, which seems to have some interesting data) On Wed, Jul 22, 2015 at 5:33 PM, Robert Coli wrote: > On Wed, Jul 22, 2015 at 2:55 PM,

Re: Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Carlos Scheidecker
I agreed Michael. I was generating stuff for it again, Looks like they had the SSL stack changed. I came from 2.1.6 to 2.2.0. Thanks. On Wed, Jul 22, 2015 at 5:45 PM, Michael Shuler wrote: > What version of Cassandra did you upgrade to 2.2.0 *from*? > > This would help with looking at config dif

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Robert Coli
On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng wrote: > nodetool still reports the node as being healthy, and it does respond to > some local queries; however, the CPU is pegged at 100%. One common thread > (heh) each time this happens is that there always seems to be one of more > compaction threa

Cassandra - Spark - Flume: best architecture for log analytics.

2015-07-22 Thread Renato Perini
Problem: Log analytics. Solutions: 1) Aggregating logs using Flume and storing the aggregations into Cassandra. Spark reads data from Cassandra, make some computations and write the results in distinct tables, still in Cassandra. 2) Aggregating logs using Flume to a sink, streamin

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Aiman Parvaiz
I faced something similar in past and the reason for nodes becoming unresponsive intermittently was Long GC pauses. That's why I wanted to bring this to your attention incase GC pause is a potential cause. Sent from my iPhone > On Jul 22, 2015, at 4:32 PM, Bryan Cheng wrote: > > Aiman, > > Y

RE: Schema questions for data structures with recently-modified access patterns

2015-07-22 Thread Alec Collier
I believe what he really wants is to be able to search for the x most recently modified documents, i.e. without specifying the docID. I don’t believe there is a ‘nice’ way of doing this in Cassandra by itself, given it really favours key-value storage. Even having the date as the partition key

Re: Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Michael Shuler
What version of Cassandra did you upgrade to 2.2.0 *from*? This would help with looking at config differences, changelogs, etc. It seems you have some pretty clear SSL connection errors, according to the logs, which at least helps with seeing why the nodes can't talk to each other. I'm not ter

Re: Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Carlos Scheidecker
Thanks for the reply, Michael! Yes, I did followed the upgrade nodes. I am running Ubuntu Ubuntu 14.04.2 LTS on all and kernel 3.13.0-57-generic on all. I have 4 machines: .31, .32, .33 and .34. If I run nodetool status from .34 I now see all the others as DN the same happens if I log in from th

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Aiman, Your post made me look back at our data a bit. The most recent occurrence of this incident was not preceded by any abnormal GC activity; however, the previous occurrence (which took place a few days ago) did correspond to a massive, order-of-magnitude increase in both ParNew and CMS collect

Re: Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Michael Shuler
On 07/22/2015 04:45 PM, Carlos Scheidecker wrote: I have a 4 node Cassandra system running on 4 Ubuntu boxes. After updating to Cassandra 2.2.0 and keeping the same cassandra.yaml file, the nodes cannot see each other. What version did you upgrade from? Usually, when upgrading, it is probably

Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Carlos Scheidecker
All, I have a 4 node Cassandra system running on 4 Ubuntu boxes. After updating to Cassandra 2.2.0 and keeping the same cassandra.yaml file, the nodes cannot see each other. When I do a nodetool status it only reports as being up the machine where I had issue the command. In other words, all the

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Hi Aiman, We previously had issues with GC, but since upgrading to 2.1.7 things seem a lot healthier. We collect GC statistics through collectd via the garbage collector mbean, ParNew GC's report sub 500ms collection time on average (I believe accumulated per minute?) and CMS peaks at about 300ms

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Aiman Parvaiz
Hi Bryan How's GC behaving on these boxes? On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng wrote: > Hi there, > > Within our Cassandra cluster, we're observing, on occasion, one or two > nodes at a time becoming partially unresponsive. > > We're running 2.1.7 across the entire cluster. > > nodetool

Re: Schema questions for data structures with recently-modified access patterns

2015-07-22 Thread Jack Krupansky
"No way to query recently-modified documents." I don't follow why you say that. I mean, that was the point of the data model suggestion I proposed. Maybe you could clarify. I also wanted to mention that the new materialized view feature of Cassandra 3.0 might handle this use case, including takin

Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Hi there, Within our Cassandra cluster, we're observing, on occasion, one or two nodes at a time becoming partially unresponsive. We're running 2.1.7 across the entire cluster. nodetool still reports the node as being healthy, and it does respond to some local queries; however, the CPU is pegged

Upgraded to Cassandra 2.2.0 nodes not seeing each other

2015-07-22 Thread Carlos Scheidecker
All, I have a 4 node Cassandra system running on 4 Ubuntu boxes. After updating to Cassandra 2.2.0 and keeping the same cassandra.yaml file, the nodes cannot see each other. When I do a nodetool status it only reports as being up the machine where I had issue the command. In other words, all the

[Ann] Cassandra Interpreter for Zeppelin

2015-07-22 Thread DuyHai Doan
Hello I'm pleased to announce a Cassandra interpreter for Apache Zepplin. For those who don't know, Apache Zeppelin[1] is a web-based notebook that enables interactive data analytics. It is similar to IPython/Jupyter but is JVM-based and its architecture is modular enough to allow various back-en

Re: howto do sql query like in a relational database

2015-07-22 Thread Carlos Rolo
Hello Anton, You need to look into Datastax Entreprise (DSE) Offering. It integrates Solr search which allows you to do searches like the one you mention. There are also some opensource projects doing this kind of integration, so its up to you. And as Oded mentioned Cassandra really shines on key

Re: Schema questions for data structures with recently-modified access patterns

2015-07-22 Thread Carlos Alonso
Ah, so you your access pattern is to get all documents modified in a particular date, right? Then I think your approach is good, and to avoid duplication, why don't add the docId as the first clustering column and remove the last_modified field from it? That way, your primary key would be PRIMARY