Re: AMI to use to launch a cluster with OpsCenter on AWS

2015-02-23 Thread Carlos Rolo
Regarding AWS the only thing I normally do (besides the normal installation, etc) is setting up the firewall zones so the ports needed for Cassandra are open. You can follow this guide: https://razvantudorica.com/02/create-a-cassandra-cluster-with-opscenter-on-amazon-ec2/a Regards, Carlos Juzart

Re: C* 2.1.2 invokes oom-killer

2015-02-23 Thread Michał Łowicki
After couple of days it's still behaving fine. Case closed. On Thu, Feb 19, 2015 at 11:15 PM, Michał Łowicki wrote: > Upgrade to 2.1.3 seems to help so far. After ~12 hours total memory > consumption grew from 10GB to 10.5GB. > > On Thu, Feb 19, 2015 at 2:02 PM, Carlos Rolo wrote: > >> Then you

Commitlog activities

2015-02-23 Thread ssiv...@gmail.com
Hi! I have the following keyspaces cqlsh> SELECT * FROM system.schema_keyspaces; keyspace_name | durable_writes | strategy_class | strategy_options ---++-+---

Re: Running Cassandra + Spark on AWS - architecture questions

2015-02-23 Thread Clint Kelly
These are both good suggestions, thanks! I thought I had remembered reading that different virtual datacenters should always have the same number of nodes. I think I was mistaken about that. In the past we had major issues running huge analytics jobs on data stored in HBase (it would bring down

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-23 Thread Clint Kelly
Hi mck, I'm not familiar with this ticket, but my understanding was that performance of Hadoop jobs on C* clusters with vnodes was poor because a given Hadoop input split has to run many individual scans (one for each vnode) rather than just a single scan. I've run C* and Hadoop in production wit

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-23 Thread mck
> … my understanding was that > performance of Hadoop jobs on C* clusters with vnodes was poor because a > given Hadoop input split has to run many individual scans (one for each > vnode) rather than just a single scan. I've run C* and Hadoop in > production with a custom input format that used v

Any notion of "unions" in C* user-defined types?

2015-02-23 Thread Clint Kelly
Hi all, I am building an application that keeps a time-series record of clickstream data (clicks, impressions, etc.). The data model looks something like: CREATE TABLE clickstream ( userid text, event_time timestamp, interaction frozen , PRIMARY KEY (userid, timestamp) ) WITH CLUSTERING

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-23 Thread Eric Stevens
Vnodes is officially disrecommended for DSE Solr integration (though a small number isn't ruinous). That might be why they still don't enable them by default. On Feb 21, 2015 3:58 PM, "mck" wrote: > At least the problem of hadoop and vnodes described in CASSANDRA-6091 > doesn't apply to spark. >

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-23 Thread Jack Krupansky
DSE 4.6 improved Solr vnode performance dramatically, so that vnodes for Search workloads is now no longer officially discouraged. As per the official doc for improvements, : "*Ability to use virtual nodes (vnodes) in Solr nodes. Recommended range: 64 to 256 (overhead increases by approximately 30%

Re: run cassandra on a small instance

2015-02-23 Thread Nate McCall
Agreed and good point. Just added it to mine - thanks Ben. On Sun, Feb 22, 2015 at 9:43 PM, Ben Bromhead wrote: > You might also have some gains setting in_memory_compaction_limit_in_mb to > something very low to force Cassandra to use on disk compaction rather than > doing it in memory. > > On

Re: run cassandra on a small instance

2015-02-23 Thread Nate McCall
Glad that helped. Thanks for reporting back! On Sun, Feb 22, 2015 at 9:12 PM, Tim Dunphy wrote: > Nate, > > Definitely thank you for this advice. After leaving the new Cassandra > node running on the 2GB instance for the past couple of days, I think I've > had ample reason to report complete su

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-23 Thread Eric Stevens
30% overhead is pretty brutal. I think this is basic support for it, and not necessarily a recommendation to use it. From http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/ana/anaNdeOps.html?scroll=anaNdeOps__implicationsVnodes *DataStax does not recommend turning

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-23 Thread Jack Krupansky
Thanks for pointing out a mistake in the doc - that statement (for Search/Solr) was simply a leftover from before 4.6. Besides, it's in the Analytics section, which is not relevant for Search/Solr anyway. -- Jack Krupansky On Mon, Feb 23, 2015 at 11:54 AM, Eric Stevens wrote: > 30% overhead is

Re: run cassandra on a small instance

2015-02-23 Thread Tim Dunphy
> > You might also have some gains setting in_memory_compaction_limit_in_mb > to something very low to force Cassandra to use on disk compaction rather > than doing it in memory. Cool Ben.. thanks I'll add that to my config as well. Glad that helped. Thanks for reporting back! No problem, Nate

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-23 Thread Eric Stevens
That link is the one from the 4.6 New Features page: http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/newFeatures.html - Ability to use virtual nodes (vnodes)

memtable_offheap_space_in_mb and memtable_cleanup_threshold

2015-02-23 Thread ssiv...@gmail.com
Hi everyone! I do write only workload (into one column family) and experiment with offheap-objects memtable space. I set parameters to:/ //memtable_offheap_space_in_mb = 51200 # 50Gb// //memtable_cleanup_threshold = 0.99/ and expect that flush will not be triggered until available /memtable

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-23 Thread Jack Krupansky
Right, and subject to techniques for reducing that overhead that I listed. In fact, I would recommend simply picking the largest number of tokens for which the overhead is acceptable for your app, even if it is only 8 or 16 tokens, by 16, 32, or 64 may be sufficient for most apps. -- Jack Krupansk

build failure with cassandra 2.0.12

2015-02-23 Thread Cheng Ren
Hi, I am experiencing build failure with cassandra 2.0.12. I downloaded source from http://cassandra.apache.org/download/, did ant mvn-install and got following error: [artifact:dependencies] -- [artifact:dependencies] 1 required artifact is missing. [artifact:dependencies] [artifact:depen

Problem with Cassandra 2.1 and Spark 1.2.1

2015-02-23 Thread Bosung Seo
Hi all, I'm trying to use Spark and Cassandra. I have two datacenter in different regions on AWS, and tried ran simple table count program. However, I'm still getting * WARN TaskSchedulerImpl: Initial job has not accepted any resources; * , and Spark can't finish the processing. The test table

One node taking more resources than others in the ring

2015-02-23 Thread Jaydeep Chovatia
Hi, I have three node cluster with RF=1 (only one Datacenter) with following size: Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN4.02 GB1 33.3%RAC1 UN4.05 GB1

Re: One node taking more resources than others in the ring

2015-02-23 Thread Robert Coli
On Mon, Feb 23, 2015 at 3:42 PM, Jaydeep Chovatia < chovatia.jayd...@gmail.com> wrote: > I have created different tables and my test application reads/writes with > CL=QUORUM. Under load I found that my one node is taking more > resources (double CPU) than the other two. I have also verified that

Re: One node taking more resources than others in the ring

2015-02-23 Thread Jonathan Haddad
If you're not using prepared statements you won't get any token aware routing. That's an even better option than round robin since it reduces the number of nodes involved. On Mon, Feb 23, 2015 at 4:48 PM Robert Coli wrote: > On Mon, Feb 23, 2015 at 3:42 PM, Jaydeep Chovatia < > chovatia.jayd...@g

Re: One node taking more resources than others in the ring

2015-02-23 Thread Robert Coli
On Mon, Feb 23, 2015 at 5:18 PM, Jonathan Haddad wrote: > If you're not using prepared statements you won't get any token aware > routing. That's an even better option than round robin since it reduces the > number of nodes involved. Fair statement. Thrust of my comment is "don't send all conne

Efficient .net client for cassandra

2015-02-23 Thread Asit KAUSHIK
Hi All, We have been able to find our case specific full text which we are analyzing using Staratio Cassandra. It has modified secondary index api which uses lucene indices. The erformace also seems good to me . Still i wanted to ask you gurus 1) Has anybody used Startio and any drawbacks of it 2