Re: C 2.1

2014-09-15 Thread James Briggs
Hi Ram. 1) As an Operations DBA, I consider all versions of Cassandra to be alpha. So whether you pick 2.0.10 or 2.1.0 doesn't really matter since you will have to do your own acceptance testing. 2) Data modelling is everything when it comes to a distributed database like Cassandra. You can read

Re: why bloom filter is only for row key?

2014-09-15 Thread Philo Yang
Thanks DuyHai, I think the trouble of bloom filter on all row keys & column names is memory usage. However, if a CF has only hundreds of columns per row, the number of total columns will be much fewer, so the bloom filter is possible for this condition, right? Is there a good way to adjust bloom

Re: why bloom filter is only for row key?

2014-09-15 Thread Robert Coli
On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang wrote: > After reading some docs, I find that bloom filter is built on row keys, > not on column key. Can anyone tell me what is considered for not building > bloom filter on column key? Is it a good idea to offer a table property > option between row

Re: C 2.1

2014-09-15 Thread Robert Coli
On Sat, Sep 13, 2014 at 3:49 PM, Ram N wrote: > Is 2.1 a production ready release? > https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ > Datastax Java driver - I get too confused with CQL and the underlying > storage model. I am also not clear on the indexing stru

Re: Quickly loading C* dataset into memory (row cache)

2014-09-15 Thread Robert Coli
On Sat, Sep 13, 2014 at 11:48 PM, Paulo Ricardo Motta Gomes < paulo.mo...@chaordicsystems.com> wrote: > Apparently Apple is using Cassandra as a massive multi-DC cache, as per > their announcement during the summit, but probably DSE with in-memory > enabled option. Would love to hear about similar

Re: why bloom filter is only for row key?

2014-09-15 Thread DuyHai Doan
Nice catch Rob On Mon, Sep 15, 2014 at 8:04 PM, Robert Coli wrote: > On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang wrote: > >> After reading some docs, I find that bloom filter is built on row keys, >> not on column key. Can anyone tell me what is considered for not building >> bloom filter on c

Re: C 2.1

2014-09-15 Thread Jack Krupansky
If you’re indexing and querying on that many columns (dozens, or more than a handful), consider DSE/Solr, especially if you need to query on multiple columns in the same query. -- Jack Krupansky From: Robert Coli Sent: Monday, September 15, 2014 11:07 AM To: user@cassandra.apache.org Subject:

Re: C 2.1

2014-09-15 Thread Ram N
Jack, Using Solr or an external search/indexing service is an option but increases the complexity of managing different systems. I am curious to understand the impact of having wide-rows on a separate CF for inverted index purpose which if I understand correctly is what Rob's response, having a se

Issues during Multi-DC setup across AWS regions + VPC setup

2014-09-15 Thread Dinesh Narayanan
We are trying to add new data center in us-east. Servers in each DC are running inside VPC. We currently have a cluster in us-west and all servers are running 2.0.7. The two DCs are talking via VPN. listen_address and broadcast_address have private ip. Our endpoint_snitch is GossipingPropertyFileSn

Cassandra, vnodes, and spark

2014-09-15 Thread Eric Plowe
Hello. http://stackoverflow.com/questions/19969329/why-not-enable-virtual-node-in-an-hadoop-node/19974621#19974621 Based on this stackoverflow question, vnodes effect the number of mappers Hadoop needs to spawn. Which in then affect performance. With the spark connector for cassandra would the s

Re: Cassandra, vnodes, and spark

2014-09-15 Thread Eric Plowe
Sorry. Trigger finger on the send. Would vnodes affect performance for spark in a similar fashion for spark. On Monday, September 15, 2014, Eric Plowe wrote: > Hello. > > > http://stackoverflow.com/questions/19969329/why-not-enable-virtual-node-in-an-hadoop-node/19974621#19974621 > > Based on t

Re: C 2.1

2014-09-15 Thread James Briggs
Ram, The reason secondary indexes are not recommended is that since they can't use the partition key, the values have to be fetched from all nodes. So you have higher latency, and likely timeouts. The C* solutions are: a) use a denormalized ("materialized") table b) use a clustered index if all

Re: Cassandra, vnodes, and spark

2014-09-15 Thread Eric Plowe
As hadoop* again sorry.. On Monday, September 15, 2014, Eric Plowe wrote: > Sorry. Trigger finger on the send. > > Would vnodes affect performance for spark in a similar fashion for spark. > > On Monday, September 15, 2014, Eric Plowe > wrote: > >> Hello. >> >> >> http://stackoverflow.com/quest

Re: 2.0.10 debian/ubuntu apt release?

2014-09-15 Thread Michael Shuler
On 09/12/2014 04:34 PM, Michael Shuler wrote: I'll have 2.0.10 deb/rpm packages in the repos on Monday, barring any issues. Just a quick update - I had a few issues with the Windows 2.0.10 release, which finally succeeded a few minutes ago, so I'll push to the repositories tomorrow, so we can

hs_err_pid3013.log, out of memory?

2014-09-15 Thread Yatong Zhang
Hi there, I just encountered an error which left a log '/hs_err_pid3013.log'. So is there a way to solve this? # > # There is insufficient memory for the Java Runtime Environment to > continue. > # Native memory allocation (malloc) failed to allocate 12288 bytes for > committing reserved memory.

Re: hs_err_pid3013.log, out of memory?

2014-09-15 Thread Robert Coli
On Mon, Sep 15, 2014 at 5:55 PM, Yatong Zhang wrote: > I just encountered an error which left a log '/hs_err_pid3013.log'. So is > there a way to solve this? > > # There is insufficient memory for the Java Runtime Environment to >> continue. >> # Native memory allocation (malloc) failed to alloca

Re: Cassandra, vnodes, and spark

2014-09-15 Thread Robert Coli
On Mon, Sep 15, 2014 at 4:57 PM, Eric Plowe wrote: > Based on this stackoverflow question, vnodes effect the number of mappers > Hadoop needs to spawn. Which in then affect performance. > > With the spark connector for cassandra would the same situation happen? > Would vnodes affect performance i

Re: C 2.1

2014-09-15 Thread Robert Coli
On Mon, Sep 15, 2014 at 1:34 PM, Ram N wrote: > Would be great to understand the design decision to go with present > implementation on Secondary Index when the alternative is better? Looking > at JIRAs is still confusing to come up with the why :) > http://mail-archives.apache.org/mod_mbox/incu

Re: hs_err_pid3013.log, out of memory?

2014-09-15 Thread Yatong Zhang
It's during the startup. I tried to upgrade cassandra from 2.0.7 to 2.0.10, but looks like cassandra could not start again. Also I found the following log at '/var/log/messages': Sep 16 09:06:59 storage6 kernel: INFO: task java:4971 blocked for more than > 120 seconds. > Sep 16 09:06:59 storage6 k

Re: Cassandra, vnodes, and spark

2014-09-15 Thread Eric Plowe
Interesting. The way I understand the spark connector is that it's basically a client executing a cql query and filling a spark rdd. Spark will then handle the partitioning of data. Again, this is my understanding, and it maybe incorrect. On Monday, September 15, 2014, Robert Coli wrote: > On Mo

Trying to understand cassandra gc logs

2014-09-15 Thread Donald Smith
I understand that cassandra uses ParNew GC for New Gen and CMS for Old Gen (tenured). I'm trying to interpret in the logs when a Full GC happens and what kind of Full GC is used. It never says "Full GC" or anything like that. But I see that whenever there's a line like 2014-09-15T18:04:1

Re: Cassandra, vnodes, and spark

2014-09-15 Thread DuyHai Doan
Look into the source code of the Spark connector. CassandraRDD try to find all token ranges (even when using vnodes) for each node (endpoint) and create RDD partition to match this distribution of token ranges. Thus data locality is guaranteed On Tue, Sep 16, 2014 at 4:39 AM, Eric Plowe wrote: >