Re: map reduce for Cassandra

2014-07-21 Thread Gaspar Muñoz
Check Stratio Deep This integration between spark and Cassandra is not based on the Cassandra's Hadoop interface. 2014-07-22 3:53 GMT+02:00 Marcelo Elias Del Valle : > Hi, > > >> But if you are only relying on memtables to sort writes, that seems like >>

Re: Authentication exception

2014-07-21 Thread Rahul Menon
I could you perhaps check your ntp? On Tue, Jul 22, 2014 at 3:35 AM, Jeremy Jongsma wrote: > I routinely get this exception from cqlsh on one of my clusters: > > cql.cassandra.ttypes.AuthenticationException: > AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException: >

Re: horizontal query scaling issues follow on

2014-07-21 Thread Diane Griffith
So I appreciate all the help so far. Upfront, it is possible the schema and data query pattern could be contributing to the problem. The schema was born out of certain design requirements. If it proves to be part of what makes the scalability crumble, then I hope it will help shape the design re

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Hi, > But if you are only relying on memtables to sort writes, that seems like a > pretty heavyweight reason to use Cassandra? Actually, it's not a reason to use Cassandra. I already use Cassandra and I need to map reduce data from it. I am trying to see a reason to use the conventional M/R too

Re: map reduce for Cassandra

2014-07-21 Thread Robert Coli
On Mon, Jul 21, 2014 at 5:45 PM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > Although several sstables (disk fragments) may have the same row key, > inside a single sstable row keys and column keys are indexed, right? > Otherwise, doing a GET in Cassandra would take some time. > Fr

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Hi Robert, First of all, thanks for answering. 2014-07-21 20:18 GMT-03:00 Robert Coli : > You're wrong, unless you're talking about insertion into a memtable, which > you probably aren't and which probably doesn't actually work that way > enough to be meaningful. > > On disk, Cassandra has immu

Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-21 Thread Karl Rieb
I did not include unit tests in my patch. I think many people did not run into this issue because many Cassandra clients handle the DateType when found as a CUSTOM type. -Karl > On Jul 21, 2014, at 8:26 PM, Robert Coli wrote: > >> On Mon, Jul 21, 2014 at 1:58 AM, Ben Hood <0x6e6...@gmail.com

Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-21 Thread Robert Coli
On Mon, Jul 21, 2014 at 1:58 AM, Ben Hood <0x6e6...@gmail.com> wrote: > On Sat, Jul 19, 2014 at 7:35 PM, Karl Rieb wrote: > > Can now be followed at: > > https://issues.apache.org/jira/browse/CASSANDRA-7576. > > Nice work! Finally we have a proper solution to this issue, so well done > to you. >

Re: horizontal query scaling issues follow on

2014-07-21 Thread Robert Coli
On Sun, Jul 20, 2014 at 6:12 PM, Diane Griffith wrote: > I am running tests again across different number of client threads and > number of nodes but this time I tweaked some of the timeouts configured for > the nodes in the cluster. I was able to get better performance on the > nodes at 10 clie

Re: TTransportException (java.net.SocketException: Broken pipe)

2014-07-21 Thread Robert Coli
On Mon, Jul 21, 2014 at 8:07 AM, Bhaskar Singhal wrote: > I have not seen the issue after changing the commit log segment size to > 1024MB. > Yes... your insanely over-huge commitlog will be contained in fewer files if you increase the size of segments that will not make it any less of an in

Re: map reduce for Cassandra

2014-07-21 Thread Robert Coli
On Mon, Jul 21, 2014 at 10:54 AM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > My understanding (please some correct me if I am wrong) is that when you > insert N items in a Cassandra CF, you are executing N binary searches to > insert the item already indexed by a key. When you rea

Authentication exception

2014-07-21 Thread Jeremy Jongsma
I routinely get this exception from cqlsh on one of my clusters: cql.cassandra.ttypes.AuthenticationException: AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 2 responses.') The system_auth keyspace is set to replicate X times

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Jonathan, By what I have read in the docs, Python API has some limitations yet, not being possible to use any hadoop binary input format. The python example for Cassandra is only in the master branch: https://github.com/apache/spark/blob/master/examples/src/main/python/cassandra_inputformat.py I

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
I haven't tried pyspark yet, but it's part of the distribution. My main language is Python too, so I intend on getting deep into it. On Mon, Jul 21, 2014 at 9:38 AM, Marcelo Elias Del Valle wrote: > Hi Jonathan, > > Do you know if this RDD can be used with Python? AFAIK, python + Cassandra > wil

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Hi Jonathan, Do you know if this RDD can be used with Python? AFAIK, python + Cassandra will be supported just in the next version, but I would like to be wrong... Best regards, Marcelo Valle. 2014-07-21 13:06 GMT-03:00 Jonathan Haddad : > Hey Marcelo, > > You should check out spark. It inte

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
Hey Marcelo, You should check out spark. It intelligently deals with a lot of the issues you're mentioning. Al Tobey did a walkthrough of how to set up the OSS side of things here: http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html It'll be less work than writing a M/

When will a node's host ID change?

2014-07-21 Thread John Sanda
Under what circumstances, if any, will a node's host ID change? - John

map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Hi, I have the need to executing a map/reduce job to identity data stored in Cassandra before indexing this data to Elastic Search. I have already used ColumnFamilyInputFormat (before start using CQL) to write hadoop jobs to do that, but I use to have a lot of troubles to perform tunning, as hado

Re: estimated row count for a pk range

2014-07-21 Thread tommaso barbugli
thank you for the reply; I was hoping for something with a bit less overhead than the first solution; the second is not really an option for me. On Monday, 21 July 2014, DuyHai Doan wrote: > 1) Use separate counter to count number of entries in each column family > but it will require you to man

Re: TTransportException (java.net.SocketException: Broken pipe)

2014-07-21 Thread Bhaskar Singhal
I have not seen the issue after changing the commit log segment size to 1024MB. tpstats output: Pool Name    Active   Pending  Completed   Blocked  All time blocked ReadStage 0 0  0 0      0 RequestResponseStage

Re: horizontal query scaling issues follow on

2014-07-21 Thread Jonathan Lacefield
Hello, Here is the documentation for cfhistograms, which is in microseconds. http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFhisto.html Your question about setting timeouts is subjective, but you have set your timeout limits to 4 mins, which seems excessive. The

RE: How to prevent writing to a Keyspace?

2014-07-21 Thread Lu, Boying
I see. Thanks a lot ☺ From: Vivek Mishra [mailto:mishra.v...@gmail.com] Sent: 2014年7月21日 14:16 To: user@cassandra.apache.org Subject: Re: How to prevent writing to a Keyspace? Create different user and assign role and privileges. Create a user like guest and grant select only to that user. That

Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-21 Thread Ben Hood
On Sat, Jul 19, 2014 at 7:35 PM, Karl Rieb wrote: > Can now be followed at: > https://issues.apache.org/jira/browse/CASSANDRA-7576. Nice work! Finally we have a proper solution to this issue, so well done to you.

Re: "ghost" table is breaking compactions and won't go away… even during a drop.

2014-07-21 Thread Philo Yang
In my experience, SSTable FileNotFoundException, not only caused by recreate a table but also other operations or even bug, cannot be solved by any nodetool command. However, restart the node for more than one time can make this Exception disappear. I don't know the reason but it does work... Tha