Re: Cassandra 1.20 with Cloudera Hadoop (CDH4) Compatibility Issue

2013-02-16 Thread Yang Song
Thanks Michael. I attached the reply I got back from CDH4 user group from Harsh. Hope to share the experience. " In CDH4, the MR1 and MR2 APIs are both fully compatible (such that moving to YARN in future would require no recompilation from MR1 produced jars). You can consider it "2.0" API in binar

Re: Cassandra 1.20 with Cloudera Hadoop (CDH4) Compatibility Issue

2013-02-16 Thread Edward Capriolo
Here is the deal. http://wiki.apache.org/hadoop/Defining%20Hadoop INAPPROPRIATE: Automotive Joe's Crankshaft: 100% compatible with Hadoop Bad, because "100% compatible" is a meaningless statement. Even Apache releases have regressions; cases were versions are incompatible *even when the Java int

Re: virtual nodes + map reduce = too many mappers

2013-02-16 Thread Edward Capriolo
No one had ever tried vnodes with hadoop until the OP did, or they would have noticed this. No one extensively used it with secondary indexes either from the last ticket I mentioned. My mistake they are not a default. I do think vnodes are awesome, its great that c* has the longer release cylcle.

Re: virtual nodes + map reduce = too many mappers

2013-02-16 Thread Eric Evans
On Sat, Feb 16, 2013 at 9:13 AM, Edward Capriolo wrote: > No one had ever tried vnodes with hadoop until the OP did, or they > would have noticed this. No one extensively used it with secondary > indexes either from the last ticket I mentioned. > > My mistake they are not a default. > > I do think

Re: Cassandra 1.20 with Cloudera Hadoop (CDH4) Compatibility Issue

2013-02-16 Thread Jeremy Hanna
Fwiw - here is are some changes that a friend said should make C*'s Hadoop support work with CDH4 - for ColumnFamilyRecordReader. https://gist.github.com/jeromatron/4967799 On Feb 16, 2013, at 8:23 AM, Edward Capriolo wrote: > Here is the deal. > > http://wiki.apache.org/hadoop/Defining%20Hado

Re: Size Tiered -> Leveled Compaction

2013-02-16 Thread Mike
Another piece of information that would be useful is advice on how to properly set the SSTable size for your usecase. I understand the default is 5MB, a lot of examples show the use of 10MB, and I've seen cases where people have set is as high as 200MB. Any information is appreciated, -Mike

Re: RuntimeException during leveled compaction

2013-02-16 Thread aaron morton
That sounds like something wrong with the way the rows are merged during compaction then. Can you run the compaction with DEBUG logging and raise a ticket? You may want to do this with the node not in the ring. Five minutes after it starts it will run pending compactions, so if there if compac

Re: Deleting old items

2013-02-16 Thread aaron morton
> Is that a feature that could possibly be developed one day ? No. Timestamps are essentially internal implementation used to resolve different values for the same column. > With "min_compaction_level_threshold" did you mean "min_compaction_threshold" > ? If so, why should I do that, what ar

Re: Deleting old items

2013-02-16 Thread Alain RODRIGUEZ
"Can you point to the docs." http://www.datastax.com/docs/1.1/configuration/storage_configuration#max-compaction-threshold And thanks about the rest of your answers, once again ;-). Alain 2013/2/16 aaron morton > Is that a feature that could possibly be developed one day ? > > No. > Timesta

Is there any consolidated literature about Read/Write and Data Consistency in Cassandra ?

2013-02-16 Thread Mateus Ferreira e Freitas
Like articles with tests and conclusions about it, and such, and not like the documentation in DataStax, or the Cassandra Books. Thank you.

Re: Is there any consolidated literature about Read/Write and Data Consistency in Cassandra ?

2013-02-16 Thread Edward Capriolo
Asking the question three times will not help getting it answered faster. Furthermore, I believe no one has answered it because no one understands what your asking. Here is something with tests and conclusions and it is not written by datastax or part of a book on cassandra. http://pbs.cs.berkele

RE: Is there any consolidated literature about Read/Write and Data Consistency in Cassandra ?

2013-02-16 Thread Mateus Ferreira e Freitas
I'm sorry, it's because the mail is returning to me, and I thought it wasn't working. That link is similar to what I asked. I'm searching like, for example, PhD,Dr,Ms dissertationsabout those topics in Cassandra; articles like http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2

Re: virtual nodes + map reduce = too many mappers

2013-02-16 Thread Jonathan Ellis
Wouldn't you have more than 256 splits anyway, given a normal amount of data? (Default split size is 64k rows.) On Fri, Feb 15, 2013 at 7:01 PM, Edward Capriolo wrote: > Seems like the hadoop Input format should combine the splits that are > on the same node into the same map task, like Hadoop's

Re: virtual nodes + map reduce = too many mappers

2013-02-16 Thread Edward Capriolo
Split size does not have to equal block size. http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html An abstract InputFormat that returns CombineFileSplit's in InputFormat.getSplits(JobConf, int) method. Splits are constructed from the files under the in

NPE in running "ClientOnlyExample"

2013-02-16 Thread Jain Rahul
Hi All, I am newbie to Cassandra and trying to run an example program "ClientOnlyExample" taken from https://raw.github.com/apache/cassandra/cassandra-1.2/examples/client_only/src/ClientOnlyExample.java. But while executing the program it gives me a null pointer exception. Can you guys pleas