See https://issues.apache.org/jira/browse/CASSANDRA-2388
On Wed, Aug 17, 2011 at 6:28 AM, Patrik Modesto <patrik.mode...@gmail.com> wrote: > Hi, > > while I was investigating this issue, I've found that hadoop+cassandra > don't work if you stop even just one node in the cluster. It doesn't > depend on RF. ColumnFamilyRecordReader gets list of nodes (acording > the RF) but chooses just the local host and if there is no cassandra > running localy it throws RuntimeError exception. Which in turn marks > the MapReduce task as failed. > > I've created a patch that makes ColumnFamilyRecordReader to try the > local node and if it fails tries the other nodes in it's list. The > patch is here http://pastebin.com/0RdQ0HMx I think attachements are > not allowed on this ML. > > Please test it and apply. It's for 0.7.8 version. > > Regards, > P. > > > On Wed, Aug 3, 2011 at 13:59, aaron morton <aa...@thelastpickle.com> wrote: >> If you want to take a look o.a.c.hadoop.ColumnFamilyRecordReader.getSplits() >> is the function that gets the splits. >> >> >> Cheers >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 3 Aug 2011, at 16:18, Patrik Modesto wrote: >> >>> On Tue, Aug 2, 2011 at 23:10, Jeremiah Jordan >>> <jeremiah.jor...@morningstar.com> wrote: >>>> If you have RF=1, taking one node down is going to cause 25% of your >>>> data to be unavailable. If you want to tolerate a machines going down >>>> you need to have at least RF=2, if you want to use quorum and have a >>>> machine go down, you need at least RF=3. >>> >>> I know I can have RF > 1 but I have limited resources and I don't care >>> lossing 25% of the data. RF > 1 basicaly means if a node goes down I >>> have the data elsewhere, but what I need is if node goes down just >>> ignore its range. I can handle it in my applications using thrift, but >>> the hadoop-mapreduce can't handle it. It just fails with "Exception in >>> thread "main" java.io.IOException: Could not get input splits". Is >>> there a way to say ignore this range to hadoop? >>> >>> Regards, >>> P. >> >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com