Re: RF=1 w/ hadoop jobs

2011-09-05 Thread Mick Semb Wever
On Mon, 2011-09-05 at 21:52 +0200, Patrik Modesto wrote: > I'm not sure about 0.8.x and 0.7.9 (to be released today with your > patch) but 0.7.8 will fail even with RF>1 when there is Hadoop > TaskTracer without local Cassandra. So increasing RF is not a > solution. This isn't true (or not the in

Re: RF=1 w/ hadoop jobs

2011-09-05 Thread Patrik Modesto
On Mon, Sep 5, 2011 at 09:39, Mick Semb Wever wrote: > I've entered a jira issue covering this request. > https://issues.apache.org/jira/browse/CASSANDRA-3136 > > Would you mind attaching your patch to the issue. > (No review of it will happen anywhere else.) I see Jonathan didn't change his mind

Re: RF=1 w/ hadoop jobs

2011-09-05 Thread Mick Semb Wever
On Fri, 2011-09-02 at 09:28 +0200, Patrik Modesto wrote: > We use Cassandra as a storage for web-pages, we store the HTML, all > URLs that has the same HTML data and some computed data. We run Hadoop > MR jobs to compute lexical and thematical data for each page and for > exporting the data to a bi

Re: RF=1 w/ hadoop jobs

2011-09-02 Thread Patrik Modesto
On Fri, Sep 2, 2011 at 08:54, Mick Semb Wever wrote: > Patrik: is it possible to describe the use-case you have here? Sure. We use Cassandra as a storage for web-pages, we store the HTML, all URLs that has the same HTML data and some computed data. We run Hadoop MR jobs to compute lexical and th

Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Mick Semb Wever
On Fri, 2011-09-02 at 08:20 +0200, Patrik Modesto wrote: > As Jonathan > already explained himself: "ignoring unavailable ranges is a > misfeature, imo" Generally it's not what one would want i think. But I can see the case when data is to be treated volatile and ignoring unavailable ranges may b

Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Patrik Modesto
Hi, On Thu, Sep 1, 2011 at 12:36, Mck wrote: >> It's available here: http://pastebin.com/hhrr8m9P (for version 0.7.8) > > I'm interested in this patch and see it's usefulness but no one will act > until you attach it to an issue. (I think a new issue is appropriate > here). I'm glad someone is i

Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Mck
On Thu, 2011-08-18 at 08:54 +0200, Patrik Modesto wrote: > But there is the another problem with Hadoop-Cassandra, if there is no > node available for a range of keys, it fails on RuntimeError. For > example having a keyspace with RF=1 and a node is down all MapReduce > tasks fail. CASSANDRA-2388

Re: RF=1

2011-08-19 Thread Jonathan Ellis
(a) this really isn't the right forum to review patches; I've pointed out the relevant jira ticket (b) ignoring unavailable ranges is a misfeature, imo On Fri, Aug 19, 2011 at 8:11 AM, Patrik Modesto wrote: > Is there really no interest in the patch? > > P. > > On Thu, Aug 18, 2011 at 08:54, Pat

Re: RF=1

2011-08-19 Thread Patrik Modesto
Is there really no interest in the patch? P. On Thu, Aug 18, 2011 at 08:54, Patrik Modesto wrote: > On Wed, Aug 17, 2011 at 17:08, Jonathan Ellis wrote: >> See https://issues.apache.org/jira/browse/CASSANDRA-2388 > > Ok, thanks for the JIRA ticker. I've found that very same problem > during my

Re: RF=1

2011-08-17 Thread Patrik Modesto
On Wed, Aug 17, 2011 at 17:08, Jonathan Ellis wrote: > See https://issues.apache.org/jira/browse/CASSANDRA-2388 Ok, thanks for the JIRA ticker. I've found that very same problem during my work on ignoring unavailabla ranges. But there is the another problem with Hadoop-Cassandra, if there is no

Re: RF=1

2011-08-17 Thread Jonathan Ellis
See https://issues.apache.org/jira/browse/CASSANDRA-2388 On Wed, Aug 17, 2011 at 6:28 AM, Patrik Modesto wrote: > Hi, > > while I was investigating this issue, I've found that hadoop+cassandra > don't work if you stop even just one node in the cluster. It doesn't > depend on RF. ColumnFamilyRecor

Re: RF=1

2011-08-17 Thread Patrik Modesto
And one more patch: http://pastebin.com/zfNPjtQz This one handles a case where there are no nodes available for a slice. For example where the is a keyspace with RF=1 and a node is shut down. Its range of keys gets ignored. Regards, P. On Wed, Aug 17, 2011 at 13:28, Patrik Modesto wrote: > Hi, >

Re: RF=1

2011-08-17 Thread Patrik Modesto
Hi, while I was investigating this issue, I've found that hadoop+cassandra don't work if you stop even just one node in the cluster. It doesn't depend on RF. ColumnFamilyRecordReader gets list of nodes (acording the RF) but chooses just the local host and if there is no cassandra running localy it

Re: RF=1

2011-08-03 Thread aaron morton
If you want to take a look o.a.c.hadoop.ColumnFamilyRecordReader.getSplits() is the function that gets the splits. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 3 Aug 2011, at 16:18, Patrik Modesto wrote: > On Tue, Aug 2, 201

Re: RF=1

2011-08-02 Thread Patrik Modesto
On Tue, Aug 2, 2011 at 23:10, Jeremiah Jordan wrote: > If you have RF=1, taking one node down is going to cause 25% of your > data to be unavailable.  If you want to tolerate a machines going down > you need to have at least RF=2, if you want to use quorum and have a > machine go down, you need at

Re: RF=1

2011-08-02 Thread Jeremiah Jordan
If you have RF=1, taking one node down is going to cause 25% of your data to be unavailable. If you want to tolerate a machines going down you need to have at least RF=2, if you want to use quorum and have a machine go down, you need at least RF=3. On Tue, 2011-08-02 at 16:22 +0200, Patrik Modest

Re: RF=1

2011-08-02 Thread Paul Loy
this is expected behaviour. Either increase RF or do a nodetool decommission on a node to remove it from the ring. On Tue, Aug 2, 2011 at 3:22 PM, Patrik Modesto wrote: > Hi all! > > I've a test cluster of 4 nodes running cassandra 0.7.8, with one > keyspace with RF=1, each node owns 25% of the d