Re: Deduplicating data on a node (RF=1)

2014-11-19 Thread Robert Coli
On Tue, Nov 18, 2014 at 10:04 AM, Alain Vandendorpe wrote: > Rob - thanks for that, I was wondering whether either of those would > successfully deduplicate the data. We were hypothesizing that a > decommission would merely stream the duplicates out as well as though they > were valid data - is t

Re: Deduplicating data on a node (RF=1)

2014-11-18 Thread Alain Vandendorpe
ith almost exactly 2x the expected disk >> usage. CQL returns correct results but we depend on the ability to directly >> read the SSTable files (hence also RF=1.) >> >> Would anyone have suggestions on a good way to resolve this? >> > > (If I understand correctl

Re: Deduplicating data on a node (RF=1)

2014-11-18 Thread Robert Coli
ead > the SSTable files (hence also RF=1.) > > Would anyone have suggestions on a good way to resolve this? > (If I understand correctly, the new node is now joined to the cluster; the below assumes this.) ** The simplest, slightly inconvenient, way, which temporarily reduces capacity :

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Eric Stevens
l Shuler wrote: > On 11/17/2014 07:20 PM, Jonathan Haddad wrote: > > If he deletes all the data with RF=1, won't he have data loss? > > Of course, ignore my quick answer, Alain. > > -- > Michael > >

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Michael Shuler
On 11/17/2014 07:20 PM, Jonathan Haddad wrote: If he deletes all the data with RF=1, won't he have data loss? Of course, ignore my quick answer, Alain. -- Michael

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Jonathan Haddad
If he deletes all the data with RF=1, won't he have data loss? On Mon Nov 17 2014 at 5:14:23 PM Michael Shuler wrote: > On 11/17/2014 02:04 PM, Alain Vandendorpe wrote: > > Hey all, > > > > For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. &g

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Michael Shuler
On 11/17/2014 02:04 PM, Alain Vandendorpe wrote: Hey all, For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. This is being moved away from ASAP. In the meantime, adding a node recently encountered a Stream Failed error (http://pastie.org/9725846). Cassandra restarted a

Deduplicating data on a node (RF=1)

2014-11-17 Thread Alain Vandendorpe
Hey all, For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. This is being moved away from ASAP. In the meantime, adding a node recently encountered a Stream Failed error (http://pastie.org/9725846). Cassandra restarted and it seemingly restarted streaming from zero, wi

Re: RF=1 w/ hadoop jobs

2011-09-05 Thread Mick Semb Wever
On Mon, 2011-09-05 at 21:52 +0200, Patrik Modesto wrote: > I'm not sure about 0.8.x and 0.7.9 (to be released today with your > patch) but 0.7.8 will fail even with RF>1 when there is Hadoop > TaskTracer without local Cassandra. So increasing RF is not a > solution. This is

Re: RF=1 w/ hadoop jobs

2011-09-05 Thread Patrik Modesto
an didn't change his mind, as the ticket was resolved "won't fix". So that's it. I'm not going to attach the patch until another core Cassandra developer is interested in the use-cases mentioned there. I'm not sure about 0.8.x and 0.7.9 (to be released today with yo

Re: RF=1 w/ hadoop jobs

2011-09-05 Thread Mick Semb Wever
he data to a binary files for later use. URL gets to a > Cassandra on user request (a pageview) so if we delete an URL, it gets > back quickly if the page is active. Because of that and because there > is lots of data, we have the keyspace set to RF=1. We can drop the > whole keyspac

Re: RF=1 w/ hadoop jobs

2011-09-02 Thread Patrik Modesto
and thematical data for each page and for exporting the data to a binary files for later use. URL gets to a Cassandra on user request (a pageview) so if we delete an URL, it gets back quickly if the page is active. Because of that and because there is lots of data, we have the keyspace set to RF=1.

Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Mick Semb Wever
On Fri, 2011-09-02 at 08:20 +0200, Patrik Modesto wrote: > As Jonathan > already explained himself: "ignoring unavailable ranges is a > misfeature, imo" Generally it's not what one would want i think. But I can see the case when data is to be treated volatile and ignoring unavailable ranges may b

Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Patrik Modesto
Hi, On Thu, Sep 1, 2011 at 12:36, Mck wrote: >> It's available here: http://pastebin.com/hhrr8m9P (for version 0.7.8) > > I'm interested in this patch and see it's usefulness but no one will act > until you attach it to an issue. (I think a new issue is appropriate > here). I'm glad someone is i

Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Mck
On Thu, 2011-08-18 at 08:54 +0200, Patrik Modesto wrote: > But there is the another problem with Hadoop-Cassandra, if there is no > node available for a range of keys, it fails on RuntimeError. For > example having a keyspace with RF=1 and a node is down all MapReduce > tasks fail.

Re: RF=1

2011-08-19 Thread Jonathan Ellis
ork on ignoring unavailabla ranges. >> >> But there is the another problem with Hadoop-Cassandra, if there is no >> node available for a range of keys, it fails on RuntimeError. For >> example having a keyspace with RF=1 and a node is down all MapReduce >> tasks fail. I

Re: RF=1

2011-08-19 Thread Patrik Modesto
problem > during my work on ignoring unavailabla ranges. > > But there is the another problem with Hadoop-Cassandra, if there is no > node available for a range of keys, it fails on RuntimeError. For > example having a keyspace with RF=1 and a node is down all MapReduce > tasks

Re: RF=1

2011-08-17 Thread Patrik Modesto
re is no node available for a range of keys, it fails on RuntimeError. For example having a keyspace with RF=1 and a node is down all MapReduce tasks fail. I've reworked my previous patch, that was addressing this issue and now there are ConfigHelper methods for enable/disable ignoring unava

Re: RF=1

2011-08-17 Thread Jonathan Ellis
elance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 3 Aug 2011, at 16:18, Patrik Modesto wrote: >> >>> On Tue, Aug 2, 2011 at 23:10, Jeremiah Jordan >>> wrote: >>>> If you have RF=1, taking one node down

Re: RF=1

2011-08-17 Thread Patrik Modesto
And one more patch: http://pastebin.com/zfNPjtQz This one handles a case where there are no nodes available for a slice. For example where the is a keyspace with RF=1 and a node is shut down. Its range of keys gets ignored. Regards, P. On Wed, Aug 17, 2011 at 13:28, Patrik Modesto wrote: >

Re: RF=1

2011-08-17 Thread Patrik Modesto
gt; Cheers > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 3 Aug 2011, at 16:18, Patrik Modesto wrote: > >> On Tue, Aug 2, 2011 at 23:10, Jeremiah Jordan >> wrote: >>> If you have RF=1, t

Re: RF=1

2011-08-03 Thread aaron morton
g 2, 2011 at 23:10, Jeremiah Jordan > wrote: >> If you have RF=1, taking one node down is going to cause 25% of your >> data to be unavailable. If you want to tolerate a machines going down >> you need to have at least RF=2, if you want to use quorum and have a >> machine g

Re: RF=1

2011-08-02 Thread Patrik Modesto
On Tue, Aug 2, 2011 at 23:10, Jeremiah Jordan wrote: > If you have RF=1, taking one node down is going to cause 25% of your > data to be unavailable.  If you want to tolerate a machines going down > you need to have at least RF=2, if you want to use quorum and have a > machine go do

Re: RF=1

2011-08-02 Thread Jeremiah Jordan
If you have RF=1, taking one node down is going to cause 25% of your data to be unavailable. If you want to tolerate a machines going down you need to have at least RF=2, if you want to use quorum and have a machine go down, you need at least RF=3. On Tue, 2011-08-02 at 16:22 +0200, Patrik

Re: RF=1

2011-08-02 Thread Paul Loy
this is expected behaviour. Either increase RF or do a nodetool decommission on a node to remove it from the ring. On Tue, Aug 2, 2011 at 3:22 PM, Patrik Modesto wrote: > Hi all! > > I've a test cluster of 4 nodes running cassandra 0.7.8, with one > keyspace with RF=1, each nod

RF=1

2011-08-02 Thread Patrik Modesto
Hi all! I've a test cluster of 4 nodes running cassandra 0.7.8, with one keyspace with RF=1, each node owns 25% of the data. As long as all nodes are alive, there is no problem, but when I shut down just one node I get UnavailableException in my application. cassandra-cli returns "