Command Request: rename a column
I think it would be really cool to be able to rename a column, or, more generally, a move command to move data from one column to another in the same CF without the client having to read and resend the column value. This would be *extremely* powerful, imo. I suspect the execution would be quick and could even be made atomic (per node) as I suspect it would mostly entail only reference updates. Has anything like this been discussed before? Seems like such a natural operation for a hash-table-like data store. aj
Re: Command Request: rename a column
On Fri, Jul 8, 2011 at 9:22 AM, AJ wrote: > > I think it would be really cool to be able to rename a column, or, more > generally, a move command to move data from one column to another in the > same CF without the client having to read and resend the column value. This > would be extremely powerful, imo. I suspect the execution would be quick > and could even be made atomic (per node) as I suspect it would mostly entail > only reference updates. Cassandra don't work like that. We would have no other choice than to read the column and write it back with a different name (and it would not be atomic). So the only win we would get from doing this server side would lie in not transferring the value across the network. -- Sylvain
Re: Command Request: rename a column
On 7/8/2011 2:18 AM, Sylvain Lebresne wrote: On Fri, Jul 8, 2011 at 9:22 AM, AJ wrote: I think it would be really cool to be able to rename a column, or, more generally, a move command to move data from one column to another in the same CF without the client having to read and resend the column value. This would be extremely powerful, imo. I suspect the execution would be quick and could even be made atomic (per node) as I suspect it would mostly entail only reference updates. Cassandra don't work like that. We would have no other choice than to read the column and write it back with a different name I figured as much :) Not that bad though. (and it would not be atomic). So the only win we would get from doing this server side would lie in not transferring the value across the network. That would be the main benefit I think, esp with large values. -- Sylvain
Re: What does a write lock ?
Questions like this seem to come up a lot: http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html Lets say you read state A (from one key in one CF), you change the data to A' in your client, and you write A'. Are you worried that someone else might have changed A to B during this process (making the "new" state a race between A' and B)? It doesn't sound to me like you are... It sounds to me like you're worried about a set of columns for the key being in a consistent state before, during, and after a process. And A -> A' and A -> B will each be atomic for the key (based on my understanding). But, if A' and B are changes to a different set of columns, I believe that would interleave, which itself could be "inconsistent" from your application's point of view. will On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman wrote: > Really, as i lay in the bath thinking nabout it, I concluded what I am > looking for is a very limited form of Consistency. > > Its consistency over a single row on a single node just for the period of > update. > > > On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote: > >> Its not really isolation, btw, because we >> arent talking about anyone seeing an update mid-update.Rather, we >> are talking about when updates are allowed to occur. >> >> Atomicity means that all the updates happen together or they don't happen >> at all. >> Isolation means that no results of the update are visible until the entire >> update operation is complete. >> >> This really lies somewhere in the middle of the two concepts. Its part >> of the results of the combined effects of ACID >> >> >> On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote: >> >>> Sounds to me like you're confusing atomicity with isolation. >>> >>> On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman >>> wrote: >>> > Yup, im even more confused.Lets talk about the model, not the >>> > implementation. >>> > AIUI updates to a row are atomic across all columns in that row at >>> once, >>> > true? >>> > If true then the next question is, does the validation happen inside or >>> > outside of that guarantee, and is the row guaranteed not to change >>> between >>> > validation and update? >>> > If that is *not* the case then it makes a whole class of solutions to >>> > synchronization problems fail and puts my larger project >>> > in serious question. >>> > >>> > On Thu, Jul 7, 2011 at 3:43 PM, Yang wrote: >>> >> >>> >> no , the memtable is a concurrentskiplistmap >>> >> >>> >> insertion can happen in parallel >>> >> >>> >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" wrote: >>> >> > This has me more confused. >>> >> > >>> >> > Does this mean that ALL rows on a given node are only updated >>> >> > sequentially, >>> >> > never in parallel? >>> >> > >>> >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang wrote: >>> >> > >>> >> >> just to add onto what jonathan said >>> >> >> >>> >> >> the columns are immutable . if u overwrite/ reconcile a new obj is >>> >> >> created and shoved into the memtable >>> >> >> >>> >> >> there is a shared lock for all writes though which guard against an >>> >> >> exclusive lock on memtable switching/flushing >>> >> >> On Jul 7, 2011 7:51 AM, "A J" wrote: >>> >> >> > Does a write lock: >>> >> >> > 1. Just the columns in question for the specific row in question >>> ? >>> >> >> > 2. The full row in question ? >>> >> >> > 3. The full CF ? >>> >> >> > >>> >> >> > I doubt read does any locks. >>> >> >> > >>> >> >> > Thanks. >>> >> >> >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > It's always darkest just before you are eaten by a grue. >>> > >>> > >>> > >>> > -- >>> > It's always darkest just before you are eaten by a grue. >>> > >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> >> >> >> -- >> It's always darkest just before you are eaten by a grue. >> > > > > -- > It's always darkest just before you are eaten by a grue. >
Re: What does a write lock ?
Disregard most of my post (already). I forgot that reads aren't isolated. That means A and B are states cassandra will *eventually* be in, but at any point in time a read might see a "partial B" (where some columns are still A, and others are B). Though, I'm sure someone else will confirm if I'm wrong yet again. For me, if I need two pieces of data to be consistently related to each other and stored in cassandra, I encode them (usually JSON) and store them in one column. will On Fri, Jul 8, 2011 at 8:30 AM, William Oberman wrote: > Questions like this seem to come up a lot: > > http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no > > http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily > http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html > > Lets say you read state A (from one key in one CF), you change the data to > A' in your client, and you write A'. Are you worried that someone else > might have changed A to B during this process (making the "new" state a race > between A' and B)? It doesn't sound to me like you are... It sounds to me > like you're worried about a set of columns for the key being in a consistent > state before, during, and after a process. And A -> A' and A -> B will each > be atomic for the key (based on my understanding). But, if A' and B are > changes to a different set of columns, I believe that would interleave, > which itself could be "inconsistent" from your application's point of view. > > > will > > On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman wrote: > >> Really, as i lay in the bath thinking nabout it, I concluded what I am >> looking for is a very limited form of Consistency. >> >> Its consistency over a single row on a single node just for the period of >> update. >> >> >> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote: >> >>> Its not really isolation, btw, because we >>> arent talking about anyone seeing an update mid-update.Rather, we >>> are talking about when updates are allowed to occur. >>> >>> Atomicity means that all the updates happen together or they don't happen >>> at all. >>> Isolation means that no results of the update are visible until the >>> entire update operation is complete. >>> >>> This really lies somewhere in the middle of the two concepts. Its part >>> of the results of the combined effects of ACID >>> >>> >>> On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote: >>> Sounds to me like you're confusing atomicity with isolation. On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman wrote: > Yup, im even more confused.Lets talk about the model, not the > implementation. > AIUI updates to a row are atomic across all columns in that row at once, > true? > If true then the next question is, does the validation happen inside or > outside of that guarantee, and is the row guaranteed not to change between > validation and update? > If that is *not* the case then it makes a whole class of solutions to > synchronization problems fail and puts my larger project > in serious question. > > On Thu, Jul 7, 2011 at 3:43 PM, Yang wrote: >> >> no , the memtable is a concurrentskiplistmap >> >> insertion can happen in parallel >> >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" wrote: >> > This has me more confused. >> > >> > Does this mean that ALL rows on a given node are only updated >> > sequentially, >> > never in parallel? >> > >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang wrote: >> > >> >> just to add onto what jonathan said >> >> >> >> the columns are immutable . if u overwrite/ reconcile a new obj is >> >> created and shoved into the memtable >> >> >> >> there is a shared lock for all writes though which guard against an >> >> exclusive lock on memtable switching/flushing >> >> On Jul 7, 2011 7:51 AM, "A J" wrote: >> >> > Does a write lock: >> >> > 1. Just the columns in question for the specific row in question ? >> >> > 2. The full row in question ? >> >> > 3. The full CF ? >> >> > >> >> > I doubt read does any locks. >> >> > >> >> > Thanks. >> >> >> > >> > >> > >> > -- >> > It's always darkest just before you are eaten by a grue. > > > > -- > It's always darkest just before you are eaten by a grue. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com >>> >>> >>> >>> -- >>> It's always darkest just before you are eaten by a grue. >>> >> >> >> >> -- >> It's always darkest just before you are eaten by a grue. >> > > > >
Re: What does a write lock ?
Not quite, its more limited and specific The order of operations is all within the Cassandra node server and looks like this this... We have one row, A. Thats the only row being operated on. Client -> submits A' Server does the following: (1) Validate function reads current A (2) Validate function validates A' vs. A (3) If validation succeeds, allows update to A'. My fear/concern is that after 1 and before 3, a second update to A'' comes in and changes the "current" value of A, therefor invalidating my validation check, see? If Cassandra does not guard against this then one possible solution would be to make my own key-to-mutex map in memory, lock the mutex for A's key as a precursor to (1) and release it in a post-update function. But I am always very nervous about inserting locking into a process that wasn't designed with it already in mind... On Fri, Jul 8, 2011 at 8:30 AM, William Oberman wrote: > Questions like this seem to come up a lot: > > http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no > > http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily > http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html > > Lets say you read state A (from one key in one CF), you change the data to > A' in your client, and you write A'. Are you worried that someone else > might have changed A to B during this process (making the "new" state a race > between A' and B)? It doesn't sound to me like you are... It sounds to me > like you're worried about a set of columns for the key being in a consistent > state before, during, and after a process. And A -> A' and A -> B will each > be atomic for the key (based on my understanding). But, if A' and B are > changes to a different set of columns, I believe that would interleave, > which itself could be "inconsistent" from your application's point of view. > > > will > > On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman wrote: > >> Really, as i lay in the bath thinking nabout it, I concluded what I am >> looking for is a very limited form of Consistency. >> >> Its consistency over a single row on a single node just for the period of >> update. >> >> >> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote: >> >>> Its not really isolation, btw, because we >>> arent talking about anyone seeing an update mid-update.Rather, we >>> are talking about when updates are allowed to occur. >>> >>> Atomicity means that all the updates happen together or they don't happen >>> at all. >>> Isolation means that no results of the update are visible until the >>> entire update operation is complete. >>> >>> This really lies somewhere in the middle of the two concepts. Its part >>> of the results of the combined effects of ACID >>> >>> >>> On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote: >>> Sounds to me like you're confusing atomicity with isolation. On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman wrote: > Yup, im even more confused.Lets talk about the model, not the > implementation. > AIUI updates to a row are atomic across all columns in that row at once, > true? > If true then the next question is, does the validation happen inside or > outside of that guarantee, and is the row guaranteed not to change between > validation and update? > If that is *not* the case then it makes a whole class of solutions to > synchronization problems fail and puts my larger project > in serious question. > > On Thu, Jul 7, 2011 at 3:43 PM, Yang wrote: >> >> no , the memtable is a concurrentskiplistmap >> >> insertion can happen in parallel >> >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" wrote: >> > This has me more confused. >> > >> > Does this mean that ALL rows on a given node are only updated >> > sequentially, >> > never in parallel? >> > >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang wrote: >> > >> >> just to add onto what jonathan said >> >> >> >> the columns are immutable . if u overwrite/ reconcile a new obj is >> >> created and shoved into the memtable >> >> >> >> there is a shared lock for all writes though which guard against an >> >> exclusive lock on memtable switching/flushing >> >> On Jul 7, 2011 7:51 AM, "A J" wrote: >> >> > Does a write lock: >> >> > 1. Just the columns in question for the specific row in question ? >> >> > 2. The full row in question ? >> >> > 3. The full CF ? >> >> > >> >> > I doubt read does any locks. >> >> > >> >> > Thanks. >> >> >> > >> > >> > >> > -- >> > It's always darkest just before you are eaten by a grue. > > > > -- > It's always darkest just before you are eaten by a grue. >
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
I think node repair involves some compaction too. See the issue: https://issues.apache.org/jira/browse/CASSANDRA-2811 It talks of 'validation compaction' being triggered concurrently during node repair. On Thu, Jun 30, 2011 at 8:51 PM, Watanabe Maki wrote: > Repair doesn't compact. Those are different processes already. > > maki > > > On 2011/07/01, at 7:21, A J wrote: > >> Thanks all ! >> In other words, I think it is safe to say that a node as a whole can >> be made consistent only on 'nodetool repair'. >> >> Has there been enough interest in providing anti-entropy without >> compaction as a separate operation (nodetool repair does both) ? >> >> >> On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: >>> On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo >>> wrote: Read repair does NOT repair tombstones. >>> >>> It does, but you can't rely on RR to repair _all_ tombstones, because >>> RR only happens if the row in question is requested by a client. >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >
Re: What does a write lock ?
It doesn't look like that at all. Row A exists. Client submits mutation Am. This is not necessarily a full row. Coordinator validates Am. If validation succeeds, coordinator sends Am to the replica owners, effectively creating A'. Neither A nor A' is ever explicitly assembled on the write path. On Fri, Jul 8, 2011 at 9:22 AM, Jeffrey Kesselman wrote: > Not quite, its more limited and specific > The order of operations is all within the Cassandra node server and looks > like this this... > We have one row, A. Thats the only row being operated on. > Client -> submits A' > Server does the following: > (1) Validate function reads current A > (2) Validate function validates A' vs. A > (3) If validation succeeds, allows update to A'. > My fear/concern is that after 1 and before 3, a second update to A'' comes > in and changes the "current" value of A, therefor invalidating my > validation check, see? > If Cassandra does not guard against this then one possible solution would be > to make my own key-to-mutex map in memory, lock the mutex for A's key as a > precursor to (1) and release it in a post-update function. But I am always > very nervous about inserting locking into a process that wasn't designed > with it already in mind... > > On Fri, Jul 8, 2011 at 8:30 AM, William Oberman > wrote: >> >> Questions like this seem to come up a lot: >> >> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no >> >> http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily >> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html >> Lets say you read state A (from one key in one CF), you change the data to >> A' in your client, and you write A'. Are you worried that someone else >> might have changed A to B during this process (making the "new" state a race >> between A' and B)? It doesn't sound to me like you are... It sounds to me >> like you're worried about a set of columns for the key being in a consistent >> state before, during, and after a process. And A -> A' and A -> B will each >> be atomic for the key (based on my understanding). But, if A' and B are >> changes to a different set of columns, I believe that would interleave, >> which itself could be "inconsistent" from your application's point of view. >> >> will >> >> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman >> wrote: >>> >>> Really, as i lay in the bath thinking nabout it, I concluded what I am >>> looking for is a very limited form of Consistency. >>> Its consistency over a single row on a single node just for the period of >>> update. >>> >>> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman >>> wrote: Its not really isolation, btw, because we arent talking about anyone seeing an update mid-update. Rather, we are talking about when updates are allowed to occur. Atomicity means that all the updates happen together or they don't happen at all. Isolation means that no results of the update are visible until the entire update operation is complete. This really lies somewhere in the middle of the two concepts. Its part of the results of the combined effects of ACID On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote: > > Sounds to me like you're confusing atomicity with isolation. > > On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman > wrote: > > Yup, im even more confused. Lets talk about the model, not the > > implementation. > > AIUI updates to a row are atomic across all columns in that row at > > once, > > true? > > If true then the next question is, does the validation happen inside > > or > > outside of that guarantee, and is the row guaranteed not to change > > between > > validation and update? > > If that is *not* the case then it makes a whole class of solutions to > > synchronization problems fail and puts my larger project > > in serious question. > > > > On Thu, Jul 7, 2011 at 3:43 PM, Yang wrote: > >> > >> no , the memtable is a concurrentskiplistmap > >> > >> insertion can happen in parallel > >> > >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" > >> wrote: > >> > This has me more confused. > >> > > >> > Does this mean that ALL rows on a given node are only updated > >> > sequentially, > >> > never in parallel? > >> > > >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang > >> > wrote: > >> > > >> >> just to add onto what jonathan said > >> >> > >> >> the columns are immutable . if u overwrite/ reconcile a new obj > >> >> is > >> >> created and shoved into the memtable > >> >> > >> >> there is a shared lock for all writes though which guard against > >> >> an > >> >> exclusive lock on memtable switching/flushing > >> >> On Jul 7, 2011 7:51 A
Re: What does a write lock ?
I think you need to look into Zookeeper, or other distributed coordinator, as you have little/no guarantees from cassandra between 1-3 (in terms of the guarantees you want and need). And my terminology in my post is different than yours. My "client" == your "server". Specifically, I was thinking in terms of: user -> cassandra client code (that runs on a "server") -> cassandra server code (e.g. cassandra itself) that runs either on the same or different server On Fri, Jul 8, 2011 at 10:22 AM, Jeffrey Kesselman wrote: > Not quite, its more limited and specific > > The order of operations is all within the Cassandra node server and looks > like this this... > > We have one row, A. Thats the only row being operated on. > > Client -> submits A' > Server does the following: > (1) Validate function reads current A > (2) Validate function validates A' vs. A > (3) If validation succeeds, allows update to A'. > > My fear/concern is that after 1 and before 3, a second update to A'' comes > in and changes the "current" value of A, therefor invalidating my > validation check, see? > > If Cassandra does not guard against this then one possible > solution would be to make my own key-to-mutex map in memory, lock the mutex > for A's key as a precursor to (1) and release it in a post-update function. > But I am always very nervous about inserting locking into a process that > wasn't designed with it already in mind... > > > On Fri, Jul 8, 2011 at 8:30 AM, William Oberman > wrote: > >> Questions like this seem to come up a lot: >> >> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no >> >> http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily >> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html >> >> Lets say you read state A (from one key in one CF), you change the data to >> A' in your client, and you write A'. Are you worried that someone else >> might have changed A to B during this process (making the "new" state a race >> between A' and B)? It doesn't sound to me like you are... It sounds to me >> like you're worried about a set of columns for the key being in a consistent >> state before, during, and after a process. And A -> A' and A -> B will each >> be atomic for the key (based on my understanding). But, if A' and B are >> changes to a different set of columns, I believe that would interleave, >> which itself could be "inconsistent" from your application's point of view. >> >> >> will >> >> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman wrote: >> >>> Really, as i lay in the bath thinking nabout it, I concluded what I am >>> looking for is a very limited form of Consistency. >>> >>> Its consistency over a single row on a single node just for the period of >>> update. >>> >>> >>> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote: >>> Its not really isolation, btw, because we arent talking about anyone seeing an update mid-update.Rather, we are talking about when updates are allowed to occur. Atomicity means that all the updates happen together or they don't happen at all. Isolation means that no results of the update are visible until the entire update operation is complete. This really lies somewhere in the middle of the two concepts. Its part of the results of the combined effects of ACID On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote: > Sounds to me like you're confusing atomicity with isolation. > > On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman > wrote: > > Yup, im even more confused.Lets talk about the model, not the > > implementation. > > AIUI updates to a row are atomic across all columns in that row at > once, > > true? > > If true then the next question is, does the validation happen inside > or > > outside of that guarantee, and is the row guaranteed not to change > between > > validation and update? > > If that is *not* the case then it makes a whole class of solutions to > > synchronization problems fail and puts my larger project > > in serious question. > > > > On Thu, Jul 7, 2011 at 3:43 PM, Yang wrote: > >> > >> no , the memtable is a concurrentskiplistmap > >> > >> insertion can happen in parallel > >> > >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" > wrote: > >> > This has me more confused. > >> > > >> > Does this mean that ALL rows on a given node are only updated > >> > sequentially, > >> > never in parallel? > >> > > >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang > wrote: > >> > > >> >> just to add onto what jonathan said > >> >> > >> >> the columns are immutable . if u overwrite/ reconcile a new obj > is > >> >> created and shoved into the memtable > >> >> > >> >> there
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
that's an internal term meaning "background i/o," not sstable merging per se. On Fri, Jul 8, 2011 at 9:24 AM, A J wrote: > I think node repair involves some compaction too. See the issue: > https://issues.apache.org/jira/browse/CASSANDRA-2811 > It talks of 'validation compaction' being triggered concurrently > during node repair. > > On Thu, Jun 30, 2011 at 8:51 PM, Watanabe Maki > wrote: >> Repair doesn't compact. Those are different processes already. >> >> maki >> >> >> On 2011/07/01, at 7:21, A J wrote: >> >>> Thanks all ! >>> In other words, I think it is safe to say that a node as a whole can >>> be made consistent only on 'nodetool repair'. >>> >>> Has there been enough interest in providing anti-entropy without >>> compaction as a separate operation (nodetool repair does both) ? >>> >>> >>> On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo wrote: > Read repair does NOT repair tombstones. It does, but you can't rely on RR to repair _all_ tombstones, because RR only happens if the row in question is requested by a client. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: What does a write lock ?
Also, one point of early confusion for me is there is a slightly different definition of "atomicity" depending on if your talking software vs. database, and I'm a "software guy". From wikipedia: Software = Atomicity is a guarantee of isolation from concurrent processes. Additionally, atomic operations commonly have a succeed-or-fail definition — they either successfully change the state of the system, or have no visible effect. Database = In an atomic transaction, a series of database operations either all occur, or nothing occurs. I believe that cassandra is using the database definition. will On Fri, Jul 8, 2011 at 10:35 AM, William Oberman wrote: > I think you need to look into Zookeeper, or other distributed coordinator, > as you have little/no guarantees from cassandra between 1-3 (in terms of the > guarantees you want and need). > > And my terminology in my post is different than yours. My "client" == your > "server". Specifically, I was thinking in terms of: > user -> cassandra client code (that runs on a "server") -> cassandra server > code (e.g. cassandra itself) that runs either on the same or different > server > > > > On Fri, Jul 8, 2011 at 10:22 AM, Jeffrey Kesselman wrote: > >> Not quite, its more limited and specific >> >> The order of operations is all within the Cassandra node server and looks >> like this this... >> >> We have one row, A. Thats the only row being operated on. >> >> Client -> submits A' >> Server does the following: >> (1) Validate function reads current A >> (2) Validate function validates A' vs. A >> (3) If validation succeeds, allows update to A'. >> >> My fear/concern is that after 1 and before 3, a second update to A'' comes >> in and changes the "current" value of A, therefor invalidating my >> validation check, see? >> >> If Cassandra does not guard against this then one possible >> solution would be to make my own key-to-mutex map in memory, lock the mutex >> for A's key as a precursor to (1) and release it in a post-update function. >> But I am always very nervous about inserting locking into a process that >> wasn't designed with it already in mind... >> >> >> On Fri, Jul 8, 2011 at 8:30 AM, William Oberman > > wrote: >> >>> Questions like this seem to come up a lot: >>> >>> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no >>> >>> http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily >>> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html >>> >>> Lets say you read state A (from one key in one CF), you change the data >>> to A' in your client, and you write A'. Are you worried that someone else >>> might have changed A to B during this process (making the "new" state a race >>> between A' and B)? It doesn't sound to me like you are... It sounds to me >>> like you're worried about a set of columns for the key being in a consistent >>> state before, during, and after a process. And A -> A' and A -> B will each >>> be atomic for the key (based on my understanding). But, if A' and B are >>> changes to a different set of columns, I believe that would interleave, >>> which itself could be "inconsistent" from your application's point of view. >>> >>> >>> will >>> >>> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman wrote: >>> Really, as i lay in the bath thinking nabout it, I concluded what I am looking for is a very limited form of Consistency. Its consistency over a single row on a single node just for the period of update. On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote: > Its not really isolation, btw, because we > arent talking about anyone seeing an update mid-update.Rather, we > are talking about when updates are allowed to occur. > > Atomicity means that all the updates happen together or they don't > happen at all. > Isolation means that no results of the update are visible until the > entire update operation is complete. > > This really lies somewhere in the middle of the two concepts. Its > part of the results of the combined effects of ACID > > > On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote: > >> Sounds to me like you're confusing atomicity with isolation. >> >> On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman >> wrote: >> > Yup, im even more confused.Lets talk about the model, not the >> > implementation. >> > AIUI updates to a row are atomic across all columns in that row at >> once, >> > true? >> > If true then the next question is, does the validation happen inside >> or >> > outside of that guarantee, and is the row guaranteed not to change >> between >> > validation and update? >> > If that is *not* the case then it makes a whole class >> of solutions to >> > synchronization problems fail and put
Re: 'select * from ' - FTS or Index
On Thu, 2011-07-07 at 19:34 -0400, A J wrote: > Does a 'select * from ' with no filter still use the primary > index on the key or do a 'full table scan' ? It's the equivalent of a range slice with no starting or ending keys, and no starting or ending columns. Indexing cannot save you; This is an expensive query. -- Eric Evans eev...@rackspace.com
Re: What does a write lock ?
I am confused by what you mean by "Cassandra client code." Is this part of the Cassnadra server? My architecture is my "user" talks thrift to Cassandra.
Re: What does a write lock ?
Where does a custom validation method run? Given that it is validating a row update, my assumption was that it ran on the node that "owns" the row. That would make sense to me as it would fulfill the NoSql philosophy of taking computation to data, rather then data to computation. I don't follow the relevance of the rest of your comment, sorry. On Fri, Jul 8, 2011 at 10:34 AM, Jonathan Ellis wrote: > It doesn't look like that at all. > > Row A exists. > > Client submits mutation Am. This is not necessarily a full row. > > Coordinator validates Am. > > If validation succeeds, coordinator sends Am to the replica owners, > effectively creating A'. > > Neither A nor A' is ever explicitly assembled on the write path. > > On Fri, Jul 8, 2011 at 9:22 AM, Jeffrey Kesselman > wrote: > > Not quite, its more limited and specific > > The order of operations is all within the Cassandra node server and looks > > like this this... > > We have one row, A. Thats the only row being operated on. > > Client -> submits A' > > Server does the following: > > (1) Validate function reads current A > > (2) Validate function validates A' vs. A > > (3) If validation succeeds, allows update to A'. > > My fear/concern is that after 1 and before 3, a second update to A'' > comes > > in and changes the "current" value of A, therefor invalidating my > > validation check, see? > > If Cassandra does not guard against this then one possible > solution would be > > to make my own key-to-mutex map in memory, lock the mutex for A's key as > a > > precursor to (1) and release it in a post-update function. But I am > always > > very nervous about inserting locking into a process that wasn't designed > > with it already in mind... > > > > On Fri, Jul 8, 2011 at 8:30 AM, William Oberman < > ober...@civicscience.com> > > wrote: > >> > >> Questions like this seem to come up a lot: > >> > >> > http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no > >> > >> > http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily > >> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html > >> Lets say you read state A (from one key in one CF), you change the data > to > >> A' in your client, and you write A'. Are you worried that someone else > >> might have changed A to B during this process (making the "new" state a > race > >> between A' and B)? It doesn't sound to me like you are... It sounds to > me > >> like you're worried about a set of columns for the key being in a > consistent > >> state before, during, and after a process. And A -> A' and A -> B will > each > >> be atomic for the key (based on my understanding). But, if A' and B are > >> changes to a different set of columns, I believe that would interleave, > >> which itself could be "inconsistent" from your application's point of > view. > >> > >> will > >> > >> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman > >> wrote: > >>> > >>> Really, as i lay in the bath thinking nabout it, I concluded what I am > >>> looking for is a very limited form of Consistency. > >>> Its consistency over a single row on a single node just for the period > of > >>> update. > >>> > >>> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman > >>> wrote: > > Its not really isolation, btw, because we > arent talking about anyone seeing an update mid-update.Rather, we > are talking about when updates are allowed to occur. > Atomicity means that all the updates happen together or they don't > happen at all. > Isolation means that no results of the update are visible until the > entire update operation is complete. > This really lies somewhere in the middle of the two concepts. Its > part > of the results of the combined effects of ACID > > > On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis > wrote: > > > > Sounds to me like you're confusing atomicity with isolation. > > > > On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman > > wrote: > > > Yup, im even more confused.Lets talk about the model, not the > > > implementation. > > > AIUI updates to a row are atomic across all columns in that row at > > > once, > > > true? > > > If true then the next question is, does the validation happen > inside > > > or > > > outside of that guarantee, and is the row guaranteed not to change > > > between > > > validation and update? > > > If that is *not* the case then it makes a whole class > of solutions to > > > synchronization problems fail and puts my larger project > > > in serious question. > > > > > > On Thu, Jul 7, 2011 at 3:43 PM, Yang > wrote: > > >> > > >> no , the memtable is a concurrentskiplistmap > > >> > > >> insertion can happen in parallel > > >> > > >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" > >>>
Re: What does a write lock ?
I use a language specific wrapper around thrift as my "client", but yes, I guess I fundamentally mean thrift == client, and the cassandra server == server. will On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman wrote: > I am confused by what you mean by "Cassandra client code." Is this part of > the Cassnadra server? > > My architecture is my "user" talks thrift to Cassandra. > > >
Re: What does a write lock ?
Alright, So are you saying the column validator, as specified by conf/storage-conf.xml is checked in the client interface library and not on the server side? That seems odd to me on a number of levels, not the least being I cant see how thrift could autogenerate that for different languages or how those other languages would use a Java class. * * On Fri, Jul 8, 2011 at 11:13 AM, William Oberman wrote: > I use a language specific wrapper around thrift as my "client", but yes, I > guess I fundamentally mean thrift == client, and the cassandra server == > server. > > will > > > On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman wrote: > >> I am confused by what you mean by "Cassandra client code." Is this part >> of the Cassnadra server? >> >> My architecture is my "user" talks thrift to Cassandra. >> >> >> > -- It's always darkest just before you are eaten by a grue.
Re: What does a write lock ?
I haven't ever written my own org.apache.cassandra.db.marshal.AbstractType (which is I think what your talking about), so I have no idea. Looking up the JavaDoc for that class, validate says "validate that the byte array is a valid sequence for the type we are supposed to be comparing", which sounds like a local operation to me (e.g. it shouldn't fetch remote data, it's just saying "yep, this is a valid member of type T"). will On Fri, Jul 8, 2011 at 11:17 AM, Jeffrey Kesselman wrote: > Alright, > > So are you saying the column validator, as specified > by conf/storage-conf.xml is checked in the client interface library and not > on the server side? That seems odd to me on a number of levels, not the > least being I cant see how thrift could autogenerate that > for different languages or how those other languages would use a Java class. > * > * > On Fri, Jul 8, 2011 at 11:13 AM, William Oberman > wrote: > >> I use a language specific wrapper around thrift as my "client", but yes, I >> guess I fundamentally mean thrift == client, and the cassandra server == >> server. >> >> will >> >> >> On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman wrote: >> >>> I am confused by what you mean by "Cassandra client code." Is this part >>> of the Cassnadra server? >>> >>> My architecture is my "user" talks thrift to Cassandra. >>> >>> >>> >> > > > -- > It's always darkest just before you are eaten by a grue. >
how large cassandra could scale when it need to do manual operation?
hi, all: I am curious about how large that Cassandra can scale? from the information I can get, the largest usage is at facebook, which is about 150 nodes. in the mean time they are using 2000+ nodes with Hadoop, and yahoo even using 4000 nodes of Hadoop. I am not understand why is the situation, I only have little knowledge with Cassandra and even no knowledge with Hadoop. currently I am using cassandra with 3 nodes and having problem bring one back after it out of sync, the problems I encountered making me worry about how cassandra could scale out: 1): the load balance need to manually performed on every node, according to: def tokens(nodes): for x in xrange(nodes): print 2 ** 127 / nodes * x 2): when adding new nodes, need to perform node repair and cleanup on every node 3) when decommission a node, there is a chance that slow down the entire cluster. (not sure why but I saw people ask around about it.) and the only way to do is shutdown the entire the cluster, rsync the data, and start all nodes without the decommission one. after all, I think there is alot of human work to do to maintain the cluster which make it impossible to scale to thousands of nodes, but I hope I am totally wrong about all of this, currently I am serving 1 millions pv every day with Cassandra and it make me feel unsafe, I am afraid one day one node crash will cause the data broken and all cluster goes wrong in the contrary, relational database make me feel safety but it does not scale well. thanks for any guidance here.
Re: What does a write lock ?
Validation occurs at the API level, returning an InvalidRequestException to the caller of the API (a thrift client in this case). Specifically, a mutation will not be scheduled for the storage until it has been validated at the API level. If the intention is to do a read-before-write validation as an AbstractType extension, then yes, the underlying value could indeed change between validation and storage. If this were the goal, you need to implement locking externally (via zookeper or similar as previously mentioned). On Fri, Jul 8, 2011 at 10:21 AM, William Oberman wrote: > I haven't ever written my own org.apache.cassandra.db.marshal.AbstractType > (which is I think what your talking about), so I have no idea. > > Looking up the JavaDoc for that class, validate says "validate that the byte > array is a valid sequence for the type we are supposed to be comparing", > which sounds like a local operation to me (e.g. it shouldn't fetch remote > data, it's just saying "yep, this is a valid member of type T"). > > will > > On Fri, Jul 8, 2011 at 11:17 AM, Jeffrey Kesselman wrote: >> >> Alright, >> So are you saying the column validator, as specified >> by conf/storage-conf.xml is checked in the client interface library and not >> on the server side? That seems odd to me on a number of levels, not the >> least being I cant see how thrift could autogenerate that >> for different languages or how those other languages would use a Java class. >> >> On Fri, Jul 8, 2011 at 11:13 AM, William Oberman >> wrote: >>> >>> I use a language specific wrapper around thrift as my "client", but yes, >>> I guess I fundamentally mean thrift == client, and the cassandra server == >>> server. >>> >>> will >>> >>> On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman >>> wrote: I am confused by what you mean by "Cassandra client code." Is this part of the Cassnadra server? My architecture is my "user" talks thrift to Cassandra. >>> >> >> >> >> -- >> It's always darkest just before you are eaten by a grue. > > >
Re: What does a write lock ?
Hmm. Thanks Nate. I need to think about this and our data store design some. In general I dislike architecture with large numbers of independent servers, I think it invites communication latencies and partial failures into the mix. But I'll cogitate some. On Fri, Jul 8, 2011 at 12:21 PM, Nate McCall wrote: > Validation occurs at the API level, returning an > InvalidRequestException to the caller of the API (a thrift client in > this case). Specifically, a mutation will not be scheduled for the > storage until it has been validated at the API level. > > If the intention is to do a read-before-write validation as an > AbstractType extension, then yes, the underlying value could indeed > change between validation and storage. If this were the goal, you need > to implement locking externally (via zookeper or similar as previously > mentioned). > > On Fri, Jul 8, 2011 at 10:21 AM, William Oberman > wrote: > > I haven't ever written my own > org.apache.cassandra.db.marshal.AbstractType > > (which is I think what your talking about), so I have no idea. > > > > Looking up the JavaDoc for that class, validate says "validate that the > byte > > array is a valid sequence for the type we are supposed to be comparing", > > which sounds like a local operation to me (e.g. it shouldn't fetch remote > > data, it's just saying "yep, this is a valid member of type T"). > > > > will > > > > On Fri, Jul 8, 2011 at 11:17 AM, Jeffrey Kesselman > wrote: > >> > >> Alright, > >> So are you saying the column validator, as specified > >> by conf/storage-conf.xml is checked in the client interface library and > not > >> on the server side? That seems odd to me on a number of levels, not the > >> least being I cant see how thrift could autogenerate that > >> for different languages or how those other languages would use a Java > class. > >> > >> On Fri, Jul 8, 2011 at 11:13 AM, William Oberman > >> wrote: > >>> > >>> I use a language specific wrapper around thrift as my "client", but > yes, > >>> I guess I fundamentally mean thrift == client, and the cassandra server > == > >>> server. > >>> > >>> will > >>> > >>> On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman > >>> wrote: > > I am confused by what you mean by "Cassandra client code." Is this > part > of the Cassnadra server? > My architecture is my "user" talks thrift to Cassandra. > > >>> > >> > >> > >> > >> -- > >> It's always darkest just before you are eaten by a grue. > > > > > > > -- It's always darkest just before you are eaten by a grue.
Re: What does a write lock ?
I should add, Nate, that the intention is to do a read before write validation and have that occur as close to the data as possible. if there is a better hook to implement it on I'd love a pointer to it. JK On Fri, Jul 8, 2011 at 12:21 PM, Nate McCall wrote: > Validation occurs at the API level, returning an > InvalidRequestException to the caller of the API (a thrift client in > this case). Specifically, a mutation will not be scheduled for the > storage until it has been validated at the API level. > > If the intention is to do a read-before-write validation as an > AbstractType extension, then yes, the underlying value could indeed > change between validation and storage. If this were the goal, you need > to implement locking externally (via zookeper or similar as previously > mentioned). > > On Fri, Jul 8, 2011 at 10:21 AM, William Oberman > wrote: > > I haven't ever written my own > org.apache.cassandra.db.marshal.AbstractType > > (which is I think what your talking about), so I have no idea. > > > > Looking up the JavaDoc for that class, validate says "validate that the > byte > > array is a valid sequence for the type we are supposed to be comparing", > > which sounds like a local operation to me (e.g. it shouldn't fetch remote > > data, it's just saying "yep, this is a valid member of type T"). > > > > will > > > > On Fri, Jul 8, 2011 at 11:17 AM, Jeffrey Kesselman > wrote: > >> > >> Alright, > >> So are you saying the column validator, as specified > >> by conf/storage-conf.xml is checked in the client interface library and > not > >> on the server side? That seems odd to me on a number of levels, not the > >> least being I cant see how thrift could autogenerate that > >> for different languages or how those other languages would use a Java > class. > >> > >> On Fri, Jul 8, 2011 at 11:13 AM, William Oberman > >> wrote: > >>> > >>> I use a language specific wrapper around thrift as my "client", but > yes, > >>> I guess I fundamentally mean thrift == client, and the cassandra server > == > >>> server. > >>> > >>> will > >>> > >>> On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman > >>> wrote: > > I am confused by what you mean by "Cassandra client code." Is this > part > of the Cassnadra server? > My architecture is my "user" talks thrift to Cassandra. > > >>> > >> > >> > >> > >> -- > >> It's always darkest just before you are eaten by a grue. > > > > > > > -- It's always darkest just before you are eaten by a grue.
Corrupted data
Hi everyone, I'm having thousands of these errors: WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 CompactionManager.java (line 737) Non-fatal error reading row (stacktrace follows) java.io.IOError: java.io.IOException: Impossible row size 6292724931198053 at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633) at org.apache.cassandra.db.compaction.CompactionManager.access $600(CompactionManager.java:65) at org.apache.cassandra.db.compaction.CompactionManager $3.call(CompactionManager.java:250) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Impossible row size 6292724931198053 ... 9 more INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705 CompactionManager.java (line 743) Retrying from row index; data is -8 bytes starting at 4735525245 WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 CompactionManager.java (line 767) Retry failed too. Skipping to next row (retry's stacktrace follows) java.io.IOError: java.io.EOFException: bloom filter claims to be 863794556 bytes, longer than entire row size -8 THis is during scrub, as I saw similar errors while in normal operation. Is there anything I can do? It looks like I'm going to lose a ton of data
Pre-CassandraSF Happy Hour on Sunday
Hi all, If you're in San Francisco for CassandraSF on Monday 11th, then come and join fellow Cassandra users and committers on Sunday evening. Starting at 6:30pm at ThirstyBear, the famous brewing company. We'll have drinks, food and more. RSVP at Eventbrite: http://pre-cassandrasf-happyhour.eventbrite.com/ Hope you can join us! -- Richard Low Acunu | http://www.acunu.com | @acunu
Re: Pig pulling an older value from cassandra
Jeremy did you get anywhere with this ? If you are reading at CL ONE Read Repair will run in the background, so it may only be visible to subsequent reads. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 6 Jul 2011, at 20:52, Jeremy Hanna wrote: > I'm seeing some strange behavior and not sure how it is possible. We updated > some data using a pig script and that wrote back to cassandra. We get the > value and list the value on the Cassandra CLI and it's the updated value - > from MARKET to market. However, when doing a pig script to filter by the > known good values, we are left with about 42k rows that still have MARKET. > If we list a subset of them, get the key, and get/list them on the CLI, they > are lowercase market. > > Anyone have any suggestions as to how this might be possible? Our read > repair chance is set to 1.0. > > Jeremy
Re: Re : result sorted by keys in reversed
> Is it possible to have same results sorting in reversed by another method > without get_range_slice in JAVA ? Sorry I don't understand your question. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 7 Jul 2011, at 01:56, Monnom Monprenom wrote: > Thanks, > > Is it possible to have same results sorting in reversed by another method > without get_range_slice in JAVA ? > > De : Aaron Morton > À : "user@cassandra.apache.org" > Envoyé le : Jeudi 7 Juillet 2011 2h52 > Objet : Re: result sorted by keys in reversed > > It's not currently supported via the api. But I *think* it's technically > possible, the code could page backwards using the index sampling the same > way it does for columns. > > Best advice is to raise a ticket on > https://issues.apache.org/jira/browse/CASSANDRA (maybe do a search first, > someone else may have requested it) > > Cheers > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 7/07/2011, at 1:39 AM, Monnom Monprenom wrote: > >> Hi, >> >> I am using get_range_slice and I get the results sorted by keys, Is it >> possible to have the results also sorted by keys but in reverse (from the >> biggest to the smallest)? > >
Re: Pig pulling an older value from cassandra
Not yet - we've updated the CassandraStorage with a patch we've done for CASSANDRA-2869 to see if that might indirectly do something to the inputs, but not sure it would affect that part of it. The hadoop default in ConfigHelper is CL ONE. I need to do some more focused study of that data. We just have multiple things we're trying to get working properly so I haven't had a chance yet. For example, we have checked all the possible scripts to make sure we're not introducing more of those, but haven't looked at the dates for those that we're seeing through pig to see when those were added. Things like that. Thanks for the response and I'll update this thread when we find out more. On Jul 8, 2011, at 3:30 PM, aaron morton wrote: > Jeremy did you get anywhere with this ? > > If you are reading at CL ONE Read Repair will run in the background, so it > may only be visible to subsequent reads. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 6 Jul 2011, at 20:52, Jeremy Hanna wrote: > >> I'm seeing some strange behavior and not sure how it is possible. We >> updated some data using a pig script and that wrote back to cassandra. We >> get the value and list the value on the Cassandra CLI and it's the updated >> value - from MARKET to market. However, when doing a pig script to filter >> by the known good values, we are left with about 42k rows that still have >> MARKET. If we list a subset of them, get the key, and get/list them on the >> CLI, they are lowercase market. >> >> Anyone have any suggestions as to how this might be possible? Our read >> repair chance is set to 1.0. >> >> Jeremy >
Re: List nodes where write was applied to
The logs will give you some idea, but it's not information that is available as part of a request. Turn the logging up to DEBUG and watch what happens. You will see the coordinator log where it is sending messages together with some unique identifiers that you will also see logged on the replicas. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 7 Jul 2011, at 10:01, A J wrote: > Is there a way to find what all nodes was a write applied to ? It > could be a successful write (i.e. w was met) or unsuccessful write > (i.e. less than w nodes were met). In either case, I am interested in > finding: > Number of nodes written to (before timeout or on success) > Name of nodes written to (before timeout or on success) > > Thanks.
Re: how large cassandra could scale when it need to do manual operation?
AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time ago. Twitter is a vocal supporter with a large Apache Cassandra install, e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half dozen clusters. " http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011 If you are working with a 3 node cluster removing/rebuilding/what ever one node will effect 33% of your capacity. When you scale up the contribution from each individual node goes down, and the impact of one node going down is less. Problems that happen with a few nodes will go away at scale, to be replaced by a whole set of new ones. > 1): the load balance need to manually performed on every node, according to: Yes > 2): when adding new nodes, need to perform node repair and cleanup on every > node You only need to run cleanup, see http://wiki.apache.org/cassandra/Operations#Bootstrap > 3) when decommission a node, there is a chance that slow down the entire > cluster. (not sure why but I saw people ask around about it.) and the only > way to do is shutdown the entire the cluster, rsync the data, and start all > nodes without the decommission one. I cannot remember any specific cases where decommission requires a full cluster stop, do you have a link? With regard to slowing down, the decommission process will stream data from the node you are removing onto the other nodes this can slow down the target node (I think it's more intelligent now about what is moved). This will be exaggerated in a 3 node cluster as you are removing 33% of the processing and adding some (temporary) extra load to the remaining nodes. > after all, I think there is alot of human work to do to maintain the cluster > which make it impossible to scale to thousands of nodes, Automation, Automation, Automation is the only way to go. Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, munin, ganglia etc for monitoring. And Ops Centre (http://www.datastax.com/products/opscenter) for cassandra specific management. > I am totally wrong about all of this, currently I am serving 1 millions pv > every day with Cassandra and it make me feel unsafe, I am afraid one day one > node crash will cause the data broken and all cluster goes wrong With RF3 and a 3Node cluster you have room to lose one node and the cluster will be up for 100% of the keys. While better than having to worry about *the* database server, it's still entry level fault tolerance. With RF 3 in a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the keys. Is there something you are specifically concerned about with your current installation ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 8 Jul 2011, at 08:50, Yan Chunlu wrote: > hi, all: > I am curious about how large that Cassandra can scale? > > from the information I can get, the largest usage is at facebook, which is > about 150 nodes. in the mean time they are using 2000+ nodes with Hadoop, > and yahoo even using 4000 nodes of Hadoop. > > I am not understand why is the situation, I only have little knowledge with > Cassandra and even no knowledge with Hadoop. > > > > currently I am using cassandra with 3 nodes and having problem bring one back > after it out of sync, the problems I encountered making me worry about how > cassandra could scale out: > > 1): the load balance need to manually performed on every node, according to: > > def tokens(nodes): > > for x in xrange(nodes): > > print 2 ** 127 / nodes * x > > > > 2): when adding new nodes, need to perform node repair and cleanup on every > node > > > > 3) when decommission a node, there is a chance that slow down the entire > cluster. (not sure why but I saw people ask around about it.) and the only > way to do is shutdown the entire the cluster, rsync the data, and start all > nodes without the decommission one. > > > > > > after all, I think there is alot of human work to do to maintain the cluster > which make it impossible to scale to thousands of nodes, but I hope I am > totally wrong about all of this, currently I am serving 1 millions pv every > day with Cassandra and it make me feel unsafe, I am afraid one day one node > crash will cause the data broken and all cluster goes wrong > > > > in the contrary, relational database make me feel safety but it does not > scale well. > > > > thanks for any guidance here. >
Re: Corrupted data
You may not lose data. - What version and whats the upgrade history? - What RF / node count / CL ? - Have you been running repair consistently ? - Is this on a single node or all nodes ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 8 Jul 2011, at 09:38, Héctor Izquierdo Seliva wrote: > Hi everyone, > > I'm having thousands of these errors: > > WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 > CompactionManager.java (line 737) Non-fatal error reading row > (stacktrace follows) > java.io.IOError: java.io.IOException: Impossible row size > 6292724931198053 > at > org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719) > at > org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633) > at org.apache.cassandra.db.compaction.CompactionManager.access > $600(CompactionManager.java:65) > at org.apache.cassandra.db.compaction.CompactionManager > $3.call(CompactionManager.java:250) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor > $Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor > $Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.IOException: Impossible row size 6292724931198053 > ... 9 more > INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705 > CompactionManager.java (line 743) Retrying from row index; data is -8 > bytes starting at 4735525245 > WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 > CompactionManager.java (line 767) Retry failed too. Skipping to next > row (retry's stacktrace follows) > java.io.IOError: java.io.EOFException: bloom filter claims to be > 863794556 bytes, longer than entire row size -8 > > > THis is during scrub, as I saw similar errors while in normal operation. > Is there anything I can do? It looks like I'm going to lose a ton of > data >
Performance deterioration while building secondary index
I have roughly 150 million rows in my database, which will grow as I continue testing. I'm building an index on a particular column, via cassandra-cli, something of the sort: update column family jobs with column_metadata = [{column_name : 'DATE', validation_class : AsciiType, index_type : 0, index_name : 'date'}] At this point, the cluster just becomes unresponsive -- just doing "list" on a CF takes a while. Random query test I used to run rather quickly, becomes terribly slow (hasn't returned since I started typing this). Is that normal? I can' imagine this happening in a production situation, when I decided to add an index for some valid reasons. Really scratching my head now. TIA. The version is 0.8.1 Thanks!
node stuck "leaving"
I've got a node that is stuck "Leaving" the ring. Running "nodetool decommission" never terminates. It's been in this state for about a week, and the load has not decreased: $ nodetool -h localhost ring Address DC RackStatus State Load OwnsToken Token(bytes[de4075d0a474c4a773efa2891c020529]) x.x.x.1 datacenter1 rack1 Up Leaving 150.63 GB 33.33% Token(bytes[10956f12b46304bf70412ad0eac14344]) x.x.x.2 datacenter1 rack1 Up Normal 79.21 GB33.33% Token(bytes[50af14df71eafac7bac60fbc836c6722]) x.x.x.3 datacenter1 rack1 Up Normal 60.74 GB33.33% Token(bytes[de4075d0a474c4a773efa2891c020529]) Any ideas? Regards, Casey
Re: Performance deterioration while building secondary index
My guess: index build isn't respecting the background i/o throttle. On Fri, Jul 8, 2011 at 5:55 PM, Maxim Potekhin wrote: > I have roughly 150 million rows in my database, which will grow as I > continue testing. I'm building an index on a particular column, via > cassandra-cli, something of the sort: > update column family jobs with column_metadata = [{column_name : 'DATE', > validation_class : AsciiType, index_type : 0, index_name : 'date'}] > > At this point, the cluster just becomes unresponsive -- just doing "list" on > a CF takes a while. Random query test I used to run rather quickly, becomes > terribly slow (hasn't returned since I started typing this). > > Is that normal? I can' imagine this happening in a production situation, > when I decided to add an index for some valid reasons. Really scratching my > head now. TIA. > > The version is 0.8.1 > > Thanks! > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Corrupted data
Hi Aaron, El vie, 08-07-2011 a las 14:47 -0700, aaron morton escribió: > You may not lose data. > > - What version and whats the upgrade history? all versions from 0.7.1 to 0.8.1. All cfs were in 0.8.1 format though > - What RF / node count / CL ? RF=3, node count = 6 > - Have you been running repair consistently ? Nop, only when something breaks > - Is this on a single node or all nodes ? A couple of nodes. Scrub told there were a few thousand of columns it could not restore. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 8 Jul 2011, at 09:38, Héctor Izquierdo Seliva wrote: > > > Hi everyone, > > > > I'm having thousands of these errors: > > > > WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 > > CompactionManager.java (line 737) Non-fatal error reading row > > (stacktrace follows) > > java.io.IOError: java.io.IOException: Impossible row size > > 6292724931198053 > > at > > org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719) > > at > > org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633) > > at org.apache.cassandra.db.compaction.CompactionManager.access > > $600(CompactionManager.java:65) > > at org.apache.cassandra.db.compaction.CompactionManager > > $3.call(CompactionManager.java:250) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.runTask(ThreadPoolExecutor.java:886) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: java.io.IOException: Impossible row size 6292724931198053 > > ... 9 more > > INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705 > > CompactionManager.java (line 743) Retrying from row index; data is -8 > > bytes starting at 4735525245 > > WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 > > CompactionManager.java (line 767) Retry failed too. Skipping to next > > row (retry's stacktrace follows) > > java.io.IOError: java.io.EOFException: bloom filter claims to be > > 863794556 bytes, longer than entire row size -8 > > > > > > THis is during scrub, as I saw similar errors while in normal operation. > > Is there anything I can do? It looks like I'm going to lose a ton of > > data > > >