date:20110708

Command Request: rename a column

2011-07-08 Thread AJ



I think it would be really cool to be able to rename a column, or, more 
generally, a move command to move data from one column to another in the 
same CF without the client having to read and resend the column value.  
This would be *extremely* powerful, imo.  I suspect the execution would 
be quick and could even be made atomic (per node) as I suspect it would 
mostly entail only reference updates.  Has anything like this been 
discussed before?  Seems like such a natural operation for a 
hash-table-like data store.


aj

Re: Command Request: rename a column

2011-07-08 Thread Sylvain Lebresne

On Fri, Jul 8, 2011 at 9:22 AM, AJ  wrote:
>
> I think it would be really cool to be able to rename a column, or, more
> generally, a move command to move data from one column to another in the
> same CF without the client having to read and resend the column value.  This
> would be extremely powerful, imo.  I suspect the execution would be quick
> and could even be made atomic (per node) as I suspect it would mostly entail
> only reference updates.

Cassandra don't work like that. We would have no other choice than to read the
column and write it back with a different name (and it would not be atomic). So
the only win we would get from doing this server side would lie in not
transferring
the value across the network.

--
Sylvain

Re: Command Request: rename a column

2011-07-08 Thread AJ


On 7/8/2011 2:18 AM, Sylvain Lebresne wrote:

On Fri, Jul 8, 2011 at 9:22 AM, AJ  wrote:

I think it would be really cool to be able to rename a column, or, more
generally, a move command to move data from one column to another in the
same CF without the client having to read and resend the column value.  This
would be extremely powerful, imo.  I suspect the execution would be quick
and could even be made atomic (per node) as I suspect it would mostly entail
only reference updates.

Cassandra don't work like that. We would have no other choice than to read the
column and write it back with a different name


I figured as much :)  Not that bad though.


  (and it would not be atomic). So
the only win we would get from doing this server side would lie in not
transferring
the value across the network.



That would be the main benefit I think, esp with large values.


--
Sylvain

Re: What does a write lock ?

2011-07-08 Thread William Oberman

Questions like this seem to come up a lot:
http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no
http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily
http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html

Lets say you read state A (from one key in one CF), you change the data to
A' in your client, and you write A'.  Are you worried that someone else
might have changed A to B during this process (making the "new" state a race
between A' and B)?  It doesn't sound to me like you are...  It sounds to me
like you're worried about a set of columns for the key being in a consistent
state before, during, and after a process.  And A -> A' and A -> B will each
be atomic for the key (based on my understanding).  But, if A' and B are
changes to a different set of columns, I believe that would interleave,
which itself could be "inconsistent" from your application's point of view.


will

On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman  wrote:

> Really, as i lay in the bath thinking nabout it, I concluded what I am
> looking for is a very limited form of Consistency.
>
> Its consistency over a single row on a single node just for the period of
> update.
>
>
> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote:
>
>> Its not really isolation, btw, because we
>> arent talking about anyone seeing an update mid-update.Rather, we
>> are talking about when updates are allowed to occur.
>>
>> Atomicity means that all the updates happen together or they don't happen
>> at all.
>> Isolation means that no results of the update are visible until the entire
>> update operation is complete.
>>
>> This really lies somewhere in the middle of the two concepts.   Its part
>> of the results of the combined effects of ACID
>>
>>
>> On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote:
>>
>>> Sounds to me like you're confusing atomicity with isolation.
>>>
>>> On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman 
>>> wrote:
>>> > Yup, im even more confused.Lets talk about the model, not the
>>> > implementation.
>>> > AIUI updates to a row are atomic across all columns in that row at
>>> once,
>>> > true?
>>> > If true then the next question is, does the validation happen inside or
>>> > outside of that guarantee, and is the row guaranteed not to change
>>> between
>>> > validation and update?
>>> > If that is *not* the case then it makes a whole class of solutions to
>>> > synchronization problems fail and puts my larger project
>>> > in serious question.
>>> >
>>> > On Thu, Jul 7, 2011 at 3:43 PM, Yang  wrote:
>>> >>
>>> >> no , the memtable is a concurrentskiplistmap
>>> >>
>>> >> insertion can happen in parallel
>>> >>
>>> >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman"  wrote:
>>> >> > This has me more confused.
>>> >> >
>>> >> > Does this mean that ALL rows on a given node are only updated
>>> >> > sequentially,
>>> >> > never in parallel?
>>> >> >
>>> >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang  wrote:
>>> >> >
>>> >> >> just to add onto what jonathan said
>>> >> >>
>>> >> >> the columns are immutable . if u overwrite/ reconcile a new obj is
>>> >> >> created and shoved into the memtable
>>> >> >>
>>> >> >> there is a shared lock for all writes though which guard against an
>>> >> >> exclusive lock on memtable switching/flushing
>>> >> >> On Jul 7, 2011 7:51 AM, "A J"  wrote:
>>> >> >> > Does a write lock:
>>> >> >> > 1. Just the columns in question for the specific row in question
>>> ?
>>> >> >> > 2. The full row in question ?
>>> >> >> > 3. The full CF ?
>>> >> >> >
>>> >> >> > I doubt read does any locks.
>>> >> >> >
>>> >> >> > Thanks.
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > It's always darkest just before you are eaten by a grue.
>>> >
>>> >
>>> >
>>> > --
>>> > It's always darkest just before you are eaten by a grue.
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
>>
>>
>> --
>> It's always darkest just before you are eaten by a grue.
>>
>
>
>
> --
> It's always darkest just before you are eaten by a grue.
>

Re: What does a write lock ?

2011-07-08 Thread William Oberman

Disregard most of my post (already).  I forgot that reads aren't isolated.
 That means A and B are states cassandra will *eventually* be in, but at any
point in time a read might see a "partial B" (where some columns are still
A, and others are B).  Though, I'm sure someone else will confirm if I'm
wrong yet again.

For me, if I need two pieces of data to be consistently related to each
other and stored in cassandra, I encode them (usually JSON) and store them
in one column.

will

On Fri, Jul 8, 2011 at 8:30 AM, William Oberman wrote:

> Questions like this seem to come up a lot:
>
> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no
>
> http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily
> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html
>
> Lets say you read state A (from one key in one CF), you change the data to
> A' in your client, and you write A'.  Are you worried that someone else
> might have changed A to B during this process (making the "new" state a race
> between A' and B)?  It doesn't sound to me like you are...  It sounds to me
> like you're worried about a set of columns for the key being in a consistent
> state before, during, and after a process.  And A -> A' and A -> B will each
> be atomic for the key (based on my understanding).  But, if A' and B are
> changes to a different set of columns, I believe that would interleave,
> which itself could be "inconsistent" from your application's point of view.
>
>
> will
>
> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman wrote:
>
>> Really, as i lay in the bath thinking nabout it, I concluded what I am
>> looking for is a very limited form of Consistency.
>>
>> Its consistency over a single row on a single node just for the period of
>> update.
>>
>>
>> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote:
>>
>>> Its not really isolation, btw, because we
>>> arent talking about anyone seeing an update mid-update.Rather, we
>>> are talking about when updates are allowed to occur.
>>>
>>> Atomicity means that all the updates happen together or they don't happen
>>> at all.
>>> Isolation means that no results of the update are visible until the
>>> entire update operation is complete.
>>>
>>> This really lies somewhere in the middle of the two concepts.   Its part
>>> of the results of the combined effects of ACID
>>>
>>>
>>> On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote:
>>>
 Sounds to me like you're confusing atomicity with isolation.

 On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman 
 wrote:
 > Yup, im even more confused.Lets talk about the model, not the
 > implementation.
 > AIUI updates to a row are atomic across all columns in that row at
 once,
 > true?
 > If true then the next question is, does the validation happen inside
 or
 > outside of that guarantee, and is the row guaranteed not to change
 between
 > validation and update?
 > If that is *not* the case then it makes a whole class of solutions to
 > synchronization problems fail and puts my larger project
 > in serious question.
 >
 > On Thu, Jul 7, 2011 at 3:43 PM, Yang  wrote:
 >>
 >> no , the memtable is a concurrentskiplistmap
 >>
 >> insertion can happen in parallel
 >>
 >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" 
 wrote:
 >> > This has me more confused.
 >> >
 >> > Does this mean that ALL rows on a given node are only updated
 >> > sequentially,
 >> > never in parallel?
 >> >
 >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang 
 wrote:
 >> >
 >> >> just to add onto what jonathan said
 >> >>
 >> >> the columns are immutable . if u overwrite/ reconcile a new obj is
 >> >> created and shoved into the memtable
 >> >>
 >> >> there is a shared lock for all writes though which guard against
 an
 >> >> exclusive lock on memtable switching/flushing
 >> >> On Jul 7, 2011 7:51 AM, "A J"  wrote:
 >> >> > Does a write lock:
 >> >> > 1. Just the columns in question for the specific row in question
 ?
 >> >> > 2. The full row in question ?
 >> >> > 3. The full CF ?
 >> >> >
 >> >> > I doubt read does any locks.
 >> >> >
 >> >> > Thanks.
 >> >>
 >> >
 >> >
 >> >
 >> > --
 >> > It's always darkest just before you are eaten by a grue.
 >
 >
 >
 > --
 > It's always darkest just before you are eaten by a grue.
 >



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

>>>
>>>
>>>
>>> --
>>> It's always darkest just before you are eaten by a grue.
>>>
>>
>>
>>
>> --
>> It's always darkest just before you are eaten by a grue.
>>
>
>
>
>

Re: What does a write lock ?

2011-07-08 Thread Jeffrey Kesselman

Not quite, its more limited and specific

The order of operations is all within the Cassandra node server and looks
like this this...

We have one row, A.  Thats the only row being operated on.

Client -> submits A'
Server does the following:
(1) Validate function reads current A
(2) Validate function validates A' vs. A
(3) If validation succeeds, allows update to A'.

My fear/concern is that after 1 and before 3, a second update to A'' comes
in and changes the "current" value of A, therefor invalidating my
validation check, see?

If Cassandra does not guard against this then one possible solution would be
to make my own key-to-mutex map in memory, lock the mutex for A's key as a
precursor to (1) and release it in a post-update function.  But I am always
very nervous about inserting locking into a process that wasn't designed
with it already in mind...


On Fri, Jul 8, 2011 at 8:30 AM, William Oberman wrote:

> Questions like this seem to come up a lot:
>
> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no
>
> http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily
> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html
>
> Lets say you read state A (from one key in one CF), you change the data to
> A' in your client, and you write A'.  Are you worried that someone else
> might have changed A to B during this process (making the "new" state a race
> between A' and B)?  It doesn't sound to me like you are...  It sounds to me
> like you're worried about a set of columns for the key being in a consistent
> state before, during, and after a process.  And A -> A' and A -> B will each
> be atomic for the key (based on my understanding).  But, if A' and B are
> changes to a different set of columns, I believe that would interleave,
> which itself could be "inconsistent" from your application's point of view.
>
>
> will
>
> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman wrote:
>
>> Really, as i lay in the bath thinking nabout it, I concluded what I am
>> looking for is a very limited form of Consistency.
>>
>> Its consistency over a single row on a single node just for the period of
>> update.
>>
>>
>> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote:
>>
>>> Its not really isolation, btw, because we
>>> arent talking about anyone seeing an update mid-update.Rather, we
>>> are talking about when updates are allowed to occur.
>>>
>>> Atomicity means that all the updates happen together or they don't happen
>>> at all.
>>> Isolation means that no results of the update are visible until the
>>> entire update operation is complete.
>>>
>>> This really lies somewhere in the middle of the two concepts.   Its part
>>> of the results of the combined effects of ACID
>>>
>>>
>>> On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote:
>>>
 Sounds to me like you're confusing atomicity with isolation.

 On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman 
 wrote:
 > Yup, im even more confused.Lets talk about the model, not the
 > implementation.
 > AIUI updates to a row are atomic across all columns in that row at
 once,
 > true?
 > If true then the next question is, does the validation happen inside
 or
 > outside of that guarantee, and is the row guaranteed not to change
 between
 > validation and update?
 > If that is *not* the case then it makes a whole class of solutions to
 > synchronization problems fail and puts my larger project
 > in serious question.
 >
 > On Thu, Jul 7, 2011 at 3:43 PM, Yang  wrote:
 >>
 >> no , the memtable is a concurrentskiplistmap
 >>
 >> insertion can happen in parallel
 >>
 >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" 
 wrote:
 >> > This has me more confused.
 >> >
 >> > Does this mean that ALL rows on a given node are only updated
 >> > sequentially,
 >> > never in parallel?
 >> >
 >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang 
 wrote:
 >> >
 >> >> just to add onto what jonathan said
 >> >>
 >> >> the columns are immutable . if u overwrite/ reconcile a new obj is
 >> >> created and shoved into the memtable
 >> >>
 >> >> there is a shared lock for all writes though which guard against
 an
 >> >> exclusive lock on memtable switching/flushing
 >> >> On Jul 7, 2011 7:51 AM, "A J"  wrote:
 >> >> > Does a write lock:
 >> >> > 1. Just the columns in question for the specific row in question
 ?
 >> >> > 2. The full row in question ?
 >> >> > 3. The full CF ?
 >> >> >
 >> >> > I doubt read does any locks.
 >> >> >
 >> >> > Thanks.
 >> >>
 >> >
 >> >
 >> >
 >> > --
 >> > It's always darkest just before you are eaten by a grue.
 >
 >
 >
 > --
 > It's always darkest just before you are eaten by a grue.
>

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-08 Thread A J

I think node repair involves some compaction too. See the issue:
https://issues.apache.org/jira/browse/CASSANDRA-2811
It talks of 'validation compaction' being triggered concurrently
during node repair.

On Thu, Jun 30, 2011 at 8:51 PM, Watanabe Maki  wrote:
> Repair doesn't compact. Those are different processes already.
>
> maki
>
>
> On 2011/07/01, at 7:21, A J  wrote:
>
>> Thanks all !
>> In other words, I think it is safe to say that a node as a whole can
>> be made consistent only on 'nodetool repair'.
>>
>> Has there been enough interest in providing anti-entropy without
>> compaction as a separate operation (nodetool repair does both) ?
>>
>>
>> On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis  wrote:
>>> On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo  
>>> wrote:
 Read repair does NOT repair tombstones.
>>>
>>> It does, but you can't rely on RR to repair _all_ tombstones, because
>>> RR only happens if the row in question is requested by a client.
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>

Re: What does a write lock ?

2011-07-08 Thread Jonathan Ellis

It doesn't look like that at all.

Row A exists.

Client submits mutation Am.  This is not necessarily a full row.

Coordinator validates Am.

If validation succeeds, coordinator sends Am to the replica owners,
effectively creating A'.

Neither A nor A' is ever explicitly assembled on the write path.

On Fri, Jul 8, 2011 at 9:22 AM, Jeffrey Kesselman  wrote:
> Not quite, its more limited and specific
> The order of operations is all within the Cassandra node server and looks
> like this this...
> We have one row, A.  Thats the only row being operated on.
> Client -> submits A'
> Server does the following:
> (1) Validate function reads current A
> (2) Validate function validates A' vs. A
> (3) If validation succeeds, allows update to A'.
> My fear/concern is that after 1 and before 3, a second update to A'' comes
> in and changes the "current" value of A, therefor invalidating my
> validation check, see?
> If Cassandra does not guard against this then one possible solution would be
> to make my own key-to-mutex map in memory, lock the mutex for A's key as a
> precursor to (1) and release it in a post-update function.  But I am always
> very nervous about inserting locking into a process that wasn't designed
> with it already in mind...
>
> On Fri, Jul 8, 2011 at 8:30 AM, William Oberman 
> wrote:
>>
>> Questions like this seem to come up a lot:
>>
>> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no
>>
>> http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily
>> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html
>> Lets say you read state A (from one key in one CF), you change the data to
>> A' in your client, and you write A'.  Are you worried that someone else
>> might have changed A to B during this process (making the "new" state a race
>> between A' and B)?  It doesn't sound to me like you are...  It sounds to me
>> like you're worried about a set of columns for the key being in a consistent
>> state before, during, and after a process.  And A -> A' and A -> B will each
>> be atomic for the key (based on my understanding).  But, if A' and B are
>> changes to a different set of columns, I believe that would interleave,
>> which itself could be "inconsistent" from your application's point of view.
>>
>> will
>>
>> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman 
>> wrote:
>>>
>>> Really, as i lay in the bath thinking nabout it, I concluded what I am
>>> looking for is a very limited form of Consistency.
>>> Its consistency over a single row on a single node just for the period of
>>> update.
>>>
>>> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman 
>>> wrote:

 Its not really isolation, btw, because we
 arent talking about anyone seeing an update mid-update.    Rather, we
 are talking about when updates are allowed to occur.
 Atomicity means that all the updates happen together or they don't
 happen at all.
 Isolation means that no results of the update are visible until the
 entire update operation is complete.
 This really lies somewhere in the middle of the two concepts.   Its part
 of the results of the combined effects of ACID


 On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis 
 wrote:
>
> Sounds to me like you're confusing atomicity with isolation.
>
> On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman 
> wrote:
> > Yup, im even more confused.    Lets talk about the model, not the
> > implementation.
> > AIUI updates to a row are atomic across all columns in that row at
> > once,
> > true?
> > If true then the next question is, does the validation happen inside
> > or
> > outside of that guarantee, and is the row guaranteed not to change
> > between
> > validation and update?
> > If that is *not* the case then it makes a whole class of solutions to
> > synchronization problems fail and puts my larger project
> > in serious question.
> >
> > On Thu, Jul 7, 2011 at 3:43 PM, Yang  wrote:
> >>
> >> no , the memtable is a concurrentskiplistmap
> >>
> >> insertion can happen in parallel
> >>
> >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" 
> >> wrote:
> >> > This has me more confused.
> >> >
> >> > Does this mean that ALL rows on a given node are only updated
> >> > sequentially,
> >> > never in parallel?
> >> >
> >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang 
> >> > wrote:
> >> >
> >> >> just to add onto what jonathan said
> >> >>
> >> >> the columns are immutable . if u overwrite/ reconcile a new obj
> >> >> is
> >> >> created and shoved into the memtable
> >> >>
> >> >> there is a shared lock for all writes though which guard against
> >> >> an
> >> >> exclusive lock on memtable switching/flushing
> >> >> On Jul 7, 2011 7:51 A

Re: What does a write lock ?

2011-07-08 Thread William Oberman

I think you need to look into Zookeeper, or other distributed coordinator,
as you have little/no guarantees from cassandra between 1-3 (in terms of the
guarantees you want and need).

And my terminology in my post is different than yours.  My "client" == your
"server".  Specifically, I was thinking in terms of:
user -> cassandra client code (that runs on a "server") -> cassandra server
code (e.g. cassandra itself) that runs either on the same or different
server


On Fri, Jul 8, 2011 at 10:22 AM, Jeffrey Kesselman  wrote:

> Not quite, its more limited and specific
>
> The order of operations is all within the Cassandra node server and looks
> like this this...
>
> We have one row, A.  Thats the only row being operated on.
>
> Client -> submits A'
> Server does the following:
> (1) Validate function reads current A
> (2) Validate function validates A' vs. A
> (3) If validation succeeds, allows update to A'.
>
> My fear/concern is that after 1 and before 3, a second update to A'' comes
> in and changes the "current" value of A, therefor invalidating my
> validation check, see?
>
> If Cassandra does not guard against this then one possible
> solution would be to make my own key-to-mutex map in memory, lock the mutex
> for A's key as a precursor to (1) and release it in a post-update function.
>  But I am always very nervous about inserting locking into a process that
> wasn't designed with it already in mind...
>
>
> On Fri, Jul 8, 2011 at 8:30 AM, William Oberman 
> wrote:
>
>> Questions like this seem to come up a lot:
>>
>> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no
>>
>> http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily
>> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html
>>
>> Lets say you read state A (from one key in one CF), you change the data to
>> A' in your client, and you write A'.  Are you worried that someone else
>> might have changed A to B during this process (making the "new" state a race
>> between A' and B)?  It doesn't sound to me like you are...  It sounds to me
>> like you're worried about a set of columns for the key being in a consistent
>> state before, during, and after a process.  And A -> A' and A -> B will each
>> be atomic for the key (based on my understanding).  But, if A' and B are
>> changes to a different set of columns, I believe that would interleave,
>> which itself could be "inconsistent" from your application's point of view.
>>
>>
>> will
>>
>> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman wrote:
>>
>>> Really, as i lay in the bath thinking nabout it, I concluded what I am
>>> looking for is a very limited form of Consistency.
>>>
>>> Its consistency over a single row on a single node just for the period of
>>> update.
>>>
>>>
>>> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote:
>>>
 Its not really isolation, btw, because we
 arent talking about anyone seeing an update mid-update.Rather, we
 are talking about when updates are allowed to occur.

 Atomicity means that all the updates happen together or they don't
 happen at all.
 Isolation means that no results of the update are visible until the
 entire update operation is complete.

 This really lies somewhere in the middle of the two concepts.   Its part
 of the results of the combined effects of ACID


 On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote:

> Sounds to me like you're confusing atomicity with isolation.
>
> On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman 
> wrote:
> > Yup, im even more confused.Lets talk about the model, not the
> > implementation.
> > AIUI updates to a row are atomic across all columns in that row at
> once,
> > true?
> > If true then the next question is, does the validation happen inside
> or
> > outside of that guarantee, and is the row guaranteed not to change
> between
> > validation and update?
> > If that is *not* the case then it makes a whole class of solutions to
> > synchronization problems fail and puts my larger project
> > in serious question.
> >
> > On Thu, Jul 7, 2011 at 3:43 PM, Yang  wrote:
> >>
> >> no , the memtable is a concurrentskiplistmap
> >>
> >> insertion can happen in parallel
> >>
> >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" 
> wrote:
> >> > This has me more confused.
> >> >
> >> > Does this mean that ALL rows on a given node are only updated
> >> > sequentially,
> >> > never in parallel?
> >> >
> >> > On Thu, Jul 7, 2011 at 3:21 PM, Yang 
> wrote:
> >> >
> >> >> just to add onto what jonathan said
> >> >>
> >> >> the columns are immutable . if u overwrite/ reconcile a new obj
> is
> >> >> created and shoved into the memtable
> >> >>
> >> >> there

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-08 Thread Jonathan Ellis

that's an internal term meaning "background i/o," not sstable merging per se.

On Fri, Jul 8, 2011 at 9:24 AM, A J  wrote:
> I think node repair involves some compaction too. See the issue:
> https://issues.apache.org/jira/browse/CASSANDRA-2811
> It talks of 'validation compaction' being triggered concurrently
> during node repair.
>
> On Thu, Jun 30, 2011 at 8:51 PM, Watanabe Maki  
> wrote:
>> Repair doesn't compact. Those are different processes already.
>>
>> maki
>>
>>
>> On 2011/07/01, at 7:21, A J  wrote:
>>
>>> Thanks all !
>>> In other words, I think it is safe to say that a node as a whole can
>>> be made consistent only on 'nodetool repair'.
>>>
>>> Has there been enough interest in providing anti-entropy without
>>> compaction as a separate operation (nodetool repair does both) ?
>>>
>>>
>>> On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis  wrote:
 On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo  
 wrote:
> Read repair does NOT repair tombstones.

 It does, but you can't rely on RR to repair _all_ tombstones, because
 RR only happens if the row in question is requested by a client.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: What does a write lock ?

2011-07-08 Thread William Oberman

Also, one point of early confusion for me is there is a slightly different
definition of "atomicity" depending on if your talking software vs.
database, and I'm a "software guy".  From wikipedia:

Software = Atomicity is a guarantee of isolation from concurrent processes.
Additionally, atomic operations commonly have a succeed-or-fail definition —
they either successfully change the state of the system, or have no visible
effect.

Database = In an atomic transaction, a series of database operations either
all occur, or nothing occurs.

I believe that cassandra is using the database definition.

will


On Fri, Jul 8, 2011 at 10:35 AM, William Oberman
wrote:

> I think you need to look into Zookeeper, or other distributed coordinator,
> as you have little/no guarantees from cassandra between 1-3 (in terms of the
> guarantees you want and need).
>
> And my terminology in my post is different than yours.  My "client" == your
> "server".  Specifically, I was thinking in terms of:
> user -> cassandra client code (that runs on a "server") -> cassandra server
> code (e.g. cassandra itself) that runs either on the same or different
> server
>
>
>
> On Fri, Jul 8, 2011 at 10:22 AM, Jeffrey Kesselman wrote:
>
>> Not quite, its more limited and specific
>>
>> The order of operations is all within the Cassandra node server and looks
>> like this this...
>>
>> We have one row, A.  Thats the only row being operated on.
>>
>> Client -> submits A'
>> Server does the following:
>> (1) Validate function reads current A
>> (2) Validate function validates A' vs. A
>> (3) If validation succeeds, allows update to A'.
>>
>> My fear/concern is that after 1 and before 3, a second update to A'' comes
>> in and changes the "current" value of A, therefor invalidating my
>> validation check, see?
>>
>> If Cassandra does not guard against this then one possible
>> solution would be to make my own key-to-mutex map in memory, lock the mutex
>> for A's key as a precursor to (1) and release it in a post-update function.
>>  But I am always very nervous about inserting locking into a process that
>> wasn't designed with it already in mind...
>>
>>
>> On Fri, Jul 8, 2011 at 8:30 AM, William Oberman > > wrote:
>>
>>> Questions like this seem to come up a lot:
>>>
>>> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no
>>>
>>> http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily
>>> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html
>>>
>>> Lets say you read state A (from one key in one CF), you change the data
>>> to A' in your client, and you write A'.  Are you worried that someone else
>>> might have changed A to B during this process (making the "new" state a race
>>> between A' and B)?  It doesn't sound to me like you are...  It sounds to me
>>> like you're worried about a set of columns for the key being in a consistent
>>> state before, during, and after a process.  And A -> A' and A -> B will each
>>> be atomic for the key (based on my understanding).  But, if A' and B are
>>> changes to a different set of columns, I believe that would interleave,
>>> which itself could be "inconsistent" from your application's point of view.
>>>
>>>
>>> will
>>>
>>> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman wrote:
>>>
 Really, as i lay in the bath thinking nabout it, I concluded what I am
 looking for is a very limited form of Consistency.

 Its consistency over a single row on a single node just for the period
 of update.


 On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman wrote:

> Its not really isolation, btw, because we
> arent talking about anyone seeing an update mid-update.Rather, we
> are talking about when updates are allowed to occur.
>
> Atomicity means that all the updates happen together or they don't
> happen at all.
> Isolation means that no results of the update are visible until the
> entire update operation is complete.
>
> This really lies somewhere in the middle of the two concepts.   Its
> part of the results of the combined effects of ACID
>
>
> On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis wrote:
>
>> Sounds to me like you're confusing atomicity with isolation.
>>
>> On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman 
>> wrote:
>> > Yup, im even more confused.Lets talk about the model, not the
>> > implementation.
>> > AIUI updates to a row are atomic across all columns in that row at
>> once,
>> > true?
>> > If true then the next question is, does the validation happen inside
>> or
>> > outside of that guarantee, and is the row guaranteed not to change
>> between
>> > validation and update?
>> > If that is *not* the case then it makes a whole class
>> of solutions to
>> > synchronization problems fail and put

Re: 'select * from ' - FTS or Index

2011-07-08 Thread Eric Evans

On Thu, 2011-07-07 at 19:34 -0400, A J wrote:
> Does a 'select * from '  with no filter still use the primary
> index on the key or do a 'full table scan' ? 

It's the equivalent of a range slice with no starting or ending keys,
and no starting or ending columns.  Indexing cannot save you; This is an
expensive query.

-- 
Eric Evans
eev...@rackspace.com

Re: What does a write lock ?

2011-07-08 Thread Jeffrey Kesselman

I am confused by what you mean by "Cassandra client code."  Is this part of
the Cassnadra server?

My architecture is my "user" talks thrift to Cassandra.

Re: What does a write lock ?

2011-07-08 Thread Jeffrey Kesselman

Where does a custom validation method run?

Given that it is validating a row update, my assumption was that it ran on
the node that "owns" the row.  That would make sense to me as it would
fulfill the NoSql philosophy of taking computation to data, rather then data
to computation.

I don't follow the relevance of the rest of your comment, sorry.

On Fri, Jul 8, 2011 at 10:34 AM, Jonathan Ellis  wrote:

> It doesn't look like that at all.
>
> Row A exists.
>
> Client submits mutation Am.  This is not necessarily a full row.
>
> Coordinator validates Am.
>
> If validation succeeds, coordinator sends Am to the replica owners,
> effectively creating A'.
>
> Neither A nor A' is ever explicitly assembled on the write path.
>
> On Fri, Jul 8, 2011 at 9:22 AM, Jeffrey Kesselman 
> wrote:
> > Not quite, its more limited and specific
> > The order of operations is all within the Cassandra node server and looks
> > like this this...
> > We have one row, A.  Thats the only row being operated on.
> > Client -> submits A'
> > Server does the following:
> > (1) Validate function reads current A
> > (2) Validate function validates A' vs. A
> > (3) If validation succeeds, allows update to A'.
> > My fear/concern is that after 1 and before 3, a second update to A''
> comes
> > in and changes the "current" value of A, therefor invalidating my
> > validation check, see?
> > If Cassandra does not guard against this then one possible
> solution would be
> > to make my own key-to-mutex map in memory, lock the mutex for A's key as
> a
> > precursor to (1) and release it in a post-update function.  But I am
> always
> > very nervous about inserting locking into a process that wasn't designed
> > with it already in mind...
> >
> > On Fri, Jul 8, 2011 at 8:30 AM, William Oberman <
> ober...@civicscience.com>
> > wrote:
> >>
> >> Questions like this seem to come up a lot:
> >>
> >>
> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no
> >>
> >>
> http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily
> >> http://www.mail-archive.com/user@cassandra.apache.org/msg14701.html
> >> Lets say you read state A (from one key in one CF), you change the data
> to
> >> A' in your client, and you write A'.  Are you worried that someone else
> >> might have changed A to B during this process (making the "new" state a
> race
> >> between A' and B)?  It doesn't sound to me like you are...  It sounds to
> me
> >> like you're worried about a set of columns for the key being in a
> consistent
> >> state before, during, and after a process.  And A -> A' and A -> B will
> each
> >> be atomic for the key (based on my understanding).  But, if A' and B are
> >> changes to a different set of columns, I believe that would interleave,
> >> which itself could be "inconsistent" from your application's point of
> view.
> >>
> >> will
> >>
> >> On Thu, Jul 7, 2011 at 11:41 PM, Jeffrey Kesselman 
> >> wrote:
> >>>
> >>> Really, as i lay in the bath thinking nabout it, I concluded what I am
> >>> looking for is a very limited form of Consistency.
> >>> Its consistency over a single row on a single node just for the period
> of
> >>> update.
> >>>
> >>> On Thu, Jul 7, 2011 at 10:34 PM, Jeffrey Kesselman 
> >>> wrote:
> 
>  Its not really isolation, btw, because we
>  arent talking about anyone seeing an update mid-update.Rather, we
>  are talking about when updates are allowed to occur.
>  Atomicity means that all the updates happen together or they don't
>  happen at all.
>  Isolation means that no results of the update are visible until the
>  entire update operation is complete.
>  This really lies somewhere in the middle of the two concepts.   Its
> part
>  of the results of the combined effects of ACID
> 
> 
>  On Thu, Jul 7, 2011 at 10:27 PM, Jonathan Ellis 
>  wrote:
> >
> > Sounds to me like you're confusing atomicity with isolation.
> >
> > On Thu, Jul 7, 2011 at 2:54 PM, Jeffrey Kesselman 
> > wrote:
> > > Yup, im even more confused.Lets talk about the model, not the
> > > implementation.
> > > AIUI updates to a row are atomic across all columns in that row at
> > > once,
> > > true?
> > > If true then the next question is, does the validation happen
> inside
> > > or
> > > outside of that guarantee, and is the row guaranteed not to change
> > > between
> > > validation and update?
> > > If that is *not* the case then it makes a whole class
> of solutions to
> > > synchronization problems fail and puts my larger project
> > > in serious question.
> > >
> > > On Thu, Jul 7, 2011 at 3:43 PM, Yang 
> wrote:
> > >>
> > >> no , the memtable is a concurrentskiplistmap
> > >>
> > >> insertion can happen in parallel
> > >>
> > >> On Jul 7, 2011 9:24 AM, "Jeffrey Kesselman" 
> >>>

Re: What does a write lock ?

2011-07-08 Thread William Oberman

I use a language specific wrapper around thrift as my "client", but yes, I
guess I fundamentally mean thrift == client, and the cassandra server ==
server.

will

On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman  wrote:

> I am confused by what you mean by "Cassandra client code."  Is this part of
> the Cassnadra server?
>
> My architecture is my "user" talks thrift to Cassandra.
>
>
>

Re: What does a write lock ?

2011-07-08 Thread Jeffrey Kesselman

Alright,

So are you saying the column validator, as specified
by conf/storage-conf.xml is checked in the client interface library and not
on the server side?  That seems odd to me on a number of levels, not the
least being I cant see how thrift could autogenerate that
for different languages or how those other languages would use a Java class.
*
*
On Fri, Jul 8, 2011 at 11:13 AM, William Oberman
wrote:

> I use a language specific wrapper around thrift as my "client", but yes, I
> guess I fundamentally mean thrift == client, and the cassandra server ==
> server.
>
> will
>
>
> On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman wrote:
>
>> I am confused by what you mean by "Cassandra client code."  Is this part
>> of the Cassnadra server?
>>
>> My architecture is my "user" talks thrift to Cassandra.
>>
>>
>>
>

-- 
It's always darkest just before you are eaten by a grue.

Re: What does a write lock ?

2011-07-08 Thread William Oberman

I haven't ever written my own org.apache.cassandra.db.marshal.AbstractType
(which is I think what your talking about), so I have no idea.

Looking up the JavaDoc for that class, validate says "validate that the byte
array is a valid sequence for the type we are supposed to be comparing",
which sounds like a local operation to me (e.g. it shouldn't fetch remote
data, it's just saying "yep, this is a valid member of type T").

will

On Fri, Jul 8, 2011 at 11:17 AM, Jeffrey Kesselman  wrote:

> Alright,
>
> So are you saying the column validator, as specified
> by conf/storage-conf.xml is checked in the client interface library and not
> on the server side?  That seems odd to me on a number of levels, not the
> least being I cant see how thrift could autogenerate that
> for different languages or how those other languages would use a Java class.
> *
> *
> On Fri, Jul 8, 2011 at 11:13 AM, William Oberman  > wrote:
>
>> I use a language specific wrapper around thrift as my "client", but yes, I
>> guess I fundamentally mean thrift == client, and the cassandra server ==
>> server.
>>
>> will
>>
>>
>> On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman wrote:
>>
>>> I am confused by what you mean by "Cassandra client code."  Is this part
>>> of the Cassnadra server?
>>>
>>> My architecture is my "user" talks thrift to Cassandra.
>>>
>>>
>>>
>>
>
>
> --
> It's always darkest just before you are eaten by a grue.
>

how large cassandra could scale when it need to do manual operation?

2011-07-08 Thread Yan Chunlu

hi, all:
I am curious about how large that Cassandra can scale?

from the information I can get, the largest usage is at facebook, which is
about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop,
and yahoo even using 4000 nodes of Hadoop.

I am not understand why is the situation, I only have  little knowledge with
Cassandra and even no knowledge with Hadoop.



currently I am using cassandra with 3 nodes and having problem bring one
back after it out of sync, the problems I encountered making me worry about
how cassandra could scale out:

1):  the load balance need to manually performed on every node, according
to:

def tokens(nodes):

for x in xrange(nodes):

print 2 ** 127 / nodes * x



2): when adding new nodes, need to perform node repair and cleanup on every
node



3) when decommission a node, there is a chance that slow down the entire
cluster. (not sure why but I saw people ask around about it.) and the only
way to do is shutdown the entire the cluster, rsync the data, and start all
nodes without the decommission one.





after all, I think there is alot of human work to do to maintain the cluster
which make it impossible to scale to thousands of nodes, but I hope I am
totally wrong about all of this, currently I am serving 1 millions pv every
day with Cassandra and it make me feel unsafe, I am afraid one day one node
crash will cause the data broken and all cluster goes wrong



in the contrary, relational database make me feel safety but it does not
scale well.



thanks for any guidance here.

Re: What does a write lock ?

2011-07-08 Thread Nate McCall

Validation occurs at the API level, returning an
InvalidRequestException to the caller of the API (a thrift client in
this case). Specifically, a mutation will not be scheduled for the
storage until it has been validated at the API level.

If the intention is to do a read-before-write validation as an
AbstractType extension, then yes, the underlying value could indeed
change between validation and storage. If this were the goal, you need
to implement locking externally (via zookeper or similar as previously
mentioned).

On Fri, Jul 8, 2011 at 10:21 AM, William Oberman
 wrote:
> I haven't ever written my own org.apache.cassandra.db.marshal.AbstractType
> (which is I think what your talking about), so I have no idea.
>
> Looking up the JavaDoc for that class, validate says "validate that the byte
> array is a valid sequence for the type we are supposed to be comparing",
> which sounds like a local operation to me (e.g. it shouldn't fetch remote
> data, it's just saying "yep, this is a valid member of type T").
>
> will
>
> On Fri, Jul 8, 2011 at 11:17 AM, Jeffrey Kesselman  wrote:
>>
>> Alright,
>> So are you saying the column validator, as specified
>> by conf/storage-conf.xml is checked in the client interface library and not
>> on the server side?  That seems odd to me on a number of levels, not the
>> least being I cant see how thrift could autogenerate that
>> for different languages or how those other languages would use a Java class.
>>
>> On Fri, Jul 8, 2011 at 11:13 AM, William Oberman
>>  wrote:
>>>
>>> I use a language specific wrapper around thrift as my "client", but yes,
>>> I guess I fundamentally mean thrift == client, and the cassandra server ==
>>> server.
>>>
>>> will
>>>
>>> On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman 
>>> wrote:

 I am confused by what you mean by "Cassandra client code."  Is this part
 of the Cassnadra server?
 My architecture is my "user" talks thrift to Cassandra.

>>>
>>
>>
>>
>> --
>> It's always darkest just before you are eaten by a grue.
>
>
>

Re: What does a write lock ?

2011-07-08 Thread Jeffrey Kesselman

Hmm.

Thanks Nate.

I need to think about this and our data store design some.  In general I
dislike architecture with large numbers of independent servers, I think it
invites communication latencies and partial failures into the mix.  But I'll
cogitate some.

On Fri, Jul 8, 2011 at 12:21 PM, Nate McCall  wrote:

> Validation occurs at the API level, returning an
> InvalidRequestException to the caller of the API (a thrift client in
> this case). Specifically, a mutation will not be scheduled for the
> storage until it has been validated at the API level.
>
> If the intention is to do a read-before-write validation as an
> AbstractType extension, then yes, the underlying value could indeed
> change between validation and storage. If this were the goal, you need
> to implement locking externally (via zookeper or similar as previously
> mentioned).
>
> On Fri, Jul 8, 2011 at 10:21 AM, William Oberman
>  wrote:
> > I haven't ever written my own
> org.apache.cassandra.db.marshal.AbstractType
> > (which is I think what your talking about), so I have no idea.
> >
> > Looking up the JavaDoc for that class, validate says "validate that the
> byte
> > array is a valid sequence for the type we are supposed to be comparing",
> > which sounds like a local operation to me (e.g. it shouldn't fetch remote
> > data, it's just saying "yep, this is a valid member of type T").
> >
> > will
> >
> > On Fri, Jul 8, 2011 at 11:17 AM, Jeffrey Kesselman 
> wrote:
> >>
> >> Alright,
> >> So are you saying the column validator, as specified
> >> by conf/storage-conf.xml is checked in the client interface library and
> not
> >> on the server side?  That seems odd to me on a number of levels, not the
> >> least being I cant see how thrift could autogenerate that
> >> for different languages or how those other languages would use a Java
> class.
> >>
> >> On Fri, Jul 8, 2011 at 11:13 AM, William Oberman
> >>  wrote:
> >>>
> >>> I use a language specific wrapper around thrift as my "client", but
> yes,
> >>> I guess I fundamentally mean thrift == client, and the cassandra server
> ==
> >>> server.
> >>>
> >>> will
> >>>
> >>> On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman 
> >>> wrote:
> 
>  I am confused by what you mean by "Cassandra client code."  Is this
> part
>  of the Cassnadra server?
>  My architecture is my "user" talks thrift to Cassandra.
> 
> >>>
> >>
> >>
> >>
> >> --
> >> It's always darkest just before you are eaten by a grue.
> >
> >
> >
>



-- 
It's always darkest just before you are eaten by a grue.

Re: What does a write lock ?

2011-07-08 Thread Jeffrey Kesselman

I should add, Nate, that the intention is to do a read before write
validation and have that occur as close to the data as possible.

if there is a better hook to implement it on I'd love a pointer to it.

JK

On Fri, Jul 8, 2011 at 12:21 PM, Nate McCall  wrote:

> Validation occurs at the API level, returning an
> InvalidRequestException to the caller of the API (a thrift client in
> this case). Specifically, a mutation will not be scheduled for the
> storage until it has been validated at the API level.
>
> If the intention is to do a read-before-write validation as an
> AbstractType extension, then yes, the underlying value could indeed
> change between validation and storage. If this were the goal, you need
> to implement locking externally (via zookeper or similar as previously
> mentioned).
>
> On Fri, Jul 8, 2011 at 10:21 AM, William Oberman
>  wrote:
> > I haven't ever written my own
> org.apache.cassandra.db.marshal.AbstractType
> > (which is I think what your talking about), so I have no idea.
> >
> > Looking up the JavaDoc for that class, validate says "validate that the
> byte
> > array is a valid sequence for the type we are supposed to be comparing",
> > which sounds like a local operation to me (e.g. it shouldn't fetch remote
> > data, it's just saying "yep, this is a valid member of type T").
> >
> > will
> >
> > On Fri, Jul 8, 2011 at 11:17 AM, Jeffrey Kesselman 
> wrote:
> >>
> >> Alright,
> >> So are you saying the column validator, as specified
> >> by conf/storage-conf.xml is checked in the client interface library and
> not
> >> on the server side?  That seems odd to me on a number of levels, not the
> >> least being I cant see how thrift could autogenerate that
> >> for different languages or how those other languages would use a Java
> class.
> >>
> >> On Fri, Jul 8, 2011 at 11:13 AM, William Oberman
> >>  wrote:
> >>>
> >>> I use a language specific wrapper around thrift as my "client", but
> yes,
> >>> I guess I fundamentally mean thrift == client, and the cassandra server
> ==
> >>> server.
> >>>
> >>> will
> >>>
> >>> On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman 
> >>> wrote:
> 
>  I am confused by what you mean by "Cassandra client code."  Is this
> part
>  of the Cassnadra server?
>  My architecture is my "user" talks thrift to Cassandra.
> 
> >>>
> >>
> >>
> >>
> >> --
> >> It's always darkest just before you are eaten by a grue.
> >
> >
> >
>



-- 
It's always darkest just before you are eaten by a grue.

Corrupted data

2011-07-08 Thread Héctor Izquierdo Seliva

Hi everyone,

I'm having thousands of these errors:

 WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
CompactionManager.java (line 737) Non-fatal error reading row
(stacktrace follows)
java.io.IOError: java.io.IOException: Impossible row size
6292724931198053
at
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719)
at
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633)
at org.apache.cassandra.db.compaction.CompactionManager.access
$600(CompactionManager.java:65)
at org.apache.cassandra.db.compaction.CompactionManager
$3.call(CompactionManager.java:250)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Impossible row size 6292724931198053
... 9 more
 INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705
CompactionManager.java (line 743) Retrying from row index; data is -8
bytes starting at 4735525245
 WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
CompactionManager.java (line 767) Retry failed too.  Skipping to next
row (retry's stacktrace follows)
java.io.IOError: java.io.EOFException: bloom filter claims to be
863794556 bytes, longer than entire row size -8


THis is during scrub, as I saw similar errors while in normal operation.
Is there anything I can do? It looks like I'm going to lose a ton of
data

Pre-CassandraSF Happy Hour on Sunday

2011-07-08 Thread Richard Low

Hi all,

If you're in San Francisco for CassandraSF on Monday 11th, then come
and join fellow Cassandra users and committers on Sunday evening.
Starting at 6:30pm at ThirstyBear, the famous brewing company.  We'll
have drinks, food and more.

RSVP at Eventbrite: http://pre-cassandrasf-happyhour.eventbrite.com/

Hope you can join us!

-- 
Richard Low
Acunu | http://www.acunu.com | @acunu

Re: Pig pulling an older value from cassandra

2011-07-08 Thread aaron morton

Jeremy did you get anywhere with this ? 

If you are reading at CL ONE Read Repair will run in the background, so it may 
only be visible to subsequent reads. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6 Jul 2011, at 20:52, Jeremy Hanna wrote:

> I'm seeing some strange behavior and not sure how it is possible.  We updated 
> some data using a pig script and that wrote back to cassandra.  We get the 
> value and list the value on the Cassandra CLI and it's the updated value - 
> from MARKET to market.  However, when doing a pig script to filter by the 
> known good values, we are left with about 42k rows that still have MARKET.  
> If we list a subset of them, get the key, and get/list them on the CLI, they 
> are lowercase market. 
> 
> Anyone have any suggestions as to how this might be possible?  Our read 
> repair chance is set to 1.0. 
> 
> Jeremy

Re: Re : result sorted by keys in reversed

2011-07-08 Thread aaron morton

> Is it possible to have same results sorting in reversed by another method 
> without get_range_slice in JAVA ?

Sorry I don't understand your question.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7 Jul 2011, at 01:56, Monnom Monprenom wrote:

> Thanks,
> 
> Is it possible to have same results sorting in reversed by another method 
> without get_range_slice in JAVA ?
> 
> De : Aaron Morton 
> À : "user@cassandra.apache.org" 
> Envoyé le : Jeudi 7 Juillet 2011 2h52
> Objet : Re: result sorted by keys in reversed
> 
> It's not currently supported via the api. But I *think* it's technically 
> possible, the code could  page backwards using the index sampling the same 
> way it does for columns. 
> 
> Best advice is to raise a ticket on 
> https://issues.apache.org/jira/browse/CASSANDRA (maybe do a search first, 
> someone else may have requested it)
> 
> Cheers
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 7/07/2011, at 1:39 AM, Monnom Monprenom  wrote:
> 
>> Hi,
>> 
>> I am using get_range_slice and I get the results sorted by keys, Is it 
>> possible to have the results also sorted by keys but in reverse (from the 
>> biggest to the smallest)?
> 
>

Re: Pig pulling an older value from cassandra

2011-07-08 Thread Jeremy Hanna

Not yet - we've updated the CassandraStorage with a patch we've done for 
CASSANDRA-2869 to see if that might indirectly do something to the inputs, but 
not sure it would affect that part of it.

The hadoop default in ConfigHelper is CL ONE.  I need to do some more focused 
study of that data.  We just have multiple things we're trying to get working 
properly so I haven't had a chance yet.  For example, we have checked all the 
possible scripts to make sure we're not introducing more of those, but haven't 
looked at the dates for those that we're seeing through pig to see when those 
were added.  Things like that.

Thanks for the response and I'll update this thread when we find out more.

On Jul 8, 2011, at 3:30 PM, aaron morton wrote:

> Jeremy did you get anywhere with this ? 
> 
> If you are reading at CL ONE Read Repair will run in the background, so it 
> may only be visible to subsequent reads. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 6 Jul 2011, at 20:52, Jeremy Hanna wrote:
> 
>> I'm seeing some strange behavior and not sure how it is possible.  We 
>> updated some data using a pig script and that wrote back to cassandra.  We 
>> get the value and list the value on the Cassandra CLI and it's the updated 
>> value - from MARKET to market.  However, when doing a pig script to filter 
>> by the known good values, we are left with about 42k rows that still have 
>> MARKET.  If we list a subset of them, get the key, and get/list them on the 
>> CLI, they are lowercase market. 
>> 
>> Anyone have any suggestions as to how this might be possible?  Our read 
>> repair chance is set to 1.0. 
>> 
>> Jeremy
>

Re: List nodes where write was applied to

2011-07-08 Thread aaron morton

The logs will give you some idea, but it's not information that is available as 
part of a request. 

Turn the logging up to DEBUG and watch what happens. You will see the 
coordinator log where it is sending messages together with some unique 
identifiers that you will also see logged on the replicas.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7 Jul 2011, at 10:01, A J wrote:

> Is there a way to find what all nodes was a write applied to ? It
> could be a successful write (i.e. w was met) or unsuccessful write
> (i.e. less than w nodes were met). In either case, I am interested in
> finding:
> Number of nodes written to (before timeout or on success)
> Name of nodes written to (before timeout or on success)
> 
> Thanks.

Re: how large cassandra could scale when it need to do manual operation?

2011-07-08 Thread aaron morton

AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time ago. 
Twitter is a vocal supporter with a large Apache Cassandra install, e.g. 
"Twitter currently runs a couple hundred Cassandra nodes across a half dozen 
clusters. " 
http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011

If you are working with a 3 node cluster removing/rebuilding/what ever one node 
will effect 33% of your capacity. When you scale up the contribution from each 
individual node goes down, and the impact of one node going down is less. 
Problems that happen with a few nodes will go away at scale, to be replaced by 
a whole set of new ones.   

> 1):  the load balance need to manually performed on every node, according to: 

Yes

> 2): when adding new nodes, need to perform node repair and cleanup on every 
> node 
You only need to run cleanup, see 
http://wiki.apache.org/cassandra/Operations#Bootstrap

> 3) when decommission a node, there is a chance that slow down the entire 
> cluster. (not sure why but I saw people ask around about it.) and the only 
> way to do is shutdown the entire the cluster, rsync the data, and start all 
> nodes without the decommission one. 

I cannot remember any specific cases where decommission requires a full cluster 
stop, do you have a link? With regard to slowing down, the decommission process 
will stream data from the node you are removing onto the other nodes this can 
slow down the target node (I think it's more intelligent now about what is 
moved). This will be exaggerated in a 3 node cluster as you are removing 33% of 
the processing and adding some (temporary) extra load to the remaining nodes. 

> after all, I think there is alot of human work to do to maintain the cluster 
> which make it impossible to scale to thousands of nodes, 
Automation, Automation, Automation is the only way to go. 

Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, munin, 
ganglia etc for monitoring. And 
Ops Centre (http://www.datastax.com/products/opscenter) for cassandra specific 
management.

> I am totally wrong about all of this, currently I am serving 1 millions pv 
> every day with Cassandra and it make me feel unsafe, I am afraid one day one 
> node crash will cause the data broken and all cluster goes wrong
With RF3 and a 3Node cluster you have room to lose one node and the cluster 
will be up for 100% of the keys. While better than having to worry about *the* 
database server, it's still entry level fault tolerance. With RF 3 in a 6 Node 
cluster you can lose up to 2 nodes and still be up for 100% of the keys. 

Is there something you are specifically concerned about with your current 
installation ? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jul 2011, at 08:50, Yan Chunlu wrote:

> hi, all:
> I am curious about how large that Cassandra can scale? 
> 
> from the information I can get, the largest usage is at facebook, which is 
> about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop, 
> and yahoo even using 4000 nodes of Hadoop. 
> 
> I am not understand why is the situation, I only have  little knowledge with 
> Cassandra and even no knowledge with Hadoop. 
> 
> 
> 
> currently I am using cassandra with 3 nodes and having problem bring one back 
> after it out of sync, the problems I encountered making me worry about how 
> cassandra could scale out: 
> 
> 1):  the load balance need to manually performed on every node, according to: 
> 
> def tokens(nodes): 
> 
> for x in xrange(nodes): 
> 
> print 2 ** 127 / nodes * x 
> 
> 
> 
> 2): when adding new nodes, need to perform node repair and cleanup on every 
> node 
> 
> 
> 
> 3) when decommission a node, there is a chance that slow down the entire 
> cluster. (not sure why but I saw people ask around about it.) and the only 
> way to do is shutdown the entire the cluster, rsync the data, and start all 
> nodes without the decommission one. 
> 
> 
> 
> 
> 
> after all, I think there is alot of human work to do to maintain the cluster 
> which make it impossible to scale to thousands of nodes, but I hope I am 
> totally wrong about all of this, currently I am serving 1 millions pv every 
> day with Cassandra and it make me feel unsafe, I am afraid one day one node 
> crash will cause the data broken and all cluster goes wrong 
> 
> 
> 
> in the contrary, relational database make me feel safety but it does not 
> scale well. 
> 
> 
> 
> thanks for any guidance here.
>

Re: Corrupted data

2011-07-08 Thread aaron morton

You may not lose data. 

- What version and whats the upgrade history?
- What RF / node count / CL  ?
- Have you been running repair consistently ?
- Is this on a single node or all nodes ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jul 2011, at 09:38, Héctor Izquierdo Seliva wrote:

> Hi everyone,
> 
> I'm having thousands of these errors:
> 
> WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> CompactionManager.java (line 737) Non-fatal error reading row
> (stacktrace follows)
> java.io.IOError: java.io.IOException: Impossible row size
> 6292724931198053
>   at
> org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719)
>   at
> org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633)
>   at org.apache.cassandra.db.compaction.CompactionManager.access
> $600(CompactionManager.java:65)
>   at org.apache.cassandra.db.compaction.CompactionManager
> $3.call(CompactionManager.java:250)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at java.util.concurrent.ThreadPoolExecutor
> $Worker.runTask(ThreadPoolExecutor.java:886)
>   at java.util.concurrent.ThreadPoolExecutor
> $Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Impossible row size 6292724931198053
>   ... 9 more
> INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705
> CompactionManager.java (line 743) Retrying from row index; data is -8
> bytes starting at 4735525245
> WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> CompactionManager.java (line 767) Retry failed too.  Skipping to next
> row (retry's stacktrace follows)
> java.io.IOError: java.io.EOFException: bloom filter claims to be
> 863794556 bytes, longer than entire row size -8
> 
> 
> THis is during scrub, as I saw similar errors while in normal operation.
> Is there anything I can do? It looks like I'm going to lose a ton of
> data
>

Performance deterioration while building secondary index

2011-07-08 Thread Maxim Potekhin

I have roughly 150 million rows in my database, which will grow as I 
continue testing. I'm building an index on a particular column, via 
cassandra-cli, something of the sort:
update column family jobs with column_metadata = [{column_name : 'DATE', 
  validation_class : AsciiType, index_type : 0, index_name : 'date'}]


At this point, the cluster just becomes unresponsive -- just doing 
"list" on a CF takes a while. Random query test I used to run rather 
quickly, becomes terribly slow (hasn't returned since I started typing 
this).


Is that normal? I can' imagine this happening in a production situation, 
when I decided to add an index for some valid reasons. Really scratching 
my head now. TIA.


The version is 0.8.1

Thanks!

node stuck "leaving"

2011-07-08 Thread Casey Deccio

I've got a node that is stuck "Leaving" the ring.  Running "nodetool
decommission" never terminates.  It's been in this state for about a week,
and the load has not decreased:

$ nodetool -h localhost ring
Address DC  RackStatus State   Load
OwnsToken

Token(bytes[de4075d0a474c4a773efa2891c020529])
x.x.x.1   datacenter1 rack1   Up Leaving 150.63 GB   33.33%
Token(bytes[10956f12b46304bf70412ad0eac14344])
x.x.x.2   datacenter1 rack1   Up Normal  79.21 GB33.33%
Token(bytes[50af14df71eafac7bac60fbc836c6722])
x.x.x.3   datacenter1 rack1   Up Normal  60.74 GB33.33%
Token(bytes[de4075d0a474c4a773efa2891c020529])

Any ideas?

Regards,
Casey

Re: Performance deterioration while building secondary index

2011-07-08 Thread Jonathan Ellis

My guess: index build isn't respecting the background i/o throttle.

On Fri, Jul 8, 2011 at 5:55 PM, Maxim Potekhin  wrote:
> I have roughly 150 million rows in my database, which will grow as I
> continue testing. I'm building an index on a particular column, via
> cassandra-cli, something of the sort:
> update column family jobs with column_metadata = [{column_name : 'DATE',
>   validation_class : AsciiType, index_type : 0, index_name : 'date'}]
>
> At this point, the cluster just becomes unresponsive -- just doing "list" on
> a CF takes a while. Random query test I used to run rather quickly, becomes
> terribly slow (hasn't returned since I started typing this).
>
> Is that normal? I can' imagine this happening in a production situation,
> when I decided to add an index for some valid reasons. Really scratching my
> head now. TIA.
>
> The version is 0.8.1
>
> Thanks!
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Corrupted data

2011-07-08 Thread Héctor Izquierdo Seliva

Hi Aaron,

El vie, 08-07-2011 a las 14:47 -0700, aaron morton escribió:
> You may not lose data. 
> 
> - What version and whats the upgrade history?

all versions from 0.7.1 to 0.8.1. All cfs were in 0.8.1 format though

> - What RF / node count / CL  ?

RF=3, node count = 6
> - Have you been running repair consistently ?

Nop, only when something breaks

> - Is this on a single node or all nodes ?

A couple of nodes. Scrub told there were a few thousand of columns it
could not restore.
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 8 Jul 2011, at 09:38, Héctor Izquierdo Seliva wrote:
> 
> > Hi everyone,
> > 
> > I'm having thousands of these errors:
> > 
> > WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> > CompactionManager.java (line 737) Non-fatal error reading row
> > (stacktrace follows)
> > java.io.IOError: java.io.IOException: Impossible row size
> > 6292724931198053
> > at
> > org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719)
> > at
> > org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633)
> > at org.apache.cassandra.db.compaction.CompactionManager.access
> > $600(CompactionManager.java:65)
> > at org.apache.cassandra.db.compaction.CompactionManager
> > $3.call(CompactionManager.java:250)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at java.util.concurrent.ThreadPoolExecutor
> > $Worker.runTask(ThreadPoolExecutor.java:886)
> > at java.util.concurrent.ThreadPoolExecutor
> > $Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.IOException: Impossible row size 6292724931198053
> > ... 9 more
> > INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705
> > CompactionManager.java (line 743) Retrying from row index; data is -8
> > bytes starting at 4735525245
> > WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> > CompactionManager.java (line 767) Retry failed too.  Skipping to next
> > row (retry's stacktrace follows)
> > java.io.IOError: java.io.EOFException: bloom filter claims to be
> > 863794556 bytes, longer than entire row size -8
> > 
> > 
> > THis is during scrub, as I saw similar errors while in normal operation.
> > Is there anything I can do? It looks like I'm going to lose a ton of
> > data
> > 
>

Command Request: rename a column

Re: Command Request: rename a column

Re: Command Request: rename a column

Re: What does a write lock ?

Re: What does a write lock ?

Re: What does a write lock ?

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

Re: What does a write lock ?

Re: What does a write lock ?

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

Re: What does a write lock ?

Re: 'select * from ' - FTS or Index

Re: What does a write lock ?

Re: What does a write lock ?

Re: What does a write lock ?

Re: What does a write lock ?

Re: What does a write lock ?

how large cassandra could scale when it need to do manual operation?

Re: What does a write lock ?

Re: What does a write lock ?

Re: What does a write lock ?

Corrupted data

Pre-CassandraSF Happy Hour on Sunday

Re: Pig pulling an older value from cassandra

Re: Re : result sorted by keys in reversed

Re: Pig pulling an older value from cassandra

Re: List nodes where write was applied to

Re: how large cassandra could scale when it need to do manual operation?

Re: Corrupted data

Performance deterioration while building secondary index

node stuck "leaving"

Re: Performance deterioration while building secondary index

Re: Corrupted data

33 matches

Site Navigation

Mail list logo

Footer information