Re: Memcached protocol?

2010-04-05 Thread David Strauss
On 2010-04-05 03:42, Paul Prescod wrote:
> On Sun, Apr 4, 2010 at 5:06 PM, Benjamin Black  wrote:
>> ...
>>
>> Are you suggesting this would give you counter semantics?
> 
> Yes: My understanding of cassandra-580 is that it gives you increment
> and decrement which are the basis of counters.

There is a difference between Cassandra allowing inc/dec on values and
actually *knowing* the resultant value at the time of the write. It's
likely that inc/dec support will still feature blind writes if at all
possible. The memcached protocol returns a resultant value from inc/dec.

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Flush Commit Log

2010-04-05 Thread JKnight JKnight
Dear all,

How can I flush all Commit Log for Cassandra version 042?
I use nodeprobe flush but It seem does not run.

Thank a lot for support.

-- 
Best regards,
JKnight


Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
On Mon, Apr 5, 2010 at 12:01 AM, David Strauss  wrote:
> On 2010-04-05 03:42, Paul Prescod wrote:
>...
>
> There is a difference between Cassandra allowing inc/dec on values and
> actually *knowing* the resultant value at the time of the write. It's
> likely that inc/dec support will still feature blind writes if at all
> possible. The memcached protocol returns a resultant value from inc/dec.

Right. That's why I said that the proxy layer would need to read the
result with an appropriate consistency level before returning to the
memcached client application. The client application would need to
declare its consistency preference using a configuration file.

 Paul Prescod


Re: Memcached protocol?

2010-04-05 Thread David Strauss
On 2010-04-05 07:47, Paul Prescod wrote:
> On Mon, Apr 5, 2010 at 12:01 AM, David Strauss  wrote:
>> On 2010-04-05 03:42, Paul Prescod wrote:
>> ...
>>
>> There is a difference between Cassandra allowing inc/dec on values and
>> actually *knowing* the resultant value at the time of the write. It's
>> likely that inc/dec support will still feature blind writes if at all
>> possible. The memcached protocol returns a resultant value from inc/dec.
> 
> Right. That's why I said that the proxy layer would need to read the
> result with an appropriate consistency level before returning to the
> memcached client application. The client application would need to
> declare its consistency preference using a configuration file.

But your "write then read" model lacks the atomicity of the memcached
API. It's possible for two clients to read the same value.

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: Cassandra Design or another solution

2010-04-05 Thread JKnight JKnight
Thanks for for reply, David.

I will tell more the detail about the system. My system is used to store the
score (point) user earn when they play game.

"Mark" is the score.
User's score changes when user win game, buy or sell anything.

Sorry I make a mistake. My data model is:

Mark{ //Column Family
gameId:{ //row key
mark_userId: ""// (column name : value),
mark2_userId2: ""
},
gameId2:{//row key
mark_userId: ""
}
}


On Sun, Apr 4, 2010 at 11:44 PM, David Strauss wrote:

> On 2010-04-05 02:48, JKnight JKnight wrote:
> > I want to design the data storage to store user's mark for a large
> > amount of user. When system run, user's mark changes frequently.
>
> What is a "mark"?
>
> > I want to list top 10 user have largest mark.
>
> Do the "marks" increase monotonically? What other properties do they have?
>
> > Could we use Cassandra for store this data?
> >
> > Ex, here my Cassandra data model design:
> > Mark{
> > userId{
> > mark_userId
> > },
> > }
>
> I do not understand that notation. What parts are the CF, key/row, and
> column?
>
> > When user's mark changes, we remove old mark_userId and add new
> > mark_userId.
> > Because user's mark change frequently and with large amount  of user, I
> > think Cassandra can not satisfy.
>
> On the contrary, Cassandra excels at tracking rapidly changing data and
> even shards rows to scale I/O horizontally.
>
> --
> David Strauss
>   | da...@fourkitchens.com
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
>
>


-- 
Best regards,
JKnight


Re: Cassandra Design or another solution

2010-04-05 Thread David Strauss
I need the question about monotonicity answered, too.

You should also know: Cassandra is not ideal for directly tracking
values you increment or decrement.

On 2010-04-05 08:04, JKnight JKnight wrote:
> Thanks for for reply, David.
> 
> I will tell more the detail about the system. My system is used to store
> the score (point) user earn when they play game.
> 
> "Mark" is the score.
> User's score changes when user win game, buy or sell anything.
> 
> Sorry I make a mistake. My data model is:
> 
> Mark{ //Column Family
> gameId:{ //row key
> mark_userId: ""// (column name : value),
> mark2_userId2: ""
> },
> gameId2:{//row key
> mark_userId: ""
> }
> }
> 
> 
> On Sun, Apr 4, 2010 at 11:44 PM, David Strauss  > wrote:
> 
> On 2010-04-05 02:48, JKnight JKnight wrote:
> > I want to design the data storage to store user's mark for a large
> > amount of user. When system run, user's mark changes frequently.
> 
> What is a "mark"?
> 
> > I want to list top 10 user have largest mark.
> 
> Do the "marks" increase monotonically? What other properties do they
> have?
> 
> > Could we use Cassandra for store this data?
> >
> > Ex, here my Cassandra data model design:
> > Mark{
> > userId{
> > mark_userId
> > },
> > }
> 
> I do not understand that notation. What parts are the CF, key/row, and
> column?
> 
> > When user's mark changes, we remove old mark_userId and add new
> > mark_userId.
> > Because user's mark change frequently and with large amount  of
> user, I
> > think Cassandra can not satisfy.
> 
> On the contrary, Cassandra excels at tracking rapidly changing data and
> even shards rows to scale I/O horizontally.
> 
> --
> David Strauss
>   | da...@fourkitchens.com 
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
> 
> 
> 
> 
> -- 
> Best regards,
> JKnight


-- 
David Strauss
   | da...@fourkitchens.com
   | +1 512 577 5827 [mobile]
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: Cassandra Design or another solution

2010-04-05 Thread JKnight JKnight
Thanks David,
But what's does "monotonicity" mean?

User's score belongs to their action. When they win the game or sale
something, user's score  will increase. When user lose the game or buy
something, user's score will decrease.

On Mon, Apr 5, 2010 at 4:09 AM, David Strauss wrote:

> I need the question about monotonicity answered, too.
>
> You should also know: Cassandra is not ideal for directly tracking
> values you increment or decrement.
>
> On 2010-04-05 08:04, JKnight JKnight wrote:
> > Thanks for for reply, David.
> >
> > I will tell more the detail about the system. My system is used to store
> > the score (point) user earn when they play game.
> >
> > "Mark" is the score.
> > User's score changes when user win game, buy or sell anything.
> >
> > Sorry I make a mistake. My data model is:
> >
> > Mark{ //Column Family
> > gameId:{ //row key
> > mark_userId: ""// (column name : value),
> > mark2_userId2: ""
> > },
> > gameId2:{//row key
> > mark_userId: ""
> > }
> > }
> >
> >
> > On Sun, Apr 4, 2010 at 11:44 PM, David Strauss  > > wrote:
> >
> > On 2010-04-05 02:48, JKnight JKnight wrote:
> > > I want to design the data storage to store user's mark for a large
> > > amount of user. When system run, user's mark changes frequently.
> >
> > What is a "mark"?
> >
> > > I want to list top 10 user have largest mark.
> >
> > Do the "marks" increase monotonically? What other properties do they
> > have?
> >
> > > Could we use Cassandra for store this data?
> > >
> > > Ex, here my Cassandra data model design:
> > > Mark{
> > > userId{
> > > mark_userId
> > > },
> > > }
> >
> > I do not understand that notation. What parts are the CF, key/row,
> and
> > column?
> >
> > > When user's mark changes, we remove old mark_userId and add new
> > > mark_userId.
> > > Because user's mark change frequently and with large amount  of
> > user, I
> > > think Cassandra can not satisfy.
> >
> > On the contrary, Cassandra excels at tracking rapidly changing data
> and
> > even shards rows to scale I/O horizontally.
> >
> > --
> > David Strauss
> >   | da...@fourkitchens.com 
> > Four Kitchens
> >   | http://fourkitchens.com
> >   | +1 512 454 6659 [office]
> >   | +1 512 870 8453 [direct]
> >
> >
> >
> >
> > --
> > Best regards,
> > JKnight
>
>
> --
> David Strauss
>   | da...@fourkitchens.com
>| +1 512 577 5827 [mobile]
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
>
>


-- 
Best regards,
JKnight


Re: Cassandra Design or another solution

2010-04-05 Thread Andriy Bohdan
Hello guys

I have a pretty similar task. There's a need to store tags of products
with score. Score may go up and down and tags have to be ordered by
their score for each product. Score is updated "very" often.

I was thinking of using the following model (simplified here for clarity):

Product = {
product_key: {
 name: 
 etc..
   }
   ...
}

Product_Tags = {
product_key : {
tag_name: score
 ...
}
...
}

Product_Tags_Ordered (compareWith:BytesType) = {
product_key: {
(score, time_uuid) :  tag_name
...
}
...
}

So to update a score of a tag:
1) need to look old value of score to be able to remove it from
Product_Tags_Ordered
2) remove Row from Product_Tags_Ordered with old score
3) update score in Product_Tags
4) insert new Row into Product_Tags_Ordered with new score

4 IO operations look like a bit too much to update one score as for me.

I'm curious if there's any better solution I missed.


On Mon, Apr 5, 2010 at 11:54 AM, JKnight JKnight  wrote:
> Thanks David,
> But what's does "monotonicity" mean?
>
> User's score belongs to their action. When they win the game or sale
> something, user's score  will increase. When user lose the game or buy
> something, user's score will decrease.
>
> On Mon, Apr 5, 2010 at 4:09 AM, David Strauss 
> wrote:
>>
>> I need the question about monotonicity answered, too.
>>
>> You should also know: Cassandra is not ideal for directly tracking
>> values you increment or decrement.
>>
>> On 2010-04-05 08:04, JKnight JKnight wrote:
>> > Thanks for for reply, David.
>> >
>> > I will tell more the detail about the system. My system is used to store
>> > the score (point) user earn when they play game.
>> >
>> > "Mark" is the score.
>> > User's score changes when user win game, buy or sell anything.
>> >
>> > Sorry I make a mistake. My data model is:
>> >
>> > Mark{ //Column Family
>> >     gameId:{ //row key
>> >         mark_userId: ""// (column name : value),
>> >         mark2_userId2: ""
>> >     },
>> >     gameId2:{//row key
>> >         mark_userId: ""
>> >     }
>> > }
>> >
>> >
>> > On Sun, Apr 4, 2010 at 11:44 PM, David Strauss > > > wrote:
>> >
>> >     On 2010-04-05 02:48, JKnight JKnight wrote:
>> >     > I want to design the data storage to store user's mark for a large
>> >     > amount of user. When system run, user's mark changes frequently.
>> >
>> >     What is a "mark"?
>> >
>> >     > I want to list top 10 user have largest mark.
>> >
>> >     Do the "marks" increase monotonically? What other properties do they
>> >     have?
>> >
>> >     > Could we use Cassandra for store this data?
>> >     >
>> >     > Ex, here my Cassandra data model design:
>> >     > Mark{
>> >     >     userId{
>> >     >         mark_userId
>> >     >     },
>> >     > }
>> >
>> >     I do not understand that notation. What parts are the CF, key/row,
>> > and
>> >     column?
>> >
>> >     > When user's mark changes, we remove old mark_userId and add new
>> >     > mark_userId.
>> >     > Because user's mark change frequently and with large amount  of
>> >     user, I
>> >     > think Cassandra can not satisfy.
>> >
>> >     On the contrary, Cassandra excels at tracking rapidly changing data
>> > and
>> >     even shards rows to scale I/O horizontally.
>> >
>> >     --
>> >     David Strauss
>> >       | da...@fourkitchens.com 
>> >     Four Kitchens
>> >       | http://fourkitchens.com
>> >       | +1 512 454 6659 [office]
>> >       | +1 512 870 8453 [direct]
>> >
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > JKnight
>>
>>
>> --
>> David Strauss
>>   | da...@fourkitchens.com
>>   | +1 512 577 5827 [mobile]
>> Four Kitchens
>>   | http://fourkitchens.com
>>   | +1 512 454 6659 [office]
>>   | +1 512 870 8453 [direct]
>>
>
>
>
> --
> Best regards,
> JKnight
>



-- 
Andriy


Re: Cassandra Design or another solution

2010-04-05 Thread David Timothy Strauss

If user scores move in more than one direction, as they apparently do in your 
case, they are not monotonic. Monotonicity can make system design a bit easier 
for various reasons. 
- "JKnight JKnight"  wrote: 


Thanks David, 
But what's does "monotonicity" mean? 

User's score belongs to their action. When they win the game or sale something, 
user's score will increase. When user lose the game or buy something, user's 
score will decrease. 


On Mon, Apr 5, 2010 at 4:09 AM, David Strauss < da...@fourkitchens.com > wrote: 


I need the question about monotonicity answered, too. 

You should also know: Cassandra is not ideal for directly tracking 
values you increment or decrement. 


On 2010-04-05 08:04, JKnight JKnight wrote: 
> Thanks for for reply, David. 
> 
> I will tell more the detail about the system. My system is used to store 
> the score (point) user earn when they play game. 
> 
> "Mark" is the score. 
> User's score changes when user win game, buy or sell anything. 
> 
> Sorry I make a mistake. My data model is: 
> 
> Mark{ //Column Family 
> gameId:{ //row key 
> mark_userId: ""// (column name : value), 
> mark2_userId2: "" 
> }, 
> gameId2:{//row key 
> mark_userId: "" 
> } 
> } 
> 
> 
> On Sun, Apr 4, 2010 at 11:44 PM, David Strauss < da...@fourkitchens.com 

> > wrote: 
> 
> On 2010-04-05 02:48, JKnight JKnight wrote: 
> > I want to design the data storage to store user's mark for a large 
> > amount of user. When system run, user's mark changes frequently. 
> 
> What is a "mark"? 
> 
> > I want to list top 10 user have largest mark. 
> 
> Do the "marks" increase monotonically? What other properties do they 
> have? 
> 
> > Could we use Cassandra for store this data? 
> > 
> > Ex, here my Cassandra data model design: 
> > Mark{ 
> > userId{ 
> > mark_userId 
> > }, 
> > } 
> 
> I do not understand that notation. What parts are the CF, key/row, and 
> column? 
> 
> > When user's mark changes, we remove old mark_userId and add new 
> > mark_userId. 
> > Because user's mark change frequently and with large amount of 
> user, I 
> > think Cassandra can not satisfy. 
> 
> On the contrary, Cassandra excels at tracking rapidly changing data and 
> even shards rows to scale I/O horizontally. 
> 
> -- 
> David Strauss 
> | da...@fourkitchens.com  

> Four Kitchens 
> | http://fourkitchens.com 
> | +1 512 454 6659 [office] 
> | +1 512 870 8453 [direct] 
> 
> 
> 
> 
> -- 
> Best regards, 
> JKnight 


-- 

David Strauss 
| da...@fourkitchens.com 
| +1 512 577 5827 [mobile] 



Four Kitchens 
| http://fourkitchens.com 
| +1 512 454 6659 [office] 
| +1 512 870 8453 [direct] 




-- 
Best regards, 
JKnight 



-- 
David Strauss 
| da...@fourkitchens.com 
| +1 512 577 5827 [mobile] 
Four Kitchens 
| http://fourkitchens.com 
| +1 512 454 6659 [office] 
| +1 512 870 8453 [direct] 


Re: Cassandra Design or another solution

2010-04-05 Thread David Timothy Strauss

In any case, the common approach to this in Cassandra is to not directly 
manipulate the user's total score but to insert columns representing changes to 
the score, later totaling them (and possibly inserting them elsewhere so you 
get the automatic sort). There are many fancy ways to approach this problem and 
reduce recalculation work. 


To be more specific about my reason for asking about monotonicity, if user 
scores increased monotonically, you could simply keep a current "top ten" list 
and bump out old members as new people's scores qualified for the top ten. 
Thus, you'd be keeping an ordered set of the top ten instead of an ordered set 
of all people. Unfortunately, the possibility of decreasing scores means 
members of the top ten may self-disqualify by a score decline, requiring 
promotion of the former 11th person to the top ten and immediate identification 
of the former 12th person to fill the 11th spot. (With a lack of monotonicity, 
the top eleven -- not just ten -- must always be tracked to efficiently know 
when someone drops below the top ten.) 
- "David Timothy Strauss"  wrote: 




If user scores move in more than one direction, as they apparently do in your 
case, they are not monotonic. Monotonicity can make system design a bit easier 
for various reasons. 
- "JKnight JKnight"  wrote: 


Thanks David, 
But what's does "monotonicity" mean? 

User's score belongs to their action. When they win the game or sale something, 
user's score will increase. When user lose the game or buy something, user's 
score will decrease. 


On Mon, Apr 5, 2010 at 4:09 AM, David Strauss < da...@fourkitchens.com > wrote: 


I need the question about monotonicity answered, too. 

You should also know: Cassandra is not ideal for directly tracking 
values you increment or decrement. 


On 2010-04-05 08:04, JKnight JKnight wrote: 
> Thanks for for reply, David. 
> 
> I will tell more the detail about the system. My system is used to store 
> the score (point) user earn when they play game. 
> 
> "Mark" is the score. 
> User's score changes when user win game, buy or sell anything. 
> 
> Sorry I make a mistake. My data model is: 
> 
> Mark{ //Column Family 
> gameId:{ //row key 
> mark_userId: ""// (column name : value), 
> mark2_userId2: "" 
> }, 
> gameId2:{//row key 
> mark_userId: "" 
> } 
> } 
> 
> 
> On Sun, Apr 4, 2010 at 11:44 PM, David Strauss < da...@fourkitchens.com 

> > wrote: 
> 
> On 2010-04-05 02:48, JKnight JKnight wrote: 
> > I want to design the data storage to store user's mark for a large 
> > amount of user. When system run, user's mark changes frequently. 
> 
> What is a "mark"? 
> 
> > I want to list top 10 user have largest mark. 
> 
> Do the "marks" increase monotonically? What other properties do they 
> have? 
> 
> > Could we use Cassandra for store this data? 
> > 
> > Ex, here my Cassandra data model design: 
> > Mark{ 
> > userId{ 
> > mark_userId 
> > }, 
> > } 
> 
> I do not understand that notation. What parts are the CF, key/row, and 
> column? 
> 
> > When user's mark changes, we remove old mark_userId and add new 
> > mark_userId. 
> > Because user's mark change frequently and with large amount of 
> user, I 
> > think Cassandra can not satisfy. 
> 
> On the contrary, Cassandra excels at tracking rapidly changing data and 
> even shards rows to scale I/O horizontally. 
> 
> -- 
> David Strauss 
> | da...@fourkitchens.com  

> Four Kitchens 
> | http://fourkitchens.com 
> | +1 512 454 6659 [office] 
> | +1 512 870 8453 [direct] 
> 
> 
> 
> 
> -- 
> Best regards, 
> JKnight 


-- 

David Strauss 
| da...@fourkitchens.com 
| +1 512 577 5827 [mobile] 



Four Kitchens 
| http://fourkitchens.com 
| +1 512 454 6659 [office] 
| +1 512 870 8453 [direct] 




-- 
Best regards, 
JKnight 



-- 
David Strauss 
| da...@fourkitchens.com 
| +1 512 577 5827 [mobile] 
Four Kitchens 
| http://fourkitchens.com 
| +1 512 454 6659 [office] 
| +1 512 870 8453 [direct] 



-- 
David Strauss 
| da...@fourkitchens.com 
| +1 512 577 5827 [mobile] 
Four Kitchens 
| http://fourkitchens.com 
| +1 512 454 6659 [office] 
| +1 512 870 8453 [direct] 


Re: Cassandra Design or another solution

2010-04-05 Thread David Timothy Strauss
Cache the  =>  map as you write values (a "write-through" 
cache) so that reading the current score hits something like memcached instead 
of Cassandra. With a cache hit, you get an ideal, write-only path in Cassandra. 
Three blind writes in Cassandra is cheap -- no matter what your scale. The only 
risk is inability to efficiently remove old scores if you lose the contents of 
memcached, but that risk can be mitigated various ways.

Of course, I'm assuming a single data center, here. Memcached isn't too useful 
for this if you need to update scores at two data centers.

I'm not sure how much the 0.6 row cache might help in this case, too.

- "Andriy Bohdan"  wrote:

> Hello guys
> 
> I have a pretty similar task. There's a need to store tags of
> products
> with score. Score may go up and down and tags have to be ordered by
> their score for each product. Score is updated "very" often.
> 
> I was thinking of using the following model (simplified here for
> clarity):
> 
> Product = {
> product_key: {
>  name: 
>  etc..
>}
>...
> }
> 
> Product_Tags = {
> product_key : {
> tag_name: score
>  ...
> }
> ...
> }
> 
> Product_Tags_Ordered (compareWith:BytesType) = {
> product_key: {
> (score, time_uuid) :  tag_name
> ...
> }
> ...
> }
> 
> So to update a score of a tag:
> 1) need to look old value of score to be able to remove it from
> Product_Tags_Ordered
> 2) remove Row from Product_Tags_Ordered with old score
> 3) update score in Product_Tags
> 4) insert new Row into Product_Tags_Ordered with new score
> 
> 4 IO operations look like a bit too much to update one score as for
> me.
> 
> I'm curious if there's any better solution I missed.
> 
> 
> On Mon, Apr 5, 2010 at 11:54 AM, JKnight JKnight 
> wrote:
> > Thanks David,
> > But what's does "monotonicity" mean?
> >
> > User's score belongs to their action. When they win the game or
> sale
> > something, user's score  will increase. When user lose the game or
> buy
> > something, user's score will decrease.
> >
> > On Mon, Apr 5, 2010 at 4:09 AM, David Strauss
> 
> > wrote:
> >>
> >> I need the question about monotonicity answered, too.
> >>
> >> You should also know: Cassandra is not ideal for directly tracking
> >> values you increment or decrement.
> >>
> >> On 2010-04-05 08:04, JKnight JKnight wrote:
> >> > Thanks for for reply, David.
> >> >
> >> > I will tell more the detail about the system. My system is used
> to store
> >> > the score (point) user earn when they play game.
> >> >
> >> > "Mark" is the score.
> >> > User's score changes when user win game, buy or sell anything.
> >> >
> >> > Sorry I make a mistake. My data model is:
> >> >
> >> > Mark{ //Column Family
> >> >     gameId:{ //row key
> >> >         mark_userId: ""// (column name : value),
> >> >         mark2_userId2: ""
> >> >     },
> >> >     gameId2:{//row key
> >> >         mark_userId: ""
> >> >     }
> >> > }
> >> >
> >> >
> >> > On Sun, Apr 4, 2010 at 11:44 PM, David Strauss
>  >> > > wrote:
> >> >
> >> >     On 2010-04-05 02:48, JKnight JKnight wrote:
> >> >     > I want to design the data storage to store user's mark for
> a large
> >> >     > amount of user. When system run, user's mark changes
> frequently.
> >> >
> >> >     What is a "mark"?
> >> >
> >> >     > I want to list top 10 user have largest mark.
> >> >
> >> >     Do the "marks" increase monotonically? What other properties
> do they
> >> >     have?
> >> >
> >> >     > Could we use Cassandra for store this data?
> >> >     >
> >> >     > Ex, here my Cassandra data model design:
> >> >     > Mark{
> >> >     >     userId{
> >> >     >         mark_userId
> >> >     >     },
> >> >     > }
> >> >
> >> >     I do not understand that notation. What parts are the CF,
> key/row,
> >> > and
> >> >     column?
> >> >
> >> >     > When user's mark changes, we remove old mark_userId and add
> new
> >> >     > mark_userId.
> >> >     > Because user's mark change frequently and with large amount
>  of
> >> >     user, I
> >> >     > think Cassandra can not satisfy.
> >> >
> >> >     On the contrary, Cassandra excels at tracking rapidly
> changing data
> >> > and
> >> >     even shards rows to scale I/O horizontally.
> >> >
> >> >     --
> >> >     David Strauss
> >> >       | da...@fourkitchens.com 
> >> >     Four Kitchens
> >> >       | http://fourkitchens.com
> >> >       | +1 512 454 6659 [office]
> >> >       | +1 512 870 8453 [direct]
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Best regards,
> >> > JKnight
> >>
> >>
> >> --
> >> David Strauss
> >>   | da...@fourkitchens.com
> >>   | +1 512 577 5827 [mobile]
> >> Four Kitchens
> >>   | http://fourkitchens.com
> >>   | +1 512 454 6659 [office]
> >>   | +1 512 870 8453 [direct]
> >>
> >
> >
> >
> > --
> > Best regards,
> > JKnight
> >
> 
> 
> 
> -- 
> Andriy

-- 
David Strauss
   | da...@fourkitchens.com
   | +1 512 577 5

Re: Cassandra Design or another solution

2010-04-05 Thread Andriy Bohdan
It makes sense.

Thanks, David!

On Mon, Apr 5, 2010 at 2:34 PM, David Timothy Strauss
 wrote:
> Cache the  =>  map as you write values (a 
> "write-through" cache) so that reading the current score hits something like 
> memcached instead of Cassandra. With a cache hit, you get an ideal, 
> write-only path in Cassandra. Three blind writes in Cassandra is cheap -- no 
> matter what your scale. The only risk is inability to efficiently remove old 
> scores if you lose the contents of memcached, but that risk can be mitigated 
> various ways.
>
> Of course, I'm assuming a single data center, here. Memcached isn't too 
> useful for this if you need to update scores at two data centers.
>
> I'm not sure how much the 0.6 row cache might help in this case, too.
>
> - "Andriy Bohdan"  wrote:
>
>> Hello guys
>>
>> I have a pretty similar task. There's a need to store tags of
>> products
>> with score. Score may go up and down and tags have to be ordered by
>> their score for each product. Score is updated "very" often.
>>
>> I was thinking of using the following model (simplified here for
>> clarity):
>>
>> Product = {
>>     product_key: {
>>          name: 
>>          etc..
>>    }
>>    ...
>> }
>>
>> Product_Tags = {
>>     product_key : {
>>         tag_name: score
>>          ...
>>     }
>>     ...
>> }
>>
>> Product_Tags_Ordered (compareWith:BytesType) = {
>>     product_key: {
>>         (score, time_uuid) :  tag_name
>>         ...
>>     }
>>     ...
>> }
>>
>> So to update a score of a tag:
>> 1) need to look old value of score to be able to remove it from
>> Product_Tags_Ordered
>> 2) remove Row from Product_Tags_Ordered with old score
>> 3) update score in Product_Tags
>> 4) insert new Row into Product_Tags_Ordered with new score
>>
>> 4 IO operations look like a bit too much to update one score as for
>> me.
>>
>> I'm curious if there's any better solution I missed.
>>
>>
>> On Mon, Apr 5, 2010 at 11:54 AM, JKnight JKnight 
>> wrote:
>> > Thanks David,
>> > But what's does "monotonicity" mean?
>> >
>> > User's score belongs to their action. When they win the game or
>> sale
>> > something, user's score  will increase. When user lose the game or
>> buy
>> > something, user's score will decrease.
>> >
>> > On Mon, Apr 5, 2010 at 4:09 AM, David Strauss
>> 
>> > wrote:
>> >>
>> >> I need the question about monotonicity answered, too.
>> >>
>> >> You should also know: Cassandra is not ideal for directly tracking
>> >> values you increment or decrement.
>> >>
>> >> On 2010-04-05 08:04, JKnight JKnight wrote:
>> >> > Thanks for for reply, David.
>> >> >
>> >> > I will tell more the detail about the system. My system is used
>> to store
>> >> > the score (point) user earn when they play game.
>> >> >
>> >> > "Mark" is the score.
>> >> > User's score changes when user win game, buy or sell anything.
>> >> >
>> >> > Sorry I make a mistake. My data model is:
>> >> >
>> >> > Mark{ //Column Family
>> >> >     gameId:{ //row key
>> >> >         mark_userId: ""// (column name : value),
>> >> >         mark2_userId2: ""
>> >> >     },
>> >> >     gameId2:{//row key
>> >> >         mark_userId: ""
>> >> >     }
>> >> > }
>> >> >
>> >> >
>> >> > On Sun, Apr 4, 2010 at 11:44 PM, David Strauss
>> > >> > > wrote:
>> >> >
>> >> >     On 2010-04-05 02:48, JKnight JKnight wrote:
>> >> >     > I want to design the data storage to store user's mark for
>> a large
>> >> >     > amount of user. When system run, user's mark changes
>> frequently.
>> >> >
>> >> >     What is a "mark"?
>> >> >
>> >> >     > I want to list top 10 user have largest mark.
>> >> >
>> >> >     Do the "marks" increase monotonically? What other properties
>> do they
>> >> >     have?
>> >> >
>> >> >     > Could we use Cassandra for store this data?
>> >> >     >
>> >> >     > Ex, here my Cassandra data model design:
>> >> >     > Mark{
>> >> >     >     userId{
>> >> >     >         mark_userId
>> >> >     >     },
>> >> >     > }
>> >> >
>> >> >     I do not understand that notation. What parts are the CF,
>> key/row,
>> >> > and
>> >> >     column?
>> >> >
>> >> >     > When user's mark changes, we remove old mark_userId and add
>> new
>> >> >     > mark_userId.
>> >> >     > Because user's mark change frequently and with large amount
>>  of
>> >> >     user, I
>> >> >     > think Cassandra can not satisfy.
>> >> >
>> >> >     On the contrary, Cassandra excels at tracking rapidly
>> changing data
>> >> > and
>> >> >     even shards rows to scale I/O horizontally.
>> >> >
>> >> >     --
>> >> >     David Strauss
>> >> >       | da...@fourkitchens.com 
>> >> >     Four Kitchens
>> >> >       | http://fourkitchens.com
>> >> >       | +1 512 454 6659 [office]
>> >> >       | +1 512 870 8453 [direct]
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Best regards,
>> >> > JKnight
>> >>
>> >>
>> >> --
>> >> David Strauss
>> >>   | da...@fourkitchens.com
>> >>   | +1 512 577 5827 [mobile]

Re: Memcached protocol?

2010-04-05 Thread Ryan Daum
It seems pretty clear to me that the full memcached protocol is not
appropriate for Cassandra. The question is whether some subset of it is of
any use to anybody. The only advantage I can see is that there are a large
number of clients out there that can speak it already; but any app that is
making extensive use of it is probably doing so in a way that would preclude
Cassandra+Jmemcached from being a "drop-in" addition.

Ryan

On Mon, Apr 5, 2010 at 9:02 AM, David Strauss wrote:

> On 2010-04-05 07:47, Paul Prescod wrote:
> > On Mon, Apr 5, 2010 at 12:01 AM, David Strauss 
> wrote:
> >> On 2010-04-05 03:42, Paul Prescod wrote:
> >> ...
> >>
> >> There is a difference between Cassandra allowing inc/dec on values and
> >> actually *knowing* the resultant value at the time of the write. It's
> >> likely that inc/dec support will still feature blind writes if at all
> >> possible. The memcached protocol returns a resultant value from inc/dec.
> >
> > Right. That's why I said that the proxy layer would need to read the
> > result with an appropriate consistency level before returning to the
> > memcached client application. The client application would need to
> > declare its consistency preference using a configuration file.
>
> But your "write then read" model lacks the atomicity of the memcached
> API. It's possible for two clients to read the same value.
>
> --
> David Strauss
>   | da...@fourkitchens.com
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
>
>


Re: multinode cluster wiki page

2010-04-05 Thread Ted Zlatanov
On Sat, 3 Apr 2010 13:52:22 -0700 Benjamin Black  wrote: 

BB> What happens if the IP I get back is for a seed that happens to be
BB> down right then?  And then that IP is cached locally by my resolver?

You have to set the TTL to be the right number of seconds for your
environment.  With tinydns on a dedicated subdomain, even an old machine
could support really short TTLs.

BB> There is certainly a tempting conceptual simplicity to using DNS, I
BB> just don't think the reality is that simple nor is it for the trade in
BB> predictability, for me.  IMO, this is better done either through
BB> automation to generate the configs (how I do it; I just update
BB> chef-server) or through a service like ZK (how I might do it in the
BB> future, in combination with automation).

DNS tends to be everywhere and easily configurable, so it's a pretty
good lowest common denominator.  I think Zeroconf AKA mDNS/DNS-SD is a
good alternative to simple DNS RR for many environments and I will
eventually propose a contrib plugin for Cassandra that provides it if no
one else gets to it first (we discussed this previously).  Modern Linux
systems support Zeroconf AKA mDNS/DNS-SD through Avahi.

Ted



Re: multinode cluster wiki page

2010-04-05 Thread Ted Zlatanov
On Sat, 3 Apr 2010 14:10:37 -0500 Jonathan Ellis  wrote: 

JE> IMO the "right" way to do it is to configure your machines so that
JE> autodetecting listenaddress Just Works, so you can deploy exactly the
JE> same config to all nodes.

It would be nice if Cassandra looked at all the available interfaces and
selected the one whose reverse DNS lookup returned ".*cassandra.*" (or
some keyword the user provided).

In other words, when you have

eth0 = address X, reverse = "67.frontend.com"
eth1 = address Y, reverse = "cassandra-67.backend.com"

eth1 should look better.  So maybe ListenAddress could support this in
the configuration somehow, as a string spec or a
ListenAddressPreferReverse option.  That would let those of us with
multiple interfaces use the exact same config everywhere.

Ted



Re: Cassandra Design or another solution

2010-04-05 Thread JKnight JKnight
Thanks for your help,  David.


On Mon, Apr 5, 2010 at 7:25 PM, Andriy Bohdan  wrote:

> It makes sense.
>
> Thanks, David!
>
> On Mon, Apr 5, 2010 at 2:34 PM, David Timothy Strauss
>  wrote:
> > Cache the  =>  map as you write values (a
> "write-through" cache) so that reading the current score hits something like
> memcached instead of Cassandra. With a cache hit, you get an ideal,
> write-only path in Cassandra. Three blind writes in Cassandra is cheap -- no
> matter what your scale. The only risk is inability to efficiently remove old
> scores if you lose the contents of memcached, but that risk can be mitigated
> various ways.
> >
> > Of course, I'm assuming a single data center, here. Memcached isn't too
> useful for this if you need to update scores at two data centers.
> >
> > I'm not sure how much the 0.6 row cache might help in this case, too.
> >
> > - "Andriy Bohdan"  wrote:
> >
> >> Hello guys
> >>
> >> I have a pretty similar task. There's a need to store tags of
> >> products
> >> with score. Score may go up and down and tags have to be ordered by
> >> their score for each product. Score is updated "very" often.
> >>
> >> I was thinking of using the following model (simplified here for
> >> clarity):
> >>
> >> Product = {
> >> product_key: {
> >>  name: 
> >>  etc..
> >>}
> >>...
> >> }
> >>
> >> Product_Tags = {
> >> product_key : {
> >> tag_name: score
> >>  ...
> >> }
> >> ...
> >> }
> >>
> >> Product_Tags_Ordered (compareWith:BytesType) = {
> >> product_key: {
> >> (score, time_uuid) :  tag_name
> >> ...
> >> }
> >> ...
> >> }
> >>
> >> So to update a score of a tag:
> >> 1) need to look old value of score to be able to remove it from
> >> Product_Tags_Ordered
> >> 2) remove Row from Product_Tags_Ordered with old score
> >> 3) update score in Product_Tags
> >> 4) insert new Row into Product_Tags_Ordered with new score
> >>
> >> 4 IO operations look like a bit too much to update one score as for
> >> me.
> >>
> >> I'm curious if there's any better solution I missed.
> >>
> >>
> >> On Mon, Apr 5, 2010 at 11:54 AM, JKnight JKnight 
> >> wrote:
> >> > Thanks David,
> >> > But what's does "monotonicity" mean?
> >> >
> >> > User's score belongs to their action. When they win the game or
> >> sale
> >> > something, user's score  will increase. When user lose the game or
> >> buy
> >> > something, user's score will decrease.
> >> >
> >> > On Mon, Apr 5, 2010 at 4:09 AM, David Strauss
> >> 
> >> > wrote:
> >> >>
> >> >> I need the question about monotonicity answered, too.
> >> >>
> >> >> You should also know: Cassandra is not ideal for directly tracking
> >> >> values you increment or decrement.
> >> >>
> >> >> On 2010-04-05 08:04, JKnight JKnight wrote:
> >> >> > Thanks for for reply, David.
> >> >> >
> >> >> > I will tell more the detail about the system. My system is used
> >> to store
> >> >> > the score (point) user earn when they play game.
> >> >> >
> >> >> > "Mark" is the score.
> >> >> > User's score changes when user win game, buy or sell anything.
> >> >> >
> >> >> > Sorry I make a mistake. My data model is:
> >> >> >
> >> >> > Mark{ //Column Family
> >> >> > gameId:{ //row key
> >> >> > mark_userId: ""// (column name : value),
> >> >> > mark2_userId2: ""
> >> >> > },
> >> >> > gameId2:{//row key
> >> >> > mark_userId: ""
> >> >> > }
> >> >> > }
> >> >> >
> >> >> >
> >> >> > On Sun, Apr 4, 2010 at 11:44 PM, David Strauss
> >>  >> >> > > wrote:
> >> >> >
> >> >> > On 2010-04-05 02:48, JKnight JKnight wrote:
> >> >> > > I want to design the data storage to store user's mark for
> >> a large
> >> >> > > amount of user. When system run, user's mark changes
> >> frequently.
> >> >> >
> >> >> > What is a "mark"?
> >> >> >
> >> >> > > I want to list top 10 user have largest mark.
> >> >> >
> >> >> > Do the "marks" increase monotonically? What other properties
> >> do they
> >> >> > have?
> >> >> >
> >> >> > > Could we use Cassandra for store this data?
> >> >> > >
> >> >> > > Ex, here my Cassandra data model design:
> >> >> > > Mark{
> >> >> > > userId{
> >> >> > > mark_userId
> >> >> > > },
> >> >> > > }
> >> >> >
> >> >> > I do not understand that notation. What parts are the CF,
> >> key/row,
> >> >> > and
> >> >> > column?
> >> >> >
> >> >> > > When user's mark changes, we remove old mark_userId and add
> >> new
> >> >> > > mark_userId.
> >> >> > > Because user's mark change frequently and with large amount
> >>  of
> >> >> > user, I
> >> >> > > think Cassandra can not satisfy.
> >> >> >
> >> >> > On the contrary, Cassandra excels at tracking rapidly
> >> changing data
> >> >> > and
> >> >> > even shards rows to scale I/O horizontally.
> >> >> >
> >> >> > --
> >> >> > David Strauss
> >> >> >   | da...@four

Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
On Mon, Apr 5, 2010 at 1:02 AM, David Strauss  wrote:
> ...
>
> But your "write then read" model lacks the atomicity of the memcached
> API. It's possible for two clients to read the same value.

Do you have an example application where this particular side effect
of eventual consistency is problematic? Obviously memcached and
Cassandra are different because of eventual consistency. The question
is whether they are different enough to break an inconvenient number
of real applications. Do you depend on add returning a unique number
to each client in an application you've deployed? I have always
imagined it as being primarily for simple counters.

 Paul Prescod


Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
On Mon, Apr 5, 2010 at 5:29 AM, Ryan Daum  wrote:
> It seems pretty clear to me that the full memcached protocol is not
> appropriate for Cassandra. The question is whether some subset of it is of
> any use to anybody. The only advantage I can see is that there are a large
> number of clients out there that can speak it already; but any app that is
> making extensive use of it is probably doing so in a way that would preclude
> Cassandra+Jmemcached from being a "drop-in" addition.

Here are a couple of example projects for info.

Django:

http://docs.djangoproject.com/en/dev/topics/cache/

It says of "increment/decrement": "incr()/decr() methods are not
guaranteed to be atomic. On those backends that support atomic
increment/decrement (most notably, the memcached backend), increment
and decrement operations will be atomic. However, if the backend
doesn't natively provide an increment/decrement operation, it will be
implemented using a two-step retrieve/update."

add() is implied to be atomic.

Django itself does use add() in exactly one line of code that I can
find. I believe it is just an optimization (don't bother saving this
object if it already exists) and is not semantically meaningful. In
fact, I don't believe that there is a code path to the add() call but
I'm really not investigating very deeply.

Rails:

http://github.com/rails/rails/blob/master/actionpack/lib/action_controller/caching/actions.rb

Here is the complete usage of the cache_store object in Rails.

actionpack/lib/action_controller/caching/fragments.rb
44:  cache_store.write(key, content, options)
55:  result = cache_store.read(key, options)
66:  cache_store.exist?(key, options)
94:cache_store.delete_matched(key, options)
96:cache_store.delete(key, options)

actionpack/lib/action_controller/caching.rb
79:cache_store.fetch(ActiveSupport::Cache.expand_cache_key(key,
:controller), options, &block)

Fetch is an abstraction on top of read. delete_matched is not
supported by the memcached plugin and not used by Rails.

So as far as I can see, Rails only uses write, read, exist? and delete.

It does expose more functions to the actual application, but the Rails
framework does not use them. Most of them (including
increment/decrement) are not even documented, and not supported with
most cache stores.

 * http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#M001029

I checked a few of my own apps. They use get/set/add/delete, but the
add is almost always used as an optimization.

 Paul Prescod


Re: Memcached protocol?

2010-04-05 Thread Ryan Daum
Are these applications using memcached for caching or for something else?

I don't see the point in putting Cassandra in as a level 1 or 2 cache
replacement? Especially given as it does not support any reasonable
expiration policy that would be of use in those circumstances.

Ryan

On Mon, Apr 5, 2010 at 1:08 PM, Paul Prescod  wrote:

> On Mon, Apr 5, 2010 at 5:29 AM, Ryan Daum  wrote:
> > It seems pretty clear to me that the full memcached protocol is not
> > appropriate for Cassandra. The question is whether some subset of it is
> of
> > any use to anybody. The only advantage I can see is that there are a
> large
> > number of clients out there that can speak it already; but any app that
> is
> > making extensive use of it is probably doing so in a way that would
> preclude
> > Cassandra+Jmemcached from being a "drop-in" addition.
>
> Here are a couple of example projects for info.
>
> Django:
>
> http://docs.djangoproject.com/en/dev/topics/cache/
>
> It says of "increment/decrement": "incr()/decr() methods are not
> guaranteed to be atomic. On those backends that support atomic
> increment/decrement (most notably, the memcached backend), increment
> and decrement operations will be atomic. However, if the backend
> doesn't natively provide an increment/decrement operation, it will be
> implemented using a two-step retrieve/update."
>
> add() is implied to be atomic.
>
> Django itself does use add() in exactly one line of code that I can
> find. I believe it is just an optimization (don't bother saving this
> object if it already exists) and is not semantically meaningful. In
> fact, I don't believe that there is a code path to the add() call but
> I'm really not investigating very deeply.
>
> Rails:
>
>
> http://github.com/rails/rails/blob/master/actionpack/lib/action_controller/caching/actions.rb
>
> Here is the complete usage of the cache_store object in Rails.
>
> actionpack/lib/action_controller/caching/fragments.rb
> 44:  cache_store.write(key, content, options)
> 55:  result = cache_store.read(key, options)
> 66:  cache_store.exist?(key, options)
> 94:cache_store.delete_matched(key, options)
> 96:cache_store.delete(key, options)
>
> actionpack/lib/action_controller/caching.rb
> 79:cache_store.fetch(ActiveSupport::Cache.expand_cache_key(key,
> :controller), options, &block)
>
> Fetch is an abstraction on top of read. delete_matched is not
> supported by the memcached plugin and not used by Rails.
>
> So as far as I can see, Rails only uses write, read, exist? and delete.
>
> It does expose more functions to the actual application, but the Rails
> framework does not use them. Most of them (including
> increment/decrement) are not even documented, and not supported with
> most cache stores.
>
>  *
> http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#M001029
>
> I checked a few of my own apps. They use get/set/add/delete, but the
> add is almost always used as an optimization.
>
>  Paul Prescod
>


Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
On Mon, Apr 5, 2010 at 10:19 AM, Ryan Daum  wrote:
> Are these applications using memcached for caching or for something else?
> I don't see the point in putting Cassandra in as a level 1 or 2 cache
> replacement? Especially given as it does not support any reasonable
> expiration policy that would be of use in those circumstances.
> Ryan

You're right that without cache expiration, it's of questionable value
for page/fragment caches. I was just curious about what methods are
used out in the real world, so I looked at some big apps that I know
use memcached.

As far as client libraries go, I can attest that in Ruby at least, the
memcached client library is vastly faster than the thrift one. I don't
know about avro. In my tests with Ruby, the marshalling was dominating
the networking in Cassandra performance. 25% of the time in my
benchmark was used by a function called "write_byte" (which is
implemented in Ruby!). I would be happy to hear that I'm Doing
Something Wrong, but I think it's just a consequence of the thrift
protocol and the client implementation.

I have no idea whether Avro is better. I'm not sure if it works well
enough to be tested yet...

 Paul Prescod


Re: Memcached protocol?

2010-04-05 Thread Mike Malone
>
> Here are a couple of example projects for info.
>
> Django:
>
> http://docs.djangoproject.com/en/dev/topics/cache/
>
> It says of "increment/decrement": "incr()/decr() methods are not
> guaranteed to be atomic. On those backends that support atomic
> increment/decrement (most notably, the memcached backend), increment
> and decrement operations will be atomic. However, if the backend
> doesn't natively provide an increment/decrement operation, it will be
> implemented using a two-step retrieve/update."
>
> add() is implied to be atomic.
>
> Django itself does use add() in exactly one line of code that I can
> find. I believe it is just an optimization (don't bother saving this
> object if it already exists) and is not semantically meaningful. In
> fact, I don't believe that there is a code path to the add() call but
> I'm really not investigating very deeply.
>

FWIW, I added the atomic increment/decrement operations to the Django cache
interface (and wrote that documentation) because the functionality was
useful for large scale apps. I didn't implement atomic increment/decrement
or atomic add for backends that didn't natively support it because, in my
opinion (and in the opinion of the other Django contributors) any site that
requires that sort of functionality should be running memcached as their
cache backend. So I guess what I'm saying is that the functionality _is_
useful. However, there probably are some users who would find the subset of
the memcache protocol that you _can_ implement on top of Cassandra useful.

Meh.

Mike


Re: multinode cluster wiki page

2010-04-05 Thread Brandon Williams
2010/4/5 Ted Zlatanov 

> On Sat, 3 Apr 2010 14:10:37 -0500 Jonathan Ellis 
> wrote:
>
> JE> IMO the "right" way to do it is to configure your machines so that
> JE> autodetecting listenaddress Just Works, so you can deploy exactly the
> JE> same config to all nodes.
>
> It would be nice if Cassandra looked at all the available interfaces and
> selected the one whose reverse DNS lookup returned ".*cassandra.*" (or
> some keyword the user provided).
>
> In other words, when you have
>
> eth0 = address X, reverse = "67.frontend.com"
> eth1 = address Y, reverse = "cassandra-67.backend.com"
>
> eth1 should look better.  So maybe ListenAddress could support this in
> the configuration somehow, as a string spec or a
> ListenAddressPreferReverse option.  That would let those of us with
> multiple interfaces use the exact same config everywhere.


You can already accomplish this.  Setup /etc/hosts correctly and leave
ListenAddress blank.

-Brandon


Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
On Mon, Apr 5, 2010 at 10:45 AM, Mike Malone  wrote:
> ...
>
> FWIW, I added the atomic increment/decrement operations to the Django cache
> interface (and wrote that documentation) because the functionality was
> useful for large scale apps. I didn't implement atomic increment/decrement
> or atomic add for backends that didn't natively support it because, in my
> opinion (and in the opinion of the other Django contributors) any site that
> requires that sort of functionality should be running memcached as their
> cache backend. So I guess what I'm saying is that the functionality _is_
> useful. However, there probably are some users who would find the subset of
> the memcache protocol that you _can_ implement on top of Cassandra useful.

That's useful information Mike. I am a bit curious about what the most
common use cases are for atomic increment/decrement. I'm familiar with
atomic add as a sort of locking mechanism.

 Paul Prescod


Re: multinode cluster wiki page

2010-04-05 Thread Ted Zlatanov
On Mon, 5 Apr 2010 13:10:38 -0500 Brandon Williams  wrote: 

BW> 2010/4/5 Ted Zlatanov 
>> It would be nice if Cassandra looked at all the available interfaces and
>> selected the one whose reverse DNS lookup returned ".*cassandra.*" (or
>> some keyword the user provided).
>> 
>> In other words, when you have
>> 
>> eth0 = address X, reverse = "67.frontend.com"
>> eth1 = address Y, reverse = "cassandra-67.backend.com"
>> 
>> eth1 should look better.  So maybe ListenAddress could support this in
>> the configuration somehow, as a string spec or a
>> ListenAddressPreferReverse option.  That would let those of us with
>> multiple interfaces use the exact same config everywhere.

BW> You can already accomplish this.  Setup /etc/hosts correctly and leave
BW> ListenAddress blank.

Thanks, that's a much better solution.

Ted
getAddressFromNameService



Re: Memcached protocol?

2010-04-05 Thread Mike Malone
>
> That's useful information Mike. I am a bit curious about what the most
> common use cases are for atomic increment/decrement. I'm familiar with
> atomic add as a sort of locking mechanism.
>

They're useful for caching denormalized counts of things. Especially things
that change rapidly. Instead of invalidating the counter whenever an event
occurs that would incr/decr the counter, you can incr/decr the cached count
too.

In the case of Cassandra, they're useful for keeping counts of things in
general, since there's no efficient way to perform count operations with
Cassandra.

Mike


Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone  wrote:
>> That's useful information Mike. I am a bit curious about what the most
>> common use cases are for atomic increment/decrement. I'm familiar with
>> atomic add as a sort of locking mechanism.
>
> They're useful for caching denormalized counts of things. Especially things
> that change rapidly. Instead of invalidating the counter whenever an event
> occurs that would incr/decr the counter, you can incr/decr the cached count
> too.

Do you think that a future cassandra increment/decrement would be
incompatible with those use cases?

It seems to me that in that use case, an eventually consistent counter
is as useful as any other eventually consistent datum. In other words,
there is no problem incrementing from 12 to 13 and getting back 15 as
the return value (due to coinciding increments). 15 is the current
correct value. It's arguably more correct then a memcached value which
other processes are trying to update but cannot because of locking.
Benjamin seemed to think that there were applications that depended on
the result always being 13.

I'm trying to understand whether a future cassandra "eventually
consistent" increment/decrement feature based on vector clocks would
have semantics that are incompatible with most deployed uses of
memcached increment/decrement.

 Paul Prescod


Re: Question about node failure...

2010-04-05 Thread Jonathan Ellis
On Mon, Mar 29, 2010 at 6:42 PM, Tatu Saloranta  wrote:
> Perhaps it would be good to have convenience workflow for replacing
> broken host ("squashing lemons")? I would assume that most common use
> case is to effectively replace host that can't be repaired (or perhaps
> it might sometimes be best way to do it anyway), by combination of
> removing failed host, bringing in new one. Handling this is as
> high-level logical operation could be more efficient than doing it
> step by step.

Does anyone have numbers on how badly "nodetool repair" sucks vs
bootstrap + removetoken?  If it's within a reasonable factor of
performance, then I'd say that's the easiest solution.


Re: Slow Responses from 2 of 3 nodes in RC1

2010-04-05 Thread Jonathan Ellis
When you're saying you can check 50 or 100 per second, how many rows
and columns does a check involve?  What query api are you using?

Your cassandra nodes look mostly idle.  Is each client thread getting
the same amount of work or are some finishing sooner than others?  Is
your client cpu or disk perhaps the bottleneck?

On Fri, Apr 2, 2010 at 2:39 PM, Mark Jones  wrote:
> To further complicate matters,
>  when I read only from cassdb1, I can check about 100/second/thread (40 
> threads)
>  when I read only from cassdb2, I can check about 50/second/thread (40 
> threads)
>  when I read only from cassdb3, I can check about 50/second/thread (40 
> threads)
>
> This is with a consistency level of ONE, ALL, or QUORUM  All 3 levels 
> return about the same read rate (~5/second), yet 2 nodes return them at 1/2 
> speed of the other node.
>
> I don't understand how this could be since QUORUM or ALL would require 2 of 
> the 3 to respond in ALL cases, so you would expect the read rate to the 
> 50/second/thread or 100/second/thread, regardless of who does the proxy.
>
> -Original Message-
> From: Mark Jones [mailto:mjo...@imagehawk.com]
> Sent: Friday, April 02, 2010 1:38 PM
> To: user@cassandra.apache.org
> Subject: Slow Responses from 2 of 3 nodes in RC1
>
> I have a 3 node cassandra cluster I'm trying to work with:
>
> All three machines are about the same:
> 6-8GB per machine  (fastest machine has 8GB, JavaVM limited to 5GB)
> separate spindle for cassandra data and commit log
>
> I wrote ~7 Million items to Cassandra, now, I'm trying to read them back, the 
> ones that are missing, might be troubling, but I'm not worried about that 
> yet.  Part of the reason I only have ~7 million items in, is that 2 of the 
> nodes are NOT pulling their weight:
>
>
> I've used "nodetool loadbalance" on them, to get the data evened out, it was 
> terribly imbalanced after ingestion, but it now looks like this:
>
> Address       Status     Load          Range                                  
>     Ring
>                                       169214437894733073017295274330696200891
> 192.168.1.116 Up         1.88 GB       83372832363385696737577075791407985563 
>     |<--|     (cassdb2)
> 192.168.1.119 Up         2.59 GB       
> 167732545381904888270252256443838855184    |   |     (cassdb3)
> 192.168.1.12  Up         2.5 GB        
> 169214437894733073017295274330696200891    |-->|     (cassdb1)
>
> This is a summary report from my checking program(c++).  It runs one thread 
> per file (files contain the originally ingested data), checking to see if the 
> data inserted is present and the same as when it was inserted.  Each thread 
> has its own thrift and Cassandra connection setup. Connection point is 
> randomly chosen at startup and that connection is reused by that thread until 
> the end of the test.  All the threads are running simultaneously and I would 
> expect similar results, but one node is beating the pants off the other two 
> nodes for performance.
>
> In the logs, there are nothing but INFO lines like these (there are others 
> that give less info about performance), no exceptions, warnings:
> cassdb1:
> INFO [COMPACTION-POOL:1] 2010-04-02 08:20:35,339 CompactionManager.java (line 
> 326) Compacted to /cassandra/data/bumble/Contacts-15-Data.db.  
> 262279345/243198299 bytes for 324378 keys.  Time: 16488ms.
>
> cassdb2:
> INFO [COMPACTION-POOL:1] 2010-04-02 08:20:16,448 CompactionManager.java (line 
> 326) Compacted to /cassandra/data/bumble/Contacts-5-Data.db.  
> 251086153/234535924 bytes for 284088 keys.  Time: 22805ms.
>
> cassdb3:
> INFO [COMPACTION-POOL:1] 2010-04-02 08:20:24,429 CompactionManager.java (line 
> 326) Compacted to /cassandra/data/bumble/Contacts-20-Data.db.  
> 266451419/248084737 bytes for 347531 keys.  Time: 25094ms.
>
>
> How do I go about figuring out what is going on in this setup?
>
> Iostat -x data is at the bottom
>
> cassdb1 Checked:    9773 Good:    9770 Missing:       3 Miscompared:        0 
>  '/tmp/QUECD05
> cassdb1 Checked:    9818 Good:    9817 Missing:       1 Miscompared:        0 
>  '/tmp/QUEDE05
> cassdb1 Checked:    9820 Good:    9820 Missing:       0 Miscompared:        0 
>  '/tmp/QUEQ05
> cassdb1 Checked:    9836 Good:    9836 Missing:       0 Miscompared:        0 
>  '/tmp/QUEJ05
> cassdb1 Checked:    9843 Good:    9843 Missing:       0 Miscompared:        0 
>  '/tmp/QUEFG05
> cassdb1 Checked:    9883 Good:    9883 Missing:       0 Miscompared:        0 
>  '/tmp/QUENO05
> cassdb1 Checked:    9884 Good:    9883 Missing:       1 Miscompared:        0 
>  '/tmp/QUEIJ05
> cassdb1 Checked:    9890 Good:    9890 Missing:       0 Miscompared:        0 
>  '/tmp/QUER05
> cassdb1 Checked:    9915 Good:    9913 Missing:       2 Miscompared:        0 
>  '/tmp/QUEMN05
> cassdb1 Checked:    9962 Good:    9962 Missing:       0 Miscompared:        0 
>  '/tmp/QUEF05
> cassdb1 Checke

Re: cms content and numerous sort operations

2010-04-05 Thread Brandon Williams
On Fri, Apr 2, 2010 at 10:06 AM, S Ahmed  wrote:

> Greetings!
>
> Content management systems usually have complex sort operations, how would
> this be best handled with Cassandra?
>
> Is the only way to handle this type of situation to build indexes for each
> and every sort?
>
> example model:
>
> Content: {
> contentID: {
> title: "this is a title",
> body: "this is the body"
>
>// now these are all columns that need to be sorted by
>isActive: "true",
>publishingStatus: 3, // enumeration
>revisionNumber: 234,
>dateCreated: "2010/03/03",
>dateModified: "2010/03/03",
>datePublished: "2010/03/06",
>authorID: 234,
>
>  }
> }
>
>
> The only solution I can think of is to create a seperate CF that maps the
> contentID and the column I need to sort by, so for dateCreated:
>
> ContentDateCreatedSort : {
>   contentID: { dateCreated: "2010/03/03" }
> }
>
>
> Am I on the right track here? Or is there a better way?


You're on the right track, denormalization is the best way to handle this.

-Brandon


Re: Heap sudden jump during import

2010-04-05 Thread Jonathan Ellis
Usually sudden heap jumps involve compacting large rows.

0.6 (since beta3) includes a warning log when it finishes compacts a
row over 500MB by default, in the hopes that this will give you enough
time to fix things before whatever is making large rows makes one too
large to fit in memory.

On Fri, Apr 2, 2010 at 4:57 PM, Weijun Li  wrote:
> I'm running a test to write 30 million columns (700bytes each) to Cassandra:
> the process ran smoothly for about 20mil then the heap usage suddenly jumped
> from 2GB to 3GB which is the up limit of JVM, --from this point Cassandra
> will freeze for long time (terrible latency, no response to nodetool that I
> have to stop the import client ) before it comes back to normal . It's a
> single node cluster with JVM maximum heap size of 3GB. So what could cause
> this spike? What kind of tool can I use to find out what are the objects
> that are filling the additional 1GB heap? I did a heap dump but could get
> jhat to work to browse the dumped file.
>
> Thanks,
>
> -Weijun
>


Re: 0.5.1 exception: java.io.IOException: Reached an EOL or something bizzare occured

2010-04-05 Thread Jonathan Ellis
Short answer: upgrade to 0.6.

On Sat, Apr 3, 2010 at 7:56 AM, Anty  wrote:
> Does anyone have solve the problem?I encounter the same error too.
>
> On Mon, Mar 29, 2010 at 12:12 AM, Benoit Perroud  wrote:
>>
>> I got the same error when the nodes are using lot of I/O, i.e during
>> compaction.
>>
>> 2010/3/28 Eric Yu :
>> > I have not restart my nodes.
>> > OK, may be I should give 0.6 a try.
>> >
>> > On Sun, Mar 28, 2010 at 9:53 AM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> It means that a MessagingService socket closed unexpectedly.  If
>> >> you're starting and restarting nodes that could cause it.
>> >>
>> >> This code is obsolete in 0.6 anyway.
>> >>
>> >> On Sat, Mar 27, 2010 at 8:51 PM, Eric Yu  wrote:
>> >> > And one more clue here, when ReplicateFactor is 1, it's OK, after
>> >> > changed to
>> >> > 2, the exception occurred.
>> >> >
>> >> > On Sun, Mar 28, 2010 at 9:46 AM, Eric Yu  wrote:
>> >> >>
>> >> >> Hi Jonathan,
>> >> >>
>> >> >> I upgraded my jdk to latest version, and I am sure I start Cassandra
>> >> >> with
>> >> >> it (set JAVA_HOME in cassansra.in.sh).
>> >> >> But the exception still there, any idea?
>> >> >>
>> >> >> On Sun, Mar 28, 2010 at 12:02 AM, Jonathan Ellis 
>> >> >> wrote:
>> >> >>>
>> >> >>> This means you need to upgrade your jdk to build 18 or later
>> >> >>>
>> >> >>> On Sat, Mar 27, 2010 at 10:55 AM, Eric Yu  wrote:
>> >> >>> > Hi, list
>> >> >>> > I got this exception when insert into a cluster with 5 node, is
>> >> >>> > this
>> >> >>> > a
>> >> >>> > bug
>> >> >>> > or something else is wrong.
>> >> >>> >
>> >> >>> > here is the system log:
>> >> >>> >
>> >> >>> >  INFO [GMFD:1] 2010-03-27 23:15:16,145 Gossiper.java (line 543)
>> >> >>> > InetAddress
>> >> >>> > /172.19.15.210 is now UP
>> >> >>> > ERROR [Timer-1] 2010-03-27 23:23:27,739 TcpConnection.java (line
>> >> >>> > 308)
>> >> >>> > Closing down connection java.nio.channels.SocketChannel[connected
>> >> >>> > local=/172.19.15.209:58261 remote=/172.19.15.210:7000] with
>> >> >>> > 342218
>> >> >>> > writes
>> >> >>> > remaining.
>> >> >>> >  INFO [Timer-1] 2010-03-27 23:23:27,792 Gossiper.java (line 194)
>> >> >>> > InetAddress
>> >> >>> > /172.19.15.210 is now dead.
>> >> >>> >  INFO [GMFD:1] 2010-03-27 23:23:32,214 Gossiper.java (line 543)
>> >> >>> > InetAddress
>> >> >>> > /172.19.15.210 is now UP
>> >> >>> > ERROR [Timer-1] 2010-03-27 23:24:47,846 TcpConnection.java (line
>> >> >>> > 308)
>> >> >>> > Closing down connection java.nio.channels.SocketChannel[connected
>> >> >>> > local=/172.19.15.209:59801 remote=/172.19.15.210:7000] with
>> >> >>> > 256285
>> >> >>> > writes
>> >> >>> > remaining.
>> >> >>> >  INFO [Timer-1] 2010-03-27 23:24:47,846 Gossiper.java (line 194)
>> >> >>> > InetAddress
>> >> >>> > /172.19.15.210 is now dead.
>> >> >>> >  WARN [MESSAGING-SERVICE-POOL:1] 2010-03-27 23:25:05,580
>> >> >>> > TcpConnection.java
>> >> >>> > (line 484) Problem reading from socket connected to :
>> >> >>> > java.nio.channels.SocketChannel[connected
>> >> >>> > local=/172.19.15.209:7000
>> >> >>> > remote=/172.19.15.210:55473]
>> >> >>> >  INFO [GMFD:1] 2010-03-27 23:25:05,580 Gossiper.java (line 543)
>> >> >>> > InetAddress
>> >> >>> > /172.19.15.210 is now UP
>> >> >>> >  WARN [MESSAGING-SERVICE-POOL:2] 2010-03-27 23:25:05,580
>> >> >>> > TcpConnection.java
>> >> >>> > (line 484) Problem reading from socket connected to :
>> >> >>> > java.nio.channels.SocketChannel[connected
>> >> >>> > local=/172.19.15.209:7000
>> >> >>> > remote=/172.19.15.210:45504]
>> >> >>> >  WARN [MESSAGING-SERVICE-POOL:2] 2010-03-27 23:25:05,580
>> >> >>> > TcpConnection.java
>> >> >>> > (line 485) Exception was generated at : 03/27/2010 23:25:05 on
>> >> >>> > thread
>> >> >>> > MESSAGING-SERVICE-POOL:2
>> >> >>> > Reached an EOL or something bizzare occured. Reading from:
>> >> >>> > /172.19.15.210
>> >> >>> > BufferSizeRemaining: 16
>> >> >>> > java.io.IOException: Reached an EOL or something bizzare occured.
>> >> >>> > Reading
>> >> >>> > from: /172.19.15.210 BufferSizeRemaining: 16
>> >> >>> >     at
>> >> >>> > org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>> >> >>> >     at
>> >> >>> >
>> >> >>> >
>> >> >>> > org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>> >> >>> >     at
>> >> >>> > org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>> >> >>> >     at
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>> >> >>> >     at
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >> >>> >     at
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >> >>> >     at java.lang.Thread.run(Thread.java:636)
>> >> >>> >
>> >> >>> >  INFO [MESSAGING-SERVICE-POOL:2] 2010-03-27 23:25:05,580
>> >> >>> > TcpConnection.java
>> >

Re: cascal - high level scala cassandra client (yes - another one)

2010-04-05 Thread Jonathan Ellis
Cool, you should add it to
http://wiki.apache.org/cassandra/ClientOptions.  (Click Login to get a
sign up page.)

On Sat, Apr 3, 2010 at 11:38 AM, Chris Shorrock  wrote:
> For the past week or so I've been developing (another) Scala based high
> level Cassandra client - Cascal. While I know there's several other (good
> quality) clients I thought developing my own would be a great way to
> familiarize myself with Cassandra as part of my analysis at work (which it
> was!).
> While I didn't write it intending to release it, now that I've completed it
> I've decided to release it into the wild (currently built against the
> 0.6-beta3 version of Cassandra) as I feel it takes a little different
> approach than some of the other libraries out there. While this may not be
> production ready, I will be using it to perform several tests for work so if
> those perform as I expect them to my plan is to use and maintain this for
> some time.
> Documentation is available at: http://wiki.github.com/shorrockin/cascal/
> Source is available through: http://github.com/shorrockin/cascal
> ScalaDocs: http://shorrockin.com/cascal/scaladocs/
> Any and all feedback is welcome. Cheers.


Re: Memcached protocol?

2010-04-05 Thread Mike Malone
On Mon, Apr 5, 2010 at 1:46 PM, Paul Prescod  wrote:

> On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone  wrote:
> >> That's useful information Mike. I am a bit curious about what the most
> >> common use cases are for atomic increment/decrement. I'm familiar with
> >> atomic add as a sort of locking mechanism.
> >
> > They're useful for caching denormalized counts of things. Especially
> things
> > that change rapidly. Instead of invalidating the counter whenever an
> event
> > occurs that would incr/decr the counter, you can incr/decr the cached
> count
> > too.
>
> Do you think that a future cassandra increment/decrement would be
> incompatible with those use cases?
>
> It seems to me that in that use case, an eventually consistent counter
> is as useful as any other eventually consistent datum.


An eventually consistent count operation in Cassandra would be great, and it
would satisfy all of the use cases I would typically use counts for in
memcached. It's just a matter of reconciling inconsistencies with a more
sophisticated operation than "latest write wins" (specifically, the
reconciliation operation should apply all incr/decr ops).

Mike


Re: Flush Commit Log

2010-04-05 Thread Jonathan Ellis
You'll have to give a more detailed error.  "nodeprobe flush" is
exactly what you should be trying.

On Mon, Apr 5, 2010 at 2:37 AM, JKnight JKnight  wrote:
> Dear all,
>
> How can I flush all Commit Log for Cassandra version 042?
> I use nodeprobe flush but It seem does not run.
>
> Thank a lot for support.
>
> --
> Best regards,
> JKnight
>


Re: cascal - high level scala cassandra client (yes - another one)

2010-04-05 Thread Mike Malone
On Sat, Apr 3, 2010 at 12:12 PM, Matthew Chambers
wrote:

> Your git page looks great, I like your cassandra explanation and graphic.


+1 on the docs - they're very nice. Off-topic, but what'd you use to create
that graphic?

Mike


Re: cascal - high level scala cassandra client (yes - another one)

2010-04-05 Thread Chris Shorrock
Thanks guys - Will definitely toss mention of it in the Wiki..(the graphic
was created using http://yuml.me/ - Great tool for quickly throwing
something together using pretty simple syntax)

On Mon, Apr 5, 2010 at 2:43 PM, Mike Malone  wrote:

> On Sat, Apr 3, 2010 at 12:12 PM, Matthew Chambers  > wrote:
>
>> Your git page looks great, I like your cassandra explanation and graphic.
>
>
> +1 on the docs - they're very nice. Off-topic, but what'd you use to create
> that graphic?
>
> Mike
>


Re: Question about node failure...

2010-04-05 Thread Rob Coli

On 4/5/10 2:11 PM, Jonathan Ellis wrote:

On Mon, Mar 29, 2010 at 6:42 PM, Tatu Saloranta  wrote:

Perhaps it would be good to have convenience workflow for replacing
broken host ("squashing lemons")? I would assume that most common use

 [ snip ]
Does anyone have numbers on how badly "nodetool repair" sucks vs
bootstrap + removetoken?  If it's within a reasonable factor of
performance, then I'd say that's the easiest solution.


As I understand it, a node which is in the midst of a "repair" operation 
is actually in a meaningfully different state from a node which is 
bootstrapping. The "repair"ing node can serve blank (?) data in the case 
where it is asked for data it should have but doesn't yet, with a 
ConsistencyLevel of ONE. AFAIK, there is no way to make a bootstrapping 
node return invalid responses in this way.


Any details from people who are more familiar with the particular code 
path in question would of course be appreciated. :)


=Rob



Re: Question about node failure...

2010-04-05 Thread Jonathan Ellis
On Mon, Apr 5, 2010 at 5:20 PM, Rob Coli  wrote:
> On 4/5/10 2:11 PM, Jonathan Ellis wrote:
>>
>> On Mon, Mar 29, 2010 at 6:42 PM, Tatu Saloranta
>>  wrote:
>>>
>>> Perhaps it would be good to have convenience workflow for replacing
>>> broken host ("squashing lemons")? I would assume that most common use
>>
>>  [ snip ]
>> Does anyone have numbers on how badly "nodetool repair" sucks vs
>> bootstrap + removetoken?  If it's within a reasonable factor of
>> performance, then I'd say that's the easiest solution.
>
> As I understand it, a node which is in the midst of a "repair" operation is
> actually in a meaningfully different state from a node which is
> bootstrapping. The "repair"ing node can serve blank (?) data in the case
> where it is asked for data it should have but doesn't yet, with a
> ConsistencyLevel of ONE. AFAIK, there is no way to make a bootstrapping node
> return invalid responses in this way.

True enough.  Created https://issues.apache.org/jira/browse/CASSANDRA-957


Re: Memcached protocol?

2010-04-05 Thread Tatu Saloranta
On Mon, Apr 5, 2010 at 1:46 PM, Paul Prescod  wrote:
> On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone  wrote:
>>> That's useful information Mike. I am a bit curious about what the most
>>> common use cases are for atomic increment/decrement. I'm familiar with
>>> atomic add as a sort of locking mechanism.
>>
>> They're useful for caching denormalized counts of things. Especially things
>> that change rapidly. Instead of invalidating the counter whenever an event
>> occurs that would incr/decr the counter, you can incr/decr the cached count
>> too.
>
> Do you think that a future cassandra increment/decrement would be
> incompatible with those use cases?
>
> It seems to me that in that use case, an eventually consistent counter
> is as useful as any other eventually consistent datum. In other words,
> there is no problem incrementing from 12 to 13 and getting back 15 as
> the return value (due to coinciding increments). 15 is the current
> correct value. It's arguably more correct then a memcached value which
> other processes are trying to update but cannot because of locking.
> Benjamin seemed to think that there were applications that depended on
> the result always being 13.

I would think that there is also possibility of losing some
increments, or perhaps getting duplicate increments?
It is not just isolation but also correctness that is hard to maintain
but correctness also. This can be more easily worked around in cases
where there is additional data that can be used to resolve potentially
ambiguous changes (like inferring which of shopping cart additions are
real, which duplicates).
With more work I am sure it is possible to get things mostly working,
it's just question of cost/benefit for specific use cases.

I think distributed counters are useful, but difficulty depends on
what are expected levels of concurrency/correctness/isolation.
There are many use cases where "about right" (or at least only losing
additions, or only getting extra ones) is enough. For example, when
calculating charges for usage, it is probably ok to lose some usage
charges, but not add bogus ones. If mostly consistent result can be
achieved cheaply, there is no point in implementing more complex
system to get minor increment (prevent loss of, say, 2% of
uncounted-for requests).

-+ Tatu +-


cassandra data viewer?

2010-04-05 Thread AJ Chen
Is there a generic GUI tool for viewing cassandra datastore? being able to
view and edit data from a GUI tool like oracle sqldeveloper is very useful.
-aj


Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
On Mon, Apr 5, 2010 at 4:48 PM, Tatu Saloranta  wrote:
> ...
>
> I would think that there is also possibility of losing some
> increments, or perhaps getting duplicate increments?

I believe that with vector clocks in Cassandra 0.7 you won't lose
anything. The conflict resolver will do the summation for you
properly.

If I'm wrong, I'd love to hear more, though.

 Paul Prescod


Re: cassandra data viewer?

2010-04-05 Thread selam
 look at chiton on github.

On Tue, Apr 6, 2010 at 3:06 AM, AJ Chen  wrote:
> Is there a generic GUI tool for viewing cassandra datastore? being able to
> view and edit data from a GUI tool like oracle sqldeveloper is very useful.
> -aj
>



-- 
Saygılar && İyi Çalışmalar
Timu EREN ( a.k.a selam )


Re: Flush Commit Log

2010-04-05 Thread JKnight JKnight
Thanks Jonathan,

When I run "nodeprobe flush" with parameter -host is Cassandra server setup
on my computer, my computer is hang up by Cassandra. (When I kill all Java
process, the computer will work well)

Yesterday, when run "nodeprobe flush" on my live server, I didn't flush all
keyspace so that commit log files weren't deleted. Today, after flush for
all keyspace, commit log files were deleted


On Mon, Apr 5, 2010 at 5:42 PM, Jonathan Ellis  wrote:

> You'll have to give a more detailed error.  "nodeprobe flush" is
> exactly what you should be trying.
>
> On Mon, Apr 5, 2010 at 2:37 AM, JKnight JKnight 
> wrote:
> > Dear all,
> >
> > How can I flush all Commit Log for Cassandra version 042?
> > I use nodeprobe flush but It seem does not run.
> >
> > Thank a lot for support.
> >
> > --
> > Best regards,
> > JKnight
> >
>



-- 
Best regards,
JKnight


Re: Flush Commit Log

2010-04-05 Thread Jonathan Ellis
On Mon, Apr 5, 2010 at 9:11 PM, JKnight JKnight  wrote:
> Thanks Jonathan,
>
> When I run "nodeprobe flush" with parameter -host is Cassandra server setup
> on my computer, my computer is hang up by Cassandra. (When I kill all Java
> process, the computer will work well)

Sounds like flush generates a lot of i/o.  Not surprising.

> Yesterday, when run "nodeprobe flush" on my live server, I didn't flush all
> keyspace so that commit log files weren't deleted. Today, after flush for
> all keyspace, commit log files were deleted

So... no problem, right?

-Jonathan


Overwhelming a cluster with writes?

2010-04-05 Thread Ilya Maykov
Hi all,

I've just started experimenting with Cassandra to get a feel for the
system. I've set up a test cluster and to get a ballpark idea of its
performance I wrote a simple tool to load some toy data into the
system. Surprisingly, I am able to "overwhelm" my 4-node cluster with
writes from a single client. I'm trying to figure out if this is a
problem with my setup, if I'm hitting bugs in the Cassandra codebase,
or if this is intended behavior. Sorry this email is kind of long,
here is the TLDR version:

While writing to Cassandra from a single node, I am able to get the
cluster into a bad state, where nodes are randomly disconnecting from
each other, write performance plummets, and sometimes nodes even
crash. Further, the nodes do not recover as long as the writes
continue (even at a much lower rate), and sometimes do not recover at
all unless I restart them. I can get this to happen simply by throwing
data at the cluster fast enough, and I'm wondering if this is a known
issue or if I need to tweak my setup.

Now, the details.

First, a little bit about the setup:

4-node cluster of identical machines, running cassandra-0.6.0-rc1 with
the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched
in. Node specs:
8-core Intel Xeon e5...@2.00ghz
8GB RAM
1Gbit ethernet
Red Hat Linux 2.6.18
JVM 1.6.0_19 64-bit
1TB spinning disk houses both commitlog and data directories (which I
know is not ideal).
The client machine is on the same local network and has very similar specs.

The cassandra nodes are started with the following JVM options:

./cassandra JVM_OPTS="-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64
-XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC"

I'm using default settings for all of the tunable stuff at the bottom
of storage-conf.xml. I also selected my initial tokens to evenly
partition the key space when the cluster was bootstrapped. I am using
the RandomPartitioner.

Now, about the test. Basically I am trying to get an idea of just how
fast I can make this thing go. I am writing ~250M data records into
the cluster, replicated at 3x, using Ran Tavory's Hector client
(Java), writing with ConsistencyLevel.ZERO and
FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8
threads talking to each of the 4 nodes in the cluster. Records are
identified by a numeric id, and I'm writing them in batches of up to
10k records per row, with each record in its own column. The row key
identifies the bucket into which records fall. So, records with ids 0
-  are written to row "0", 1 - 1 are written to row
"1", etc. Each record is a JSON object with ~10-20 fields.

Records: {  // Column Family
  0 : {  // row key for the start of the bucket. Buckets span a range
of up to 1 records
    1 : "{ /* some JSON */ }",  // Column for record with id=1
    3 : "{ /* some more JSON */ }",  // Column for record with id=3
...
 : "{ /* ... */ }"
  },
  1 : {  // row key for the start of the next bucket
10001 : ...
10004 :
}

I am reading the data out of a local, sorted file on the client, so I
only write a row to Cassandra once all records for that row have been
read, and each row is written to exactly once. I'm using a
producer-consumer queue to pump data from the input reader thread to
the output writer threads. I found that I have to throttle the reader
thread heavily in order to get good behavior. So, if I make the reader
sleep for 7 seconds every 1M records, everything is fine - the data
loads in about an hour, half of which is spent by the reader thread
sleeping. In between the sleeps, I see ~40-50 MB/s throughput on the
client's network interface while the reader is not sleeping, and it
takes ~7-8 seconds to write each batch of 1M records.

Now, if I remove the 7 second sleeps on the client side, things get
bad after the first ~8M records are written to the client. Write
throughput drops to <5 MB/s. I start seeing messages about nodes
disconnecting and reconnecting in Cassandra's system.log, as well as
lots of GC messages:

...
 INFO [Timer-1] 2010-04-06 04:03:27,178 Gossiper.java (line 179)
InetAddress /10.15.38.88 is now dead.
 INFO [GC inspection] 2010-04-06 04:03:30,259 GCInspector.java (line
110) GC for ConcurrentMarkSweep: 2989 ms, 55326320 reclaimed leaving
1035998648 used; max is 1211170816
 INFO [GC inspection] 2010-04-06 04:03:41,838 GCInspector.java (line
110) GC for ConcurrentMarkSweep: 3004 ms, 24377240 reclaimed leaving
1066120952 used; max is 1211170816
 INFO [Timer-1] 2010-04-06 04:03:44,136 Gossiper.java (line 179)
InetAddress /10.15.38.55 is now dead.
 INFO [GMFD:1] 2010-04-06 04:03:44,138 Gossiper.java (line 568)
InetAddress /10.15.38.55 is now UP
 INFO [GC inspection] 2010-04-06 04:03:52,957 GCInspector.java (line
110) GC for ConcurrentMarkSweep: 2319 ms, 4504888 reclaimed leaving
1086023832 used; max is 1211170816
 INFO [Timer-1] 2010-04-06 04:04:19,508 Gossiper.java (line 179)
InetAddress /10.15.38.242 is now dead.
 INFO [Tim

Re: Overwhelming a cluster with writes?

2010-04-05 Thread Ilya Maykov
I just tried the same test with ConsistencyLevel.ALL, and the problem
went away - the writes are somewhat slower but the cluster never gets
into a bad state. So, I wonder if this is a bug in Cassandra's
handling of async / "non-ConsistencyLevel.ALL" writes ...

-- Ilya

On Mon, Apr 5, 2010 at 9:31 PM, Ilya Maykov  wrote:
> Hi all,
>
> I've just started experimenting with Cassandra to get a feel for the
> system. I've set up a test cluster and to get a ballpark idea of its
> performance I wrote a simple tool to load some toy data into the
> system. Surprisingly, I am able to "overwhelm" my 4-node cluster with
> writes from a single client. I'm trying to figure out if this is a
> problem with my setup, if I'm hitting bugs in the Cassandra codebase,
> or if this is intended behavior. Sorry this email is kind of long,
> here is the TLDR version:
>
> While writing to Cassandra from a single node, I am able to get the
> cluster into a bad state, where nodes are randomly disconnecting from
> each other, write performance plummets, and sometimes nodes even
> crash. Further, the nodes do not recover as long as the writes
> continue (even at a much lower rate), and sometimes do not recover at
> all unless I restart them. I can get this to happen simply by throwing
> data at the cluster fast enough, and I'm wondering if this is a known
> issue or if I need to tweak my setup.
>
> Now, the details.
>
> First, a little bit about the setup:
>
> 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with
> the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched
> in. Node specs:
> 8-core Intel Xeon e5...@2.00ghz
> 8GB RAM
> 1Gbit ethernet
> Red Hat Linux 2.6.18
> JVM 1.6.0_19 64-bit
> 1TB spinning disk houses both commitlog and data directories (which I
> know is not ideal).
> The client machine is on the same local network and has very similar specs.
>
> The cassandra nodes are started with the following JVM options:
>
> ./cassandra JVM_OPTS="-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64
> -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC"
>
> I'm using default settings for all of the tunable stuff at the bottom
> of storage-conf.xml. I also selected my initial tokens to evenly
> partition the key space when the cluster was bootstrapped. I am using
> the RandomPartitioner.
>
> Now, about the test. Basically I am trying to get an idea of just how
> fast I can make this thing go. I am writing ~250M data records into
> the cluster, replicated at 3x, using Ran Tavory's Hector client
> (Java), writing with ConsistencyLevel.ZERO and
> FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8
> threads talking to each of the 4 nodes in the cluster. Records are
> identified by a numeric id, and I'm writing them in batches of up to
> 10k records per row, with each record in its own column. The row key
> identifies the bucket into which records fall. So, records with ids 0
> -  are written to row "0", 1 - 1 are written to row
> "1", etc. Each record is a JSON object with ~10-20 fields.
>
> Records: {  // Column Family
>   0 : {  // row key for the start of the bucket. Buckets span a range
> of up to 1 records
>     1 : "{ /* some JSON */ }",  // Column for record with id=1
>     3 : "{ /* some more JSON */ }",  // Column for record with id=3
>    ...
>     : "{ /* ... */ }"
>   },
>  1 : {  // row key for the start of the next bucket
>    10001 : ...
>    10004 :
> }
>
> I am reading the data out of a local, sorted file on the client, so I
> only write a row to Cassandra once all records for that row have been
> read, and each row is written to exactly once. I'm using a
> producer-consumer queue to pump data from the input reader thread to
> the output writer threads. I found that I have to throttle the reader
> thread heavily in order to get good behavior. So, if I make the reader
> sleep for 7 seconds every 1M records, everything is fine - the data
> loads in about an hour, half of which is spent by the reader thread
> sleeping. In between the sleeps, I see ~40-50 MB/s throughput on the
> client's network interface while the reader is not sleeping, and it
> takes ~7-8 seconds to write each batch of 1M records.
>
> Now, if I remove the 7 second sleeps on the client side, things get
> bad after the first ~8M records are written to the client. Write
> throughput drops to <5 MB/s. I start seeing messages about nodes
> disconnecting and reconnecting in Cassandra's system.log, as well as
> lots of GC messages:
>
> ...
>  INFO [Timer-1] 2010-04-06 04:03:27,178 Gossiper.java (line 179)
> InetAddress /10.15.38.88 is now dead.
>  INFO [GC inspection] 2010-04-06 04:03:30,259 GCInspector.java (line
> 110) GC for ConcurrentMarkSweep: 2989 ms, 55326320 reclaimed leaving
> 1035998648 used; max is 1211170816
>  INFO [GC inspection] 2010-04-06 04:03:41,838 GCInspector.java (line
> 110) GC for ConcurrentMarkSweep: 3004 ms, 24377240 reclaimed leaving
> 1066120952 used; max i

Re: Overwhelming a cluster with writes?

2010-04-05 Thread Boris Shulman
You are running out of memory on your nodes. Before the final crash
your nodes are probably slow  due to GC. What is your memtable size?
What cache options did you configure?

On Tue, Apr 6, 2010 at 7:31 AM, Ilya Maykov  wrote:
> Hi all,
>
> I've just started experimenting with Cassandra to get a feel for the
> system. I've set up a test cluster and to get a ballpark idea of its
> performance I wrote a simple tool to load some toy data into the
> system. Surprisingly, I am able to "overwhelm" my 4-node cluster with
> writes from a single client. I'm trying to figure out if this is a
> problem with my setup, if I'm hitting bugs in the Cassandra codebase,
> or if this is intended behavior. Sorry this email is kind of long,
> here is the TLDR version:
>
> While writing to Cassandra from a single node, I am able to get the
> cluster into a bad state, where nodes are randomly disconnecting from
> each other, write performance plummets, and sometimes nodes even
> crash. Further, the nodes do not recover as long as the writes
> continue (even at a much lower rate), and sometimes do not recover at
> all unless I restart them. I can get this to happen simply by throwing
> data at the cluster fast enough, and I'm wondering if this is a known
> issue or if I need to tweak my setup.
>
> Now, the details.
>
> First, a little bit about the setup:
>
> 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with
> the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched
> in. Node specs:
> 8-core Intel Xeon e5...@2.00ghz
> 8GB RAM
> 1Gbit ethernet
> Red Hat Linux 2.6.18
> JVM 1.6.0_19 64-bit
> 1TB spinning disk houses both commitlog and data directories (which I
> know is not ideal).
> The client machine is on the same local network and has very similar specs.
>
> The cassandra nodes are started with the following JVM options:
>
> ./cassandra JVM_OPTS="-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64
> -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC"
>
> I'm using default settings for all of the tunable stuff at the bottom
> of storage-conf.xml. I also selected my initial tokens to evenly
> partition the key space when the cluster was bootstrapped. I am using
> the RandomPartitioner.
>
> Now, about the test. Basically I am trying to get an idea of just how
> fast I can make this thing go. I am writing ~250M data records into
> the cluster, replicated at 3x, using Ran Tavory's Hector client
> (Java), writing with ConsistencyLevel.ZERO and
> FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8
> threads talking to each of the 4 nodes in the cluster. Records are
> identified by a numeric id, and I'm writing them in batches of up to
> 10k records per row, with each record in its own column. The row key
> identifies the bucket into which records fall. So, records with ids 0
> -  are written to row "0", 1 - 1 are written to row
> "1", etc. Each record is a JSON object with ~10-20 fields.
>
> Records: {  // Column Family
>   0 : {  // row key for the start of the bucket. Buckets span a range
> of up to 1 records
>     1 : "{ /* some JSON */ }",  // Column for record with id=1
>     3 : "{ /* some more JSON */ }",  // Column for record with id=3
>    ...
>     : "{ /* ... */ }"
>   },
>  1 : {  // row key for the start of the next bucket
>    10001 : ...
>    10004 :
> }
>
> I am reading the data out of a local, sorted file on the client, so I
> only write a row to Cassandra once all records for that row have been
> read, and each row is written to exactly once. I'm using a
> producer-consumer queue to pump data from the input reader thread to
> the output writer threads. I found that I have to throttle the reader
> thread heavily in order to get good behavior. So, if I make the reader
> sleep for 7 seconds every 1M records, everything is fine - the data
> loads in about an hour, half of which is spent by the reader thread
> sleeping. In between the sleeps, I see ~40-50 MB/s throughput on the
> client's network interface while the reader is not sleeping, and it
> takes ~7-8 seconds to write each batch of 1M records.
>
> Now, if I remove the 7 second sleeps on the client side, things get
> bad after the first ~8M records are written to the client. Write
> throughput drops to <5 MB/s. I start seeing messages about nodes
> disconnecting and reconnecting in Cassandra's system.log, as well as
> lots of GC messages:
>
> ...
>  INFO [Timer-1] 2010-04-06 04:03:27,178 Gossiper.java (line 179)
> InetAddress /10.15.38.88 is now dead.
>  INFO [GC inspection] 2010-04-06 04:03:30,259 GCInspector.java (line
> 110) GC for ConcurrentMarkSweep: 2989 ms, 55326320 reclaimed leaving
> 1035998648 used; max is 1211170816
>  INFO [GC inspection] 2010-04-06 04:03:41,838 GCInspector.java (line
> 110) GC for ConcurrentMarkSweep: 3004 ms, 24377240 reclaimed leaving
> 1066120952 used; max is 1211170816
>  INFO [Timer-1] 2010-04-06 04:03:44,136 Gossiper.java (line 179)
> InetAddress /

Re: Overwhelming a cluster with writes?

2010-04-05 Thread Ilya Maykov
I'm running the nodes with a JVM heap size of 6GB, and here are the
related options from my storage-conf.xml. As mentioned in the first
email, I left everything at the default value. I briefly googled
around for "Cassandra performance tuning" etc but haven't found a
definitive guide ... any help with tuning these parameters is greatly
appreciated!

  auto
  512
  64
  32
  8
  64
  64
  256
  0.3
  60
  8
  64
  periodic
  1
  864000

-- Ilya

On Mon, Apr 5, 2010 at 11:26 PM, Boris Shulman  wrote:
> You are running out of memory on your nodes. Before the final crash
> your nodes are probably slow  due to GC. What is your memtable size?
> What cache options did you configure?
>
> On Tue, Apr 6, 2010 at 7:31 AM, Ilya Maykov  wrote:
>> Hi all,
>>
>> I've just started experimenting with Cassandra to get a feel for the
>> system. I've set up a test cluster and to get a ballpark idea of its
>> performance I wrote a simple tool to load some toy data into the
>> system. Surprisingly, I am able to "overwhelm" my 4-node cluster with
>> writes from a single client. I'm trying to figure out if this is a
>> problem with my setup, if I'm hitting bugs in the Cassandra codebase,
>> or if this is intended behavior. Sorry this email is kind of long,
>> here is the TLDR version:
>>
>> While writing to Cassandra from a single node, I am able to get the
>> cluster into a bad state, where nodes are randomly disconnecting from
>> each other, write performance plummets, and sometimes nodes even
>> crash. Further, the nodes do not recover as long as the writes
>> continue (even at a much lower rate), and sometimes do not recover at
>> all unless I restart them. I can get this to happen simply by throwing
>> data at the cluster fast enough, and I'm wondering if this is a known
>> issue or if I need to tweak my setup.
>>
>> Now, the details.
>>
>> First, a little bit about the setup:
>>
>> 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with
>> the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched
>> in. Node specs:
>> 8-core Intel Xeon e5...@2.00ghz
>> 8GB RAM
>> 1Gbit ethernet
>> Red Hat Linux 2.6.18
>> JVM 1.6.0_19 64-bit
>> 1TB spinning disk houses both commitlog and data directories (which I
>> know is not ideal).
>> The client machine is on the same local network and has very similar specs.
>>
>> The cassandra nodes are started with the following JVM options:
>>
>> ./cassandra JVM_OPTS="-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64
>> -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC"
>>
>> I'm using default settings for all of the tunable stuff at the bottom
>> of storage-conf.xml. I also selected my initial tokens to evenly
>> partition the key space when the cluster was bootstrapped. I am using
>> the RandomPartitioner.
>>
>> Now, about the test. Basically I am trying to get an idea of just how
>> fast I can make this thing go. I am writing ~250M data records into
>> the cluster, replicated at 3x, using Ran Tavory's Hector client
>> (Java), writing with ConsistencyLevel.ZERO and
>> FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8
>> threads talking to each of the 4 nodes in the cluster. Records are
>> identified by a numeric id, and I'm writing them in batches of up to
>> 10k records per row, with each record in its own column. The row key
>> identifies the bucket into which records fall. So, records with ids 0
>> -  are written to row "0", 1 - 1 are written to row
>> "1", etc. Each record is a JSON object with ~10-20 fields.
>>
>> Records: {  // Column Family
>>   0 : {  // row key for the start of the bucket. Buckets span a range
>> of up to 1 records
>>     1 : "{ /* some JSON */ }",  // Column for record with id=1
>>     3 : "{ /* some more JSON */ }",  // Column for record with id=3
>>    ...
>>     : "{ /* ... */ }"
>>   },
>>  1 : {  // row key for the start of the next bucket
>>    10001 : ...
>>    10004 :
>> }
>>
>> I am reading the data out of a local, sorted file on the client, so I
>> only write a row to Cassandra once all records for that row have been
>> read, and each row is written to exactly once. I'm using a
>> producer-consumer queue to pump data from the input reader thread to
>> the output writer threads. I found that I have to throttle the reader
>> thread heavily in order to get good behavior. So, if I make the reader
>> sleep for 7 seconds every 1M records, everything is fine - the data
>> loads in about an hour, half of which is spent by the reader thread
>> sleeping. In between the sleeps, I see ~40-50 MB/s throughput on the
>> client's network interface while the reader is not sleeping, and it
>> takes ~7-8 seconds to write each batch of 1M records.
>>
>> Now, if I remove the 7 second sleeps on the client side, things get
>> bad after the first ~8M records are written to the client. Write
>> throughput drops to <5 MB/s. I start seeing messages about nodes
>> disconnecting and reconnecting in Cassandra'

Re: Overwhelming a cluster with writes?

2010-04-05 Thread Ran Tavory
Do you see one of the disks used by cassandra filled up when a node crashes?

On Tue, Apr 6, 2010 at 9:39 AM, Ilya Maykov  wrote:

> I'm running the nodes with a JVM heap size of 6GB, and here are the
> related options from my storage-conf.xml. As mentioned in the first
> email, I left everything at the default value. I briefly googled
> around for "Cassandra performance tuning" etc but haven't found a
> definitive guide ... any help with tuning these parameters is greatly
> appreciated!
>
>  auto
>  512
>  64
>  32
>  8
>  64
>  64
>  256
>  0.3
>  60
>  8
>  64
>  periodic
>  1
>  864000
>
> -- Ilya
>
> On Mon, Apr 5, 2010 at 11:26 PM, Boris Shulman  wrote:
> > You are running out of memory on your nodes. Before the final crash
> > your nodes are probably slow  due to GC. What is your memtable size?
> > What cache options did you configure?
> >
> > On Tue, Apr 6, 2010 at 7:31 AM, Ilya Maykov  wrote:
> >> Hi all,
> >>
> >> I've just started experimenting with Cassandra to get a feel for the
> >> system. I've set up a test cluster and to get a ballpark idea of its
> >> performance I wrote a simple tool to load some toy data into the
> >> system. Surprisingly, I am able to "overwhelm" my 4-node cluster with
> >> writes from a single client. I'm trying to figure out if this is a
> >> problem with my setup, if I'm hitting bugs in the Cassandra codebase,
> >> or if this is intended behavior. Sorry this email is kind of long,
> >> here is the TLDR version:
> >>
> >> While writing to Cassandra from a single node, I am able to get the
> >> cluster into a bad state, where nodes are randomly disconnecting from
> >> each other, write performance plummets, and sometimes nodes even
> >> crash. Further, the nodes do not recover as long as the writes
> >> continue (even at a much lower rate), and sometimes do not recover at
> >> all unless I restart them. I can get this to happen simply by throwing
> >> data at the cluster fast enough, and I'm wondering if this is a known
> >> issue or if I need to tweak my setup.
> >>
> >> Now, the details.
> >>
> >> First, a little bit about the setup:
> >>
> >> 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with
> >> the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched
> >> in. Node specs:
> >> 8-core Intel Xeon e5...@2.00ghz
> >> 8GB RAM
> >> 1Gbit ethernet
> >> Red Hat Linux 2.6.18
> >> JVM 1.6.0_19 64-bit
> >> 1TB spinning disk houses both commitlog and data directories (which I
> >> know is not ideal).
> >> The client machine is on the same local network and has very similar
> specs.
> >>
> >> The cassandra nodes are started with the following JVM options:
> >>
> >> ./cassandra JVM_OPTS="-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64
> >> -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC"
> >>
> >> I'm using default settings for all of the tunable stuff at the bottom
> >> of storage-conf.xml. I also selected my initial tokens to evenly
> >> partition the key space when the cluster was bootstrapped. I am using
> >> the RandomPartitioner.
> >>
> >> Now, about the test. Basically I am trying to get an idea of just how
> >> fast I can make this thing go. I am writing ~250M data records into
> >> the cluster, replicated at 3x, using Ran Tavory's Hector client
> >> (Java), writing with ConsistencyLevel.ZERO and
> >> FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8
> >> threads talking to each of the 4 nodes in the cluster. Records are
> >> identified by a numeric id, and I'm writing them in batches of up to
> >> 10k records per row, with each record in its own column. The row key
> >> identifies the bucket into which records fall. So, records with ids 0
> >> -  are written to row "0", 1 - 1 are written to row
> >> "1", etc. Each record is a JSON object with ~10-20 fields.
> >>
> >> Records: {  // Column Family
> >>   0 : {  // row key for the start of the bucket. Buckets span a range
> >> of up to 1 records
> >> 1 : "{ /* some JSON */ }",  // Column for record with id=1
> >> 3 : "{ /* some more JSON */ }",  // Column for record with id=3
> >>...
> >> : "{ /* ... */ }"
> >>   },
> >>  1 : {  // row key for the start of the next bucket
> >>10001 : ...
> >>10004 :
> >> }
> >>
> >> I am reading the data out of a local, sorted file on the client, so I
> >> only write a row to Cassandra once all records for that row have been
> >> read, and each row is written to exactly once. I'm using a
> >> producer-consumer queue to pump data from the input reader thread to
> >> the output writer threads. I found that I have to throttle the reader
> >> thread heavily in order to get good behavior. So, if I make the reader
> >> sleep for 7 seconds every 1M records, everything is fine - the data
> >> loads in about an hour, half of which is spent by the reader thread
> >> sleeping. In between the sleeps, I see ~40-50 MB/s throughput on the
> >> client's network interface while the reader

Re: Overwhelming a cluster with writes?

2010-04-05 Thread Ilya Maykov
No, the disks on all nodes have about 750GB free space. Also as
mentioned in my follow-up email, writing with ConsistencyLevel.ALL
makes the slowdowns / crashes go away.

-- Ilya

On Mon, Apr 5, 2010 at 11:46 PM, Ran Tavory  wrote:
> Do you see one of the disks used by cassandra filled up when a node crashes?
>
> On Tue, Apr 6, 2010 at 9:39 AM, Ilya Maykov  wrote:
>>
>> I'm running the nodes with a JVM heap size of 6GB, and here are the
>> related options from my storage-conf.xml. As mentioned in the first
>> email, I left everything at the default value. I briefly googled
>> around for "Cassandra performance tuning" etc but haven't found a
>> definitive guide ... any help with tuning these parameters is greatly
>> appreciated!
>>
>>  auto
>>  512
>>  64
>>  32
>>  8
>>  64
>>  64
>>  256
>>  0.3
>>  60
>>  8
>>  64
>>  periodic
>>  1
>>  864000
>>
>> -- Ilya
>>
>> On Mon, Apr 5, 2010 at 11:26 PM, Boris Shulman  wrote:
>> > You are running out of memory on your nodes. Before the final crash
>> > your nodes are probably slow  due to GC. What is your memtable size?
>> > What cache options did you configure?
>> >
>> > On Tue, Apr 6, 2010 at 7:31 AM, Ilya Maykov  wrote:
>> >> Hi all,
>> >>
>> >> I've just started experimenting with Cassandra to get a feel for the
>> >> system. I've set up a test cluster and to get a ballpark idea of its
>> >> performance I wrote a simple tool to load some toy data into the
>> >> system. Surprisingly, I am able to "overwhelm" my 4-node cluster with
>> >> writes from a single client. I'm trying to figure out if this is a
>> >> problem with my setup, if I'm hitting bugs in the Cassandra codebase,
>> >> or if this is intended behavior. Sorry this email is kind of long,
>> >> here is the TLDR version:
>> >>
>> >> While writing to Cassandra from a single node, I am able to get the
>> >> cluster into a bad state, where nodes are randomly disconnecting from
>> >> each other, write performance plummets, and sometimes nodes even
>> >> crash. Further, the nodes do not recover as long as the writes
>> >> continue (even at a much lower rate), and sometimes do not recover at
>> >> all unless I restart them. I can get this to happen simply by throwing
>> >> data at the cluster fast enough, and I'm wondering if this is a known
>> >> issue or if I need to tweak my setup.
>> >>
>> >> Now, the details.
>> >>
>> >> First, a little bit about the setup:
>> >>
>> >> 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with
>> >> the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched
>> >> in. Node specs:
>> >> 8-core Intel Xeon e5...@2.00ghz
>> >> 8GB RAM
>> >> 1Gbit ethernet
>> >> Red Hat Linux 2.6.18
>> >> JVM 1.6.0_19 64-bit
>> >> 1TB spinning disk houses both commitlog and data directories (which I
>> >> know is not ideal).
>> >> The client machine is on the same local network and has very similar
>> >> specs.
>> >>
>> >> The cassandra nodes are started with the following JVM options:
>> >>
>> >> ./cassandra JVM_OPTS="-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64
>> >> -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC"
>> >>
>> >> I'm using default settings for all of the tunable stuff at the bottom
>> >> of storage-conf.xml. I also selected my initial tokens to evenly
>> >> partition the key space when the cluster was bootstrapped. I am using
>> >> the RandomPartitioner.
>> >>
>> >> Now, about the test. Basically I am trying to get an idea of just how
>> >> fast I can make this thing go. I am writing ~250M data records into
>> >> the cluster, replicated at 3x, using Ran Tavory's Hector client
>> >> (Java), writing with ConsistencyLevel.ZERO and
>> >> FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8
>> >> threads talking to each of the 4 nodes in the cluster. Records are
>> >> identified by a numeric id, and I'm writing them in batches of up to
>> >> 10k records per row, with each record in its own column. The row key
>> >> identifies the bucket into which records fall. So, records with ids 0
>> >> -  are written to row "0", 1 - 1 are written to row
>> >> "1", etc. Each record is a JSON object with ~10-20 fields.
>> >>
>> >> Records: {  // Column Family
>> >>   0 : {  // row key for the start of the bucket. Buckets span a range
>> >> of up to 1 records
>> >>     1 : "{ /* some JSON */ }",  // Column for record with id=1
>> >>     3 : "{ /* some more JSON */ }",  // Column for record with id=3
>> >>    ...
>> >>     : "{ /* ... */ }"
>> >>   },
>> >>  1 : {  // row key for the start of the next bucket
>> >>    10001 : ...
>> >>    10004 :
>> >> }
>> >>
>> >> I am reading the data out of a local, sorted file on the client, so I
>> >> only write a row to Cassandra once all records for that row have been
>> >> read, and each row is written to exactly once. I'm using a
>> >> producer-consumer queue to pump data from the input reader thread to
>> >> the output writer threads. I found that I have to throttle the r

Re: Overwhelming a cluster with writes?

2010-04-05 Thread Benjamin Black
You are blowing away the mostly saner JVM_OPTS running it that way.
Edit cassandra.in.sh (or wherever config is on your system) to
increase mx to 4G (not 6G, for now) and leave everything else
untouched and do not specify JVM_OPTS on the command line.  See if you
get the same behavior.


b

On Mon, Apr 5, 2010 at 11:48 PM, Ilya Maykov  wrote:
> No, the disks on all nodes have about 750GB free space. Also as
> mentioned in my follow-up email, writing with ConsistencyLevel.ALL
> makes the slowdowns / crashes go away.
>
> -- Ilya
>
> On Mon, Apr 5, 2010 at 11:46 PM, Ran Tavory  wrote:
>> Do you see one of the disks used by cassandra filled up when a node crashes?
>>
>> On Tue, Apr 6, 2010 at 9:39 AM, Ilya Maykov  wrote:
>>>
>>> I'm running the nodes with a JVM heap size of 6GB, and here are the
>>> related options from my storage-conf.xml. As mentioned in the first
>>> email, I left everything at the default value. I briefly googled
>>> around for "Cassandra performance tuning" etc but haven't found a
>>> definitive guide ... any help with tuning these parameters is greatly
>>> appreciated!
>>>
>>>  auto
>>>  512
>>>  64
>>>  32
>>>  8
>>>  64
>>>  64
>>>  256
>>>  0.3
>>>  60
>>>  8
>>>  64
>>>  periodic
>>>  1
>>>  864000
>>>
>>> -- Ilya
>>>
>>> On Mon, Apr 5, 2010 at 11:26 PM, Boris Shulman  wrote:
>>> > You are running out of memory on your nodes. Before the final crash
>>> > your nodes are probably slow  due to GC. What is your memtable size?
>>> > What cache options did you configure?
>>> >
>>> > On Tue, Apr 6, 2010 at 7:31 AM, Ilya Maykov  wrote:
>>> >> Hi all,
>>> >>
>>> >> I've just started experimenting with Cassandra to get a feel for the
>>> >> system. I've set up a test cluster and to get a ballpark idea of its
>>> >> performance I wrote a simple tool to load some toy data into the
>>> >> system. Surprisingly, I am able to "overwhelm" my 4-node cluster with
>>> >> writes from a single client. I'm trying to figure out if this is a
>>> >> problem with my setup, if I'm hitting bugs in the Cassandra codebase,
>>> >> or if this is intended behavior. Sorry this email is kind of long,
>>> >> here is the TLDR version:
>>> >>
>>> >> While writing to Cassandra from a single node, I am able to get the
>>> >> cluster into a bad state, where nodes are randomly disconnecting from
>>> >> each other, write performance plummets, and sometimes nodes even
>>> >> crash. Further, the nodes do not recover as long as the writes
>>> >> continue (even at a much lower rate), and sometimes do not recover at
>>> >> all unless I restart them. I can get this to happen simply by throwing
>>> >> data at the cluster fast enough, and I'm wondering if this is a known
>>> >> issue or if I need to tweak my setup.
>>> >>
>>> >> Now, the details.
>>> >>
>>> >> First, a little bit about the setup:
>>> >>
>>> >> 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with
>>> >> the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched
>>> >> in. Node specs:
>>> >> 8-core Intel Xeon e5...@2.00ghz
>>> >> 8GB RAM
>>> >> 1Gbit ethernet
>>> >> Red Hat Linux 2.6.18
>>> >> JVM 1.6.0_19 64-bit
>>> >> 1TB spinning disk houses both commitlog and data directories (which I
>>> >> know is not ideal).
>>> >> The client machine is on the same local network and has very similar
>>> >> specs.
>>> >>
>>> >> The cassandra nodes are started with the following JVM options:
>>> >>
>>> >> ./cassandra JVM_OPTS="-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64
>>> >> -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC"
>>> >>
>>> >> I'm using default settings for all of the tunable stuff at the bottom
>>> >> of storage-conf.xml. I also selected my initial tokens to evenly
>>> >> partition the key space when the cluster was bootstrapped. I am using
>>> >> the RandomPartitioner.
>>> >>
>>> >> Now, about the test. Basically I am trying to get an idea of just how
>>> >> fast I can make this thing go. I am writing ~250M data records into
>>> >> the cluster, replicated at 3x, using Ran Tavory's Hector client
>>> >> (Java), writing with ConsistencyLevel.ZERO and
>>> >> FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8
>>> >> threads talking to each of the 4 nodes in the cluster. Records are
>>> >> identified by a numeric id, and I'm writing them in batches of up to
>>> >> 10k records per row, with each record in its own column. The row key
>>> >> identifies the bucket into which records fall. So, records with ids 0
>>> >> -  are written to row "0", 1 - 1 are written to row
>>> >> "1", etc. Each record is a JSON object with ~10-20 fields.
>>> >>
>>> >> Records: {  // Column Family
>>> >>   0 : {  // row key for the start of the bucket. Buckets span a range
>>> >> of up to 1 records
>>> >>     1 : "{ /* some JSON */ }",  // Column for record with id=1
>>> >>     3 : "{ /* some more JSON */ }",  // Column for record with id=3
>>> >>    ...
>>> >>     : "{ /* ... */ }"
>>> >>   },
>>> >>  1 : {

Re: Memcached protocol?

2010-04-05 Thread Tatu Saloranta
On Mon, Apr 5, 2010 at 5:10 PM, Paul Prescod  wrote:
> On Mon, Apr 5, 2010 at 4:48 PM, Tatu Saloranta  wrote:
>> ...
>>
>> I would think that there is also possibility of losing some
>> increments, or perhaps getting duplicate increments?
>
> I believe that with vector clocks in Cassandra 0.7 you won't lose
> anything. The conflict resolver will do the summation for you
> properly.
>
> If I'm wrong, I'd love to hear more, though.

I think the key is that this is not automatic -- there is no general
mechanism for aggregating distinct modifications. Point being that you
could choose one amongst right answers, but not what to do with
concurrent modifications. So what is done instead is have
application-specific resolution strategy which makes use of semantics
of operations, to know how to combine such concurrent modifications
into "correct" answer. I don't know if this is trivial for case of
counter increments: especially since two concurrent increments give
same new value; yet correct combined result would be one higher (both
used base, added one).

That is to say, my understanding was that vector clocks would be
required but not sufficient for reconciliation of concurrent value
updates.

I may be off here; apologies if I have misunderstood some crucial piece.

-+ Tatu +-