Re: question about deleting from cassandra

2010-03-18 Thread Sylvain Lebresne
Hi, I modified the patch to work against the current 0.6 svn branch (as I needed it myself). I attached the files to jira if someone want to play with it. Maybe should I remove the old files, as they were only working against an old random svn trunk ? -- Sylvain On Mon, Mar 15, 2010 at 6:01 PM,

Re: nodetool-compact duplicated data files again and again

2010-03-18 Thread Sylvain Lebresne
I believe this is caused by two things (and sorry if I go into too much details): 1) there is http://wiki.apache.org/cassandra/FAQ#i_deleted_what_gives. That is, Cassandra has to wait GCGraceSeconds before really remove physically deleted columns. And by default, this is 10 days. For "normal" colu

Re: Storing large blobs

2010-03-18 Thread Ted Zlatanov
On Wed, 17 Mar 2010 22:42:13 -0400 Carlos Sanchez wrote: CS> We could have blob as large as 50mb compressed (XML compresses quite CS> well). Typical documents we would deal with would be between 500K CS> and 3MB When just starting to use Cassandra I had serious issues with 0.5 and blobs (comp

Re: question about deleting from cassandra

2010-03-18 Thread Bill Au
That is very true from the users' point of view, especially since their data is being stored for free. But I am looking at it from the service providers' point of view. Maybe that's why NoSQL solutions are so popular right now since they scale much better than RDBMS. I wonder if service provider

Re: Model to store biggest score

2010-03-18 Thread Erik Holstad
Another approach you can take is to add the userid to the score like, => (column=140_uid2, value=[], timestamp=1268841641979) and f you need the score time sorted you can add => (column=140_268841641979_uid2, value=[], timestamp=1268841641979) But I do think that in any case you need to remove the

Re: question about deleting from cassandra

2010-03-18 Thread Vick Khera
On Thu, Mar 18, 2010 at 9:15 AM, Bill Au wrote: > In theory there is a breaking point somewhere, right? I don't think google has hit it yet, so I'd have to say nobody has reached "the breaking point" yet What do the big places do when people quit the service? Ie, if I close my facebook or t

renaming a SuperColumn

2010-03-18 Thread Ted Zlatanov
I find it useful in one particular schema to have SuperColumns with specific names and rename them sometimes. Rather than client-side (read, write, delete) it would be nice if there was a SuperColumnRename Mutation than encapsulated that sequence on the server side and perhaps implemented it more

Re: renaming a SuperColumn

2010-03-18 Thread Jonathan Ellis
-1 on adding a special case for this. 2010/3/18 Ted Zlatanov : > I find it useful in one particular schema to have SuperColumns with > specific names and rename them sometimes.  Rather than client-side > (read, write, delete) it would be nice if there was a SuperColumnRename > Mutation than encaps

Re: renaming a SuperColumn

2010-03-18 Thread Vijay
+1 for renaming the S Column/Column as a atomic operation :) Regards, On Thu, Mar 18, 2010 at 9:50 AM, Jonathan Ellis wrote: > -1 on adding a special case for this. > > 2010/3/18 Ted Zlatanov : > > I find it useful in one particular schema to have SuperColumns with > > specific names and ren

Re: renaming a SuperColumn

2010-03-18 Thread Ted Zlatanov
On Thu, 18 Mar 2010 11:50:53 -0500 Jonathan Ellis wrote: JE> 2010/3/18 Ted Zlatanov : >> I find it useful in one particular schema to have SuperColumns with >> specific names and rename them sometimes.  Rather than client-side >> (read, write, delete) it would be nice if there was a SuperColumnR

write performance thrift interfaces

2010-03-18 Thread Martin Probst (RobHost Support)
Hi, we've tested the write performance on a single and dual node cluster and the results are strangely poor. We've got about 30 inserts per second which seems a little bit slow?! The strange about is, that the node's we've used (single-cpu, 3gb ram, single disk) got a load of 0.02-0.05 while th

Re: write performance thrift interfaces

2010-03-18 Thread Roger Schildmeijer
Yes, 30 writes / s sounds a little bit poor. Maybe you could show your benchmark code? And what adjustments had to be done to the CF? // Roger On 18 mar 2010, at 19.03em, Martin Probst (RobHost Support) wrote: > Hi, > > we've tested the write performance on a single and dual node cluster an

Re: write performance thrift interfaces

2010-03-18 Thread Jonathan Ellis
Perhaps you're only inserting with a single thread? On Thu, Mar 18, 2010 at 1:03 PM, Martin Probst (RobHost Support) wrote: > Hi, > > we've tested the write performance on a single and dual node cluster and the > results are strangely poor. We've got about 30 inserts per second which seems > a

Re: write performance thrift interfaces

2010-03-18 Thread Tom Chen
Hi Martin, Are you using a connection pool? I have been able to get about a 1000+ inserts with java code on one cassandra node with small values(100 bytes). Tom On Thu, Mar 18, 2010 at 11:08 AM, Roger Schildmeijer wrote: > Yes, 30 writes / s sounds a little bit poor. > > Maybe you could show

Re: write performance thrift interfaces

2010-03-18 Thread Martin Probst (RobHost Support)
Hi Roger, we've only adjusted the names for the keyspaces and the columnfamilies. This is the second perl benchmark code, which switches the node after 100 datasets: #!/usr/bin/perl use strict; use warnings; use Data::Dumper qw( Dumper ); use Net::Cassandra; my $host1 = "localhost"; my $host

Re: write performance thrift interfaces

2010-03-18 Thread Martin Probst (RobHost Support)
How did you mean that, are there some config adjustments, or did you mean the inserting client? Martin Am 18.03.2010 um 19:18 schrieb Jonathan Ellis: > Perhaps you're only inserting with a single thread? > > On Thu, Mar 18, 2010 at 1:03 PM, Martin Probst (RobHost Support) > wrote: >> Hi, >>

Re: write performance thrift interfaces

2010-03-18 Thread Martin Probst (RobHost Support)
Hi Tom, no we're not using a connection pool, only pure java on cmd. Cheers, Martin Am 18.03.2010 um 19:18 schrieb Tom Chen: > Hi Martin, > > Are you using a connection pool? I have been able to get about a 1000+ > inserts with java code on one cassandra node with small values(100 bytes). >

Re: write performance thrift interfaces

2010-03-18 Thread Brandon Williams
On Thu, Mar 18, 2010 at 1:22 PM, Martin Probst (RobHost Support) < supp...@robhost.de> wrote: > Hi Tom, > > no we're not using a connection pool, only pure java on cmd. > > Cheers, > Martin > > The second graph here is relevant: http://spyced.blogspot.com/2010/01/cassandra-05.html Rather than cre

Re: renaming a SuperColumn

2010-03-18 Thread Sylvain Lebresne
2010/3/18 Ted Zlatanov : > On Thu, 18 Mar 2010 11:50:53 -0500 Jonathan Ellis wrote: > > JE> 2010/3/18 Ted Zlatanov : >>> I find it useful in one particular schema to have SuperColumns with >>> specific names and rename them sometimes.  Rather than client-side >>> (read, write, delete) it would be

Re: renaming a SuperColumn

2010-03-18 Thread Ted Zlatanov
On Thu, 18 Mar 2010 19:26:06 +0100 Sylvain Lebresne wrote: SL> Given how Cassandra works, I don't think that the server can do much SL> better than the read, write, delete your client already do SL> (basically everything is immutable, you only 'add' new versions). As SL> this cannot be done effi

Unsubscribe

2010-03-18 Thread John Alessi
-- John Alessi SocketLabs, Inc. 484-418-1282 On Mar 18, 2010, at 10:12 AM, Erik Holstad wrote: Another approach you can take is to add the userid to the score like, => (column=140_uid2, value=[], timestamp=1268841641979) and f you need the score time sorted you can add => (column=140_2688416419

Storing lots of data as Columns in a Column Family (ref Twissandra)

2010-03-18 Thread Muhammed Nasrullah
Hello folks, Twissandra (Twitter clone example for Cassandra) has a public page where every public update/tweet is stored in a column family under the key !public! like so: Userline = { '!public!': { # timestamp of tweet: tweet id 1267414247561777: '75

Re: Storing lots of data as Columns in a Column Family (ref Twissandra)

2010-03-18 Thread Brandon Williams
27;, this won't fit > in memory eventually. Is there a better way to model this? The problem is > that the data needs to be retrieved in reverse chronological order, > something which cannot be done while getting a range of keys without knowing > the start and finish keys in advanc

any cassandra consultants?

2010-03-18 Thread Ken Williams
I'm looking for a cassandra consultant for a short term setup project. Please email me if you have experience setting up cassandra in a high traffic environment. ken.willi...@meteorgames.com. Thanks!

Re: any cassandra consultants?

2010-03-18 Thread Joe Stein
/* Joe Stein http://www.linkedin.com/in/charmalloc */ On Mar 18, 2010, at 6:42 PM, Ken Williams wrote: I'm looking for a cassandra consultant for a short term setup project. Please email me if you have experience setting up cassandra in a high traffic environment. ken.willi...@meteorga

Issue with TimeUUID

2010-03-18 Thread John Alessi
I am having an issue where Cassandra doesn't seem to be able to distinguish between 2 different UUIDs if based on the same exact time, and sorting by TimeUUID. * Some of my config: * 0.01

Re: Issue with TimeUUID

2010-03-18 Thread Brandon Williams
On Thu, Mar 18, 2010 at 6:12 PM, John Alessi wrote: > I am having an issue where Cassandra doesn't seem to be able to distinguish > between 2 different UUIDs if based on the same exact time, and sorting by > TimeUUID. > *snip* > Cassandra doesn't seem to be able to distinguish between 2 differen

Re: Issue with TimeUUID

2010-03-18 Thread John Alessi
But they are different names. In my example they are: 1077e700-c7f2-11de-86d5-f5bcc793a028 1077e700-c7f2-11de-982e-6fad363d5f29 But Cassandra sees them as the same. -- John On Mar 18, 2010, at 7:17 PM, Brandon Williams wrote: On Thu, Mar 18, 2010 at 6:12 PM, John Alessi mailto:j...@socketla

Re: write performance thrift interfaces

2010-03-18 Thread Martin Probst (RobHost Support)
Hi Brandon, i've recoded my client (using threads). Now i'am getting round about 240 inserts per second (i think the bottleneck is know the virtualized hardware --> single cpu). The stress.py script gives about 50 inserts/sec. I'll test cassandra on real hw to see if it's perform better under a

Re: Re: exception when adding new node

2010-03-18 Thread casablinca126.com
This is a bug of the jdk older than jdk1.6.0_18. Problem resolved after updating the jdk. Bug of the file channel : http://bugs.sun.com/view_bug.do?bug_id=5103988 Fixed in version 6u18 : http://java.sun.com/javase/6/webnotes/6u18.html -- casablinca126.com 2010-03-19 --

Re: Storing lots of data as Columns in a Column Family (ref Twissandra)

2010-03-18 Thread Eric Florenzano
> > The rows could be named and partitioned by date/time, which can be known in > advance. For example, '!public!20100318' could contain the public timeline > for that day. > Yes, I thought of doing this. Then I realized there'd be boundary cases, on the start of