Re: (another) Newbie question

2010-04-16 Thread Jonathan Ellis
The important distinction here is that you can slice on columns in a row, but you can't slice on column family (or keyspace) names, because the data isn't stored contiguously. The row, within the columnfamily, is the unit of data storage and api focus. On Sat, Apr 17, 2010 at 12:42 AM, Benjamin B

Re: (another) Newbie question

2010-04-16 Thread Benjamin Black
The multi-level dictionary explanation holds. Regex on keys like that is something specific language implementations support, not something inherent in a dictionary data structure. The table model is particularly fraught because it drags in a lot of relational assumptions, none of which hold. An

Re: Any optimization strategy?

2010-04-16 Thread Ken Sandney
Take a look at the insert benchmark result, it shows almost linear increment with threads number. At last it was about 4 rows/s with 1000 threads, is this normal? [r...@localhost py_stress]# python stress.py -n 20 -t 50 -c 10 -d > 10.0.0.169,10.0.0.185 -o insert > total,interval_op_rate,in

Re: Just starting to play with Cassandra: (Surely) Dumb Question

2010-04-16 Thread Jonathan Ellis
You're supposed to request a few hundred or thousand columns per call, then if you need more request the next set using the start parameter. On Fri, Apr 16, 2010 at 7:13 PM, Lucas Di Pentima wrote: > Hello all, > > I'm playing with Cassandra 0.6.0-rc1 on a MacOSX, with the 'cassandra' ruby > gem

Re: Is that possible to write a file system over Cassandra?

2010-04-16 Thread Jeff Zhang
Yes, we are in a rush at the beginning of this prototype. Now the code structure looks better. On Fri, Apr 16, 2010 at 5:46 AM, Jonathan Ellis wrote: > The strange part is copying the entire cassandra source tree. > > On Thu, Apr 15, 2010 at 8:35 PM, Jeff Zhang wrote: > > Jonathan, > > > > Prev

Re: Any optimization strategy?

2010-04-16 Thread Ken Sandney
uh, yes, I am using single thread. Thank you for your link, it helps On Fri, Apr 16, 2010 at 8:50 PM, Jonathan Ellis wrote: > sounds like you are only using a single thread. > > look at the second graph on > http://spyced.blogspot.com/2010/01/cassandra-05.html > > On Fri, Apr 16, 2010 at 5:59 AM

Re: Is that possible to write a file system over Cassandra?

2010-04-16 Thread Tatu Saloranta
On Fri, Apr 16, 2010 at 4:08 AM, Mark Robson wrote: > On 15 April 2010 02:42, Zhuguo Shi wrote: >> >> Hi, >> Cassandra has a good distributed model: decentralized, auto-partition, >> auto-recovery. I am evaluating about writing a file system over Cassandra >> (like CassFS: http://github.com/jdarc

Just starting to play with Cassandra: (Surely) Dumb Question

2010-04-16 Thread Lucas Di Pentima
Hello all, I'm playing with Cassandra 0.6.0-rc1 on a MacOSX, with the 'cassandra' ruby gem. I load some test data to it and I was trying the gem's get() API when I realized that if I call it some way like this: db.get('SomeSCFName', 'SomeKey') It returned me only 100 subcolumns when 'SomeKey'

Re: (another) Newbie question

2010-04-16 Thread Jonathan Ellis
On Mon, Apr 12, 2010 at 9:48 AM, Colin Yates wrote: > I was hoping I could do a get_range_slices specifying 'project*' for the > columnFamily and a keyRange start: 20100107, end:20100109 but I > get an error 'InvalidRequestException(why:unconfigured columnfamily > project*)'. I think you've been

Re: Regarding Cassandra Scalability

2010-04-16 Thread Ryan King
That's just the data our analytics team produces (logs, etc). Production/online data is separate. -ryan On Fri, Apr 16, 2010 at 2:22 PM, Stu Hood wrote: > http://twitter.com/jromeh/status/12295736793 > > -Original Message- > From: "Mike Gallamore" > Sent: Friday, April 16, 2010 3:46pm >

Maintaining secondary indices on mutable columns

2010-04-16 Thread David King
I'm having trouble visualising how to maintain a secondary index on a mutable column. For instance, given some objects and the number that we have in inventory: widget1: { count = 0, colour = blue } widget2: { count = 5, colour = red } widget3: { count = 8, colour = green } widget4: { count = 8,

Re: Regarding Cassandra Scalability

2010-04-16 Thread Peter Chang
The redundancy/denormalization takes advantage of cheap writes to make reads really quick. Imagine a query that returns one row with your whole tweet stream vs having to do 50 separate lookups per tweet. Space is cheap and the upside is performance Especially if you're getting a lot of fail wha

Re: Regarding Cassandra Scalability

2010-04-16 Thread Mike Gallamore
Does that include HD copies of CNN et al reading tweets to people on T.V.? You know your medium is doomed when you're reduced to reading comments from random_dude64 and omg69 because they get the news out faster than you can. They must be tracking a lot more than just the tweets themselves (wh

Re: Regarding Cassandra Scalability

2010-04-16 Thread Stu Hood
http://twitter.com/jromeh/status/12295736793 -Original Message- From: "Mike Gallamore" Sent: Friday, April 16, 2010 3:46pm To: user@cassandra.apache.org Subject: Re: Regarding Cassandra Scalability Also people with 1M followers tend to have "public" tweets, which means really I think it

Re: Regarding Cassandra Scalability

2010-04-16 Thread Mike Gallamore
Also people with 1M followers tend to have "public" tweets, which means really I think it would be the same as subscribing to a RSS feed or whatever. You aren't getting a local copy because you will "always" have access to the tweet as will everyone else. Also tweets don't change AFAIK so no po

Re: Regarding Cassandra Scalability

2010-04-16 Thread Peter Chang
Yeah. I wasn't sure if Cassandra was optimized for binary data especially since any site of that size will use a CDN. Interesting read though. I think 1K per tweet is off by an order of magnitude considering they only allow 140 characters. Regardless the number of users with > 1MM is probably a ha

Re: Cassandra Java Client

2010-04-16 Thread Weijun Li
I'm using spymemcached and it works great! Easy to use, support sharding and compression and can handle high volume traffic. http://code.google.com/p/spymemcached/ -Weijun On Fri, Apr 16, 2010 at 3:29 AM, Linton N wrote: > import java.util.List; > import java.io.UnsupportedEncodingException; >

Re: effective modeling for fixed limit columns

2010-04-16 Thread Mike Gallamore
The problem I'm working on is very similar to this. I'm working on a reputation system and we keep a fixed number of day buckets for the scores. So when new data comes in you need to find out what bucket is supposed to be used, remove the data in it if you've moved to a new bucket as the data t

Re: cassandra instability

2010-04-16 Thread Jonathan Ellis
On Fri, Apr 16, 2010 at 2:30 PM, Lee Parker wrote: > As for the Memtable thresholds, when I ran with lower thresholds, the server > would be thrashing with compaction runs due to the dramatically increased > number of sstable files.  That was when I was running 0.5.0.  Has 0.6.0 > improved compact

Re: cassandra instability

2010-04-16 Thread Lee Parker
I don't think it is a hardware issue. This is happening on multiple servers and clients on ec2 instances and my local development VM. I think you are right that the timestamp errors are likely being cause by the Thrift PHP bindings. The frustrating part is that I can't get the error to consisten

effective modeling for fixed limit columns

2010-04-16 Thread Chris Shorrock
I'm attempting to come up with a technique for limiting the number of columns a single key (or super column - doesn't matter too much for the context of this conversation) may contain at any one time. My actual use-case is a little too meaty to try to describe so an alternate use-case of this mecha

Re: cassandra instability

2010-04-16 Thread Paul Brown
Two more things you can do: 1) If you're running the updaters in the JVM (sounded like you were doing PHP?), then be sure that you're cleaning up the database sessions properly. Hibernate, in particular, will keep a lot of bookkeeping data around otherwise, and that can easily overflow your h

Re: cassandra instability

2010-04-16 Thread Jonathan Ellis
On Fri, Apr 16, 2010 at 12:50 PM, Lee Parker wrote: > Each time I start it up, it will > work fine for about 1 hour and then it will crash the servers.  The error > message on the servers is usually an out of memory error. Sounds like http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_in

Re: cassandra instability

2010-04-16 Thread banks
Is crashing really how it should deal with restricted memory? Seems like if this was true either a minimum required memory needs to be defined, or it should adjust how it uses memory in the absence of it... On Fri, Apr 16, 2010 at 11:07 AM, Avinash Lakshman < avinash.laksh...@gmail.com> wrote: >

Re: cassandra instability

2010-04-16 Thread Avinash Lakshman
Those memtable thresholds also need looking into. You are using some real poor hardware configuration - 1.7 GB RAM is not a configuration worth experimenting with IMO. Typical production deployments are running 16 GB RAM and quad core 64 bit machines. Its hard I would presume to make any recommenda

Re: cassandra instability

2010-04-16 Thread Lee Parker
Row caching is not turned on. Lee Parker On Fri, Apr 16, 2010 at 12:58 PM, Paul Brown wrote: > > On Apr 16, 2010, at 10:50 AM, Lee Parker wrote: > > [...] > > I am trying to migrate data from mysql into the cluster using the > following methodology: > > 1. get 500 rows (12 columns each) from mys

Re: Regarding Cassandra Scalability

2010-04-16 Thread gabriele renzi
On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang wrote: > FB also does pics and movies so 1MB is way off depending on where they > manage such binary data. apparently not in cassandra http://www.facebook.com/note.php?note_id=76191543919 >I do agree that 1MB of text alone is a lot of text > which is

Re: cassandra instability

2010-04-16 Thread Paul Brown
On Apr 16, 2010, at 10:50 AM, Lee Parker wrote: > [...] > I am trying to migrate data from mysql into the cluster using the following > methodology: > 1. get 500 rows (12 columns each) from mysql > 2. build a batch_mutate to insert these rows into one CF (1 row = 1 row ) > 3. build a second batch

cassandra instability

2010-04-16 Thread Lee Parker
I am having major issues with stability on my cassandra nodes. Here is the setup: Cassandra Cluster - 2 EC2 small instances (1.7G RAM, single 32 bit core) with an EBS for the cassandra sstables Cassandra 0.6.0 w/ 1G heap space and 128M/1mil Memtable Thresholds Clients are also small EC2 webservers

Re: Regarding Cassandra Scalability

2010-04-16 Thread Tatu Saloranta
On Fri, Apr 16, 2010 at 9:17 AM, Mike Gallamore wrote: > On 04/16/2010 01:38 AM, dir dir wrote: > > I hear Facebook.com and tweeter.com using cassandra database. In my opinion > Facebook and > tweeter have hundreds TB data.  because their user reach hundreds million > people. > > I think you might

Re: Regarding Cassandra Scalability

2010-04-16 Thread Peter Chang
FB also does pics and movies so 1MB is way off depending on where they manage such binary data. I do agree that 1MB of text alone is a lot of text which is more relevant in the case of Twitter. The only large thing you leave out is denormalization. Every tweet you write is likely denormalized acros

Re: Regarding Cassandra Scalability

2010-04-16 Thread Mike Gallamore
On 04/16/2010 01:38 AM, dir dir wrote: I hear Facebook.com and tweeter.com using cassandra database. In my opinion Facebook and tweeter have hundreds TB data. because their user reach hundreds million people. I think you might be forgetting just how tiny tweets are. The las

Re: Cassandra 0.5.1 slow down after doing a lot of inserts

2010-04-16 Thread Jonathan Ellis
Moving to user list. You should read http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts 2010/4/16 Lu Ming : > Hi: >   We have build a  storage system with 4 nodes in the cluster. We use the > default configuration file, > every node have 2*E5504 CPU, 8G memory and 6*1T Sata disk

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-16 Thread Brandon Williams
On Fri, Apr 16, 2010 at 10:42 AM, Heath Oderman wrote: > Any pointers on why a thrift client would fly against cass on one os vs > another? The only delta is os. > That's probably not the only delta. There's networking involved too, so perhaps the problem lies there. Let us know how stress.py

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-16 Thread Heath Oderman
Any pointers on why a thrift client would fly against cass on one os vs another? The only delta is os. I'm going to try py_stress remote. Stu On Apr 16, 2010 11:39 AM, "Jonathan Ellis" wrote: Sounds like the problem is with the C# client code, then. On Fri, Apr 16, 2010 at 10:36 AM, Heath O

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-16 Thread Jonathan Ellis
Sounds like the problem is with the C# client code, then. On Fri, Apr 16, 2010 at 10:36 AM, Heath Oderman wrote: > Ok, it took me a long time to get py_stress working. > I didn't have thrift / boost / gcc on my debian box :)   > I'm using this command line believing it's similar to my c# test

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-16 Thread Heath Oderman
Ok, it took me a long time to get py_stress working. I didn't have thrift / boost / gcc on my debian box :) I'm using this command line believing it's similar to my c# tests from a remote box: cnb:~/apache-cassandra-0.6.0-src/contrib/py_stress# python stress.py -o insert -n 100 -d 10.1

Re: timestamp not found

2010-04-16 Thread Lee Parker
In an attempt to continue trouble shooting these errors, I took the text from one and converted it from hex to ascii. Here is the original error: Required field 'timestamp' was not found in serialized data! Struct: Column(name:61 75 74 68 6F 72 5F 69 63 6F 6E, value:68 74 74 70 3A 2F 2F 61 31 2E 7

Re: Any optimization strategy?

2010-04-16 Thread Jonathan Ellis
sounds like you are only using a single thread. look at the second graph on http://spyced.blogspot.com/2010/01/cassandra-05.html On Fri, Apr 16, 2010 at 5:59 AM, Ken Sandney wrote: > Hi, > I am just doing a simple insert test with a cluster of two nodes, but seems > relatively slow: about 1000 r

Re: Is that possible to write a file system over Cassandra?

2010-04-16 Thread Jonathan Ellis
The strange part is copying the entire cassandra source tree. On Thu, Apr 15, 2010 at 8:35 PM, Jeff Zhang wrote: > Jonathan, > > Previously we use the cassandra-0.6, but we'd like to leverage the hector > java client since it has more advanced features. And hector currently only > support cassand

Re: Regarding Cassandra Scalability

2010-04-16 Thread Mark Robson
> > On Fri, Apr 16, 2010 at 1:28 PM, Linton N wrote: > >> hi , >> I am working for the past 1 year with hadoop, but quite new to >> cassandra, I would like to get clarified few things regarding the >> scalability of Cassandra. Can it scall up to TB of data ? >> > TB of data is not really t

Re: Is that possible to write a file system over Cassandra?

2010-04-16 Thread Mark Robson
On 15 April 2010 02:42, Zhuguo Shi wrote: > Hi, > > Cassandra has a good distributed model: decentralized, auto-partition, > auto-recovery. I am evaluating about writing a file system over Cassandra > (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if > Cassandra is good at such

Re: Clarification on Ring operations in Cassandra 0.5.1

2010-04-16 Thread gabriele renzi
On Fri, Apr 16, 2010 at 1:10 AM, Anthony Molinaro wrote: > Hi, > >  I have a cluster running on ec2, and would like to do some ring > management.  Specifically, I'd like to replace an existing node > without another node (I want to change the instance type). does maybe `nodetool move` do what yo

Any optimization strategy?

2010-04-16 Thread Ken Sandney
Hi, I am just doing a simple insert test with a cluster of two nodes, but seems relatively slow: about 1000 rows/second. The test box are normal PC, 2GB RAM, Intel E3200 2.4GHz. Are there any general optimization strategy? Thanks

Re: Cassandra Java Client

2010-04-16 Thread Linton N
import java.util.List; import java.io.UnsupportedEncodingException; import org.apache.thrift.transport.TTransport; import org.apache.thrift.transport.TSocket; import org.apache.thrift.protocol.TProtocol; import org.apache.thrift.protocol.TBinaryProtocol; import org.apache.thrift.TException; import

RE: Cassandra Java Client

2010-04-16 Thread Ake Tangkananond
Hi Nirmala, Welcome to Cassandra! Is this the one you are looking for ? http://www.sodeso.nl/?p=80 -Ake From: Nirmala Agadgar [mailto:nirmala...@gmail.com] Sent: Friday, April 16, 2010 4:56 PM To: user@cassandra.apache.org Subject: Cassandra Java Client Hi, Can anyone tell how

Re: Cassandra Java Client

2010-04-16 Thread Colin Vipurs
Take a look at Hector, a Java client: http://wiki.github.com/rantav/hector/ There's example code here: http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.java On Fri, Apr 16, 2010 at 10:56 AM, Nirmala Agadgar wrote: > Hi, > > Can anyone tel

Cassandra Java Client

2010-04-16 Thread Nirmala Agadgar
Hi, Can anyone tell how to implement Client that can insert data into cassandra in Java. Any Code or guidelines would be helpful. - Nirmala

Re: inserting rows in columns inside a supercolumn

2010-04-16 Thread Julio Carlos Barrera Juez
Hi again, First of all, obviously, I have omitted the timestamps to make easy the representation, not in the code. Secondly, there are one supercolumn with two rows, A and D, all the others are columns, including B, with various key-values (1, 2, etc.). I need two levels for my design, it is manda

Re: Regarding Cassandra Scalability

2010-04-16 Thread dir dir
I hear Facebook.com and tweeter.com using cassandra database. In my opinion Facebook and tweeter have hundreds TB data. because their user reach hundreds million people. Dir. On Fri, Apr 16, 2010 at 1:28 PM, Linton N wrote: > hi , > I am working for the past 1 year with hadoop, but qu

Re: Regarding Cassandra Scalability

2010-04-16 Thread Linton N
Thank you very much. sorry for the trouble. I could have done in myself On Fri, Apr 16, 2010 at 1:29 PM, Paul Prescod wrote: > http://www.google.ca/search?hl=en&q=cassandra+terabyte > > On Thu, Apr 15, 2010 at 11:28 PM, Linton N > wrote: > > hi , > > I am working for the past 1

Re: Regarding Cassandra Scalability

2010-04-16 Thread Paul Prescod
http://www.google.ca/search?hl=en&q=cassandra+terabyte On Thu, Apr 15, 2010 at 11:28 PM, Linton N wrote: > hi , > I am working for the past 1 year with hadoop, but quite new to > cassandra, I would like to get clarified few things regarding the > scalability of Cassandra. Can it scall up