Re: why Cassandra 0.5.1 write speed is very slow

2010-03-20 Thread Jonathan Ellis
If you're benchmarking throughput, then you absolutely have to be multithreaded. 2010/3/20 郭鹏 : > You mean I waste lots of time waiting for the write reply? > > How could I avoid this problem? Using multi-thread for very write operation? > > I am totally confused. > > Thx. > > 在 2010年3月21日 上午11:16

Re: why Cassandra 0.5.1 write speed is very slow

2010-03-20 Thread 郭鹏
You mean I waste lots of time waiting for the write reply? How could I avoid this problem? Using multi-thread for very write operation? I am totally confused. Thx. 在 2010年3月21日 上午11:16,Jonathan Ellis 写道: > time a single-threaded client is spending waiting for a reply, is time > it's not spendi

Re: why Cassandra 0.5.1 write speed is very slow

2010-03-20 Thread Jonathan Ellis
time a single-threaded client is spending waiting for a reply, is time it's not spending sending another request 2010/3/20 郭鹏 : > Good ideal, I will try it with client multithread write. > > But I don't understand the meaning of "d the latency in cluster mode hurts > that much more." > > Thx. > >

Re: why Cassandra 0.5.1 write speed is very slow

2010-03-20 Thread 郭鹏
Good ideal, I will try it with client multithread write. But I don't understand the meaning of "d the latency in cluster mode hurts that much more." Thx. 2010/3/21 Jonathan Ellis > 2010/3/20 郭鹏 : > > :) > > > > I just want to know why the write speed is ok in the standalone mode but > too > >

Re: why Cassandra 0.5.1 write speed is very slow

2010-03-20 Thread Jonathan Ellis
2010/3/20 郭鹏 : > :) > > I just want to know why the write speed is ok in the standalone mode but too > slow in the cluster mode. > > Am I doing something wrong? Probably, but it's hard to say what. Maybe you're just not using enough threads in your benchmark and the latency in cluster mode hurts

Re: why Cassandra 0.5.1 write speed is very slow

2010-03-20 Thread 郭鹏
:) I just want to know why the write speed is ok in the standalone mode but too slow in the cluster mode. Am I doing something wrong? Thx. Peng Guo 2010/3/21 Jonathan Ellis > Looks like the right question to ask is "what _else_ changed during the > move?" > > On Sat, Mar 20, 2010 at 8:44 AM,

RE: node repair

2010-03-20 Thread Todd Burruss
fyi ... i just compacted and node 105 is definitely not being repaired From: Todd Burruss Sent: Saturday, March 20, 2010 12:34 PM To: user@cassandra.apache.org Subject: RE: node repair same IP, same token. i'm trying Handling Failure, #3. it is running, a

Supervisord

2010-03-20 Thread Mark Jarecki
Hi there, Is anyone using Supervisord to manage starting/stopping Cassandra? I was wondering what command and stopsignal instructions you might be using. Cheers, mark

Re: data corruption experiences

2010-03-20 Thread Alex Durgin
Thanks for the input. My primary draw to Cassandra is dynamic schema. I could make it work relationally, perhaps even nicely with something like postgres' hstore, but I haven't investigated that fully yet. Relatively linear scaling has it's appeal and competitive advantages too. I also find

Re: SimpleCassie - ORM PHP Client

2010-03-20 Thread Marcin
Hi Jonathan, of course, in order to use TimeUUID you need to use $cassie->uuid() which will generate uuid object, which can be passed directly into ->column(). i.e. for setting new entry $cassie->keyspace('MyBlog')->cf('RawFood')->key('Posts')->column($cassie->uuid())->set('I like green food

RE: node repair

2010-03-20 Thread Todd Burruss
same IP, same token. i'm trying Handling Failure, #3. it is running, a part of the ring, and seems to be handling reads/writes, but does not appear to have received a copy of its data (the last node below). i've searched the all logs for ERRORs but there are none. i will compact the other no

Re: Startup issue when big data in.

2010-03-20 Thread Lenin Gali
Thanx Joe. Great stuff and very encouraging. Lenin Sent from my BlackBerry® wireless handheld -Original Message- From: Joe Stump Date: Sat, 20 Mar 2010 09:27:21 To: Subject: Re: Startup issue when big data in. On Mar 20, 2010, at 3:33 AM, Lenin Gali wrote: > 1.what kind of performan

Re: running mapreduce against a remote cassandra

2010-03-20 Thread Matteo Caprari
Uops, missed it. On Sat, Mar 20, 2010 at 6:26 PM, Jonathan Ellis wrote: > This is covered in the README: "If you want to point wordcount at a > real cluster, modify the seed > and listenaddress settings in storage-conf.xml accordingly." > > On Sat, Mar 20, 2010 at 9:10 AM, Matteo Caprari > wrote

Re: why Cassandra 0.5.1 write speed is very slow

2010-03-20 Thread Jonathan Ellis
Looks like the right question to ask is "what _else_ changed during the move?" On Sat, Mar 20, 2010 at 8:44 AM, 郭鹏 wrote: > Hi: > > I'm doing a research on move our data from MySQL to Cassandra 0.5.1 > > At first, I am doing it in the Windows XP, and read the record from MySQL > then insert it in

Re: running mapreduce against a remote cassandra

2010-03-20 Thread Jonathan Ellis
This is covered in the README: "If you want to point wordcount at a real cluster, modify the seed and listenaddress settings in storage-conf.xml accordingly." On Sat, Mar 20, 2010 at 9:10 AM, Matteo Caprari wrote: > Hi. > > How do I configure the mapreduce job (example  in contrib/wordcount) > so

Re: node repair

2010-03-20 Thread Jonathan Ellis
if you bring up a new node w/ a different ip but the same token, it will confuse things. http://wiki.apache.org/cassandra/Operations "handling failure" section covers best practices here. On Sat, Mar 20, 2010 at 11:51 AM, Todd Burruss wrote: > i had a node fail, lost all data.  so i brought it b

Re: data corruption experiences

2010-03-20 Thread Chris Goffinet
We saw corruption pre 0.4 days. Digg hasn't seen corruption since that got taken care of. We are only doing this for the "just in case the shit hits the fan". Cassandra is rapidly changing and it would be completely careless of us to forgo a path of using a new database as our primary datastore.

data corruption experiences

2010-03-20 Thread Alex Durgin
Recent messages to the list regarding durabilty and backup strategies leads me to a few questions that other new users may also have. What's the general experience with corruption to date? Is it common? Would I regret operating a single node cluster? Digg referenced sending snapshots to hdfs

Re: SimpleCassie - ORM PHP Client

2010-03-20 Thread Jonathan Ellis
Cool, thanks! Does it make it easy to use TimeUUID columns? Because that is the biggest problem I see people having from PHP. On Sat, Mar 20, 2010 at 7:32 AM, Marcin wrote: > Hi guys, > > I would like to share with you link to the PHP client for Cassandra build > with flexibility and easy use i

node repair

2010-03-20 Thread Todd Burruss
i had a node fail, lost all data. so i brought it back up fresh, but assigned it the same token in storage-conf.xml. then ran nodetool repair. all compactions have finished, no streams are happening. nothing. so i did it again. same thing. i don't think its working. is there a log message

Re: Digg's data model

2010-03-20 Thread Chris Goffinet
On Mar 20, 2010, at 9:10 AM, Jeremy Dunck wrote: > On Sat, Mar 20, 2010 at 10:40 AM, Chris Goffinet wrote: >>> 5. Backups : If there is a 4 or 5 TB cassandra cluster what do you >>> recommend the backup scenario's could be? >> >> Worst case scenario (total failure) we opted to do global snapsh

Re: Digg's data model

2010-03-20 Thread Jeremy Dunck
On Sat, Mar 20, 2010 at 10:40 AM, Chris Goffinet wrote: >> 5. Backups : If there is a  4 or 5 TB cassandra cluster what do you >> recommend the backup scenario's could be? > > Worst case scenario (total failure) we opted to do global snapshots every 24 > hours. This creates hard links to SSTable

Re: Digg's data model

2010-03-20 Thread Chris Goffinet
> 5. Backups : If there is a 4 or 5 TB cassandra cluster what do you recommend > the backup scenario's could be? Worst case scenario (total failure) we opted to do global snapshots every 24 hours. This creates hard links to SSTables on each node. We copy those SSTables to HDFS on daily basis.

Re: Digg's data model

2010-03-20 Thread Chris Goffinet
> Also, Does cassandra support counters? Digg's article said they are going to > contribute their work to open source any idea when that would be? > All of the custom work has been pushed upstream from Digg and continues. We have a few operational tools we will be releasing that will go into co

Re: Startup issue when big data in.

2010-03-20 Thread Joe Stump
On Mar 20, 2010, at 3:33 AM, Lenin Gali wrote: > 1.what kind of performance are you getting, how many writes vs reads do you > do per min? Our performance is quite good. Here are some HDD benchmarks I've ran: http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html > 2. have

Re: Digg's data model

2010-03-20 Thread Joe Stump
On Mar 20, 2010, at 2:53 AM, Lenin Gali wrote: > 1. Eventual consistency: Given a volume of 5K writes / sec and roughly 1500 > writes are Updates per sec while the rest are inserts, what kind of latency > can be expected in eventual consistency? Depending on the size of the cluster you're not

running mapreduce against a remote cassandra

2010-03-20 Thread Matteo Caprari
Hi. How do I configure the mapreduce job (example in contrib/wordcount) so that it connects to a cassandra running on a remote host? Thanks -- :Matteo Caprari matteo.capr...@gmail.com

why Cassandra 0.5.1 write speed is very slow

2010-03-20 Thread 郭鹏
Hi: I'm doing a research on move our data from MySQL to Cassandra 0.5.1 At first, I am doing it in the Windows XP, and read the record from MySQL then insert it into Cassandra. The write speed is ok, about 18,000 records per second. But when I changed it into the 5 Linux Red Hat 5 Server, doing

SimpleCassie - ORM PHP Client

2010-03-20 Thread Marcin
Hi guys, I would like to share with you link to the PHP client for Cassandra build with flexibility and easy use in mind. It implements some of the ORM concepts. here you go: http://code.google.com/p/simpletools-php/wiki/SimpleCassie P.S. Appreciate any feedback. cheers, /Marcin

Re: Startup issue when big data in.

2010-03-20 Thread Lenin Gali
Thanks Stu for point the article. The really helped me and also ask the following questions to user group. I am curious about your users EC2 experience, if you can share some details on 1.what kind of performance are you getting, how many writes vs reads do you do per min? 2. have you used EBS o

Re: Digg's data model

2010-03-20 Thread Lenin Gali
Hi, I have several questions. I hope some of you can share your experiences in each or all of these following. I will be curious about twitter and digg's experience as they might be processing 1. Eventual consistency: Given a volume of 5K writes / sec and roughly 1500 writes are Updates per sec wh