Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-13 Thread Sandeep Kalidindi at PaGaLGuY.com
@michael - benjamin answered your question. Thing is if you use mysql just for indices you are not at all using the benefits of the whole relational database engine(which is fine) but then are inheriting all its disadvantages. You can use mysql for storing indices and then write your own sharding

Cassandra client - clock sync

2010-07-13 Thread Narendra Sharma
Hi, We have an application that uses Cassandra to store data. The application is deployed on multiple nodes that are part of an application cluster. We are at present using single Cassandra node. We have noticed few errors in application and our analysis revealed that the root cause was that the c

Re: CassandraBulkLoader

2010-07-13 Thread Torsten Curdt
On Tue, Jul 13, 2010 at 04:35, Mubarak Seyed wrote: > Where can i find the documentation for BinaryMemTable (btm_example in contrib) > to use CassandraBulkLoader? What is the input to be supplied to > CassandraBulkLoader? > How to form the input data and what is the format of an input data? The

Re: Is anyone using version 0.7 schema update API

2010-07-13 Thread GH
They are not complicated, its more that they are not in the package that they should be in. I assume the client package exposes the functionality of the server and it does not have the ability to manage the tables in the database that to me seems to be extremely limiting. When I did not see that co

Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-13 Thread Paul Prescod
On Mon, Jul 12, 2010 at 11:44 PM, Benjamin Black wrote: > We use Cassandra (multidimensional metrics) *and* redis (counters and > alerts) *and* MySQL (supporting Rails).  Right tool for each job.  The > idea that it is a good thing to cram everything into a single database > (and data model), beat

Re: concurrent reads

2010-07-13 Thread Schubert Zhang
For read, the bottleneck is usually the disk. Use iostat to check the utility of your disks. On Tue, Jul 13, 2010 at 2:07 PM, Peter Schuller wrote: > > Has anyone experimented with different settings for concurrent reads? I > > have set our servers to 4 ( 2 per processor core ). I have notic

Re: Iterate all keys - doing it as the faq fails for me :(

2010-07-13 Thread Thomas Heller
I'm not entirely sure but I think you can only use get_range_slices with start_key/end_key on a cluster using OrderPreservingPartitioner. Dont know if that is intentional or buggy like Jonathan suggest but I saw the same "duplicates" behaviour when trying to iterate all rows using RP and start_key/

Re: Question regarding consistency and deletion

2010-07-13 Thread Samuru Jackson
Thanks for the links. Actually it is pretty easy to catch those tombstoned keys on the client side. However, in certain applications it can generate some additional overhead on the network. I think it would be nice to have a forced garbage collection in the API. This would IMHO ease to write Unit

RE: Iterate all keys - doing it as the faq fails for me :(

2010-07-13 Thread Per Olesen
>I'm not entirely sure but I think you can only use get_range_slices >with start_key/end_key on a cluster using OrderPreservingPartitioner. >Dont know if that is intentional or buggy like Jonathan suggest but I >saw the same "duplicates" behaviour when trying to iterate all rows >using RP and start

Re: Iterate all keys - doing it as the faq fails for me :(

2010-07-13 Thread Jonathan Ellis
On Tue, Jul 13, 2010 at 7:38 AM, Thomas Heller wrote: > I'm not entirely sure but I think you can only use get_range_slices > with start_key/end_key on a cluster using OrderPreservingPartitioner. > Dont know if that is intentional or buggy like Jonathan suggest but I > saw the same "duplicates" be

Re: Cassandra client - clock sync

2010-07-13 Thread Jonathan Ellis
You should use ntp in daemon mode, not as a one-time fix. http://linux.die.net/man/1/ntpd On Tue, Jul 13, 2010 at 2:45 AM, Narendra Sharma wrote: > Hi, > > We have an application that uses Cassandra to store data. The application is > deployed on multiple nodes that are part of an application clu

Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-13 Thread S Ahmed
The only issue I see (please correct me if I am wrong) is that you loose, is that you have single points of failure in the system now i.e. redis etc. On Tue, Jul 13, 2010 at 3:33 AM, Sandeep Kalidindi at PaGaLGuY.com < sandeep.kalidi...@pagalguy.com> wrote: > @michael - benjamin answered your que

Re: concurrent reads

2010-07-13 Thread Lee Parker
The iostat numbers are rather low as is cpu utilization. We have a couple of nightly jobs which do a lot of reads in a short amount of time. That is when the pending reads was climbing. I'm going to bump up the number and see how things run. Lee Parker On Tue, Jul 13, 2010 at 6:18 AM, Schubert

Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-13 Thread Sandeep Kalidindi at PaGaLGuY.com
@Ahmed - we are trying to use Redis + gizzard - with gizzard responsible for sharding and maintaining replicas . Need to test it well before plunging into production though. Cheers, Deepu. On Tue, Jul 13, 2010 at 7:46 PM, S Ahmed wrote: > The only issue I see (please correct me if I am wrong)

Consequences of having many columns

2010-07-13 Thread Kochheiser,Todd W - TOK-DITT-1
I recently ran across a blog posting with a comment from a Cassandra committer that indicated a performance penalty when having a large number of columns per row/key. Unfortunately I didn't bookmark the blog posting and now I can't find it. Regardless, since our current plan and design is to h

Performance Issues

2010-07-13 Thread Samuru Jackson
Hi, I have set up a ring with a couple of servers and wanted to run some stress tests. Unfortunately, there is some kind of bottleneck at the client side. I'm using Hector and Cassandra 0.6.1. The subsequent profile results are based on a small Java program that inserts sequentially records, wit

Re: Consequences of having many columns

2010-07-13 Thread Mason Hale
Currently there is a limitation that each row must fit in memory (with some not insignificant overhead), thus having lots of columns per row can trigger out-of-memory errors. This limitation should be removed in a future release. Please see: - http://wiki.apache.org/cassandra/CassandraLimitation

Re: GCGraceSeconds per ColumnFamily/Keyspace

2010-07-13 Thread Todd Burruss
Yes -Original Message- From: Jonathan Ellis [jbel...@gmail.com] Received: 7/12/10 9:15 PM To: user@cassandra.apache.org [u...@cassandra.apache.org] Subject: Re: GCGraceSeconds per ColumnFamily/Keyspace Probably. Can you open a ticket? On Mon, Jul 12, 2010 at 10:41 PM, Todd Burruss wr

Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-13 Thread Benjamin Black
On Tue, Jul 13, 2010 at 2:43 AM, Paul Prescod wrote: > On Mon, Jul 12, 2010 at 11:44 PM, Benjamin Black wrote: >> We use Cassandra (multidimensional metrics) *and* redis (counters and >> alerts) *and* MySQL (supporting Rails).  Right tool for each job.  The >> idea that it is a good thing to cram

Re: Question regarding consistency and deletion

2010-07-13 Thread Benjamin Black
On Tue, Jul 13, 2010 at 5:47 AM, Samuru Jackson wrote: > Thanks for the links. > > Actually it is pretty easy to catch those tombstoned keys on the > client side. However, in certain applications it can generate some > additional overhead on the network. > > I think it would be nice to have a forc

Re: Is anyone using version 0.7 schema update API

2010-07-13 Thread Benjamin Black
I updated the Ruby client to 0.7, but I am not a Cassandra committer (and not much of a Java guy), so haven't touched the Java client. Is there more to it than regenerating Thrift bindings? On Tue, Jul 13, 2010 at 1:42 AM, GH wrote: > They are not complicated, its more that they are not in the p

Consulting for Rollout + Cassandra

2010-07-13 Thread David Boxenhorn
We are planning a rollout of our online product ~September 1. Cassandra is a major part of our online system. We need some Cassandra consulting + general online consulting for determining our server configuration so it will support Cassandra under all possible scenarios. Does anybody have any ide

Re: Consulting for Rollout + Cassandra

2010-07-13 Thread Benjamin Black
http://riptano.com On Tue, Jul 13, 2010 at 9:14 AM, David Boxenhorn wrote: > We are planning a rollout of our online product ~September 1. Cassandra is a > major part of our online system. > > We need some Cassandra consulting + general online consulting for > determining our server configuration

Re: Cassandra client - clock sync

2010-07-13 Thread Benjamin Black
On Tue, Jul 13, 2010 at 12:45 AM, Narendra Sharma wrote: > How are other Cassandra users handling the clock sync in production > environment? > By structuring access in the app such that there are never conflicts in the first place, for example by using UUIDs for row and column names. At the p

Re: Authentication

2010-07-13 Thread Ben Standefer
Are there any plans or talks of adding SSL/encryption support between Cassandra nodes? This would make setting up secure cross-country Cassandra clusters much easier, without having to setup a secure overlay network. MySQL supports this in it's replication. -Ben On Mon, Jul 12, 2010 at 11:23 P

Re: Performance Issues

2010-07-13 Thread Ran Tavory
Since you're using hector hector-users@ is a good place to be, so u...@cassandra to bcc operateWithFailover is one stop before sending the request over the network and waiting, so it makes lots of sense that a significant part of the application is spent in it. On Tue, Jul 13, 2010 at 6:22 PM, Sa

Re: CassandraBulkLoader

2010-07-13 Thread Mubarak Seyed
Thanks Torsten. Jonathan's blog on Fact Vs Fiction says that Fact: It has always been straightforward to send the output of Hadoop jobs to Cassandra, and Facebook, Digg, and others have been using Hadoop like this as a Cassandra bulk-loader for over a year. Does anyone from Facebook or Digg shar

understand thrift

2010-07-13 Thread S Ahmed
Just want some clarifications on thrift. 1. thrift creates a layer between Cassandra and the client, specific to whatever language you want. 2. thrift generates an interface to Cassandra's service endpoints *3. when Cassandra's endpoints have been modified, thrift needs to be re-generated (along

Re: GCGraceSeconds per ColumnFamily/Keyspace

2010-07-13 Thread B. Todd Burruss
https://issues.apache.org/jira/browse/CASSANDRA-1276 On Tue, 2010-07-13 at 09:05 -0700, Todd Burruss wrote: > From: Jonathan Ellis [jbel...@gmail.com] > Received: 7/12/10 9:15 PM > To: user@cassandra.apache.org [u...@cassandra.apache.org] > Subject: Re: GCGraceSeconds per ColumnFamily/Keyspace >

Re: understand thrift

2010-07-13 Thread Peter Schuller
> Just want some clarifications on thrift. > 1. thrift creates a layer between Cassandra and the client, specific to > whatever language you want. Well, thrift allows cassandra to expose an RPC interface in a language neutral fashion. > 2. thrift generates an interface to Cassandra's service endp

RE: Consequences of having many columns

2010-07-13 Thread Kochheiser,Todd W - TOK-DITT-1
So it would appear that 0.7 will have solved the requirement that a single row must be able to fit in memory. That issue aside, how would one expect the read/write performance to be in the scenarios listed below? From: Mason Hale [mailto:ma...@onespot.com] Sent:

Re: CassandraBulkLoader

2010-07-13 Thread Jonathan Ellis
look at contrib/bmt_example, with the caveat that it's usually premature optimization On Tue, Jul 13, 2010 at 12:31 PM, Mubarak Seyed wrote: > Thanks Torsten. > Jonathan's blog on Fact Vs Fiction says that > Fact: It has always been straightforward to send the output of Hadoop jobs > to Cassandra

Re: NYC Cassandra training

2010-07-13 Thread Jonathan Ellis
We would like to do one in Europe in October. On Fri, Jul 9, 2010 at 11:02 AM, Dave Gardner wrote: > > Do you have a rough estimate as to when there might be a training day in > London (UK). I'm currently weighing up whether I should be making a journey > across the pond for one of the US-based e

Re: NYC Cassandra training

2010-07-13 Thread Jonathan Ellis
On Fri, Jul 9, 2010 at 9:36 AM, Jeremy Dunck wrote: > On Fri, Jul 2, 2010 at 1:08 PM, Jonathan Ellis wrote: >> Riptano's one day Cassandra training is coming to NYC in August, our >> first public session on the East coast: >> http://www.eventbrite.com/event/749518831 > > Is there a calendar where

Re: High CPU usage on all nodes without any read or write

2010-07-13 Thread Jonathan Ellis
did you look at compaction activity? On Mon, Jul 12, 2010 at 9:31 AM, Olivier Rosello wrote: >> > But in Cassandra output log : >> > r...@cassandra-2:~#  tail -f /var/log/cassandra/output.log >> >  INFO 15:32:05,390 GC for ConcurrentMarkSweep: 1359 ms, 4295787600 >> reclaimed leaving 1684169392 u

Re: Authentication

2010-07-13 Thread Jonathan Ellis
It's been suggested, but it's not very useful w/o having encryption for Thrift as well (in case a client has to fail over to the cross-country Cassandra nodes). So using a secure VPN makes the most sense to me. On Tue, Jul 13, 2010 at 12:02 PM, Ben Standefer wrote: > Are there any plans or talks

Re: Authentication

2010-07-13 Thread Ben Standefer
Many apps would find it realistic or feasible to failover database connections across the country (going from <1ms latency to ~90ms latency). The scheme of failing over client database connections across the country is probably the minority case. SSL between Cassandra nodes, even without encrypti

Re: Authentication

2010-07-13 Thread Ben Standefer
Err, find it *unrealistic* -Ben On Tue, Jul 13, 2010 at 2:22 PM, Ben Standefer wrote: > Many apps would find it realistic or feasible to failover database > connections across the country (going from <1ms latency to ~90ms latency). > The scheme of failing over client database connections acro

Re: CassandraBulkLoader

2010-07-13 Thread Torsten Curdt
> look at contrib/bmt_example, with the caveat that it's usually > premature optimization I wish that was true for us :) >> Fact: It has always been straightforward to send the output of Hadoop jobs >> to Cassandra, and Facebook, Digg, and others have been using Hadoop like >> this as a Cassandra

Re: RE: Consequences of having many columns

2010-07-13 Thread Aaron Morton
If you do not need range scans (and assuming Random Partitioner), I would probably go with B. I tend to feel better when things are spread out. I'm not sure on any overhead on asking the coordinator to send requests to a lot of nodes. But I feel that it will make better use of new nodes added to th

Re: live nodes list in ring

2010-07-13 Thread Artie Copeland
Benjamin, Yes i have seen this when adding a new node into the cluster. the new node doesnt see the complete ring through nodetool, but the strange part is that looking at the ring through jconsole shows the complete ring. it as if there is a big in nodetool publishing the actual ring. has anyo

Re: Authentication

2010-07-13 Thread Jonathan Ellis
Are you interested in contributing this? On Tue, Jul 13, 2010 at 4:22 PM, Ben Standefer wrote: > Many apps would find it realistic or feasible to failover database > connections across the country (going from <1ms latency to ~90ms latency). >  The scheme of failing over client database connection

Elastic Load Balancing Cassandra

2010-07-13 Thread Brian Helfrich
Hi, has anyone been able to load balance a Cassandra cluster with an AWS Elastic Load Balancer? I've setup an ELB with the obvious settings (namely, --listener "lb-port=9160,instance-port=9160,protocol=TCP") but client's simply hang trying to load records from the ELB hostname:9160. Thanks, --Bria

Using Pelops with Cassandra 0.7.X

2010-07-13 Thread Peter Harrison
I know Cassandra 0.7 isn't released yet, but I was wondering if anyone has used Pelops with the latest builds of Cassandra? I'm having some issues, but I wanted to make sure that somebody else isn't working on a branch of Pelops to support Cassandra 7. I have downloaded and built the latest code fr

java.lang.NoSuchMethodError: org.apache.cassandra.db.ColumnFamily.id()I

2010-07-13 Thread Arya Goudarzi
I just build today's trunk successfully and am getting the following exception on startup which to me it seams bogus as the method exists but I don't know why: ERROR 15:27:00,957 Exception encountered during startup. java.lang.NoSuchMethodError: org.apache.cassandra.db.ColumnFamily.id()I

Re: Using Pelops with Cassandra 0.7.X

2010-07-13 Thread Ran Tavory
Hector doesn't have 0.7 support yet On Jul 14, 2010 1:34 AM, "Peter Harrison" wrote: I know Cassandra 0.7 isn't released yet, but I was wondering if anyone has used Pelops with the latest builds of Cassandra? I'm having some issues, but I wanted to make sure that somebody else isn't working on a

Re: java.lang.NoSuchMethodError: org.apache.cassandra.db.ColumnFamily.id()I

2010-07-13 Thread Jonathan Ellis
ant clean On Tue, Jul 13, 2010 at 5:33 PM, Arya Goudarzi wrote: > I just build today's trunk successfully and am getting the following > exception on startup which to me it seams bogus as the method exists but I > don't know why: > > ERROR 15:27:00,957 Exception encountered during startup. > ja

Re: Elastic Load Balancing Cassandra

2010-07-13 Thread Dave Viner
I haven't used ELB, but I've setup HAProxy to do it... appears to work well so far. Dave Viner On Tue, Jul 13, 2010 at 3:30 PM, Brian Helfrich wrote: > Hi, has anyone been able to load balance a Cassandra cluster with an AWS > Elastic Load Balancer? I've setup an ELB with the obvious settings (

Re: nodetool loadbalance : Strerams Continue on Non Acceptance of New Token

2010-07-13 Thread Arya Goudarzi
Hi Gary, Thanks for the reply. I tried this again today. Streams gets stuck, pls read my comment: https://issues.apache.org/jira/browse/CASSANDRA-1221 -arya - Original Message - From: "Gary Dusbabek" To: user@cassandra.apache.org Sent: Wednesday, June 23, 2010 5:40:02 AM Subject: Re:

Re: Is anyone using version 0.7 schema update API

2010-07-13 Thread GH
To be honest I do not know how to regenerate the binidings, I will look into that. ollowing your email, I went on and took the unit test code and created a client. Given that this code works I am guessing that the thrift bindings are in place and it is more that the client code does not support the

Re: Using Pelops with Cassandra 0.7.X

2010-07-13 Thread Dan Washusen
http://github.com/danwashusen/pelops/tree/cassandra-0.7.0 p.s. Pelops doesn't have any test coverage and my implicit tests (my app integration tests) don't touch anywhere near all of the Pelops API. p.s.s. I've made API breaking changes to support the new 0.7.0 API and Dominic (the original Pelop

Re: Using Pelops with Cassandra 0.7.X

2010-07-13 Thread Peter Harrison
On Wed, Jul 14, 2010 at 2:43 PM, Dan Washusen wrote: > http://github.com/danwashusen/pelops/tree/cassandra-0.7.0 Doh - I've just finished making most of the changes for the new API. > p.s. Pelops doesn't have any test coverage and my implicit tests (my app > integration tests) don't touch anywhe

Re: Is anyone using version 0.7 schema update API

2010-07-13 Thread Dave Viner
Check out step 4 of this page: https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP ./compiler/cpp/thrift -gen php ../PATH-TO-CASSANDRA/interface/cassandra.thrift That is how to compile the thrift client from the cassandra bindings. Just replace the "php" with the language of your ch

Re: Is anyone using version 0.7 schema update API

2010-07-13 Thread GH
Very cool stuff, thanks for the info Dave, I will give this a shot... On Wed, Jul 14, 2010 at 1:03 PM, Dave Viner wrote: > Check out step 4 of this page: > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP > > ./compiler/cpp/thrift -gen php > ../PATH-TO-CASSANDRA/interface/cassan

Re: Authentication

2010-07-13 Thread Ben Standefer
Yes, possibly. We haven't written it yet, and I was putting some feelers out there to see if there's any interest or buy-in from committers if we did contribute it. -Ben On Tue, Jul 13, 2010 at 3:23 PM, Jonathan Ellis wrote: > Are you interested in contributing this? > > On Tue, Jul 13, 2010

Re: Authentication

2010-07-13 Thread Mike Malone
Yep, as Ben said, we're not asking for anyone to write this for us. We've been playing with some ideas around encryption between EC2 data-centers/regions (intra-region is already secure enough for us -- it's all switches / dedicate lines) and the easiest solution seems to be to wrap the inter-Cass