Predicate Indexes

2010-06-03 Thread David Boxenhorn
So I've been thinking about the problem of how to do range queries on keys with random partitioning. I'm new to Cassandra, and I don't know what the plans are, but I have an idea and I thought I'd just put it out there: Predicate Indexes. I would like to be able to define predicate indexes in Cass

problem when trying to get_range_slice()

2010-06-03 Thread Shuai Yuan
Hi all, my env 6 servers with about 200GB data. data structure, 64B rowkey + (5B column)*20, rowkey and column.value are all random bytes from a-z,A-Z,0-9 problem when I tried iterate over the data in the storage, I always get org::apache::cassandra::TimedOutException (RpcTim

Re: [***SPAM*** ] problem when trying to get_range_slice()

2010-06-03 Thread Shuai Yuan
more info: CL = ONE, replica = 2, and when I tried to monitor the disk_io with iostat I get almost 0MB/s read & 0% CPU on the machine the scan-data app started on. Thanks! ??: Shuai Yuan ??: user@cassandra.apache.org : [***SPAM*** ] problem when trying t

Re: Error during startup

2010-06-03 Thread David Boxenhorn
We didn't change partitioners. Maybe we did some other stupid thing, but not that one. On Wed, Jun 2, 2010 at 8:52 PM, Gary Dusbabek wrote: > I was able to reproduce the error by staring up a node using > RandomPartioner, kill it, switch to OrderPreservingPartitioner, > restart, kill, switch b

Cassandra in the cloud

2010-06-03 Thread David Boxenhorn
We want to try out Cassandra in the cloud. Any recommendations? Comments? Should we use Amazon? Rackspace? Something else?

Re: Nodes dropping out of cluster due to GC

2010-06-03 Thread Peter Schüller
> We did indeed have a problem with our GC settings.  The survivor ratio was > too low.  After changing that things are better but we are still seeing GC > that takes 5-10 seconds, which is enough for the node to drop out of the > cluster briefly. This still indicates full GC:s. What is your write

Re: Giant sets of ordered data

2010-06-03 Thread yoshiyuki kanno
Hi I think In this case (logging hard traffic) both of two idea can't scale write operation in current Cassandra. So wait for secondary index support. 2010/6/3 Jonathan Shook > Insert "if you want to use long values for keys and column names" > above paragraph 2. I forgot that part. > > On Wed,

Re: Cassandra in the cloud

2010-06-03 Thread David King
> We want to try out Cassandra in the cloud. Any recommendations? Comments? > Should we use Amazon? Rackspace? Something else? I'm using it on Amazon with mostly success. I'd recommend increasing Phi from 8 to 10, use the 4-core/15gb instances to start, and if you plan to be really heavy on rea

MessageDeserializationTask backlog crash

2010-06-03 Thread Daniel Kluesing
I've had a few nodes crash (Out of heap), and when I pull the heap dump, there are hundreds of thousands of MessageDeserializationTasks in the thread pool executor, using up GB of the heap. I'm running 0.6.2 on sun jvm u20 and the nodes are under heavy load. Has anyone else run into this? I have

Re: Effective cache size

2010-06-03 Thread David King
>> So with the row cache, that first node (the primary replica) is the one that >> has that row cached, yes? > No, it's the closest node as determined by snitch.sortByProximity. And with the default snitch, rack-unaware placement, random partitioner, and all nodes up, that's the primary replica,

OutOfMemoryError

2010-06-03 Thread Lev Stesin
Hi, I am getting OOM during load tests: java.lang.OutOfMemoryError: Java heap space at java.util.HashSet.(HashSet.java:125) at com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:181) at com.google.common.collect.HashMultimap.createCollection(HashMultimap

What is K-table ?

2010-06-03 Thread yaw
Hi all, connecting to a cluster with cassandra-cli and trying a describe command, I obtain a "missing K_TABLE" message : cassandra> describe Keyspace1 line 1:9 missing K_TABLE at 'Keyspace1' Keyspace1.Super1 Column Family Type: Super Columns Sorted By: org.apache.cassandra.db.marshal.bytest...@2

Re: OutOfMemoryError

2010-06-03 Thread Gary Dusbabek
Are you running "ant test"? It defaults to setting memory to 1G. If you're running them outside of ant, you'll need to set max memory manually. Gary. On Thu, Jun 3, 2010 at 10:35, Lev Stesin wrote: > Hi, > > I am getting OOM during load tests: > > java.lang.OutOfMemoryError: Java heap space >

Re: Cassandra in the cloud

2010-06-03 Thread Eric Evans
On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote: > We want to try out Cassandra in the cloud. Any recommendations? > Comments? > > Should we use Amazon? Rackspace? Something else? I personally haven't used Cassandra on EC2, but others have reported significantly better disk IO, (and hen

Re: OutOfMemoryError

2010-06-03 Thread Lev Stesin
Gary, Is there a directive to set it? Or should I modify the cassandra script itself? Thanks. Lev. On Thu, Jun 3, 2010 at 10:48 AM, Gary Dusbabek wrote: > Are you running "ant test"?  It defaults to setting memory to 1G.  If > you're running them outside of ant, you'll need to set max memory >

Re: Cassandra in the cloud

2010-06-03 Thread Ben Standefer
We're using Cassandra on AWS at SimpleGeo. We software RAID 0 stripe the ephemeral drives to achieve better I/O and have machines in multiple Availability Zones with a custom EndPointSnitch that replicates the data between AZs for high availability (to be open-sourced/contributed at some point).

Re: Cassandra in the cloud

2010-06-03 Thread Mike Subelsky
Ben, do you just keep the commit log on the ephemeral drive? Or data and commit? (I was confused by your reference to XFS and snapshots -- I assume you keep data on the XFS drive) -Mike On Thu, Jun 3, 2010 at 2:29 PM, Ben Standefer wrote: > We're using Cassandra on AWS at SimpleGeo.  We softwa

Cassandra Cluster Setup

2010-06-03 Thread Stephan Pfammatter
I'm having difficulties setting up a 3 way cassandra cluster. Any comments/help would be appreciated. My goal is that all data should be fully replicated amongst the 3 nodes. I want to simulate the failure of one node and proof that the test column family still can be accessed. In a nutshell I

Re: OutOfMemoryError

2010-06-03 Thread Gary Dusbabek
It's set in the build file: But I'm not sure if you're using the build file or not. It kind of sounds like you are not. Gary. On Thu, Jun 3, 2010 at 11:24, Lev Stesin wrote: > Gary, > > Is there a directive to set it? Or should I modify the cassandra > script itself? Thanks. > > Lev. > > On

Re: Cassandra Cluster Setup

2010-06-03 Thread Gary Dusbabek
Your replication factor is only set to 1, which means that each key will only live on a single node. If you do wait for bootstrapping to commence (takes 90s in trunk, I don't recall in 0.6), you should see some keys moving unless your inserts were all into a small range. Perhaps your being impatie

Re: Cassandra in the cloud

2010-06-03 Thread Ben Standefer
The commit log and data directory are on the same mounted directory structure (the 2 RAID 0 striped ephemeral disks) rather than using 1 of the ephemeral disks for the data and 1 of the ephemeral disks for the data directory. While it's usually advised that for disk utilization reasons you keep th

Re: Cassandra in the cloud

2010-06-03 Thread Mike Subelsky
Ben, thanks for that, we may try that. I did find an AWS forum tidbit from two years ago: "4 ephemeral stores striped together can give significantly higher throughput for sequential writes than EBS." http://developer.amazonwebservices.com/connect/thread.jspa?messageID=125197𞤍 -Mike On Thu, J

Re: Cassandra in the cloud

2010-06-03 Thread Ben Standefer
Mike, yep, there are a lot of benchmarks proving it (plus it just makes sense) http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html http://www.mysqlperformanceblog.com/2009/08/06/ec2ebs-single-and-raid-volumes-io-bencmark/ http://orion.heroku.com/past/2009/7/29/io_performanc

Re: Cassandra Cluster Setup

2010-06-03 Thread Nahor
On 2010-06-03 13:07, Stephan Pfammatter wrote: Cassandra-or [...] cassandra-or Aside from the replication factor noted by Gary, this should point to your existing node (cassandra-ca) otherwise, how will this node know where existing node is and where to get the data from? Cassandra-az [

Cassandra training Jun 18 in SF

2010-06-03 Thread Jonathan Ellis
We're back with another public Cassandra training: http://www.eventbrite.com/event/718755818 This will be Riptano's 6th training session (including the four we've done that were on-site with a specific customer), and in my humble opinion the material's really solid at this point. The eventbrite t

Re: [***SPAM*** ] Re: question about class SlicePredicate

2010-06-03 Thread Shuai Yuan
It's documented that get_range_slice() supports all partitioner in 0.6 Kevin ??: Olivier Mallassi ??: user@cassandra.apache.org : [***SPAM*** ] Re: question about class SlicePredicate : Tue, 1 Jun 2010 13:38:03 +0200 Does it work whatever the chosen p

Re: problem when trying to get_range_slice()

2010-06-03 Thread Jonathan Ellis
use smaller slices and page through the data 2010/6/3 Shuai Yuan : > Hi all, > > my env > > 6 servers with about 200GB data. > > data structure, > > 64B rowkey + (5B column)*20, > rowkey and column.value are all random bytes from a-z,A-Z,0-9 > > problem > > when I tried iterate ove

Re: Effective cache size

2010-06-03 Thread Jonathan Ellis
On Thu, Jun 3, 2010 at 10:17 AM, David King wrote: >>> So with the row cache, that first node (the primary replica) is the one >>> that has that row cached, yes? >> No, it's the closest node as determined by snitch.sortByProximity. > > And with the default snitch, rack-unaware placement, random p

Re: What is K-table ?

2010-06-03 Thread Jonathan Ellis
Sounds like a bug in the cli. Maybe it only knows how to describe KS + CF together? Please file a bug report at https://issues.apache.org/jira/browse/CASSANDRA. On Thu, Jun 3, 2010 at 10:37 AM, yaw wrote: > Hi all, > connecting to a cluster with cassandra-cli and trying a describe command,  I >

Re: What is K-table ?

2010-06-03 Thread Philip Stanhope
Note the describe_keyspace API method does not exhibit this behavior in 0.6.2 ... seems to be a problem specific to cassandra-cli. -phil On Jun 3, 2010, at 10:18 PM, Jonathan Ellis wrote: > Sounds like a bug in the cli. Maybe it only knows how to describe KS > + CF together? > > Please file a

Re: MessageDeserializationTask backlog crash

2010-06-03 Thread Jonathan Ellis
having the write or read stage fill up, will cause as a secondary effect deserialization to fill up moral: when you start getting timeout exceptions, have your clients sleep for 100ms or otherwise back off (or maybe you just need to add capacity) On Thu, Jun 3, 2010 at 10:16 AM, Daniel Kluesing

Re: [***SPAM*** ] Re: problem when trying to get_range_slice()

2010-06-03 Thread Shuai Yuan
Thanks for the hint. I found out it was "too many opened files" error and the server side just lost response to the get_range_slice() request by throwing out an exception. Now works with "ulimit -n 32768". Kevin ??: Jonathan Ellis ??: user@cassandra.apache.or

Re: Cassandra Cluster Setup

2010-06-03 Thread Benjamin Black
http://wiki.apache.org/cassandra/MultinodeCluster On Thu, Jun 3, 2010 at 1:07 PM, Stephan Pfammatter wrote: > I’m having difficulties setting up a 3 way cassandra cluster. Any > comments/help would be appreciated. > > > > My goal is that all data should be fully replicated amongst the 3 nodes. I

High CPU Usage since 0.6.2

2010-06-03 Thread Lu Ming
I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to cassandra to 0.6.2 yesterday. But today I find six cassandra nodes have high CPU usage more than 400% in my 8-core CPU sever. The worst one is more than 760%. It is very serious. I use jvisualvm to watch the worst node, and

Re: High CPU Usage since 0.6.2

2010-06-03 Thread Chris Goffinet
We're seeing this as well. We were testing with a 40+ node cluster on the latest 0.6 branch from few days ago. -Chris On Jun 3, 2010, at 9:55 PM, Lu Ming wrote: > > I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to > cassandra to 0.6.2 yesterday. > But today I find six cass

High read latency

2010-06-03 Thread Ma Xiao
we have a SupperCF which may have up to 1000 supper columns and 5 clumns for each supper column, the read latency may go up to 50ms (even higher), I think it's a long time to response, how to tune the storage config to optimize the performace? I read the wiki, may help to do this, supose that