Predicate Indexes
So I've been thinking about the problem of how to do range queries on keys with random partitioning. I'm new to Cassandra, and I don't know what the plans are, but I have an idea and I thought I'd just put it out there: Predicate Indexes. I would like to be able to define predicate indexes in Cassandra, something like this: At each node, Cassandra would maintain indexes for every key that matches the predicate that each index defines. Within each index, keys would be ordered by the order implied by Random Partitioner. A new attribute should be added to KeyRange: Name - i.e. setName(String name), getName(), etc. When we loop through the keys, we would pass the last key in as the start key, until we finish, as we do now. The results would not be ordered, but we would have very quick access to the entire range implied by the predicate. I very much want something like this. I am willing to pay the price in disk space. Yes, I know that something like this can be approximated by super columns. But supercolumns have well-known problems, primarily practical limitations on the size of supercolumns, secondarily the increased number of round-trips that working with supercolumns necessitates, and tertiarily the management costs of maintaining the supercolumns by hand.
problem when trying to get_range_slice()
Hi all, my env 6 servers with about 200GB data. data structure, 64B rowkey + (5B column)*20, rowkey and column.value are all random bytes from a-z,A-Z,0-9 problem when I tried iterate over the data in the storage, I always get org::apache::cassandra::TimedOutException (RpcTimeout = 2minutes) questions 1.How could I iterate over my data then? 2.In the code below, I gave key_start/end/count twice, one for get_range_slice() and the other for SlicePreditor. Are they the same? Thanks! scan-data application CassandraFactory factory(argv[1], 9160); tr1::shared_ptr client(factory.create()); Keyspace *key_space = client->getKeyspace("campaign"); map > mResult; ColumnParent cp; SlicePredicate sp; string szStart, szEnd; uint32_t uCount = 100; //try to retrieve 100 keys? szStart = ""; szEnd = "1000"; cp.column_family = "test_col"; sp.__isset.slice_range = true; sp.slice_range.start = ""; sp.slice_range.finish = "1000"; sp.slice_range.count = 100; try { mResult = key_space->getRangeSlice(cp, sp, szStart, szEnd, uCount); //by libcassandra, mainly invoking get_range_slice() /* then iterate the result and output */ } } } catch (org::apache::cassandra::InvalidRequestException &ire) { cout << ire.why << endl; delete key_space; return 1; } return 0; -- Kevin Yuan www.yuan-shuai.info
Re: [***SPAM*** ] problem when trying to get_range_slice()
more info: CL = ONE, replica = 2, and when I tried to monitor the disk_io with iostat I get almost 0MB/s read & 0% CPU on the machine the scan-data app started on. Thanks! ??: Shuai Yuan ??: user@cassandra.apache.org : [***SPAM*** ] problem when trying to get_range_slice() : Thu, 03 Jun 2010 15:57:14 +0800 Hi all, my env 6 servers with about 200GB data. data structure, 64B rowkey + (5B column)*20, rowkey and column.value are all random bytes from a-z,A-Z,0-9 problem when I tried iterate over the data in the storage, I always get org::apache::cassandra::TimedOutException (RpcTimeout = 2minutes) questions 1.How could I iterate over my data then? 2.In the code below, I gave key_start/end/count twice, one for get_range_slice() and the other for SlicePreditor. Are they the same? Thanks! scan-data application CassandraFactory factory(argv[1], 9160); tr1::shared_ptr client(factory.create()); Keyspace *key_space = client->getKeyspace("campaign"); map > mResult; ColumnParent cp; SlicePredicate sp; string szStart, szEnd; uint32_t uCount = 100; //try to retrieve 100 keys? szStart = ""; szEnd = "1000"; cp.column_family = "test_col"; sp.__isset.slice_range = true; sp.slice_range.start = ""; sp.slice_range.finish = "1000"; sp.slice_range.count = 100; try { mResult = key_space->getRangeSlice(cp, sp, szStart, szEnd, uCount); //by libcassandra, mainly invoking get_range_slice() /* then iterate the result and output */ } } } catch (org::apache::cassandra::InvalidRequestException &ire) { cout << ire.why << endl; delete key_space; return 1; } return 0; -- Kevin Yuan www.yuan-shuai.info
Re: Error during startup
We didn't change partitioners. Maybe we did some other stupid thing, but not that one. On Wed, Jun 2, 2010 at 8:52 PM, Gary Dusbabek wrote: > I was able to reproduce the error by staring up a node using > RandomPartioner, kill it, switch to OrderPreservingPartitioner, > restart, kill, switch back to RandomPartitioner, BANG! > > So it looks like you tinkered with the partitioner at some point. > This has the unfortunate effect of corrupting your system table. I'm > trying to figure out a way to detect this and abort before data is > overwritten. > > Gary. > > > On Sun, May 30, 2010 at 06:49, David Boxenhorn wrote: > > I deleted the system/LocationInfo files, and now everything works. > > > > Yay! (...what happened?) > > > > On Sun, May 30, 2010 at 4:18 PM, David Boxenhorn > wrote: > >> > >> I'm getting an "Expected both token and generation columns; found > >> ColumnFamily" error during startup can anyone tell me what it is? > Details > >> below. > >> > >> Starting Cassandra Server > >> Listening for transport dt_socket at address: > >> INFO 16:14:33,459 Auto DiskAccessMode determined to be standard > >> INFO 16:14:33,615 Sampling index for > >> C:\var\lib\cassandra\data\system\LocationInfo-1-Data.db > >> INFO 16:14:33,631 Removing orphan > >> C:\var\lib\cassandra\data\Lookin2\Users-tmp-27-Index.db > >> INFO 16:14:33,631 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db > >> INFO 16:14:33,662 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Users-18-Data.db > >> INFO 16:14:33,818 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db > >> INFO 16:14:33,850 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db > >> INFO 16:14:33,865 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db > >> INFO 16:14:33,881 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-580-Data.db > >> INFO 16:14:33,896 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-672-Data.db > >> INFO 16:14:33,912 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-681-Data.db > >> INFO 16:14:33,912 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-691-Data.db > >> INFO 16:14:33,928 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-696-Data.db > >> INFO 16:14:33,943 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Attractions-17-Data.db > >> INFO 16:14:34,006 Sampling index for > >> > C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-5-Data.db > >> INFO 16:14:34,006 Sampling index for > >> > C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-6-Data.db > >> INFO 16:14:34,021 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-29-Data.db > >> INFO 16:14:34,350 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-51-Data.db > >> INFO 16:14:34,693 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-72-Data.db > >> INFO 16:14:35,021 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-77-Data.db > >> INFO 16:14:35,225 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-78-Data.db > >> INFO 16:14:35,350 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-79-Data.db > >> INFO 16:14:35,459 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-80-Data.db > >> INFO 16:14:35,459 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Taxonomy-1-Data.db > >> INFO 16:14:35,475 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Taxonomy-2-Data.db > >> INFO 16:14:35,475 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Content-30-Data.db > >> INFO 16:14:35,631 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Content-35-Data.db > >> INFO 16:14:35,771 Sampling index for > >> C:\var\lib\cassandra\data\Lookin2\Content-40-Data.db > >> INFO 16:14:35,959 Compacting > >> > [org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db')] > >> ERROR 16:14:35,975 Exception encountered during startup. > >> java.lang.RuntimeException: Expected both token and generation columns; > >> found ColumnFamily(LocationInfo [Generation:false:4...@4,]) > >> at > >> org.apache.cassandra.db.SystemTable.initMetadata(SystemTable.java:159) > >> at > >> > org.apache.cassandra.service.StorageService.initServer(StorageService.java:305) > >> at > >> > org.apache.cassandra.thrift.CassandraDaemon.setup(Cassand
Cassandra in the cloud
We want to try out Cassandra in the cloud. Any recommendations? Comments? Should we use Amazon? Rackspace? Something else?
Re: Nodes dropping out of cluster due to GC
> We did indeed have a problem with our GC settings. The survivor ratio was > too low. After changing that things are better but we are still seeing GC > that takes 5-10 seconds, which is enough for the node to drop out of the > cluster briefly. This still indicates full GC:s. What is your write activity like? Do you know if you're legitimately growing the heap quickly enough that the concurrent marking in CMS is unable to catch up? What is the free heap ratio (according to the logs produced with -XX:+PrintGC/-XX:+PrintGCDetails) after a concurrent mark-sweep has finished? If the heap is very full even after a mark/sweep you likely need a bigger heep or smaller caches sizes/memtables flush thresholds etc. On the other hand if you have very significant amounts of free space in the heap after a mark/sweep, the problem may rather be that CMS is just kicking in too late. If so you can experiment with the -XX:+UseCMSInitiatingOccupancyOnly and -XX:CMSInitiatingOccupancyFraction=XXX options. If you're willing to temporarily accept that CMS is continuously running (due to an aggressive initiating occupancy fraction) that should at least tell you whether you can in fact avoid the fallbacks and if so, then look at more proper tuning... -- / Peter Schuller aka scode
Re: Giant sets of ordered data
Hi I think In this case (logging hard traffic) both of two idea can't scale write operation in current Cassandra. So wait for secondary index support. 2010/6/3 Jonathan Shook > Insert "if you want to use long values for keys and column names" > above paragraph 2. I forgot that part. > > On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook wrote: > > If you want to do range queries on the keys, you can use OPP to do this: > > (example using UTF-8 lexicographic keys, with bursts split across rows > > according to row size limits) > > > > Events: { > > "20100601.05.30.003": { > >"20100601.05.30.003": > >"20100601.05.30.007": > >... > > } > > } > > > > With a future version of Cassandra, you may be able to use the same > > basic datatype for both key and column name, as keys will be binary > > like the rest, I believe. > > > > I'm not aware of specific performance improvements when using OPP > > range queries on keys vs iterating over known keys. I suspect (hope) > > that round-tripping to the server should be reduced, which may be > > significant. Does anybody have decent benchmarks that tell the > > difference? > > > > > > On Wed, Jun 2, 2010 at 11:53 AM, Ben Browning wrote: > >> With a traffic pattern like that, you may be better off storing the > >> events of each burst (I'll call them group) in one or more keys and > >> then storing these keys in the day key. > >> > >> EventGroupsPerDay: { > >> "20100601": { > >>123456789: "group123", // column name is timestamp group was > >> received, column value is key > >>123456790: "group124" > >> } > >> } > >> > >> EventGroups: { > >> "group123": { > >>123456789: "value1", > >>123456799: "value2" > >> } > >> } > >> > >> If you think of Cassandra as a toolkit for building scalable indexes > >> it seems to make the modeling a bit easier. In this case, you're > >> building an index by day to lookup events that come in as groups. So, > >> first you'd fetch the slice of columns for the day you're interested > >> in to figure out which groups to look at then you'd fetch the events > >> in those groups. > >> > >> There are plenty of alternate ways to divide up the data among rows > >> also - you could use hour keys instead of days as an example. > >> > >> On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn > wrote: > >>> Let's say you're logging events, and you have billions of events. What > if > >>> the events come in bursts, so within a day there are millions of > events, but > >>> they all come within microseconds of each other a few times a day? How > do > >>> you find the events that happened on a particular day if you can't > store > >>> them all in one row? > >>> > >>> On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook > wrote: > > Either OPP by key, or within a row by column name. I'd suggest the > latter. > If you have structured data to stick under a column (named by the > timestamp), then you can serialize and unserialize it yourself, or you > can use a supercolumn. It's effectively the same thing. Cassandra > only provides the super column support as a convenience layer as it is > currently implemented. That may change in the future. > > You didn't make clear in your question why a standard column would be > less suitable. I presumed you had layered structure within the > timestamp, hence my response. > How would you logically partition your dataset according to natural > application boundaries? This will answer most of your question. > If you have a dataset which can't be partitioned into a reasonable > size row, then you may want to use OPP and key concatenation. > > What do you mean by giant? > > On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn > wrote: > > How do I handle giant sets of ordered data, e.g. by timestamps, > which I > > want > > to access by range? > > > > I can't put all the data into a supercolumn, because it's loaded > into > > memory > > at once, and it's too much data. > > > > Am I forced to use an order-preserving partitioner? I don't want the > > headache. Is there any other way? > > > >>> > >>> > >> > > >
Re: Cassandra in the cloud
> We want to try out Cassandra in the cloud. Any recommendations? Comments? > Should we use Amazon? Rackspace? Something else? I'm using it on Amazon with mostly success. I'd recommend increasing Phi from 8 to 10, use the 4-core/15gb instances to start, and if you plan to be really heavy on reads to use software RAID of striped EBS volumes instead of just the straight-up EBS volumes.
MessageDeserializationTask backlog crash
I've had a few nodes crash (Out of heap), and when I pull the heap dump, there are hundreds of thousands of MessageDeserializationTasks in the thread pool executor, using up GB of the heap. I'm running 0.6.2 on sun jvm u20 and the nodes are under heavy load. Has anyone else run into this? I haven't deeply understood the deserilization pool, but just based on the name, it seems like it should be super fast. What could make it back up with hundreds of thousands of messages?
Re: Effective cache size
>> So with the row cache, that first node (the primary replica) is the one that >> has that row cached, yes? > No, it's the closest node as determined by snitch.sortByProximity. And with the default snitch, rack-unaware placement, random partitioner, and all nodes up, that's the primary replica, right? > any given node X will never know whether another node Y has a row cached or > not. the overhead for communicating that level of detail would be totally > prohibitive. all caching does is speed the read, once a request is received > for data local to a given node. no more, no less. Yes, that's my concern, but the details significantly affect the effective size of the cache (in the afoorementioned case, the details place the effective size at either 6 million or 18 million, a 3x difference). So given CL==ONE reads, only the actually read node (which will be the primary replica given the default placement strategy and snitch) will cache the item, right? The checksum-checking doesn't cause the row to be cached on the non-read nodes? If I read with CL==QUORUM in an RF==3 environment, do both read nodes them cache the item, or only the primary replica?
OutOfMemoryError
Hi, I am getting OOM during load tests: java.lang.OutOfMemoryError: Java heap space at java.util.HashSet.(HashSet.java:125) at com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:181) at com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:112) at com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:47) at com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:155) at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:207) at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194) at com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:80) at com.google.common.collect.HashMultimap.put(HashMultimap.java:47) at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:87) at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:451) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:361) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:1484) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1125) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Any suggestions as to how to fix/ease the problem? Thanks. -- Lev
What is K-table ?
Hi all, connecting to a cluster with cassandra-cli and trying a describe command, I obtain a "missing K_TABLE" message : cassandra> describe Keyspace1 line 1:9 missing K_TABLE at 'Keyspace1' Keyspace1.Super1 Column Family Type: Super Columns Sorted By: org.apache.cassandra.db.marshal.bytest...@2bbe2893 Is this a real issue ? Many thanks,
Re: OutOfMemoryError
Are you running "ant test"? It defaults to setting memory to 1G. If you're running them outside of ant, you'll need to set max memory manually. Gary. On Thu, Jun 3, 2010 at 10:35, Lev Stesin wrote: > Hi, > > I am getting OOM during load tests: > > java.lang.OutOfMemoryError: Java heap space > at java.util.HashSet.(HashSet.java:125) > at > com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:181) > at > com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:112) > at > com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:47) > at > com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:155) > at > com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:207) > at > com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194) > at > com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:80) > at com.google.common.collect.HashMultimap.put(HashMultimap.java:47) > at > org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:87) > at > org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204) > at > org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:451) > at > org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:361) > at > org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:1484) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1125) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > Any suggestions as to how to fix/ease the problem? Thanks. > > -- > Lev >
Re: Cassandra in the cloud
On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote: > We want to try out Cassandra in the cloud. Any recommendations? > Comments? > > Should we use Amazon? Rackspace? Something else? I personally haven't used Cassandra on EC2, but others have reported significantly better disk IO, (and hence, better performance), with Rackspace's Cloud Servers. Full disclosure though, I work for Rackspace. :) -- Eric Evans eev...@rackspace.com
Re: OutOfMemoryError
Gary, Is there a directive to set it? Or should I modify the cassandra script itself? Thanks. Lev. On Thu, Jun 3, 2010 at 10:48 AM, Gary Dusbabek wrote: > Are you running "ant test"? It defaults to setting memory to 1G. If > you're running them outside of ant, you'll need to set max memory > manually. > > Gary. > > On Thu, Jun 3, 2010 at 10:35, Lev Stesin wrote: >> Hi, >> >> I am getting OOM during load tests: >> >> java.lang.OutOfMemoryError: Java heap space >> at java.util.HashSet.(HashSet.java:125) >> at >> com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:181) >> at >> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:112) >> at >> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:47) >> at >> com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:155) >> at >> com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:207) >> at >> com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194) >> at >> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:80) >> at com.google.common.collect.HashMultimap.put(HashMultimap.java:47) >> at >> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:87) >> at >> org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204) >> at >> org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:451) >> at >> org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:361) >> at >> org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:1484) >> at >> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1125) >> at >> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:619) >> >> Any suggestions as to how to fix/ease the problem? Thanks. >> >> -- >> Lev >> > -- Lev
Re: Cassandra in the cloud
We're using Cassandra on AWS at SimpleGeo. We software RAID 0 stripe the ephemeral drives to achieve better I/O and have machines in multiple Availability Zones with a custom EndPointSnitch that replicates the data between AZs for high availability (to be open-sourced/contributed at some point). Using XFS as described here http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663 also makes it very easy to snapshot your cluster to S3. We've had no real problems with EC2 and Cassandra, it's been great. -Ben Standefer On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans wrote: > On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote: >> We want to try out Cassandra in the cloud. Any recommendations? >> Comments? >> >> Should we use Amazon? Rackspace? Something else? > > I personally haven't used Cassandra on EC2, but others have reported > significantly better disk IO, (and hence, better performance), with > Rackspace's Cloud Servers. > > Full disclosure though, I work for Rackspace. :) > > -- > Eric Evans > eev...@rackspace.com > >
Re: Cassandra in the cloud
Ben, do you just keep the commit log on the ephemeral drive? Or data and commit? (I was confused by your reference to XFS and snapshots -- I assume you keep data on the XFS drive) -Mike On Thu, Jun 3, 2010 at 2:29 PM, Ben Standefer wrote: > We're using Cassandra on AWS at SimpleGeo. We software RAID 0 stripe > the ephemeral drives to achieve better I/O and have machines in > multiple Availability Zones with a custom EndPointSnitch that > replicates the data between AZs for high availability (to be > open-sourced/contributed at some point). > > Using XFS as described here > http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663 > also makes it very easy to snapshot your cluster to S3. > > We've had no real problems with EC2 and Cassandra, it's been great. > > -Ben Standefer > > > On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans wrote: >> On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote: >>> We want to try out Cassandra in the cloud. Any recommendations? >>> Comments? >>> >>> Should we use Amazon? Rackspace? Something else? >> >> I personally haven't used Cassandra on EC2, but others have reported >> significantly better disk IO, (and hence, better performance), with >> Rackspace's Cloud Servers. >> >> Full disclosure though, I work for Rackspace. :) >> >> -- >> Eric Evans >> eev...@rackspace.com >> >> > -- Mike Subelsky oib.com // ignitebaltimore.com // subelsky.com @subelsky // (410) 929-4022
Cassandra Cluster Setup
I'm having difficulties setting up a 3 way cassandra cluster. Any comments/help would be appreciated. My goal is that all data should be fully replicated amongst the 3 nodes. I want to simulate the failure of one node and proof that the test column family still can be accessed. In a nutshell I did the following 3 steps. The excerpt below shows the changed storage-con.xml: (1)I started Cassandra-ca and worked fine. I added 3 rows to a cf cf_test. (2)Then I started Cassandra-or. It comes up ok but goes into "sleeping 3".mode (3)Nodetool streams doesn't reveal anything. Node: Cassandra-ca false org.apache.cassandra.locator.RackUnawareStrategy 1 org.apache.cassandra.locator.EndPointSnitch cassandra-ca cassandra-us 9160 cassandra-us Cassandra-or true org.apache.cassandra.locator.RackUnawareStrategy 1 org.apache.cassandra.locator.EndPointSnitch cassandra-or cassandra-or 9160 cassandra-or Cassandra-az Cassandra-or true org.apache.cassandra.locator.RackUnawareStrategy 1 org.apache.cassandra.locator.EndPointSnitch cassandra-az cassandra-az 9160 cassandra-az
Re: OutOfMemoryError
It's set in the build file: But I'm not sure if you're using the build file or not. It kind of sounds like you are not. Gary. On Thu, Jun 3, 2010 at 11:24, Lev Stesin wrote: > Gary, > > Is there a directive to set it? Or should I modify the cassandra > script itself? Thanks. > > Lev. > > On Thu, Jun 3, 2010 at 10:48 AM, Gary Dusbabek wrote: >> Are you running "ant test"? It defaults to setting memory to 1G. If >> you're running them outside of ant, you'll need to set max memory >> manually. >> >> Gary. >> >> On Thu, Jun 3, 2010 at 10:35, Lev Stesin wrote: >>> Hi, >>> >>> I am getting OOM during load tests: >>> >>> java.lang.OutOfMemoryError: Java heap space >>> at java.util.HashSet.(HashSet.java:125) >>> at >>> com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:181) >>> at >>> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:112) >>> at >>> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:47) >>> at >>> com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:155) >>> at >>> com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:207) >>> at >>> com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194) >>> at >>> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:80) >>> at com.google.common.collect.HashMultimap.put(HashMultimap.java:47) >>> at >>> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:87) >>> at >>> org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204) >>> at >>> org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:451) >>> at >>> org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:361) >>> at >>> org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:1484) >>> at >>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1125) >>> at >>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>> at java.lang.Thread.run(Thread.java:619) >>> >>> Any suggestions as to how to fix/ease the problem? Thanks. >>> >>> -- >>> Lev >>> >> > > > > -- > Lev >
Re: Cassandra Cluster Setup
Your replication factor is only set to 1, which means that each key will only live on a single node. If you do wait for bootstrapping to commence (takes 90s in trunk, I don't recall in 0.6), you should see some keys moving unless your inserts were all into a small range. Perhaps your being impatient. If not, I recommend that you start over, set the replication factor to 3, wait a good while for all the nodes to fully join the cluster, and then make sure the keys you write to are random. Gary. On Thu, Jun 3, 2010 at 13:07, Stephan Pfammatter wrote: > I’m having difficulties setting up a 3 way cassandra cluster. Any > comments/help would be appreciated. > > > > My goal is that all data should be fully replicated amongst the 3 nodes. I > want to simulate the failure of one node and proof that the test column > family still can be accessed. > > In a nutshell I did the following 3 steps. The excerpt below shows the > changed storage-con.xml: > > (1) I started Cassandra-ca and worked fine. I added 3 rows to a cf > cf_test. > > (2) Then I started Cassandra-or. It comes up ok but goes into “sleeping > 3”…..mode > > (3) Nodetool streams doesn’t reveal anything. > > > > > > > > Node: Cassandra-ca > > false > > > > > > > org.apache.cassandra.locator.RackUnawareStrategy > > 1 > > > org.apache.cassandra.locator.EndPointSnitch > > > > cassandra-ca > > cassandra-us > > 9160 > > cassandra-us > > > > Cassandra-or > > true > > > > > > > org.apache.cassandra.locator.RackUnawareStrategy > > 1 > > > org.apache.cassandra.locator.EndPointSnitch > > > > cassandra-or > > cassandra-or > > 9160 > > cassandra-or > > > > Cassandra-az > > > > Cassandra-or > > true > > > > > > > org.apache.cassandra.locator.RackUnawareStrategy > > 1 > > > org.apache.cassandra.locator.EndPointSnitch > > > > cassandra-az > > cassandra-az > > 9160 > > cassandra-az > > > > > >
Re: Cassandra in the cloud
The commit log and data directory are on the same mounted directory structure (the 2 RAID 0 striped ephemeral disks) rather than using 1 of the ephemeral disks for the data and 1 of the ephemeral disks for the data directory. While it's usually advised that for disk utilization reasons you keep the commit logs and data directory on separate disks, our RAID0 configuration gives us much more space for the data directory without having to mess with EBSes. We've found it to be fine for now. I see how my XFS snapshots reference was confusing. Our plan is to have a single AZ use EBSes for the data directory so that we can more easily snapshot our data (trusting that our AZ-aware EndPointSnitch), while other AZs will continue ephemeral drives. -Ben Standefer On Thu, Jun 3, 2010 at 1:26 PM, Mike Subelsky wrote: > Ben, > > do you just keep the commit log on the ephemeral drive? Or data and > commit? (I was confused by your reference to XFS and snapshots -- I > assume you keep data on the XFS drive) > > -Mike > > On Thu, Jun 3, 2010 at 2:29 PM, Ben Standefer wrote: >> We're using Cassandra on AWS at SimpleGeo. We software RAID 0 stripe >> the ephemeral drives to achieve better I/O and have machines in >> multiple Availability Zones with a custom EndPointSnitch that >> replicates the data between AZs for high availability (to be >> open-sourced/contributed at some point). >> >> Using XFS as described here >> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663 >> also makes it very easy to snapshot your cluster to S3. >> >> We've had no real problems with EC2 and Cassandra, it's been great. >> >> -Ben Standefer >> >> >> On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans wrote: >>> On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote: We want to try out Cassandra in the cloud. Any recommendations? Comments? Should we use Amazon? Rackspace? Something else? >>> >>> I personally haven't used Cassandra on EC2, but others have reported >>> significantly better disk IO, (and hence, better performance), with >>> Rackspace's Cloud Servers. >>> >>> Full disclosure though, I work for Rackspace. :) >>> >>> -- >>> Eric Evans >>> eev...@rackspace.com >>> >>> >> > > > > -- > Mike Subelsky > oib.com // ignitebaltimore.com // subelsky.com > @subelsky // (410) 929-4022 >
Re: Cassandra in the cloud
Ben, thanks for that, we may try that. I did find an AWS forum tidbit from two years ago: "4 ephemeral stores striped together can give significantly higher throughput for sequential writes than EBS." http://developer.amazonwebservices.com/connect/thread.jspa?messageID=125197𞤍 -Mike On Thu, Jun 3, 2010 at 5:57 PM, Ben Standefer wrote: > The commit log and data directory are on the same mounted directory > structure (the 2 RAID 0 striped ephemeral disks) rather than using 1 > of the ephemeral disks for the data and 1 of the ephemeral disks for > the data directory. While it's usually advised that for disk > utilization reasons you keep the commit logs and data directory on > separate disks, our RAID0 configuration gives us much more space for > the data directory without having to mess with EBSes. We've found it > to be fine for now. > > I see how my XFS snapshots reference was confusing. Our plan is to > have a single AZ use EBSes for the data directory so that we can more > easily snapshot our data (trusting that our AZ-aware EndPointSnitch), > while other AZs will continue ephemeral drives. > > -Ben Standefer > > > On Thu, Jun 3, 2010 at 1:26 PM, Mike Subelsky wrote: >> Ben, >> >> do you just keep the commit log on the ephemeral drive? Or data and >> commit? (I was confused by your reference to XFS and snapshots -- I >> assume you keep data on the XFS drive) >> >> -Mike >> >> On Thu, Jun 3, 2010 at 2:29 PM, Ben Standefer wrote: >>> We're using Cassandra on AWS at SimpleGeo. We software RAID 0 stripe >>> the ephemeral drives to achieve better I/O and have machines in >>> multiple Availability Zones with a custom EndPointSnitch that >>> replicates the data between AZs for high availability (to be >>> open-sourced/contributed at some point). >>> >>> Using XFS as described here >>> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663 >>> also makes it very easy to snapshot your cluster to S3. >>> >>> We've had no real problems with EC2 and Cassandra, it's been great. >>> >>> -Ben Standefer >>> >>> >>> On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans wrote: On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote: > We want to try out Cassandra in the cloud. Any recommendations? > Comments? > > Should we use Amazon? Rackspace? Something else? I personally haven't used Cassandra on EC2, but others have reported significantly better disk IO, (and hence, better performance), with Rackspace's Cloud Servers. Full disclosure though, I work for Rackspace. :) -- Eric Evans eev...@rackspace.com >>> >> >> >> >> -- >> Mike Subelsky >> oib.com // ignitebaltimore.com // subelsky.com >> @subelsky // (410) 929-4022 >> > -- Mike Subelsky oib.com // ignitebaltimore.com // subelsky.com @subelsky
Re: Cassandra in the cloud
Mike, yep, there are a lot of benchmarks proving it (plus it just makes sense) http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html http://www.mysqlperformanceblog.com/2009/08/06/ec2ebs-single-and-raid-volumes-io-bencmark/ http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs/ -Ben Standefer On Thu, Jun 3, 2010 at 4:41 PM, Mike Subelsky wrote: > Ben, > > thanks for that, we may try that. I did find an AWS forum tidbit from > two years ago: > > "4 ephemeral stores striped together can give significantly higher > throughput for sequential writes than EBS." > > http://developer.amazonwebservices.com/connect/thread.jspa?messageID=125197𞤍 > > -Mike > > On Thu, Jun 3, 2010 at 5:57 PM, Ben Standefer wrote: >> The commit log and data directory are on the same mounted directory >> structure (the 2 RAID 0 striped ephemeral disks) rather than using 1 >> of the ephemeral disks for the data and 1 of the ephemeral disks for >> the data directory. While it's usually advised that for disk >> utilization reasons you keep the commit logs and data directory on >> separate disks, our RAID0 configuration gives us much more space for >> the data directory without having to mess with EBSes. We've found it >> to be fine for now. >> >> I see how my XFS snapshots reference was confusing. Our plan is to >> have a single AZ use EBSes for the data directory so that we can more >> easily snapshot our data (trusting that our AZ-aware EndPointSnitch), >> while other AZs will continue ephemeral drives. >> >> -Ben Standefer >> >> >> On Thu, Jun 3, 2010 at 1:26 PM, Mike Subelsky wrote: >>> Ben, >>> >>> do you just keep the commit log on the ephemeral drive? Or data and >>> commit? (I was confused by your reference to XFS and snapshots -- I >>> assume you keep data on the XFS drive) >>> >>> -Mike >>> >>> On Thu, Jun 3, 2010 at 2:29 PM, Ben Standefer wrote: We're using Cassandra on AWS at SimpleGeo. We software RAID 0 stripe the ephemeral drives to achieve better I/O and have machines in multiple Availability Zones with a custom EndPointSnitch that replicates the data between AZs for high availability (to be open-sourced/contributed at some point). Using XFS as described here http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663 also makes it very easy to snapshot your cluster to S3. We've had no real problems with EC2 and Cassandra, it's been great. -Ben Standefer On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans wrote: > On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote: >> We want to try out Cassandra in the cloud. Any recommendations? >> Comments? >> >> Should we use Amazon? Rackspace? Something else? > > I personally haven't used Cassandra on EC2, but others have reported > significantly better disk IO, (and hence, better performance), with > Rackspace's Cloud Servers. > > Full disclosure though, I work for Rackspace. :) > > -- > Eric Evans > eev...@rackspace.com > > >>> >>> >>> >>> -- >>> Mike Subelsky >>> oib.com // ignitebaltimore.com // subelsky.com >>> @subelsky // (410) 929-4022 >>> >> > > > > -- > Mike Subelsky > oib.com // ignitebaltimore.com // subelsky.com > @subelsky >
Re: Cassandra Cluster Setup
On 2010-06-03 13:07, Stephan Pfammatter wrote: Cassandra-or [...] cassandra-or Aside from the replication factor noted by Gary, this should point to your existing node (cassandra-ca) otherwise, how will this node know where existing node is and where to get the data from? Cassandra-az [...] cassandra-az Same here
Cassandra training Jun 18 in SF
We're back with another public Cassandra training: http://www.eventbrite.com/event/718755818 This will be Riptano's 6th training session (including the four we've done that were on-site with a specific customer), and in my humble opinion the material's really solid at this point. The eventbrite text does a pretty good job of describing what we're covering. I would only add that it focuses on the stable 0.6 series, with notes as to where things will be changing for 0.7. Also, as suggested by the outline, there is about a 2:1 ratio of "ops" material to "dev" material. We are actively working on lining up other locations. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: [***SPAM*** ] Re: question about class SlicePredicate
It's documented that get_range_slice() supports all partitioner in 0.6 Kevin ??: Olivier Mallassi ??: user@cassandra.apache.org : [***SPAM*** ] Re: question about class SlicePredicate : Tue, 1 Jun 2010 13:38:03 +0200 Does it work whatever the chosen partionner? Or only for OrderPreservingPartitionner ? On Tuesday, June 1, 2010, Eric Yu wrote: > It needs a SliceRange. For example: > SliceRange range = new SliceRange(); > range.setStart("".getBytes()); > range.setFinish("".getBytes()); > range.setReversed(true); > range.setCount(20); > > SlicePredicate sp = new SlicePredicate(); > sp.setSlice_range(range); > > client.get_slice(KEYSPACE, KEY, ColumnParent, sp, ConsistencyLevel.ONE); > 2010/6/1 Shuai Yuan > Hi all, > > I don't quite understand the usage of 'class SlicePredicate' when trying > to retrieve a ranged slice. > > How should it be initialized? > > Thanks! > -- > Kevin Yuan > www.yuan-shuai.info > > > > >
Re: problem when trying to get_range_slice()
use smaller slices and page through the data 2010/6/3 Shuai Yuan : > Hi all, > > my env > > 6 servers with about 200GB data. > > data structure, > > 64B rowkey + (5B column)*20, > rowkey and column.value are all random bytes from a-z,A-Z,0-9 > > problem > > when I tried iterate over the data in the storage, I always get > org::apache::cassandra::TimedOutException > (RpcTimeout = 2minutes) > > questions > > 1.How could I iterate over my data then? > > 2.In the code below, I gave key_start/end/count twice, one for > get_range_slice() and the other for SlicePreditor. Are they the same? > > Thanks! > > scan-data application > > CassandraFactory factory(argv[1], 9160); > tr1::shared_ptr client(factory.create()); > > Keyspace *key_space = client->getKeyspace("campaign"); > > map > mResult; > ColumnParent cp; > SlicePredicate sp; > string szStart, szEnd; > uint32_t uCount = 100; //try to retrieve 100 keys? > > szStart = > ""; > szEnd = > "1000"; > > cp.column_family = "test_col"; > > sp.__isset.slice_range = true; > sp.slice_range.start = > ""; > sp.slice_range.finish = > "1000"; > sp.slice_range.count = 100; > > try { > mResult = key_space->getRangeSlice(cp, sp, szStart, szEnd, > uCount); //by libcassandra, mainly invoking get_range_slice() > > /* then iterate the result and output */ > } > } > } catch (org::apache::cassandra::InvalidRequestException &ire) { > cout << ire.why << endl; > delete key_space; > return 1; > } > > return 0; > -- > Kevin Yuan > www.yuan-shuai.info > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Effective cache size
On Thu, Jun 3, 2010 at 10:17 AM, David King wrote: >>> So with the row cache, that first node (the primary replica) is the one >>> that has that row cached, yes? >> No, it's the closest node as determined by snitch.sortByProximity. > > And with the default snitch, rack-unaware placement, random partitioner, and > all nodes up, that's the primary replica, right? No. When all replicas have equal weight it's basically random. >> any given node X will never know whether another node Y has a row cached or >> not. the overhead for communicating that level of detail would be totally >> prohibitive. all caching does is speed the read, once a request is received >> for data local to a given node. no more, no less. > > Yes, that's my concern, but the details significantly affect the effective > size of the cache (in the afoorementioned case, the details place the > effective size at either 6 million or 18 million, a 3x difference). > > So given CL==ONE reads, only the actually read node (which will be the > primary replica given the default placement strategy and snitch) will cache > the item, right? The checksum-checking doesn't cause the row to be cached on > the non-read nodes? You have to read the data, before you can checksum it. So on the contrary, digest (checksum) vs data read has no effect on cache behavior. > If I read with CL==QUORUM in an RF==3 environment, do both read nodes them > cache the item, or only the primary replica? Both. Which is what you want, otherwise your digest reads will cause substantial unnecessary i/o on hot keys. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: What is K-table ?
Sounds like a bug in the cli. Maybe it only knows how to describe KS + CF together? Please file a bug report at https://issues.apache.org/jira/browse/CASSANDRA. On Thu, Jun 3, 2010 at 10:37 AM, yaw wrote: > Hi all, > connecting to a cluster with cassandra-cli and trying a describe command, I > obtain a "missing K_TABLE" message : > > > cassandra> describe Keyspace1 > line 1:9 missing K_TABLE at 'Keyspace1' > Keyspace1.Super1 > Column Family Type: Super > Columns Sorted By: org.apache.cassandra.db.marshal.bytest...@2bbe2893 > > Is this a real issue ? > > Many thanks, > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: What is K-table ?
Note the describe_keyspace API method does not exhibit this behavior in 0.6.2 ... seems to be a problem specific to cassandra-cli. -phil On Jun 3, 2010, at 10:18 PM, Jonathan Ellis wrote: > Sounds like a bug in the cli. Maybe it only knows how to describe KS > + CF together? > > Please file a bug report at https://issues.apache.org/jira/browse/CASSANDRA. > > On Thu, Jun 3, 2010 at 10:37 AM, yaw wrote: >> Hi all, >> connecting to a cluster with cassandra-cli and trying a describe command, I >> obtain a "missing K_TABLE" message : >> >> >> cassandra> describe Keyspace1 >> line 1:9 missing K_TABLE at 'Keyspace1' >> Keyspace1.Super1 >> Column Family Type: Super >> Columns Sorted By: org.apache.cassandra.db.marshal.bytest...@2bbe2893 >> >> Is this a real issue ? >> >> Many thanks, >> >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com
Re: MessageDeserializationTask backlog crash
having the write or read stage fill up, will cause as a secondary effect deserialization to fill up moral: when you start getting timeout exceptions, have your clients sleep for 100ms or otherwise back off (or maybe you just need to add capacity) On Thu, Jun 3, 2010 at 10:16 AM, Daniel Kluesing wrote: > I’ve had a few nodes crash (Out of heap), and when I pull the heap dump, > there are hundreds of thousands of MessageDeserializationTasks in the thread > pool executor, using up GB of the heap. I’m running 0.6.2 on sun jvm u20 and > the nodes are under heavy load. Has anyone else run into this? I haven’t > deeply understood the deserilization pool, but just based on the name, it > seems like it should be super fast. What could make it back up with hundreds > of thousands of messages? > > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: [***SPAM*** ] Re: problem when trying to get_range_slice()
Thanks for the hint. I found out it was "too many opened files" error and the server side just lost response to the get_range_slice() request by throwing out an exception. Now works with "ulimit -n 32768". Kevin ??: Jonathan Ellis ??: user@cassandra.apache.org : [***SPAM*** ] Re: problem when trying to get_range_slice() : Thu, 3 Jun 2010 19:07:28 -0700 use smaller slices and page through the data 2010/6/3 Shuai Yuan : > Hi all, > > my env > > 6 servers with about 200GB data. > > data structure, > > 64B rowkey + (5B column)*20, > rowkey and column.value are all random bytes from a-z,A-Z,0-9 > > problem > > when I tried iterate over the data in the storage, I always get > org::apache::cassandra::TimedOutException > (RpcTimeout = 2minutes) > > questions > > 1.How could I iterate over my data then? > > 2.In the code below, I gave key_start/end/count twice, one for > get_range_slice() and the other for SlicePreditor. Are they the same? > > Thanks! > > scan-data application > >CassandraFactory factory(argv[1], 9160); >tr1::shared_ptr client(factory.create()); > >Keyspace *key_space = client->getKeyspace("campaign"); > >map > mResult; >ColumnParent cp; >SlicePredicate sp; >string szStart, szEnd; >uint32_t uCount = 100; //try to retrieve 100 keys? > >szStart = > ""; >szEnd = > "1000"; > >cp.column_family = "test_col"; > >sp.__isset.slice_range = true; >sp.slice_range.start = > ""; >sp.slice_range.finish = > "1000"; >sp.slice_range.count = 100; > >try { >mResult = key_space->getRangeSlice(cp, sp, szStart, szEnd, > uCount); //by libcassandra, mainly invoking get_range_slice() > >/* then iterate the result and output */ >} >} >} catch (org::apache::cassandra::InvalidRequestException &ire) { >cout << ire.why << endl; >delete key_space; >return 1; >} > >return 0; > -- > Kevin Yuan > www.yuan-shuai.info > > >
Re: Cassandra Cluster Setup
http://wiki.apache.org/cassandra/MultinodeCluster On Thu, Jun 3, 2010 at 1:07 PM, Stephan Pfammatter wrote: > I’m having difficulties setting up a 3 way cassandra cluster. Any > comments/help would be appreciated. > > > > My goal is that all data should be fully replicated amongst the 3 nodes. I > want to simulate the failure of one node and proof that the test column > family still can be accessed. > > In a nutshell I did the following 3 steps. The excerpt below shows the > changed storage-con.xml: > > (1) I started Cassandra-ca and worked fine. I added 3 rows to a cf > cf_test. > > (2) Then I started Cassandra-or. It comes up ok but goes into “sleeping > 3”…..mode > > (3) Nodetool streams doesn’t reveal anything. > > > > > > > > Node: Cassandra-ca > > false > > > > > > > org.apache.cassandra.locator.RackUnawareStrategy > > 1 > > > org.apache.cassandra.locator.EndPointSnitch > > > > cassandra-ca > > cassandra-us > > 9160 > > cassandra-us > > > > Cassandra-or > > true > > > > > > > org.apache.cassandra.locator.RackUnawareStrategy > > 1 > > > org.apache.cassandra.locator.EndPointSnitch > > > > cassandra-or > > cassandra-or > > 9160 > > cassandra-or > > > > Cassandra-az > > > > Cassandra-or > > true > > > > > > > org.apache.cassandra.locator.RackUnawareStrategy > > 1 > > > org.apache.cassandra.locator.EndPointSnitch > > > > cassandra-az > > cassandra-az > > 9160 > > cassandra-az > > > > > >
High CPU Usage since 0.6.2
I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to cassandra to 0.6.2 yesterday. But today I find six cassandra nodes have high CPU usage more than 400% in my 8-core CPU sever. The worst one is more than 760%. It is very serious. I use jvisualvm to watch the worst node, and I found that there are many running threads named "thread-xxx" the status of other threads is waiting and sleeping. "Thread-130" - Thread t...@240 java.lang.Thread.State: RUNNABLE at sun.misc.Unsafe.setMemory(Native Method) at sun.nio.ch.Util.erase(Util.java:202) at sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) Locked ownable synchronizers: - None "Thread-126" - Thread t...@236 java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:200) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked java.lang.obj...@10808561 at sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) Locked ownable synchronizers: - None "Thread-119" - Thread t...@229 java.lang.Thread.State: RUNNABLE at sun.nio.ch.NativeThread.current(Native Method) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:182) - locked java.lang.obj...@65b4abbd - locked java.lang.obj...@38773975 at sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) Locked ownable synchronizers: - None
Re: High CPU Usage since 0.6.2
We're seeing this as well. We were testing with a 40+ node cluster on the latest 0.6 branch from few days ago. -Chris On Jun 3, 2010, at 9:55 PM, Lu Ming wrote: > > I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to > cassandra to 0.6.2 yesterday. > But today I find six cassandra nodes have high CPU usage more than 400% in my > 8-core CPU sever. > The worst one is more than 760%. It is very serious. > > I use jvisualvm to watch the worst node, and I found that there are many > running threads named "thread-xxx" > the status of other threads is waiting and sleeping. > > "Thread-130" - Thread t...@240 > java.lang.Thread.State: RUNNABLE > at sun.misc.Unsafe.setMemory(Native Method) > at sun.nio.ch.Util.erase(Util.java:202) > at > sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) > at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > Locked ownable synchronizers: > - None > > "Thread-126" - Thread t...@236 > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcher.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) > at sun.nio.ch.IOUtil.read(IOUtil.java:200) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) > - locked java.lang.obj...@10808561 > at > sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565) > at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > Locked ownable synchronizers: > - None > > "Thread-119" - Thread t...@229 > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.NativeThread.current(Native Method) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:182) > - locked java.lang.obj...@65b4abbd > - locked java.lang.obj...@38773975 > at > sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565) > at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > Locked ownable synchronizers: > - None > >
High read latency
we have a SupperCF which may have up to 1000 supper columns and 5 clumns for each supper column, the read latency may go up to 50ms (even higher), I think it's a long time to response, how to tune the storage config to optimize the performace? I read the wiki, may help to do this, supose that by asign a big value to this( 2 ex. ), no row and reach this limit so it never generate a index for a row. In our production scenario, we only access 1 row at a time and with up to 1000 columns slice retuned. Any suggestion?