Predicate Indexes

2010-06-03 Thread David Boxenhorn
So I've been thinking about the problem of how to do range queries on keys
with random partitioning. I'm new to Cassandra, and I don't know what the
plans are, but I have an idea and I thought I'd just put it out there:
Predicate Indexes.

I would like to be able to define predicate indexes in Cassandra, something
like this:






At each node, Cassandra would maintain indexes for every key that matches
the predicate that each index defines. Within each index, keys would be
ordered by the order implied by Random Partitioner.

A new attribute should be added to KeyRange: Name - i.e. setName(String
name), getName(), etc.

When we loop through the keys, we would pass the last key in as the start
key, until we finish, as we do now. The results would not be ordered, but we
would have very quick access to the entire range implied by the predicate.

I very much want something like this. I am willing to pay the price in disk
space.

Yes, I know that something like this can be approximated by super columns.
But supercolumns have well-known problems, primarily practical limitations
on the size of supercolumns, secondarily the increased number of round-trips
that working with supercolumns necessitates, and tertiarily the management
costs of maintaining the supercolumns by hand.


problem when trying to get_range_slice()

2010-06-03 Thread Shuai Yuan
Hi all,

my env

6 servers with about 200GB data.

data structure,

64B rowkey + (5B column)*20, 
rowkey and column.value are all random bytes from a-z,A-Z,0-9

problem

when I tried iterate over the data in the storage, I always get
org::apache::cassandra::TimedOutException
(RpcTimeout = 2minutes)

questions

1.How could I iterate over my data then?

2.In the code below, I gave key_start/end/count twice, one for
get_range_slice() and the other for SlicePreditor. Are they the same?

Thanks!

scan-data application

CassandraFactory factory(argv[1], 9160);
tr1::shared_ptr client(factory.create());

Keyspace *key_space = client->getKeyspace("campaign");

map > mResult;
ColumnParent cp;
SlicePredicate sp;
string szStart, szEnd;
uint32_t uCount = 100; //try to retrieve 100 keys?

szStart =
"";
szEnd =
"1000";

cp.column_family = "test_col";

sp.__isset.slice_range = true;
sp.slice_range.start =
"";
sp.slice_range.finish =
"1000";
sp.slice_range.count = 100;

try {
mResult = key_space->getRangeSlice(cp, sp, szStart, szEnd,
uCount); //by libcassandra, mainly invoking get_range_slice()

/* then iterate the result and output */
}
}
} catch (org::apache::cassandra::InvalidRequestException &ire) {
cout << ire.why << endl;
delete key_space;
return 1;
}

return 0;
-- 
Kevin Yuan
www.yuan-shuai.info




Re: [***SPAM*** ] problem when trying to get_range_slice()

2010-06-03 Thread Shuai Yuan
more info:

CL = ONE,

replica = 2,

and when I tried to monitor the disk_io with iostat I get almost 0MB/s
read & 0% CPU on the machine the scan-data app started on.

Thanks!

  
??: Shuai Yuan 
??: user@cassandra.apache.org
: [***SPAM*** ] problem when trying to get_range_slice()
: Thu, 03 Jun 2010 15:57:14 +0800

Hi all,

my env

6 servers with about 200GB data.

data structure,

64B rowkey + (5B column)*20, 
rowkey and column.value are all random bytes from a-z,A-Z,0-9

problem

when I tried iterate over the data in the storage, I always get
org::apache::cassandra::TimedOutException
(RpcTimeout = 2minutes)

questions

1.How could I iterate over my data then?

2.In the code below, I gave key_start/end/count twice, one for
get_range_slice() and the other for SlicePreditor. Are they the same?

Thanks!

scan-data application

CassandraFactory factory(argv[1], 9160);
tr1::shared_ptr client(factory.create());

Keyspace *key_space = client->getKeyspace("campaign");

map > mResult;
ColumnParent cp;
SlicePredicate sp;
string szStart, szEnd;
uint32_t uCount = 100; //try to retrieve 100 keys?

szStart =
"";
szEnd =
"1000";

cp.column_family = "test_col";

sp.__isset.slice_range = true;
sp.slice_range.start =
"";
sp.slice_range.finish =
"1000";
sp.slice_range.count = 100;

try {
mResult = key_space->getRangeSlice(cp, sp, szStart, szEnd,
uCount); //by libcassandra, mainly invoking get_range_slice()

/* then iterate the result and output */
}
}
} catch (org::apache::cassandra::InvalidRequestException &ire) {
cout << ire.why << endl;
delete key_space;
return 1;
}

return 0;

-- 
Kevin Yuan
www.yuan-shuai.info




Re: Error during startup

2010-06-03 Thread David Boxenhorn
We didn't change partitioners.

Maybe we did some other stupid thing, but not that one.


On Wed, Jun 2, 2010 at 8:52 PM, Gary Dusbabek  wrote:

> I was able to reproduce the error by staring up a node using
> RandomPartioner, kill it, switch to OrderPreservingPartitioner,
> restart, kill, switch back to RandomPartitioner, BANG!
>
> So it looks like you tinkered with the partitioner at some point.
> This has the unfortunate effect of corrupting your system table.  I'm
> trying to figure out a way to detect this and abort before data is
> overwritten.
>
> Gary.
>
>
> On Sun, May 30, 2010 at 06:49, David Boxenhorn  wrote:
> > I deleted the system/LocationInfo files, and now everything works.
> >
> > Yay! (...what happened?)
> >
> > On Sun, May 30, 2010 at 4:18 PM, David Boxenhorn 
> wrote:
> >>
> >> I'm getting an "Expected both token and generation columns; found
> >> ColumnFamily" error during startup can anyone tell me what it is?
> Details
> >> below.
> >>
> >> Starting Cassandra Server
> >> Listening for transport dt_socket at address: 
> >>  INFO 16:14:33,459 Auto DiskAccessMode determined to be standard
> >>  INFO 16:14:33,615 Sampling index for
> >> C:\var\lib\cassandra\data\system\LocationInfo-1-Data.db
> >>  INFO 16:14:33,631 Removing orphan
> >> C:\var\lib\cassandra\data\Lookin2\Users-tmp-27-Index.db
> >>  INFO 16:14:33,631 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db
> >>  INFO 16:14:33,662 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Users-18-Data.db
> >>  INFO 16:14:33,818 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db
> >>  INFO 16:14:33,850 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db
> >>  INFO 16:14:33,865 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db
> >>  INFO 16:14:33,881 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-580-Data.db
> >>  INFO 16:14:33,896 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-672-Data.db
> >>  INFO 16:14:33,912 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-681-Data.db
> >>  INFO 16:14:33,912 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-691-Data.db
> >>  INFO 16:14:33,928 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-696-Data.db
> >>  INFO 16:14:33,943 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Attractions-17-Data.db
> >>  INFO 16:14:34,006 Sampling index for
> >>
> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-5-Data.db
> >>  INFO 16:14:34,006 Sampling index for
> >>
> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-6-Data.db
> >>  INFO 16:14:34,021 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-29-Data.db
> >>  INFO 16:14:34,350 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-51-Data.db
> >>  INFO 16:14:34,693 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-72-Data.db
> >>  INFO 16:14:35,021 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-77-Data.db
> >>  INFO 16:14:35,225 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-78-Data.db
> >>  INFO 16:14:35,350 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-79-Data.db
> >>  INFO 16:14:35,459 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-80-Data.db
> >>  INFO 16:14:35,459 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Taxonomy-1-Data.db
> >>  INFO 16:14:35,475 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Taxonomy-2-Data.db
> >>  INFO 16:14:35,475 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Content-30-Data.db
> >>  INFO 16:14:35,631 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Content-35-Data.db
> >>  INFO 16:14:35,771 Sampling index for
> >> C:\var\lib\cassandra\data\Lookin2\Content-40-Data.db
> >>  INFO 16:14:35,959 Compacting
> >>
> [org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db')]
> >> ERROR 16:14:35,975 Exception encountered during startup.
> >> java.lang.RuntimeException: Expected both token and generation columns;
> >> found ColumnFamily(LocationInfo [Generation:false:4...@4,])
> >> at
> >> org.apache.cassandra.db.SystemTable.initMetadata(SystemTable.java:159)
> >> at
> >>
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:305)
> >> at
> >>
> org.apache.cassandra.thrift.CassandraDaemon.setup(Cassand

Cassandra in the cloud

2010-06-03 Thread David Boxenhorn
We want to try out Cassandra in the cloud. Any recommendations? Comments?

Should we use Amazon? Rackspace? Something else?


Re: Nodes dropping out of cluster due to GC

2010-06-03 Thread Peter Schüller
> We did indeed have a problem with our GC settings.  The survivor ratio was
> too low.  After changing that things are better but we are still seeing GC
> that takes 5-10 seconds, which is enough for the node to drop out of the
> cluster briefly.

This still indicates full GC:s. What is your write activity like? Do
you know if you're legitimately growing the heap quickly enough that
the concurrent marking in CMS is unable to catch up? What is the free
heap ratio (according to the logs produced with
-XX:+PrintGC/-XX:+PrintGCDetails) after a concurrent mark-sweep has
finished?

If the heap is very full even after a mark/sweep you likely need a
bigger heep or smaller caches sizes/memtables flush thresholds etc.

On the other hand if you have very significant amounts of free space
in the heap after a mark/sweep, the problem may rather be that CMS is
just kicking in too late. If so you can experiment with the
-XX:+UseCMSInitiatingOccupancyOnly and
-XX:CMSInitiatingOccupancyFraction=XXX options. If you're willing to
temporarily accept that CMS is continuously running (due to an
aggressive initiating occupancy fraction) that should at least tell
you whether you can in fact avoid the fallbacks and if so, then look
at more proper tuning...

-- 
/ Peter Schuller aka scode


Re: Giant sets of ordered data

2010-06-03 Thread yoshiyuki kanno
Hi

I think In this case (logging hard traffic) both of two idea can't scale
write operation in current Cassandra.
So wait for secondary index support.

2010/6/3 Jonathan Shook 

> Insert "if you want to use long values for keys and column names"
> above paragraph 2. I forgot that part.
>
> On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook  wrote:
> > If you want to do range queries on the keys, you can use OPP to do this:
> > (example using UTF-8 lexicographic keys, with bursts split across rows
> > according to row size limits)
> >
> > Events: {
> >  "20100601.05.30.003": {
> >"20100601.05.30.003": 
> >"20100601.05.30.007": 
> >...
> >  }
> > }
> >
> > With a future version of Cassandra, you may be able to use the same
> > basic datatype for both key and column name, as keys will be binary
> > like the rest, I believe.
> >
> > I'm not aware of specific performance improvements when using OPP
> > range queries on keys vs iterating over known keys. I suspect (hope)
> > that round-tripping to the server should be reduced, which may be
> > significant. Does anybody have decent benchmarks that tell the
> > difference?
> >
> >
> > On Wed, Jun 2, 2010 at 11:53 AM, Ben Browning  wrote:
> >> With a traffic pattern like that, you may be better off storing the
> >> events of each burst (I'll call them group) in one or more keys and
> >> then storing these keys in the day key.
> >>
> >> EventGroupsPerDay: {
> >>  "20100601": {
> >>123456789: "group123", // column name is timestamp group was
> >> received, column value is key
> >>123456790: "group124"
> >>  }
> >> }
> >>
> >> EventGroups: {
> >>  "group123": {
> >>123456789: "value1",
> >>123456799: "value2"
> >>   }
> >> }
> >>
> >> If you think of Cassandra as a toolkit for building scalable indexes
> >> it seems to make the modeling a bit easier. In this case, you're
> >> building an index by day to lookup events that come in as groups. So,
> >> first you'd fetch the slice of columns for the day you're interested
> >> in to figure out which groups to look at then you'd fetch the events
> >> in those groups.
> >>
> >> There are plenty of alternate ways to divide up the data among rows
> >> also - you could use hour keys instead of days as an example.
> >>
> >> On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn 
> wrote:
> >>> Let's say you're logging events, and you have billions of events. What
> if
> >>> the events come in bursts, so within a day there are millions of
> events, but
> >>> they all come within microseconds of each other a few times a day? How
> do
> >>> you find the events that happened on a particular day if you can't
> store
> >>> them all in one row?
> >>>
> >>> On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook 
> wrote:
> 
>  Either OPP by key, or within a row by column name. I'd suggest the
> latter.
>  If you have structured data to stick under a column (named by the
>  timestamp), then you can serialize and unserialize it yourself, or you
>  can use a supercolumn. It's effectively the same thing.  Cassandra
>  only provides the super column support as a convenience layer as it is
>  currently implemented. That may change in the future.
> 
>  You didn't make clear in your question why a standard column would be
>  less suitable. I presumed you had layered structure within the
>  timestamp, hence my response.
>  How would you logically partition your dataset according to natural
>  application boundaries? This will answer most of your question.
>  If you have a dataset which can't be partitioned into a reasonable
>  size row, then you may want to use OPP and key concatenation.
> 
>  What do you mean by giant?
> 
>  On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn 
>  wrote:
>  > How do I handle giant sets of ordered data, e.g. by timestamps,
> which I
>  > want
>  > to access by range?
>  >
>  > I can't put all the data into a supercolumn, because it's loaded
> into
>  > memory
>  > at once, and it's too much data.
>  >
>  > Am I forced to use an order-preserving partitioner? I don't want the
>  > headache. Is there any other way?
>  >
> >>>
> >>>
> >>
> >
>


Re: Cassandra in the cloud

2010-06-03 Thread David King
> We want to try out Cassandra in the cloud. Any recommendations? Comments?
> Should we use Amazon? Rackspace? Something else? 

I'm using it on Amazon with mostly success. I'd recommend increasing Phi from 8 
to 10, use the 4-core/15gb instances to start, and if you plan to be really 
heavy on reads to use software RAID of striped EBS volumes instead of just the 
straight-up EBS volumes.

MessageDeserializationTask backlog crash

2010-06-03 Thread Daniel Kluesing
I've had a few nodes crash (Out of heap), and when I pull the heap dump, there 
are hundreds of thousands of MessageDeserializationTasks in the thread pool 
executor, using up GB of the heap. I'm running 0.6.2 on sun jvm u20 and the 
nodes are under heavy load. Has anyone else run into this? I haven't deeply 
understood the deserilization pool, but just based on the name, it seems like 
it should be super fast. What could make it back up with hundreds of thousands 
of messages?




Re: Effective cache size

2010-06-03 Thread David King
>> So with the row cache, that first node (the primary replica) is the one that 
>> has that row cached, yes?
> No, it's the closest node as determined by snitch.sortByProximity.

And with the default snitch, rack-unaware placement, random partitioner, and 
all nodes up, that's the primary replica, right?

> any given node X will never know whether another node Y has a row cached or 
> not.  the overhead for communicating that level of detail would be totally 
> prohibitive. all caching does is speed the read, once a request is received 
> for data local to a given node.  no more, no less.

Yes, that's my concern, but the details significantly affect the effective size 
of the cache (in the afoorementioned case, the details place the effective size 
at either 6 million or 18 million, a 3x difference).

So given CL==ONE reads, only the actually read node (which will be the primary 
replica given the default placement strategy and snitch) will cache the item, 
right? The checksum-checking doesn't cause the row to be cached on the non-read 
nodes?

If I read with CL==QUORUM in an RF==3 environment, do both read nodes them 
cache the item, or only the primary replica?

OutOfMemoryError

2010-06-03 Thread Lev Stesin
Hi,

I am getting OOM during load tests:

java.lang.OutOfMemoryError: Java heap space
at java.util.HashSet.(HashSet.java:125)
at 
com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:181)
at 
com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:112)
at 
com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:47)
at 
com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:155)
at 
com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:207)
at 
com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
at 
com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:80)
at com.google.common.collect.HashMultimap.put(HashMultimap.java:47)
at 
org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:87)
at 
org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
at 
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:451)
at 
org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:361)
at 
org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:1484)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1125)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Any suggestions as to how to fix/ease the problem? Thanks.

-- 
Lev


What is K-table ?

2010-06-03 Thread yaw
Hi all,
connecting to a cluster with cassandra-cli and trying a describe command,  I
obtain a "missing K_TABLE" message :


cassandra> describe Keyspace1
line 1:9 missing K_TABLE at 'Keyspace1'
Keyspace1.Super1
Column Family Type: Super
Columns Sorted By: org.apache.cassandra.db.marshal.bytest...@2bbe2893

Is this a real issue ?

Many thanks,


Re: OutOfMemoryError

2010-06-03 Thread Gary Dusbabek
Are you running "ant test"?  It defaults to setting memory to 1G.  If
you're running them outside of ant, you'll need to set max memory
manually.

Gary.

On Thu, Jun 3, 2010 at 10:35, Lev Stesin  wrote:
> Hi,
>
> I am getting OOM during load tests:
>
> java.lang.OutOfMemoryError: Java heap space
>        at java.util.HashSet.(HashSet.java:125)
>        at 
> com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:181)
>        at 
> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:112)
>        at 
> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:47)
>        at 
> com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:155)
>        at 
> com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:207)
>        at 
> com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
>        at 
> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:80)
>        at com.google.common.collect.HashMultimap.put(HashMultimap.java:47)
>        at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:87)
>        at 
> org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
>        at 
> org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:451)
>        at 
> org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:361)
>        at 
> org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:1484)
>        at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1125)
>        at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
> Any suggestions as to how to fix/ease the problem? Thanks.
>
> --
> Lev
>


Re: Cassandra in the cloud

2010-06-03 Thread Eric Evans
On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote:
> We want to try out Cassandra in the cloud. Any recommendations?
> Comments?
> 
> Should we use Amazon? Rackspace? Something else? 

I personally haven't used Cassandra on EC2, but others have reported
significantly better disk IO, (and hence, better performance), with
Rackspace's Cloud Servers.

Full disclosure though, I work for Rackspace. :)

-- 
Eric Evans
eev...@rackspace.com



Re: OutOfMemoryError

2010-06-03 Thread Lev Stesin
Gary,

Is there a directive to set it? Or should I modify the cassandra
script itself? Thanks.

Lev.

On Thu, Jun 3, 2010 at 10:48 AM, Gary Dusbabek  wrote:
> Are you running "ant test"?  It defaults to setting memory to 1G.  If
> you're running them outside of ant, you'll need to set max memory
> manually.
>
> Gary.
>
> On Thu, Jun 3, 2010 at 10:35, Lev Stesin  wrote:
>> Hi,
>>
>> I am getting OOM during load tests:
>>
>> java.lang.OutOfMemoryError: Java heap space
>>        at java.util.HashSet.(HashSet.java:125)
>>        at 
>> com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:181)
>>        at 
>> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:112)
>>        at 
>> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:47)
>>        at 
>> com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:155)
>>        at 
>> com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:207)
>>        at 
>> com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
>>        at 
>> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:80)
>>        at com.google.common.collect.HashMultimap.put(HashMultimap.java:47)
>>        at 
>> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:87)
>>        at 
>> org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
>>        at 
>> org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:451)
>>        at 
>> org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:361)
>>        at 
>> org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:1484)
>>        at 
>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1125)
>>        at 
>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>        at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>        at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> Any suggestions as to how to fix/ease the problem? Thanks.
>>
>> --
>> Lev
>>
>



-- 
Lev


Re: Cassandra in the cloud

2010-06-03 Thread Ben Standefer
We're using Cassandra on AWS at SimpleGeo.  We software RAID 0 stripe
the ephemeral drives to achieve better I/O and have machines in
multiple Availability Zones with a custom EndPointSnitch that
replicates the data between AZs for high availability (to be
open-sourced/contributed at some point).

Using XFS as described here
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663
also makes it very easy to snapshot your cluster to S3.

We've had no real problems with EC2 and Cassandra, it's been great.

-Ben Standefer


On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans  wrote:
> On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote:
>> We want to try out Cassandra in the cloud. Any recommendations?
>> Comments?
>>
>> Should we use Amazon? Rackspace? Something else?
>
> I personally haven't used Cassandra on EC2, but others have reported
> significantly better disk IO, (and hence, better performance), with
> Rackspace's Cloud Servers.
>
> Full disclosure though, I work for Rackspace. :)
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: Cassandra in the cloud

2010-06-03 Thread Mike Subelsky
Ben,

do you just keep the commit log on the ephemeral drive?  Or data and
commit? (I was confused by your reference to XFS and snapshots -- I
assume you keep data on the XFS drive)

-Mike

On Thu, Jun 3, 2010 at 2:29 PM, Ben Standefer  wrote:
> We're using Cassandra on AWS at SimpleGeo.  We software RAID 0 stripe
> the ephemeral drives to achieve better I/O and have machines in
> multiple Availability Zones with a custom EndPointSnitch that
> replicates the data between AZs for high availability (to be
> open-sourced/contributed at some point).
>
> Using XFS as described here
> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663
> also makes it very easy to snapshot your cluster to S3.
>
> We've had no real problems with EC2 and Cassandra, it's been great.
>
> -Ben Standefer
>
>
> On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans  wrote:
>> On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote:
>>> We want to try out Cassandra in the cloud. Any recommendations?
>>> Comments?
>>>
>>> Should we use Amazon? Rackspace? Something else?
>>
>> I personally haven't used Cassandra on EC2, but others have reported
>> significantly better disk IO, (and hence, better performance), with
>> Rackspace's Cloud Servers.
>>
>> Full disclosure though, I work for Rackspace. :)
>>
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>>
>



-- 
Mike Subelsky
oib.com // ignitebaltimore.com // subelsky.com
@subelsky // (410) 929-4022


Cassandra Cluster Setup

2010-06-03 Thread Stephan Pfammatter
I'm having difficulties setting up a 3 way cassandra cluster. Any comments/help 
would be appreciated.

My goal is that all data should be fully replicated amongst the 3 nodes. I want 
to simulate the failure of one node and proof that the test column family still 
can be accessed.
In a nutshell I did the following 3 steps. The excerpt below shows the changed 
storage-con.xml:

(1)I started Cassandra-ca and worked fine. I added 3 rows to a cf cf_test.

(2)Then I started Cassandra-or. It comes up ok but goes into "sleeping 
3".mode

(3)Nodetool streams doesn't reveal anything.



Node: Cassandra-ca
false



org.apache.cassandra.locator.RackUnawareStrategy
1

org.apache.cassandra.locator.EndPointSnitch

  cassandra-ca
  cassandra-us
 9160
cassandra-us

Cassandra-or
true



org.apache.cassandra.locator.RackUnawareStrategy
1

org.apache.cassandra.locator.EndPointSnitch

  cassandra-or
  cassandra-or
 9160
cassandra-or

Cassandra-az

Cassandra-or
true



org.apache.cassandra.locator.RackUnawareStrategy
1

org.apache.cassandra.locator.EndPointSnitch

  cassandra-az
  cassandra-az
 9160
cassandra-az





Re: OutOfMemoryError

2010-06-03 Thread Gary Dusbabek
It's set in the build file:



But I'm not sure if you're using the build file or not.  It kind of
sounds like you are not.

Gary.


On Thu, Jun 3, 2010 at 11:24, Lev Stesin  wrote:
> Gary,
>
> Is there a directive to set it? Or should I modify the cassandra
> script itself? Thanks.
>
> Lev.
>
> On Thu, Jun 3, 2010 at 10:48 AM, Gary Dusbabek  wrote:
>> Are you running "ant test"?  It defaults to setting memory to 1G.  If
>> you're running them outside of ant, you'll need to set max memory
>> manually.
>>
>> Gary.
>>
>> On Thu, Jun 3, 2010 at 10:35, Lev Stesin  wrote:
>>> Hi,
>>>
>>> I am getting OOM during load tests:
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>>        at java.util.HashSet.(HashSet.java:125)
>>>        at 
>>> com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:181)
>>>        at 
>>> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:112)
>>>        at 
>>> com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:47)
>>>        at 
>>> com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:155)
>>>        at 
>>> com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:207)
>>>        at 
>>> com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
>>>        at 
>>> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:80)
>>>        at com.google.common.collect.HashMultimap.put(HashMultimap.java:47)
>>>        at 
>>> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:87)
>>>        at 
>>> org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
>>>        at 
>>> org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:451)
>>>        at 
>>> org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:361)
>>>        at 
>>> org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:1484)
>>>        at 
>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1125)
>>>        at 
>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>        at java.lang.Thread.run(Thread.java:619)
>>>
>>> Any suggestions as to how to fix/ease the problem? Thanks.
>>>
>>> --
>>> Lev
>>>
>>
>
>
>
> --
> Lev
>


Re: Cassandra Cluster Setup

2010-06-03 Thread Gary Dusbabek
Your replication factor is only set to 1, which means that each key
will only live on a single node.  If you do wait for bootstrapping to
commence (takes 90s in trunk, I don't recall in 0.6), you should see
some keys moving unless your inserts were all into a small range.
Perhaps your being impatient.

If not, I recommend that you start over, set the replication factor to
3, wait a good while for all the nodes to fully join the cluster, and
then make sure the keys you write to are random.

Gary.


On Thu, Jun 3, 2010 at 13:07, Stephan Pfammatter
 wrote:
> I’m having difficulties setting up a 3 way cassandra cluster. Any
> comments/help would be appreciated.
>
>
>
> My goal is that all data should be fully replicated amongst the 3 nodes. I
> want to simulate the failure of one node and proof that the test column
> family still can be accessed.
>
> In a nutshell I did the following 3 steps. The excerpt below shows the
> changed storage-con.xml:
>
> (1)    I started Cassandra-ca and worked fine. I added 3 rows to a cf
> cf_test.
>
> (2)    Then I started Cassandra-or. It comes up ok but goes into “sleeping
> 3”…..mode
>
> (3)    Nodetool streams doesn’t reveal anything.
>
>
>
>
>
>
>
> Node: Cassandra-ca
>
> false
>
> 
>
>     
>
>
> org.apache.cassandra.locator.RackUnawareStrategy
>
>     1
>
>
> org.apache.cassandra.locator.EndPointSnitch
>
> 
>
>   cassandra-ca
>
>   cassandra-us
>
>  9160
>
> cassandra-us
>
>
>
> Cassandra-or
>
> true
>
> 
>
>     
>
>
> org.apache.cassandra.locator.RackUnawareStrategy
>
>     1
>
>
> org.apache.cassandra.locator.EndPointSnitch
>
> 
>
>   cassandra-or
>
>   cassandra-or
>
>  9160
>
> cassandra-or
>
>
>
> Cassandra-az
>
>
>
> Cassandra-or
>
> true
>
> 
>
>     
>
>
> org.apache.cassandra.locator.RackUnawareStrategy
>
>     1
>
>
> org.apache.cassandra.locator.EndPointSnitch
>
> 
>
>   cassandra-az
>
>   cassandra-az
>
>  9160
>
> cassandra-az
>
>
>
>
>
>


Re: Cassandra in the cloud

2010-06-03 Thread Ben Standefer
The commit log and data directory are on the same mounted directory
structure (the 2 RAID 0 striped ephemeral disks) rather than using 1
of the ephemeral disks for the data and 1 of the ephemeral disks for
the data directory.  While it's usually advised that for disk
utilization reasons you keep the commit logs and data directory on
separate disks, our RAID0 configuration gives us much more space for
the data directory without having to mess with EBSes.  We've found it
to be fine for now.

I see how my XFS snapshots reference was confusing.  Our plan is to
have a single AZ use EBSes for the data directory so that we can more
easily snapshot our data (trusting that our AZ-aware EndPointSnitch),
while other AZs will continue ephemeral drives.

-Ben Standefer


On Thu, Jun 3, 2010 at 1:26 PM, Mike Subelsky  wrote:
> Ben,
>
> do you just keep the commit log on the ephemeral drive?  Or data and
> commit? (I was confused by your reference to XFS and snapshots -- I
> assume you keep data on the XFS drive)
>
> -Mike
>
> On Thu, Jun 3, 2010 at 2:29 PM, Ben Standefer  wrote:
>> We're using Cassandra on AWS at SimpleGeo.  We software RAID 0 stripe
>> the ephemeral drives to achieve better I/O and have machines in
>> multiple Availability Zones with a custom EndPointSnitch that
>> replicates the data between AZs for high availability (to be
>> open-sourced/contributed at some point).
>>
>> Using XFS as described here
>> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663
>> also makes it very easy to snapshot your cluster to S3.
>>
>> We've had no real problems with EC2 and Cassandra, it's been great.
>>
>> -Ben Standefer
>>
>>
>> On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans  wrote:
>>> On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote:
 We want to try out Cassandra in the cloud. Any recommendations?
 Comments?

 Should we use Amazon? Rackspace? Something else?
>>>
>>> I personally haven't used Cassandra on EC2, but others have reported
>>> significantly better disk IO, (and hence, better performance), with
>>> Rackspace's Cloud Servers.
>>>
>>> Full disclosure though, I work for Rackspace. :)
>>>
>>> --
>>> Eric Evans
>>> eev...@rackspace.com
>>>
>>>
>>
>
>
>
> --
> Mike Subelsky
> oib.com // ignitebaltimore.com // subelsky.com
> @subelsky // (410) 929-4022
>


Re: Cassandra in the cloud

2010-06-03 Thread Mike Subelsky
Ben,

thanks for that, we may try that.  I did find an AWS forum tidbit from
two years ago:

"4 ephemeral stores striped together can give significantly higher
throughput for sequential writes than EBS."

http://developer.amazonwebservices.com/connect/thread.jspa?messageID=125197𞤍

-Mike

On Thu, Jun 3, 2010 at 5:57 PM, Ben Standefer  wrote:
> The commit log and data directory are on the same mounted directory
> structure (the 2 RAID 0 striped ephemeral disks) rather than using 1
> of the ephemeral disks for the data and 1 of the ephemeral disks for
> the data directory.  While it's usually advised that for disk
> utilization reasons you keep the commit logs and data directory on
> separate disks, our RAID0 configuration gives us much more space for
> the data directory without having to mess with EBSes.  We've found it
> to be fine for now.
>
> I see how my XFS snapshots reference was confusing.  Our plan is to
> have a single AZ use EBSes for the data directory so that we can more
> easily snapshot our data (trusting that our AZ-aware EndPointSnitch),
> while other AZs will continue ephemeral drives.
>
> -Ben Standefer
>
>
> On Thu, Jun 3, 2010 at 1:26 PM, Mike Subelsky  wrote:
>> Ben,
>>
>> do you just keep the commit log on the ephemeral drive?  Or data and
>> commit? (I was confused by your reference to XFS and snapshots -- I
>> assume you keep data on the XFS drive)
>>
>> -Mike
>>
>> On Thu, Jun 3, 2010 at 2:29 PM, Ben Standefer  wrote:
>>> We're using Cassandra on AWS at SimpleGeo.  We software RAID 0 stripe
>>> the ephemeral drives to achieve better I/O and have machines in
>>> multiple Availability Zones with a custom EndPointSnitch that
>>> replicates the data between AZs for high availability (to be
>>> open-sourced/contributed at some point).
>>>
>>> Using XFS as described here
>>> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663
>>> also makes it very easy to snapshot your cluster to S3.
>>>
>>> We've had no real problems with EC2 and Cassandra, it's been great.
>>>
>>> -Ben Standefer
>>>
>>>
>>> On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans  wrote:
 On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote:
> We want to try out Cassandra in the cloud. Any recommendations?
> Comments?
>
> Should we use Amazon? Rackspace? Something else?

 I personally haven't used Cassandra on EC2, but others have reported
 significantly better disk IO, (and hence, better performance), with
 Rackspace's Cloud Servers.

 Full disclosure though, I work for Rackspace. :)

 --
 Eric Evans
 eev...@rackspace.com


>>>
>>
>>
>>
>> --
>> Mike Subelsky
>> oib.com // ignitebaltimore.com // subelsky.com
>> @subelsky // (410) 929-4022
>>
>



-- 
Mike Subelsky
oib.com // ignitebaltimore.com // subelsky.com
@subelsky


Re: Cassandra in the cloud

2010-06-03 Thread Ben Standefer
Mike, yep, there are a lot of benchmarks proving it (plus it just makes sense)

http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html
http://www.mysqlperformanceblog.com/2009/08/06/ec2ebs-single-and-raid-volumes-io-bencmark/
http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs/

-Ben Standefer


On Thu, Jun 3, 2010 at 4:41 PM, Mike Subelsky  wrote:
> Ben,
>
> thanks for that, we may try that.  I did find an AWS forum tidbit from
> two years ago:
>
> "4 ephemeral stores striped together can give significantly higher
> throughput for sequential writes than EBS."
>
> http://developer.amazonwebservices.com/connect/thread.jspa?messageID=125197𞤍
>
> -Mike
>
> On Thu, Jun 3, 2010 at 5:57 PM, Ben Standefer  wrote:
>> The commit log and data directory are on the same mounted directory
>> structure (the 2 RAID 0 striped ephemeral disks) rather than using 1
>> of the ephemeral disks for the data and 1 of the ephemeral disks for
>> the data directory.  While it's usually advised that for disk
>> utilization reasons you keep the commit logs and data directory on
>> separate disks, our RAID0 configuration gives us much more space for
>> the data directory without having to mess with EBSes.  We've found it
>> to be fine for now.
>>
>> I see how my XFS snapshots reference was confusing.  Our plan is to
>> have a single AZ use EBSes for the data directory so that we can more
>> easily snapshot our data (trusting that our AZ-aware EndPointSnitch),
>> while other AZs will continue ephemeral drives.
>>
>> -Ben Standefer
>>
>>
>> On Thu, Jun 3, 2010 at 1:26 PM, Mike Subelsky  wrote:
>>> Ben,
>>>
>>> do you just keep the commit log on the ephemeral drive?  Or data and
>>> commit? (I was confused by your reference to XFS and snapshots -- I
>>> assume you keep data on the XFS drive)
>>>
>>> -Mike
>>>
>>> On Thu, Jun 3, 2010 at 2:29 PM, Ben Standefer  wrote:
 We're using Cassandra on AWS at SimpleGeo.  We software RAID 0 stripe
 the ephemeral drives to achieve better I/O and have machines in
 multiple Availability Zones with a custom EndPointSnitch that
 replicates the data between AZs for high availability (to be
 open-sourced/contributed at some point).

 Using XFS as described here
 http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663
 also makes it very easy to snapshot your cluster to S3.

 We've had no real problems with EC2 and Cassandra, it's been great.

 -Ben Standefer


 On Thu, Jun 3, 2010 at 11:51 AM, Eric Evans  wrote:
> On Thu, 2010-06-03 at 11:29 +0300, David Boxenhorn wrote:
>> We want to try out Cassandra in the cloud. Any recommendations?
>> Comments?
>>
>> Should we use Amazon? Rackspace? Something else?
>
> I personally haven't used Cassandra on EC2, but others have reported
> significantly better disk IO, (and hence, better performance), with
> Rackspace's Cloud Servers.
>
> Full disclosure though, I work for Rackspace. :)
>
> --
> Eric Evans
> eev...@rackspace.com
>
>

>>>
>>>
>>>
>>> --
>>> Mike Subelsky
>>> oib.com // ignitebaltimore.com // subelsky.com
>>> @subelsky // (410) 929-4022
>>>
>>
>
>
>
> --
> Mike Subelsky
> oib.com // ignitebaltimore.com // subelsky.com
> @subelsky
>


Re: Cassandra Cluster Setup

2010-06-03 Thread Nahor

 On 2010-06-03 13:07, Stephan Pfammatter wrote:

Cassandra-or
[...]
cassandra-or



Aside from the replication factor noted by Gary, this should point to 
your existing node (cassandra-ca) otherwise, how will this node know 
where existing node is and where to get the data from?




Cassandra-az
[...]
cassandra-az


Same here




Cassandra training Jun 18 in SF

2010-06-03 Thread Jonathan Ellis
We're back with another public Cassandra training:
http://www.eventbrite.com/event/718755818

This will be Riptano's 6th training session (including the four we've
done that were on-site with a specific customer), and in my humble
opinion the material's really solid at this point.

The eventbrite text does a pretty good job of describing what we're
covering.  I would only add that it focuses on the stable 0.6 series,
with notes as to where things will be changing for 0.7.  Also, as
suggested by the outline, there is about a 2:1 ratio of "ops" material
to "dev" material.

We are actively working on lining up other locations.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: [***SPAM*** ] Re: question about class SlicePredicate

2010-06-03 Thread Shuai Yuan
It's documented that get_range_slice() supports all partitioner in 0.6

Kevin

  
??: Olivier Mallassi 
??: user@cassandra.apache.org 
: [***SPAM*** ] Re: question about class SlicePredicate
: Tue, 1 Jun 2010 13:38:03 +0200

Does it work whatever the chosen partionner?
Or only for OrderPreservingPartitionner ?

On Tuesday, June 1, 2010, Eric Yu  wrote:
> It needs a SliceRange. For example:
> SliceRange range = new SliceRange();
> range.setStart("".getBytes());
> range.setFinish("".getBytes());
> range.setReversed(true);
> range.setCount(20);
>
> SlicePredicate sp = new SlicePredicate();
> sp.setSlice_range(range);
>
> client.get_slice(KEYSPACE, KEY, ColumnParent, sp, ConsistencyLevel.ONE);
> 2010/6/1 Shuai Yuan 
> Hi all,
>
> I don't quite understand the usage of 'class SlicePredicate' when trying
> to retrieve a ranged slice.
>
> How should it be initialized?
>
> Thanks!
> --
> Kevin Yuan
> www.yuan-shuai.info
>
>
>
>
>






Re: problem when trying to get_range_slice()

2010-06-03 Thread Jonathan Ellis
use smaller slices and page through the data

2010/6/3 Shuai Yuan :
> Hi all,
>
> my env
>
> 6 servers with about 200GB data.
>
> data structure,
>
> 64B rowkey + (5B column)*20,
> rowkey and column.value are all random bytes from a-z,A-Z,0-9
>
> problem
>
> when I tried iterate over the data in the storage, I always get
> org::apache::cassandra::TimedOutException
> (RpcTimeout = 2minutes)
>
> questions
>
> 1.How could I iterate over my data then?
>
> 2.In the code below, I gave key_start/end/count twice, one for
> get_range_slice() and the other for SlicePreditor. Are they the same?
>
> Thanks!
>
> scan-data application
>
>        CassandraFactory factory(argv[1], 9160);
>        tr1::shared_ptr client(factory.create());
>
>        Keyspace *key_space = client->getKeyspace("campaign");
>
>        map > mResult;
>        ColumnParent cp;
>        SlicePredicate sp;
>        string szStart, szEnd;
>        uint32_t uCount = 100; //try to retrieve 100 keys?
>
>        szStart =
> "";
>        szEnd =
> "1000";
>
>        cp.column_family = "test_col";
>
>        sp.__isset.slice_range = true;
>        sp.slice_range.start =
> "";
>        sp.slice_range.finish =
> "1000";
>        sp.slice_range.count = 100;
>
>        try {
>                mResult = key_space->getRangeSlice(cp, sp, szStart, szEnd,
> uCount); //by libcassandra, mainly invoking get_range_slice()
>
>                /* then iterate the result and output */
>                        }
>                }
>        } catch (org::apache::cassandra::InvalidRequestException &ire) {
>                cout << ire.why << endl;
>                delete key_space;
>                return 1;
>        }
>
>        return 0;
> --
> Kevin Yuan
> www.yuan-shuai.info
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Effective cache size

2010-06-03 Thread Jonathan Ellis
On Thu, Jun 3, 2010 at 10:17 AM, David King  wrote:
>>> So with the row cache, that first node (the primary replica) is the one 
>>> that has that row cached, yes?
>> No, it's the closest node as determined by snitch.sortByProximity.
>
> And with the default snitch, rack-unaware placement, random partitioner, and 
> all nodes up, that's the primary replica, right?

No.  When all replicas have equal weight it's basically random.

>> any given node X will never know whether another node Y has a row cached or 
>> not.  the overhead for communicating that level of detail would be totally 
>> prohibitive. all caching does is speed the read, once a request is received 
>> for data local to a given node.  no more, no less.
>
> Yes, that's my concern, but the details significantly affect the effective 
> size of the cache (in the afoorementioned case, the details place the 
> effective size at either 6 million or 18 million, a 3x difference).
>
> So given CL==ONE reads, only the actually read node (which will be the 
> primary replica given the default placement strategy and snitch) will cache 
> the item, right? The checksum-checking doesn't cause the row to be cached on 
> the non-read nodes?

You have to read the data, before you can checksum it.  So on the
contrary, digest (checksum) vs data read has no effect on cache
behavior.

> If I read with CL==QUORUM in an RF==3 environment, do both read nodes them 
> cache the item, or only the primary replica?

Both.  Which is what you want, otherwise your digest reads will cause
substantial unnecessary i/o on hot keys.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: What is K-table ?

2010-06-03 Thread Jonathan Ellis
Sounds like a bug in the cli.  Maybe it only knows how to describe KS
+ CF together?

Please file a bug report at https://issues.apache.org/jira/browse/CASSANDRA.

On Thu, Jun 3, 2010 at 10:37 AM, yaw  wrote:
> Hi all,
> connecting to a cluster with cassandra-cli and trying a describe command,  I
> obtain a "missing K_TABLE" message :
>
>
> cassandra> describe Keyspace1
> line 1:9 missing K_TABLE at 'Keyspace1'
> Keyspace1.Super1
> Column Family Type: Super
> Columns Sorted By: org.apache.cassandra.db.marshal.bytest...@2bbe2893
>
> Is this a real issue ?
>
> Many thanks,
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: What is K-table ?

2010-06-03 Thread Philip Stanhope
Note the describe_keyspace API method does not exhibit this behavior in 0.6.2 
... seems to be a problem specific to cassandra-cli.

-phil

On Jun 3, 2010, at 10:18 PM, Jonathan Ellis wrote:

> Sounds like a bug in the cli.  Maybe it only knows how to describe KS
> + CF together?
> 
> Please file a bug report at https://issues.apache.org/jira/browse/CASSANDRA.
> 
> On Thu, Jun 3, 2010 at 10:37 AM, yaw  wrote:
>> Hi all,
>> connecting to a cluster with cassandra-cli and trying a describe command,  I
>> obtain a "missing K_TABLE" message :
>> 
>> 
>> cassandra> describe Keyspace1
>> line 1:9 missing K_TABLE at 'Keyspace1'
>> Keyspace1.Super1
>> Column Family Type: Super
>> Columns Sorted By: org.apache.cassandra.db.marshal.bytest...@2bbe2893
>> 
>> Is this a real issue ?
>> 
>> Many thanks,
>> 
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com



Re: MessageDeserializationTask backlog crash

2010-06-03 Thread Jonathan Ellis
having the write or read stage fill up, will cause as a secondary
effect deserialization to fill up

moral: when you start getting timeout exceptions, have your clients
sleep for 100ms or otherwise back off (or maybe you just need to add
capacity)

On Thu, Jun 3, 2010 at 10:16 AM, Daniel Kluesing  wrote:
> I’ve had a few nodes crash (Out of heap), and when I pull the heap dump,
> there are hundreds of thousands of MessageDeserializationTasks in the thread
> pool executor, using up GB of the heap. I’m running 0.6.2 on sun jvm u20 and
> the nodes are under heavy load. Has anyone else run into this? I haven’t
> deeply understood the deserilization pool, but just based on the name, it
> seems like it should be super fast. What could make it back up with hundreds
> of thousands of messages?
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: [***SPAM*** ] Re: problem when trying to get_range_slice()

2010-06-03 Thread Shuai Yuan
Thanks for the hint.

I found out it was "too many opened files" error and the server side
just lost response to the get_range_slice() request by throwing out an
exception.

Now works with "ulimit -n 32768".

Kevin

  
??: Jonathan Ellis 
??: user@cassandra.apache.org
: [***SPAM*** ] Re: problem when trying to get_range_slice()
: Thu, 3 Jun 2010 19:07:28 -0700

use smaller slices and page through the data

2010/6/3 Shuai Yuan :
> Hi all,
>
> my env
>
> 6 servers with about 200GB data.
>
> data structure,
>
> 64B rowkey + (5B column)*20,
> rowkey and column.value are all random bytes from a-z,A-Z,0-9
>
> problem
>
> when I tried iterate over the data in the storage, I always get
> org::apache::cassandra::TimedOutException
> (RpcTimeout = 2minutes)
>
> questions
>
> 1.How could I iterate over my data then?
>
> 2.In the code below, I gave key_start/end/count twice, one for
> get_range_slice() and the other for SlicePreditor. Are they the same?
>
> Thanks!
>
> scan-data application
>
>CassandraFactory factory(argv[1], 9160);
>tr1::shared_ptr client(factory.create());
>
>Keyspace *key_space = client->getKeyspace("campaign");
>
>map > mResult;
>ColumnParent cp;
>SlicePredicate sp;
>string szStart, szEnd;
>uint32_t uCount = 100; //try to retrieve 100 keys?
>
>szStart =
> "";
>szEnd =
> "1000";
>
>cp.column_family = "test_col";
>
>sp.__isset.slice_range = true;
>sp.slice_range.start =
> "";
>sp.slice_range.finish =
> "1000";
>sp.slice_range.count = 100;
>
>try {
>mResult = key_space->getRangeSlice(cp, sp, szStart, szEnd,
> uCount); //by libcassandra, mainly invoking get_range_slice()
>
>/* then iterate the result and output */
>}
>}
>} catch (org::apache::cassandra::InvalidRequestException &ire) {
>cout << ire.why << endl;
>delete key_space;
>return 1;
>}
>
>return 0;
> --
> Kevin Yuan
> www.yuan-shuai.info
>
>
>








Re: Cassandra Cluster Setup

2010-06-03 Thread Benjamin Black
http://wiki.apache.org/cassandra/MultinodeCluster

On Thu, Jun 3, 2010 at 1:07 PM, Stephan Pfammatter
 wrote:
> I’m having difficulties setting up a 3 way cassandra cluster. Any
> comments/help would be appreciated.
>
>
>
> My goal is that all data should be fully replicated amongst the 3 nodes. I
> want to simulate the failure of one node and proof that the test column
> family still can be accessed.
>
> In a nutshell I did the following 3 steps. The excerpt below shows the
> changed storage-con.xml:
>
> (1)    I started Cassandra-ca and worked fine. I added 3 rows to a cf
> cf_test.
>
> (2)    Then I started Cassandra-or. It comes up ok but goes into “sleeping
> 3”…..mode
>
> (3)    Nodetool streams doesn’t reveal anything.
>
>
>
>
>
>
>
> Node: Cassandra-ca
>
> false
>
> 
>
>     
>
>
> org.apache.cassandra.locator.RackUnawareStrategy
>
>     1
>
>
> org.apache.cassandra.locator.EndPointSnitch
>
> 
>
>   cassandra-ca
>
>   cassandra-us
>
>  9160
>
> cassandra-us
>
>
>
> Cassandra-or
>
> true
>
> 
>
>     
>
>
> org.apache.cassandra.locator.RackUnawareStrategy
>
>     1
>
>
> org.apache.cassandra.locator.EndPointSnitch
>
> 
>
>   cassandra-or
>
>   cassandra-or
>
>  9160
>
> cassandra-or
>
>
>
> Cassandra-az
>
>
>
> Cassandra-or
>
> true
>
> 
>
>     
>
>
> org.apache.cassandra.locator.RackUnawareStrategy
>
>     1
>
>
> org.apache.cassandra.locator.EndPointSnitch
>
> 
>
>   cassandra-az
>
>   cassandra-az
>
>  9160
>
> cassandra-az
>
>
>
>
>
>


High CPU Usage since 0.6.2

2010-06-03 Thread Lu Ming


I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to 
cassandra to 0.6.2 yesterday.
But today I find six cassandra nodes have high CPU usage more than 400% in 
my 8-core CPU sever.

The worst one is more than 760%. It is very serious.

I use jvisualvm to watch the worst node, and I found that there are many 
running threads named "thread-xxx"

the status of other threads is waiting and sleeping.

"Thread-130" - Thread t...@240
  java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.setMemory(Native Method)
at sun.nio.ch.Util.erase(Util.java:202)
	at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
	at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
	at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

  Locked ownable synchronizers:
- None

"Thread-126" - Thread t...@236
  java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:200)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
- locked java.lang.obj...@10808561
	at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
	at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
	at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


  Locked ownable synchronizers:
- None

"Thread-119" - Thread t...@229
  java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:182)
- locked java.lang.obj...@65b4abbd
- locked java.lang.obj...@38773975
	at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
	at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
	at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


  Locked ownable synchronizers:
- None





Re: High CPU Usage since 0.6.2

2010-06-03 Thread Chris Goffinet
We're seeing this as well. We were testing with a 40+ node cluster on the 
latest 0.6 branch from few days ago.

-Chris

On Jun 3, 2010, at 9:55 PM, Lu Ming wrote:

> 
> I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to 
> cassandra to 0.6.2 yesterday.
> But today I find six cassandra nodes have high CPU usage more than 400% in my 
> 8-core CPU sever.
> The worst one is more than 760%. It is very serious.
> 
> I use jvisualvm to watch the worst node, and I found that there are many 
> running threads named "thread-xxx"
> the status of other threads is waiting and sleeping.
> 
> "Thread-130" - Thread t...@240
>  java.lang.Thread.State: RUNNABLE
>   at sun.misc.Unsafe.setMemory(Native Method)
>   at sun.nio.ch.Util.erase(Util.java:202)
>   at 
> sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)
>   at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
>   at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>  Locked ownable synchronizers:
>   - None
> 
> "Thread-126" - Thread t...@236
>  java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.FileDispatcher.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:200)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>   - locked java.lang.obj...@10808561
>   at 
> sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)
>   at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
>   at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> 
>  Locked ownable synchronizers:
>   - None
> 
> "Thread-119" - Thread t...@229
>  java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.NativeThread.current(Native Method)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:182)
>   - locked java.lang.obj...@65b4abbd
>   - locked java.lang.obj...@38773975
>   at 
> sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)
>   at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
>   at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> 
>  Locked ownable synchronizers:
>   - None
> 
> 



High read latency

2010-06-03 Thread Ma Xiao
we have a SupperCF which may have up to 1000 supper columns and 5
clumns for each supper column, the read latency may go up to 50ms
(even higher), I think it's a long time to response, how to tune the
storage config to optimize the performace? I read the wiki,
 may help to do this, supose that by asign a big
value to this( 2 ex. ), no row and reach this limit so it never
generate a index for a row. In our production scenario, we only access
1 row at a time and with up to 1000 columns slice retuned.  Any
suggestion?