Re: Cassandra 0.8 Counters Inverted Index?

2011-10-03 Thread Pierre-Yves Ritschard
Unfortunately there's no way to do this in Cassandra right now, except by using another row as index, like you're doing right now. Of course you could also store by source_id.date and have a batch job iterate over all sources to compute the top 100. It would not be real time any more though. -

Re: Cassandra 0.8 Counters Inverted Index?

2011-10-03 Thread Richard Low
On Mon, Oct 3, 2011 at 9:14 AM, Pierre-Yves Ritschard wrote: > Unfortunately there's no way to do this in Cassandra right now, except > by using another row as index, like you're doing right now. > > Of course you could also store by source_id.date and have a batch job > iterate over all sources t

Re: Cassandra annotation

2011-10-03 Thread aaron morton
Nothing against annotations, the are like post-it notes from pixies. More about what you to with them to. A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 3/10/2011, at 12:03 PM, Peter Lin wrote: > It can be dangerous if wielded like

Re: cfstats - check Read Count per minute

2011-10-03 Thread aaron morton
Other than manually pull them from JMX, not really. Most monitoring templates will grab those stats per cf (and perhaps per ks). Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 3/10/2011, at 3:41 PM, Marcus Both wrote: > Hi, >

Re: Cassandra annotation

2011-10-03 Thread Peter Lin
The annotations I'm thinking of are pretty simple keyspace key composite key column it's probably easier if I post it on github so that others can see peter lin On Mon, Oct 3, 2011 at 5:46 AM, aaron morton wrote: > Nothing against annotations, the are like post-it notes from pixies. > > More

RE: invalid column name length 0

2011-10-03 Thread Desimpel, Ignace
I did an extra test, again starting from scratch but with replication factor 1. I still get the dead/up messages and timeout exceptions, but the system keeps running and storing. However I ran out of disk space, logically producing a lot of other errors. Then I restarted the Cassandra servers, so

[ANN] Usergrid, Open Source Mobile Data Platform built on Cassandra

2011-10-03 Thread Ed Anuff
I made mention of this during my presentation at the Cassandra Summit back in July, but we're finally ready to release the source for Usergrid. This is a mobile platform stack built on top of Cassandra and using Hector and we're making the full source code available on GitHub. We'll be offering i

Re: [ANN] Usergrid, Open Source Mobile Data Platform built on Cassandra

2011-10-03 Thread Roshan Dawrani
This should be quite helpful as a reference. Thanks! On Mon, Oct 3, 2011 at 9:03 PM, Ed Anuff wrote: > I made mention of this during my presentation at the Cassandra Summit > back in July, but we're finally ready to release the source for > Usergrid. This is a mobile platform stack built on top

help needed interpreting Read/Write latency in cfstats and cfhistograms output

2011-10-03 Thread Ramesh Natarajan
I am running a cassandra 0.8.6 cluster. I started a clean test setup and run my tests for a while. Later when I run cfstats and cfhistograms ( both ran at the same time ) the values for Read/Write latency doesn't match. As per cfstats the latency for read and write are 5.086 and 0.018 ms respec

Re: invalid column name length 0

2011-10-03 Thread Sylvain Lebresne
On the 'invalid column name length 0' exception, since you're embedding the Cassandra server, it could be that you modify a column ByteBuffer that you feed to Cassandra (that's fairly easy to do with ByteBuffer by calling some relative get method of ByteBuffer). Or more generally that you feed a ze

Re: [ANN] Usergrid, Open Source Mobile Data Platform built on Cassandra

2011-10-03 Thread Jonathan Ellis
Nice! On Mon, Oct 3, 2011 at 10:33 AM, Ed Anuff wrote: > I made mention of this during my presentation at the Cassandra Summit > back in July, but we're finally ready to release the source for > Usergrid.  This is a mobile platform stack built on top of Cassandra > and using Hector and we're maki

Skeletor => Scala wrapper of Hector for Cassandra

2011-10-03 Thread Joe Stein
Hey folks, I pushed my Scala wrapper of Hector for Cassandra https://github.com/joestein/skeletor It not only gets Cassandra hooked into your Scala projects quick and simple but does so in a functional way. It is not a new library interface for Cassandra because Hector is a great library as is.

CQL select not working for CF defined programatically with Hector API

2011-10-03 Thread Alexandru Sicoe
Hi, I am using Cassandra 0.8.5, Hector 0.8.0-2 and cqlsh (cql 1.0.3). If I define a CF with comparator LongType like this: BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setKeyspaceName("XXX"); columnFamilyDef

cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
I am running a cassandra cluster of 6 nodes running RHEL6 virtualized by ESXi 5.0. Each VM is configured with 20GB of ram and 12 cores. Our test setup performs about 3000 inserts per second. The cassandra data partition is on a XFS filesystem mounted with options (noatime,nodiratime,nobarrier,l

Re: unwanted node discovery

2011-10-03 Thread Eric Czech
The tokens were different than the production cluster and after closer inspection a lot of data wasn't queryable (as excpected I suppose). I set the tokens and everything seems ok now. Auto bootstrap was false so no issues there. Thanks for the insight Shyamal! It's good to finally have this up

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan wrote: > I am running a cassandra cluster of  6 nodes running RHEL6 virtualized by > ESXi 5.0.  Each VM is configured with 20GB of ram and 12 cores. Our test > setup performs about 3000  inserts per second.  The cassandra data partition > is on a X

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
We have 5 CF. Attached is the output from the describe command. We don't have row cache enabled. Thanks Ramesh Keyspace: MSA: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:3] Column Families: ColumnFamily: admin

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Yang
maybe try row cache ? have you enabled the mlock ? (need jna.jar , and set ulimit -l ) using iostat -x would also give you more clues as to disk performance On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan wrote: > I am running a cassandra cluster of  6 nodes running RHEL6 virtualized by > ESX

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
I am wondering if you are seeing issues because of more frequent compactions kicking in. Is this primarily write ops or reads too? During the period of test gather data like: 1. cfstats 2. tpstats 3. compactionstats 4. netstats 5. iostat You have RSS memory close to 17gb. Maybe someone can give f

sstable compatibility between 8.4 and 8.1

2011-10-03 Thread Eric Czech
Hi, we're trying to setup a cluster to run brisk/hadoop jobs on and part of that setup is copying sstables from another cluster running 8.4. Could there be any compatibility issues with the files there since the brisk beta2 package uses 8.1? So far, it seems to work fine but now I'm a little nerv

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
I will start another test run to collect these stats. Our test model is in the neighborhood of 4500 inserts, 8000 updates&deletes and 1500 reads every second across 6 servers. Can you elaborate more on reducing the heap space? Do you think it is a problem with 17G RSS? thanks Ramesh On Mon, Oc

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
In order to understand what's going on you might want to first just do write test, look at the results and then do just the read tests and then do both read / write tests. Since you mentioned high update/deletes I should also ask your CL for writes/reads? with high updates/delete + high CL I think

Re: sstable compatibility between 8.4 and 8.1

2011-10-03 Thread Jonathan Ellis
Nope, you're good to go. On Mon, Oct 3, 2011 at 1:34 PM, Eric Czech wrote: > Hi, we're trying to setup a cluster to run brisk/hadoop jobs on and part of > that setup is copying sstables from another cluster running 8.4.  Could > there be any compatibility issues with the files there since the bri

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Chris Goffinet
Most likely what could be happening is you are running single threaded compaction. Look at the cassandra.yaml of how to enable multi-threaded compaction. As more data comes into the system, bigger files get created during compaction. You could be in a situation where you might be compacting at a hi

Re: cfstats - check Read Count per minute

2011-10-03 Thread Chris Goffinet
If he puts the mx4j jar (http://mx4j.sourceforge.net/) in his lib/ folder, he can fetch stats out over HTTP. mx4j is a bridge for JMX->HTTP. On Mon, Oct 3, 2011 at 2:53 AM, aaron morton wrote: > Other than manually pull them from JMX, not really. > > Most monitoring templates will grab those sta

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
Thanks for the pointers. I checked the system and the iostat showed that we are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to ESX host via iSCSI. When I run compactionstats I see we are compact

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Chris Goffinet
Yes look at cassandra.yaml there is a section about throttling compaction. You still *want* multi-threaded compaction. Throttling will occur across all threads. The reason being is that you don't want to get stuck compacting bigger files, while the smaller ones build up waiting for bigger compactio

Re: help needed interpreting Read/Write latency in cfstats and cfhistograms output

2011-10-03 Thread aaron morton
Hi Rameash, Both tools output the "recent" latency, and while they do this slightly differently, the result is that it's the latency since the last time it was checked. Also the two tools use different counters, so using cfstats will not update cfhistogram. S

Re: [ANN] Usergrid, Open Source Mobile Data Platform built on Cassandra

2011-10-03 Thread aaron morton
Thanks Ed. A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 4/10/2011, at 5:05 AM, Jonathan Ellis wrote: > Nice! > > On Mon, Oct 3, 2011 at 10:33 AM, Ed Anuff wrote: >> I made mention of this during my presentation at the Cassandra S

Re: help needed interpreting Read/Write latency in cfstats and cfhistograms output

2011-10-03 Thread Ramesh Natarajan
Thanks Aaron. The ms in the latency is it microseconds or milliseconds? I ran the 2 commands at the same time. I was expecting the values to be in the some what similar but from my output earlier , you can see the median in read latency in histogram output is about 10 milliseconds whereas the cfs

node selection for replication factor 3

2011-10-03 Thread Ramesh Natarajan
I have 6 nodes in a cluster running RandonPartitioner with SimpleStrategy and replication factor 3. Lets say we insert a column with a QUORUM consistency. Based on the md5 hash it decides to go to node 10.19.104.11. How does cassandra pick the other 2 nodes? Is it sequential ( .12 and .13 ) or any

disable mysterious GC

2011-10-03 Thread Yang
the following source code in jdk , RMI part, forces a full gc every 1 hour , if no old gen gc has happened by then. /** maximum interval between complete garbage collections of local heap */ private static final long gcInterval = // default 1 hour AccessController.d

Re: disable mysterious GC

2011-10-03 Thread Jonathan Ellis
I would expect that client=nodetool and server=Cassandra. But sun's docs say that sun.rmi.dgc.server.gcInterval defaults to 60s which I am definitely NOT seeing. On Mon, Oct 3, 2011 at 4:12 PM, Yang wrote: > the following source code in jdk , RMI part, forces a full gc every 1 > hour , if no old

Re: disable mysterious GC

2011-10-03 Thread Yang
looks doc is outdated : $ grep '\.gcInterval' ./j2se/src/share/classes/sun/rmi/transport/ObjectTable.java new GetLongAction("sun.rmi.dgc.server.gcInterval", 360)); On Mon, Oct 3, 2011 at 2:21 PM, Jonathan Ellis wrote: > I would expect that client=nodetool and server=Cassandra.  

Re: disable mysterious GC

2011-10-03 Thread Yang
btw the first code snippet is from openjdk 7 On Mon, Oct 3, 2011 at 2:29 PM, Yang wrote: > looks doc is outdated : > > $ grep '\.gcInterval' > ./j2se/src/share/classes/sun/rmi/transport/ObjectTable.java >            new GetLongAction("sun.rmi.dgc.server.gcInterval", 360)); > > > On Mon, Oct

Running on Windows

2011-10-03 Thread Bryce Godfrey
I'm wondering what the consensus is for running a Cassandra cluster on top of Windows boxes? We are currently running a small 5 node cluster on top of CentOS without problems, so I have no desire to move. But we are a windows shop, and I have an IT department that is scared of Linux since they

nodetool cfstats on 1.0.0-rc1 throws an exception

2011-10-03 Thread Ramesh Natarajan
We have about 5000 column family and when we run the nodetool cfstats it throws out this exception... this is running 1.0.0-rc1 This seems to work on 0.8.6. Is this a bug in 1.0.0? thanks Ramesh Keyspace: system Read Count: 28 Read Latency: 5.8675 ms. Write Count: 3

Re: nodetool cfstats on 1.0.0-rc1 throws an exception

2011-10-03 Thread Jonathan Ellis
Looks like you have unexpectedly large rows in your 1.0 cluster but not 0.8. I guess you could use sstable2json to manually check your row sizes. On Mon, Oct 3, 2011 at 5:20 PM, Ramesh Natarajan wrote: > It happens all the time on 1.0. It doesn't happen on 0.8.6.  Is there any > thing I can do t

Re: nodetool cfstats on 1.0.0-rc1 throws an exception

2011-10-03 Thread Ramesh Natarajan
It happens all the time on 1.0. It doesn't happen on 0.8.6. Is there any thing I can do to check? thanks Ramesh On Mon, Oct 3, 2011 at 5:15 PM, Jonathan Ellis wrote: > My suspicion would be that it has more to do with "rare case when > running with 5000 CFs" than "1.0 regression." > > On Mon,

Re: nodetool cfstats on 1.0.0-rc1 throws an exception

2011-10-03 Thread Ramesh Natarajan
We recreated the schema using the same input file on both clusters and they are running identical load. Isn't the exception thrown in the system CF? this line looks strange: Compacted row maximum size: 9223372036854775807 thanks Ramesh On Mon, Oct 3, 2011 at 5:26 PM, Jonathan Ellis wrote: >

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
On Mon, Oct 3, 2011 at 1:19 PM, Ramesh Natarajan wrote: > Thanks for the pointers.  I checked the system and the iostat showed that we > are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and > it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to > ESX

Re: node selection for replication factor 3

2011-10-03 Thread Konstantin Naryshkin
It picks sequentially (the two previous ones, I believe). So in your example it would be 105.12 and 105.11 - Original Message - From: "Ramesh Natarajan" To: user@cassandra.apache.org Sent: Monday, October 3, 2011 5:06:10 PM Subject: node selection for replication factor 3 I have 6 nod

Re: node selection for replication factor 3

2011-10-03 Thread Shyamal Prasad
"Ramesh" == Ramesh Natarajan writes: Ramesh> I have 6 nodes in a cluster running RandonPartitioner with Ramesh> SimpleStrategy and replication factor 3.  Lets say we insert Ramesh> a column with a QUORUM consistency. Based on the md5 hash Ramesh> it decides to go to node 10.19.

Re: nodetool cfstats on 1.0.0-rc1 throws an exception

2011-10-03 Thread Jonathan Ellis
My suspicion would be that it has more to do with "rare case when running with 5000 CFs" than "1.0 regression." On Mon, Oct 3, 2011 at 5:00 PM, Ramesh Natarajan wrote: > We have about 5000 column family and when we run the nodetool cfstats it > throws out this exception...  this is running 1.0.0-

Re: node selection for replication factor 3

2011-10-03 Thread Jonathan Ellis
Depends on the replication strategy used. http://www.datastax.com/docs/0.8/cluster_architecture/replication On Mon, Oct 3, 2011 at 4:06 PM, Ramesh Natarajan wrote: > > I have 6 nodes in a cluster running RandonPartitioner with SimpleStrategy > and replication factor 3.  Lets say we insert a colu

Re: node selection for replication factor 3

2011-10-03 Thread Edward Capriolo
On Mon, Oct 3, 2011 at 6:16 PM, Jonathan Ellis wrote: > Depends on the replication strategy used. > > http://www.datastax.com/docs/0.8/cluster_architecture/replication > > On Mon, Oct 3, 2011 at 4:06 PM, Ramesh Natarajan > wrote: > > > > I have 6 nodes in a cluster running RandonPartitioner with

Re: CQL select not working for CF defined programatically with Hector API

2011-10-03 Thread Eric Evans
On Mon, Oct 3, 2011 at 12:02 PM, Alexandru Sicoe wrote: > Hi, >  I am using Cassandra 0.8.5, Hector 0.8.0-2 and cqlsh (cql 1.0.3). If I > define a CF with comparator LongType like this: > >     BasicColumnFamilyDefinition columnFamilyDefinition = new > BasicColumnFamilyDefinition(); >        

Re: Cassandra JVM heap size

2011-10-03 Thread Yi Yang
Someone has just talked about the heap size in this mail list, says that bigger heap size will result into a longer GC phase, that could probably be one of the reason not using larger heap size. But I have really heard of some others using Cassandra with some 60 gigabytes of heap size. 從我的 Bla

Re: Weird problem with empty CF

2011-10-03 Thread Daning Wang
Lots of SliceQueryFilter in the log, is that handling tombstone? DEBUG [ReadStage:49] 2011-10-03 20:15:07,942 SliceQueryFilter.java (line 123) collecting 0 of 1: 1317582939743663:true:4@1317582939933000 DEBUG [ReadStage:50] 2011-10-03 20:15:07,942 SliceQueryFilter.java (line 123) collecting 0 of 1

Re: Cassandra JVM heap size

2011-10-03 Thread Jonathan Ellis
That's misleading, because you don't necessarily need to give the memory to the JVM for Cassandra to make use of it. (See, for example, http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management.) In fact it's counterproductive to increase heap size past

Re: Cassandra JVM heap size

2011-10-03 Thread Ramesh Natarajan
Thanks. We are not planning to use row cache because we don't anticipate requests for the same row coming in often and we would better let the OS do the caching.. So does this mean in my case instead of running 6 servers with 100 GB each, I can run 75 servers with 8 GB RAM and set the Xms/Xmx to 4

Re: Cassandra JVM heap size

2011-10-03 Thread Jonathan Ellis
Sure, other things being equal. Of course, other things are not truly equal and in practice I think dual-quad-core, 32GB servers are at a good sweet spot for a lot of applications. As a rule of thumb, inserts will be cpu-bound and reads will be ram/io bound. On Mon, Oct 3, 2011 at 11:10 PM, Rame