Unfortunately there's no way to do this in Cassandra right now, except
by using another row as index, like you're doing right now.
Of course you could also store by source_id.date and have a batch job
iterate over all sources to compute the top 100. It would not be real
time any more though.
-
On Mon, Oct 3, 2011 at 9:14 AM, Pierre-Yves Ritschard
wrote:
> Unfortunately there's no way to do this in Cassandra right now, except
> by using another row as index, like you're doing right now.
>
> Of course you could also store by source_id.date and have a batch job
> iterate over all sources t
Nothing against annotations, the are like post-it notes from pixies.
More about what you to with them to.
A
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 3/10/2011, at 12:03 PM, Peter Lin wrote:
> It can be dangerous if wielded like
Other than manually pull them from JMX, not really.
Most monitoring templates will grab those stats per cf (and perhaps per ks).
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 3/10/2011, at 3:41 PM, Marcus Both wrote:
> Hi,
>
The annotations I'm thinking of are pretty simple
keyspace
key
composite key
column
it's probably easier if I post it on github so that others can see
peter lin
On Mon, Oct 3, 2011 at 5:46 AM, aaron morton wrote:
> Nothing against annotations, the are like post-it notes from pixies.
>
> More
I did an extra test, again starting from scratch but with replication factor 1.
I still get the dead/up messages and timeout exceptions, but the system keeps
running and storing. However I ran out of disk space, logically producing a lot
of other errors.
Then I restarted the Cassandra servers, so
I made mention of this during my presentation at the Cassandra Summit
back in July, but we're finally ready to release the source for
Usergrid. This is a mobile platform stack built on top of Cassandra
and using Hector and we're making the full source code available on
GitHub. We'll be offering i
This should be quite helpful as a reference. Thanks!
On Mon, Oct 3, 2011 at 9:03 PM, Ed Anuff wrote:
> I made mention of this during my presentation at the Cassandra Summit
> back in July, but we're finally ready to release the source for
> Usergrid. This is a mobile platform stack built on top
I am running a cassandra 0.8.6 cluster. I started a clean test setup and run
my tests for a while. Later when I run cfstats and cfhistograms ( both ran
at the same time )
the values for Read/Write latency doesn't match. As per cfstats the
latency for read and write are 5.086 and 0.018 ms respec
On the 'invalid column name length 0' exception, since you're
embedding the Cassandra server, it could
be that you modify a column ByteBuffer that you feed to Cassandra
(that's fairly easy to do with ByteBuffer
by calling some relative get method of ByteBuffer). Or more generally
that you feed a ze
Nice!
On Mon, Oct 3, 2011 at 10:33 AM, Ed Anuff wrote:
> I made mention of this during my presentation at the Cassandra Summit
> back in July, but we're finally ready to release the source for
> Usergrid. This is a mobile platform stack built on top of Cassandra
> and using Hector and we're maki
Hey folks, I pushed my Scala wrapper of Hector for Cassandra
https://github.com/joestein/skeletor
It not only gets Cassandra hooked into your Scala projects quick and simple
but does so in a functional way.
It is not a new library interface for Cassandra because Hector is a great
library as is.
Hi,
I am using Cassandra 0.8.5, Hector 0.8.0-2 and cqlsh (cql 1.0.3). If I
define a CF with comparator LongType like this:
BasicColumnFamilyDefinition columnFamilyDefinition = new
BasicColumnFamilyDefinition();
columnFamilyDefinition.setKeyspaceName("XXX");
columnFamilyDef
I am running a cassandra cluster of 6 nodes running RHEL6 virtualized by
ESXi 5.0. Each VM is configured with 20GB of ram and 12 cores. Our test
setup performs about 3000 inserts per second. The cassandra data partition
is on a XFS filesystem mounted with options
(noatime,nodiratime,nobarrier,l
The tokens were different than the production cluster and after closer
inspection a lot of data wasn't queryable (as excpected I suppose). I set
the tokens and everything seems ok now.
Auto bootstrap was false so no issues there.
Thanks for the insight Shyamal! It's good to finally have this up
On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan wrote:
> I am running a cassandra cluster of 6 nodes running RHEL6 virtualized by
> ESXi 5.0. Each VM is configured with 20GB of ram and 12 cores. Our test
> setup performs about 3000 inserts per second. The cassandra data partition
> is on a X
We have 5 CF. Attached is the output from the describe command. We don't
have row cache enabled.
Thanks
Ramesh
Keyspace: MSA:
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Durable Writes: true
Options: [replication_factor:3]
Column Families:
ColumnFamily: admin
maybe try row cache ?
have you enabled the mlock ? (need jna.jar , and set ulimit -l )
using iostat -x would also give you more clues as to disk performance
On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan wrote:
> I am running a cassandra cluster of 6 nodes running RHEL6 virtualized by
> ESX
I am wondering if you are seeing issues because of more frequent
compactions kicking in. Is this primarily write ops or reads too?
During the period of test gather data like:
1. cfstats
2. tpstats
3. compactionstats
4. netstats
5. iostat
You have RSS memory close to 17gb. Maybe someone can give f
Hi, we're trying to setup a cluster to run brisk/hadoop jobs on and part of
that setup is copying sstables from another cluster running 8.4. Could
there be any compatibility issues with the files there since the brisk beta2
package uses 8.1? So far, it seems to work fine but now I'm a little
nerv
I will start another test run to collect these stats. Our test model is in
the neighborhood of 4500 inserts, 8000 updates&deletes and 1500 reads every
second across 6 servers.
Can you elaborate more on reducing the heap space? Do you think it is a
problem with 17G RSS?
thanks
Ramesh
On Mon, Oc
In order to understand what's going on you might want to first just do
write test, look at the results and then do just the read tests and
then do both read / write tests.
Since you mentioned high update/deletes I should also ask your CL for
writes/reads? with high updates/delete + high CL I think
Nope, you're good to go.
On Mon, Oct 3, 2011 at 1:34 PM, Eric Czech wrote:
> Hi, we're trying to setup a cluster to run brisk/hadoop jobs on and part of
> that setup is copying sstables from another cluster running 8.4. Could
> there be any compatibility issues with the files there since the bri
Most likely what could be happening is you are running single threaded
compaction. Look at the cassandra.yaml of how to enable multi-threaded
compaction. As more data comes into the system, bigger files get created
during compaction. You could be in a situation where you might be compacting
at a hi
If he puts the mx4j jar (http://mx4j.sourceforge.net/) in his lib/ folder,
he can fetch stats out over HTTP. mx4j is a bridge for JMX->HTTP.
On Mon, Oct 3, 2011 at 2:53 AM, aaron morton wrote:
> Other than manually pull them from JMX, not really.
>
> Most monitoring templates will grab those sta
Thanks for the pointers. I checked the system and the iostat showed that we
are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and
it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to
ESX host via iSCSI.
When I run compactionstats I see we are compact
Yes look at cassandra.yaml there is a section about throttling compaction.
You still *want* multi-threaded compaction. Throttling will occur across all
threads. The reason being is that you don't want to get stuck compacting
bigger files, while the smaller ones build up waiting for bigger compactio
Hi Rameash,
Both tools output the "recent" latency, and while they do this slightly
differently, the result is that it's the latency since the last time it was
checked. Also the two tools use different counters, so using cfstats will not
update cfhistogram.
S
Thanks Ed.
A
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 4/10/2011, at 5:05 AM, Jonathan Ellis wrote:
> Nice!
>
> On Mon, Oct 3, 2011 at 10:33 AM, Ed Anuff wrote:
>> I made mention of this during my presentation at the Cassandra S
Thanks Aaron. The ms in the latency is it microseconds or milliseconds?
I ran the 2 commands at the same time. I was expecting the values to be in
the some what similar but from my output earlier , you can see the median
in read latency in histogram output is about 10 milliseconds whereas the
cfs
I have 6 nodes in a cluster running RandonPartitioner with SimpleStrategy
and replication factor 3. Lets say we insert a column with
a QUORUM consistency.
Based on the md5 hash it decides to go to node 10.19.104.11. How does
cassandra pick the other 2 nodes? Is it sequential ( .12 and .13 ) or any
the following source code in jdk , RMI part, forces a full gc every 1
hour , if no old gen gc has happened by then.
/** maximum interval between complete garbage collections of local heap */
private static final long gcInterval = // default 1 hour
AccessController.d
I would expect that client=nodetool and server=Cassandra. But sun's
docs say that sun.rmi.dgc.server.gcInterval defaults to 60s which I am
definitely NOT seeing.
On Mon, Oct 3, 2011 at 4:12 PM, Yang wrote:
> the following source code in jdk , RMI part, forces a full gc every 1
> hour , if no old
looks doc is outdated :
$ grep '\.gcInterval'
./j2se/src/share/classes/sun/rmi/transport/ObjectTable.java
new GetLongAction("sun.rmi.dgc.server.gcInterval", 360));
On Mon, Oct 3, 2011 at 2:21 PM, Jonathan Ellis wrote:
> I would expect that client=nodetool and server=Cassandra.
btw the first code snippet is from openjdk 7
On Mon, Oct 3, 2011 at 2:29 PM, Yang wrote:
> looks doc is outdated :
>
> $ grep '\.gcInterval'
> ./j2se/src/share/classes/sun/rmi/transport/ObjectTable.java
> new GetLongAction("sun.rmi.dgc.server.gcInterval", 360));
>
>
> On Mon, Oct
I'm wondering what the consensus is for running a Cassandra cluster on top of
Windows boxes? We are currently running a small 5 node cluster on top of
CentOS without problems, so I have no desire to move. But we are a windows
shop, and I have an IT department that is scared of Linux since they
We have about 5000 column family and when we run the nodetool cfstats it
throws out this exception... this is running 1.0.0-rc1
This seems to work on 0.8.6. Is this a bug in 1.0.0?
thanks
Ramesh
Keyspace: system
Read Count: 28
Read Latency: 5.8675 ms.
Write Count: 3
Looks like you have unexpectedly large rows in your 1.0 cluster but
not 0.8. I guess you could use sstable2json to manually check your
row sizes.
On Mon, Oct 3, 2011 at 5:20 PM, Ramesh Natarajan wrote:
> It happens all the time on 1.0. It doesn't happen on 0.8.6. Is there any
> thing I can do t
It happens all the time on 1.0. It doesn't happen on 0.8.6. Is there any
thing I can do to check?
thanks
Ramesh
On Mon, Oct 3, 2011 at 5:15 PM, Jonathan Ellis wrote:
> My suspicion would be that it has more to do with "rare case when
> running with 5000 CFs" than "1.0 regression."
>
> On Mon,
We recreated the schema using the same input file on both clusters and they
are running identical load.
Isn't the exception thrown in the system CF?
this line looks strange:
Compacted row maximum size: 9223372036854775807
thanks
Ramesh
On Mon, Oct 3, 2011 at 5:26 PM, Jonathan Ellis wrote:
>
On Mon, Oct 3, 2011 at 1:19 PM, Ramesh Natarajan wrote:
> Thanks for the pointers. I checked the system and the iostat showed that we
> are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and
> it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to
> ESX
It picks sequentially (the two previous ones, I believe). So in your example it
would be 105.12 and 105.11
- Original Message -
From: "Ramesh Natarajan"
To: user@cassandra.apache.org
Sent: Monday, October 3, 2011 5:06:10 PM
Subject: node selection for replication factor 3
I have 6 nod
"Ramesh" == Ramesh Natarajan writes:
Ramesh> I have 6 nodes in a cluster running RandonPartitioner with
Ramesh> SimpleStrategy and replication factor 3. Lets say we insert
Ramesh> a column with a QUORUM consistency. Based on the md5 hash
Ramesh> it decides to go to node 10.19.
My suspicion would be that it has more to do with "rare case when
running with 5000 CFs" than "1.0 regression."
On Mon, Oct 3, 2011 at 5:00 PM, Ramesh Natarajan wrote:
> We have about 5000 column family and when we run the nodetool cfstats it
> throws out this exception... this is running 1.0.0-
Depends on the replication strategy used.
http://www.datastax.com/docs/0.8/cluster_architecture/replication
On Mon, Oct 3, 2011 at 4:06 PM, Ramesh Natarajan wrote:
>
> I have 6 nodes in a cluster running RandonPartitioner with SimpleStrategy
> and replication factor 3. Lets say we insert a colu
On Mon, Oct 3, 2011 at 6:16 PM, Jonathan Ellis wrote:
> Depends on the replication strategy used.
>
> http://www.datastax.com/docs/0.8/cluster_architecture/replication
>
> On Mon, Oct 3, 2011 at 4:06 PM, Ramesh Natarajan
> wrote:
> >
> > I have 6 nodes in a cluster running RandonPartitioner with
On Mon, Oct 3, 2011 at 12:02 PM, Alexandru Sicoe wrote:
> Hi,
> I am using Cassandra 0.8.5, Hector 0.8.0-2 and cqlsh (cql 1.0.3). If I
> define a CF with comparator LongType like this:
>
> BasicColumnFamilyDefinition columnFamilyDefinition = new
> BasicColumnFamilyDefinition();
>
Someone has just talked about the heap size in this mail list, says that bigger
heap size will result into a longer GC phase, that could probably be one of the
reason not using larger heap size.
But I have really heard of some others using Cassandra with some 60 gigabytes
of heap size.
從我的 Bla
Lots of SliceQueryFilter in the log, is that handling tombstone?
DEBUG [ReadStage:49] 2011-10-03 20:15:07,942 SliceQueryFilter.java (line
123) collecting 0 of 1: 1317582939743663:true:4@1317582939933000
DEBUG [ReadStage:50] 2011-10-03 20:15:07,942 SliceQueryFilter.java (line
123) collecting 0 of 1
That's misleading, because you don't necessarily need to give the
memory to the JVM for Cassandra to make use of it. (See, for example,
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management.)
In fact it's counterproductive to increase heap size past
Thanks. We are not planning to use row cache because we don't anticipate
requests for the same row coming in often and we would better let the OS do
the caching.. So does this mean in my case instead of running 6 servers
with 100 GB each, I can run 75 servers with 8 GB RAM and set the Xms/Xmx to
4
Sure, other things being equal.
Of course, other things are not truly equal and in practice I think
dual-quad-core, 32GB servers are at a good sweet spot for a lot of
applications.
As a rule of thumb, inserts will be cpu-bound and reads will be ram/io bound.
On Mon, Oct 3, 2011 at 11:10 PM, Rame
52 matches
Mail list logo