This is because of the "warm up" of Cassandra as it starts. On a start it
will start fetching the rows that were cached: this will have to be loaded
from the disk, as there is nothing in the cache yet. You can read more
about this at http://wiki.apache.org/cassandra/LargeDataSetConsiderations
201
2012/2/13 R. Verlangen
> This is because of the "warm up" of Cassandra as it starts. On a start it
> will start fetching the rows that were cached: this will have to be loaded
> from the disk, as there is nothing in the cache yet. You can read more
> about this at http://wiki.apache.org/cassandr
I also noticed that, Cassandra appears to perform better under a continues
load.
Are you sure the rows you're quering are actually in the cache?
2012/2/13 Franc Carter
> 2012/2/13 R. Verlangen
>
>> This is because of the "warm up" of Cassandra as it starts. On a start it
>> will start fetching
> I actually has the opposite 'problem'. I have a pair of servers that have
> been static since mid last week, but have seen performance vary
> significantly (x10) for exactly the same query. I hypothesised it was
> various caches so I shut down Cassandra, flushed the O/S buffer cache and
> then bo
On Mon, Feb 13, 2012 at 7:21 PM, Peter Schuller wrote:
> > I actually has the opposite 'problem'. I have a pair of servers that have
> > been static since mid last week, but have seen performance vary
> > significantly (x10) for exactly the same query. I hypothesised it was
> > various caches so
2012/2/13 R. Verlangen
> I also noticed that, Cassandra appears to perform better under a continues
> load.
>
> Are you sure the rows you're quering are actually in the cache?
>
I'm making an assumption . . . I don't yet know enough about cassandra to
prove they are in the cache. I have my keyc
> Yep - I've been looking at these - I don't see anything in iostat/dstat etc
> that point strongly to a problem. There is quite a bit of I/O load, but it
> looks roughly uniform on slow and fast instances of the queries. The last
> compaction ran 4 days ago - which was before I started seeing vari
> I'm making an assumption . . . I don't yet know enough about cassandra to
> prove they are in the cache. I have my keycache set to 2 million, and am
> only querying ~900,000 keys. so after the first time I'm assuming they are
> in the cache.
Note that the key cache only caches the index positio
For one thing, what does ReadStage's pending look like if you
repeatedly run "nodetool tpstats" on these nodes? If you're simply
bottlenecking on I/O on reads, that is the most easy and direct way to
observe this empirically. If you're saturated, you'll see active close
to maximum at all times, and
On Mon, Feb 13, 2012 at 7:49 PM, Peter Schuller wrote:
> > I'm making an assumption . . . I don't yet know enough about cassandra
> to
> > prove they are in the cache. I have my keycache set to 2 million, and am
> > only querying ~900,000 keys. so after the first time I'm assuming they
> are
> >
What is your total data size (nodetool info/nodetool ring) per node,
your heap size, and the amount of memory on the system?
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
On Mon, Feb 13, 2012 at 7:48 PM, Peter Schuller wrote:
> > Yep - I've been looking at these - I don't see anything in iostat/dstat
> etc
> > that point strongly to a problem. There is quite a bit of I/O load, but
> it
> > looks roughly uniform on slow and fast instances of the queries. The last
>
On Mon, Feb 13, 2012 at 7:51 PM, Peter Schuller wrote:
> For one thing, what does ReadStage's pending look like if you
> repeatedly run "nodetool tpstats" on these nodes? If you're simply
> bottlenecking on I/O on reads, that is the most easy and direct way to
> observe this empirically. If you'r
> the servers spending >50% of the time in io-wait
Note that I/O wait is not necessarily a good indicator, depending on
situation. In particular if you have multiple drives, I/O wait can
mostly be ignored. Similarly if you have non-trivial CPU usage in
addition to disk I/O, it is also not a good i
> Yep, the readstage is backlogging consistently - but the thing I am trying
> to explain s why it is good sometimes in an environment that is pretty well
> controlled - other than being on ec2
So pending is constantly > 0? What are the clients? Is it batch jobs
or something similar where there is
On Mon, Feb 13, 2012 at 8:00 PM, Peter Schuller wrote:
> What is your total data size (nodetool info/nodetool ring) per node,
> your heap size, and the amount of memory on the system?
>
2 Node cluster, 7.9GB of ram (ec2 m1.large)
RF=2
11GB per node
Quorum reads
122 million keys
heap size is 1867
> 2 Node cluster, 7.9GB of ram (ec2 m1.large)
> RF=2
> 11GB per node
> Quorum reads
> 122 million keys
> heap size is 1867M (default from the AMI I am running)
> I'm reading about 900k keys
Ok, so basically a very significant portion of the data fits in page
cache, but not all.
> As I was just go
On Mon, Feb 13, 2012 at 8:09 PM, Peter Schuller wrote:
> > the servers spending >50% of the time in io-wait
>
> Note that I/O wait is not necessarily a good indicator, depending on
> situation. In particular if you have multiple drives, I/O wait can
> mostly be ignored. Similarly if you have non-
Are there plans to write partitioner based on faster hash alg. instead
of MD5? I did cassandra profiling and lot of time is spent inside MD5
function.
On Mon, Feb 13, 2012 at 8:15 PM, Peter Schuller wrote:
> > 2 Node cluster, 7.9GB of ram (ec2 m1.large)
> > RF=2
> > 11GB per node
> > Quorum reads
> > 122 million keys
> > heap size is 1867M (default from the AMI I am running)
> > I'm reading about 900k keys
>
> Ok, so basically a very significan
https://issues.apache.org/jira/browse/CASSANDRA-3772
2012/2/13 Radim Kolar :
> Are there plans to write partitioner based on faster hash alg. instead of
> MD5? I did cassandra profiling and lot of time is spent inside MD5 function.
The Cassandra team is pleased to announce the release of Apache Cassandra
version 0.8.10.
Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:
http://cassan
Hi,
I've been looking at tpstats as various test queries run and I noticed
something I don't understand.
I have a two node cluster with RF=2 on which I run 4 parallel queries, each
job goes through a list of keys doing a multiget for 2 keys at a time. If
two of the queries go to one node and the
Hi Cassandra Users,
Heard that indexing a field with high cardinality is not good. If we create a
CF to store the index information like indexed field as key and the keys of
original CF as cols in the row. Will there be any performance improvement? Is
this the way secondary indexes are maintain
Hello everybody
I have a very simple cluster containing 2 servers. Replication_factor = 2,
Consistency_level of reads and writes = 1
10.111.1.141datacenter1 rack1 Up Normal 1.5 TB 100.00%
vjpigMzv4KkX3x7z
10.111.1.142datacenter1 rack1 Up Normal 1.41 TB
Hi Nikolay,
Some points that may be useful:
1/ auto_bootstrap = true is used for telling a new node to join the ring
(the cluster). It has nothing to do with hinted handoff
2/ both of your nodes seem to be using the same token? The output indicates
that 100% of your key range is assigned to 10.1
Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly
tells me that my message looks like spam...
> 2/ both of your nodes seem to be using the same token? The output indicates
> that 100% of your key range is assigned to 10.111.1.141 (and
> therefore 10.111.1.142 holds repl
Hi Guys,
A very trivial question on batch mutation provided by Hector. Is the execution
of the batch sequential? (in the order data is added).
Also say there are 10 operations in a batch and 3rd fails will it try the
remaining 7?
Is execution of batch mutator multi threaded ?
Regards,
Dushyant
Hi all,
I am nursing an overloaded 0.6 cluster through compaction to get its disk
usage under 50%. Many rows' content have been replaced so that after
compaction there will be plenty of room, but a couple of nodes are
currently at 95%.
One strategy I considered is temporarily moving a couple of t
My understanding is you expected to see
111:ticks
222:ticks
333:ticks
444:ticks
But instead you are getting
111:ticks
111:quote
222:ticks
222:quote
333:ticks
333:quote
444:ticks
If that is the case things are working as expected.
The slice operation gets a column range. So if you start at 1
What CL are you reading at ?
Write ops go to RF number of nodes, read ops go to RF number of nodes 10% (the
default probability that Read Repair will be running) of the time and CL number
of nodes 90% of the time. With 2 nodes and RF 2 the QUOURM is 2, every request
will involve all nodes.
A
> Heard that indexing a field with high cardinality is not good.
http://www.datastax.com/docs/0.7/data_model/secondary_indexes
> Will there be any performance improvement? Is this the way secondary indexes
> are maintained?
Updating secondary indexes requires a read and a write.
> Also this ma
> Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly
> tells me that my message looks like spam…
Send as text.
What version are you using ?
It looks like you are using the ByteOrderedPartitioner , is that correct ?
I would try to get the repair done first, what was the
> Is the execution of the batch sequential? (in the order data is added).
No, parallel see concurrent_writes in cassandra.yaml
> Also say there are 10 operations in a batch and 3rd fails will it try the
> remaining 7?
http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
Cheers
---
if the composite column was rearranged as ticks:111wouldn't the result be as
desired? - Original Message -From: "aaron morton"
>;aa...@thelastpickle.com
If you want to get all the tick between two integers yes.
A
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 14/02/2012, at 8:36 AM, Dave Brosius wrote:
> if the composite column was rearranged as
>
> ticks:111
>
> wouldn't the result be as des
Too easy. Does anybody have a more difficult approach? :) Just kidding.
Thanks, Aaron.
On Mon, Feb 13, 2012 at 11:43 AM, aaron morton wrote:
> I am nursing an overloaded 0.6 cluster
>
> Shine on you crazy diamond.
>
> If you have some additional storage available I would:
>
> 1) Allocate a data d
Hi All
I am using expiring columns in my column family, and need to search for
the rows where a particular column expired (and no longer exists).. I am
using Hector client. How can I make a query to find the rows of my interest?
thanks
asankha
--
Asankha C. Perera
AdroitLogic, http://adroitl
Hi all,
Those in the UK might be interested in the next Cassandra London events:
Monday 20th February
Two talks: "Cassandra as an email storage system" and "CQL - then and now"
http://www.meetup.com/Cassandra-London/events/29569461/
Tuesday 6th March
How Netflix uses Cassandra with Adrian Coc
Hi Experts,
My program is such that it queries all keys on Cassandra. I want to do this
as quick as possible, in order to get as close to real-time as possible.
One solution I heard was to use the sstables2json tool, and read the data
in as JSON. I understand that reading from each line in Cassan
On Tue, Feb 14, 2012 at 6:06 AM, aaron morton wrote:
> What CL are you reading at ?
>
Quorum
>
> Write ops go to RF number of nodes, read ops go to RF number of nodes 10%
> (the default probability that Read Repair will be running) of the time and
> CL number of nodes 90% of the time. With 2 no
Hi
I got the below exception to the system.log after upgrade to 1.0.7 from
1.0.6 version. I am using the same configuration files which I used in 1.0.6
version.
2012-02-14 10:48:12,379 ERROR [AbstractCassandraDaemon] Fatal exception in
thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerEx
Perfect, Aaron, Thanks a lot
From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, February 14, 2012 12:54 AM
To: user@cassandra.apache.org
Subject: Re: Secondary indexes and cardinality
Heard that indexing a field with high cardinality is not good.
http://www.datastax.com/docs/0.7
43 matches
Mail list logo