Wed, May 5, 2010 at 6:36 PM, Mark Jones
mailto:mjo...@imagehawk.com>> wrote:
Have you actually managed to get 10K reads/second, or are you just estimating
that you can? I've run into similar issues, but I never got reads to scale
when searching for unique keys even using 40 threads
...@gmail.com]
Sent: Wednesday, May 05, 2010 7:04 PM
To: user@cassandra.apache.org
Subject: Re: performance tuning - where does the slowness come from?
On Wed, May 5, 2010 at 6:59 PM, Mark Jones
mailto:mjo...@imagehawk.com>> wrote:
My data is single row/key to a 500 byte column and I'm reading
I have a 3 node cassandra cluster I'm trying to work with:
All three machines are about the same:
6-8GB per machine (fastest machine has 8GB, JavaVM limited to 5GB)
separate spindle for cassandra data and commit log
I wrote ~7 Million items to Cassandra, now, I'm trying to read them back, the
o
I got the idea for this from:
http://wiki.apache.org/cassandra/StorageConfiguration
I put my keyspace setup on a webserver, and I pull it into the config like this:
storage-conf.xml starts with:
http://cassandraconfig /seeds.xml">
http://cassandraconfig /autobootstrap.xml">
http://cassandracon
rate to the
50/second/thread or 100/second/thread, regardless of who does the proxy.
-Original Message-
From: Mark Jones [mailto:mjo...@imagehawk.com]
Sent: Friday, April 02, 2010 1:38 PM
To: user@cassandra.apache.org
Subject: Slow Responses from 2 of 3 nodes in RC1
I have a 3 node cass
or are some finishing sooner than others? Is
your client cpu or disk perhaps the bottleneck?
On Fri, Apr 2, 2010 at 2:39 PM, Mark Jones wrote:
> To further complicate matters,
> when I read only from cassdb1, I can check about 100/second/thread (40
> threads)
> when I read only from
It shouldn't remove a node from the ring should it? (appears it did)
It shouldn't remove data from db, should it? (data size appears to grow, but
records are now missing)
Loaded 38 million "rows" and the ring looked like this:
m...@ec2:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --h
The log said Bootstrapping @ 07:34 (since it was 08:35, I assumed it wasn't
doing anything, also, CPU usage was < 10%)
Turns out, when I restarted the node, it claimed the time was 7:35 rather than
8:35. Why would log4j be off by one hour? We are on CDT here, and have been
for more than a w
0 per second, how many rows
and columns does a check involve? What query api are you using?
Your cassandra nodes look mostly idle. Is each client thread getting
the same amount of work or are some finishing sooner than others? Is
your client cpu or disk perhaps the bottleneck?
On Fri, Apr 2,
>From cfstats:
SSTable count: 3
Space used (live): 4951669191
Space used (total): 5237040637
Memtable Columns Count: 190266
Memtable Data Size: 23459012
Memtable Switch Count: 89
Read Cou
I have 3 nodes in the cluster, and
bin/nodetool --host this-host-name ring
Works as expected, but
bin/nodetool --host some-other-host ring
always throws this exception:
Error connecting to remote JMX agent!
java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested
exceptio
I don't see any way to increase the # of active Deserializers in
storage-conf.xml
Tpstats more than 8 hours after insert/read stop
Pool NameActive Pending Completed
FILEUTILS-DELETE-POOL 0 0227
STREAM-STAGE 0
:jbel...@gmail.com]
Sent: Thursday, April 08, 2010 10:12 AM
To: user@cassandra.apache.org
Subject: Re: Some insight into the slow read speed. Where to go from here? RC1
MESSAGE-DESERIALIZER-POOL
Have you checked iostat -x ?
On Thu, Apr 8, 2010 at 9:45 AM, Mark Jones wrote:
> I don't s
you are overwhelming the server with requests. Could you run sar
and find out how many bytes/sec you are receiving/transmitting?
Cheers
Avinash
On Thu, Apr 8, 2010 at 7:45 AM, Mark Jones
mailto:mjo...@imagehawk.com>> wrote:
I don't see any way to increase the # of active Deserializers
Sounds like we are some experiencing the same problems. (I'm using 0.6RC1) I
have a 3 node cluster with 8GB/machine (dual core CPU). I'm peaking on inserts
at about 6000-7000/second running 40 threads. Separate spindles for commitlog
and data.
My read speed is atrocious, 800/sec sustained
in ec2 as well) takes 400,000 ticks. Super
consistent. One thread.
My friends setup with cassandra on osx takes 400,000 ticks for the first
insert, vthen drops to 20,000 ticks for every consecutive call.
That's what is so strange.
On Apr 9, 2010 12:15 PM, "Mark Jones"
mailto:mjo
I'm seeing some issues like this as well, in fact, I think seeing your graphs
has helped me understand the dynamics of my cluster better.
Using some ballpark figures for inserting single column objects of ~500 bytes
onto individual nodes(not when combined as a cluster):
Node1: Inserts 12000/s
N
I too am seeing very slow performance while testing worst case scenarios of 1
key leading to 1 supercolumn and 1 column beyond that.
Key -> SuperColumn -> 1 Column (of ~ 500 bytes)
Drive utilization is 80-90% and I'm only dealing with 50-70 million rows.
(With NO swapping) So far, I've found
olumns are in the supercolumn total?
"in super columnfamilies there is a third level of subcolumns; these
are not indexed, and any request for a subcolumn deserializes _all_
the subcolumns in that supercolumn"
http://wiki.apache.org/cassandra/CassandraLimitations
On Tue, Apr 20, 2010 at
n deserializes _all_
the subcolumns in that supercolumn"
http://wiki.apache.org/cassandra/CassandraLimitations
On Tue, Apr 20, 2010 at 9:50 AM, Mark Jones wrote:
> I too am seeing very slow performance while testing worst case scenarios of
> 1 key leading to 1 supercolumn and 1
t supercolumn into a new row.
On Tue, Apr 20, 2010 at 11:08 AM, Mark Jones wrote:
> When I first read this, it bothered me because it seemed like it couldn't be
> so. So I read the link, and it says the whole thing, so I have to ask for
> some classification here.
>
> I had a
You will have to pull the columns and filter yourself.
From: Christian Torres [mailto:chtor...@gmail.com]
Sent: Tuesday, April 20, 2010 11:50 AM
To: user@cassandra.apache.org
Cc: d...@cassandra.apache.org
Subject: Filters
Hello!
Is there any way to make filters (WHEREs) in cassandra? Or I have t
I would think this is on the roadmap, just not available yet. It can be
managed by adjusting the Heap size (to a large degree).
-Original Message-
From: Tatu Saloranta [mailto:tsalora...@gmail.com]
Sent: Tuesday, April 20, 2010 12:18 PM
To: user@cassandra.apache.org
Subject: Re: 0.6.1 in
s [mailto:chtor...@gmail.com]
Sent: Tuesday, April 20, 2010 12:25 PM
To: user@cassandra.apache.org
Subject: Re: Filters
Mmmm...
According with this doc http://wiki.apache.org/cassandra/API#get_slice that a
developer mailed to me It's possible!!
I sent you as reference
On Tue, Apr 20, 2010 at
ave a CF for Emails, with 1 email per row, and another CF for
UserEmails with per-user index rows referencing the Emails rows.
b
On Tue, Apr 20, 2010 at 9:44 AM, Mark Jones wrote:
> To make sure I'm clear on what you are saying:
>
> Are the "Individual Emails" in the exa
Stop the program, wipe the data dir and commit logs, start the program, it's
what I'm doing.
I even made a script that will do it so it's just a one line command.
From: ROGER PUIG GANZA [mailto:rp...@tid.es]
Sent: Wednesday, April 21, 2010 5:20 AM
To: cassandra-u...@incubator.apache.org
Subject:
On my 4GB machine I'm giving it 3GB and having no trouble with 60+ million 500
byte columns
From: Nicolas Labrot [mailto:nith...@gmail.com]
Sent: Wednesday, April 21, 2010 7:47 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra tuning for running test on a desktop
I have try 1400M, and Cass
I'm seeing a cluster of 4 (replication factor=2) to be about as slow overall as
the barely faster than the slowest node in the group. When I run the 4 nodes
individually, I see:
For inserts:
Two nodes @ 12000/second
1 node @ 9000/second
1 node @ 7000/second
For reads:
Abysmal, less than 1000/s
If I wanted to store tags in Cassandra, on a per user basis, what would be the
best way to do that?
ColumnFamily:Tags
Key:UserID
SuperColumn: Tag names
Columns: keys to records using this Tag
And in each of the items, have a comma separated list of its tags?
Or some other way?
How is this specified?
Is it a large hex #?
A string of bytes in hex?
http://wiki.apache.org/cassandra/StorageConfiguration doesn't say.
std::string strUUID(uuid, 16) will do the right thing for you.
-Original Message-
From: Olivier Rosello [mailto:orose...@corp.free.fr]
Sent: Friday, April 23, 2010 9:59 AM
To: user@cassandra.apache.org
Subject: Re: How to insert a row with a TimeUUIDType column in C++
Le vendredi 23 avril
Ellis [mailto:jbel...@gmail.com]
Sent: Friday, April 23, 2010 10:22 AM
To: user@cassandra.apache.org
Subject: Re: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token
a normal String from the same universe as your keys.
On Fri, Apr 23, 2010 at 7:23 AM, Mark Jones wrote:
> How is t
Turns out assign can be called with the length as well
So mod your code to be
new_col.column.assign((char *)uuid, 16);
and you are fixed.
-Original Message-
From: Mark Jones [mailto:mjo...@imagehawk.com]
Sent: Friday, April 23, 2010 10:52 AM
To: user@cassandra.apache.org
Subject: RE
Eliminating GC hell would probably do a lot to help Cassandra maintain speed vs
periods of superfast/superslow performance. I look forward to hearing how this
experiment goes.
From: Eric Hauser [mailto:ewhau...@gmail.com]
Sent: Friday, April 23, 2010 3:37 PM
To: user@cassandra.apache.org
Subjec
Orthogonal in this case means "at cross purposes" Transactions can't really be
done with eventual consistency because all nodes don't have all the info at the
time the transaction is done. I think they recommend zookeeper for this kind
of stuff, but I don't know why you want to use Cassandra v
The max size would probably be best determined by looking at the size of your
MemTable
64
Read repair is on a per column basis, every column gets a timestamp, and the
overhead of a name. So, balance those 3 out and you have a pretty good idea of
what to do.
From: Dop Sun [mailto:su...@d
One of your problems here is the connect uses a daft connection string
convention
You would think node:port but it's actually node/port
Your connection only succeeded because 9160 is the default for port not
specified.
And the keyspace thing that jbellis mentioned.
-Original Message-
At the moment they all have to fit in memory during compaction. Columns OR
SuperColumns (for one Key).
From: Andrew Nguyen [mailto:andrew-lists-cassan...@ucsfcti.org]
Sent: Thursday, April 29, 2010 10:30 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra data model for financial data
What
Sounds like you want something like http://oss.oetiker.ch/rrdtool/
Assuming you are trying to store computer log data.
Do you have any other data that can spread the data load? Like a machine name?
If so, you can use a hash of that value to place that "machine" randomly on
the net, then appe
MD5 is not a perfect hash, it can produce collisions, how are these dealt with?
Is there a size appended to them?
If 2 keys collide, would that result in a merging of data (if the column names
aren't the same) or an overwrite if they were?
40 matches
Mail list logo