RE: performance tuning - where does the slowness come from?

2010-05-05 Thread Mark Jones
Wed, May 5, 2010 at 6:36 PM, Mark Jones mailto:mjo...@imagehawk.com>> wrote: Have you actually managed to get 10K reads/second, or are you just estimating that you can? I've run into similar issues, but I never got reads to scale when searching for unique keys even using 40 threads

RE: performance tuning - where does the slowness come from?

2010-05-05 Thread Mark Jones
...@gmail.com] Sent: Wednesday, May 05, 2010 7:04 PM To: user@cassandra.apache.org Subject: Re: performance tuning - where does the slowness come from? On Wed, May 5, 2010 at 6:59 PM, Mark Jones mailto:mjo...@imagehawk.com>> wrote: My data is single row/key to a 500 byte column and I'm reading

Slow Responses from 2 of 3 nodes in RC1

2010-04-02 Thread Mark Jones
I have a 3 node cassandra cluster I'm trying to work with: All three machines are about the same: 6-8GB per machine (fastest machine has 8GB, JavaVM limited to 5GB) separate spindle for cassandra data and commit log I wrote ~7 Million items to Cassandra, now, I'm trying to read them back, the o

RE: best practice for migrating data

2010-04-02 Thread Mark Jones
I got the idea for this from: http://wiki.apache.org/cassandra/StorageConfiguration I put my keyspace setup on a webserver, and I pull it into the config like this: storage-conf.xml starts with: http://cassandraconfig /seeds.xml"> http://cassandraconfig /autobootstrap.xml"> http://cassandracon

RE: Slow Responses from 2 of 3 nodes in RC1

2010-04-02 Thread Mark Jones
rate to the 50/second/thread or 100/second/thread, regardless of who does the proxy. -Original Message- From: Mark Jones [mailto:mjo...@imagehawk.com] Sent: Friday, April 02, 2010 1:38 PM To: user@cassandra.apache.org Subject: Slow Responses from 2 of 3 nodes in RC1 I have a 3 node cass

RE: Slow Responses from 2 of 3 nodes in RC1

2010-04-07 Thread Mark Jones
or are some finishing sooner than others? Is your client cpu or disk perhaps the bottleneck? On Fri, Apr 2, 2010 at 2:39 PM, Mark Jones wrote: > To further complicate matters, > when I read only from cassdb1, I can check about 100/second/thread (40 > threads) > when I read only from

What is loadbalance supposed to do? 0.6.0RC1

2010-04-07 Thread Mark Jones
It shouldn't remove a node from the ring should it? (appears it did) It shouldn't remove data from db, should it? (data size appears to grow, but records are now missing) Loaded 38 million "rows" and the ring looked like this: m...@ec2:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --h

RE: What is loadbalance supposed to do? 0.6.0RC1

2010-04-07 Thread Mark Jones
The log said Bootstrapping @ 07:34 (since it was 08:35, I assumed it wasn't doing anything, also, CPU usage was < 10%) Turns out, when I restarted the node, it claimed the time was 7:35 rather than 8:35. Why would log4j be off by one hour? We are on CDT here, and have been for more than a w

Yet more strangeness RE: Slow Responses from 2 of 3 nodes in RC1

2010-04-07 Thread Mark Jones
0 per second, how many rows and columns does a check involve? What query api are you using? Your cassandra nodes look mostly idle. Is each client thread getting the same amount of work or are some finishing sooner than others? Is your client cpu or disk perhaps the bottleneck? On Fri, Apr 2,

Can these stats be right?

2010-04-07 Thread Mark Jones
>From cfstats: SSTable count: 3 Space used (live): 4951669191 Space used (total): 5237040637 Memtable Columns Count: 190266 Memtable Data Size: 23459012 Memtable Switch Count: 89 Read Cou

Why can't you manage one node from another?

2010-04-07 Thread Mark Jones
I have 3 nodes in the cluster, and bin/nodetool --host this-host-name ring Works as expected, but bin/nodetool --host some-other-host ring always throws this exception: Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exceptio

Some insight into the slow read speed. Where to go from here? RC1 MESSAGE-DESERIALIZER-POOL

2010-04-08 Thread Mark Jones
I don't see any way to increase the # of active Deserializers in storage-conf.xml Tpstats more than 8 hours after insert/read stop Pool NameActive Pending Completed FILEUTILS-DELETE-POOL 0 0227 STREAM-STAGE 0

RE: Some insight into the slow read speed. Where to go from here? RC1 MESSAGE-DESERIALIZER-POOL

2010-04-08 Thread Mark Jones
:jbel...@gmail.com] Sent: Thursday, April 08, 2010 10:12 AM To: user@cassandra.apache.org Subject: Re: Some insight into the slow read speed. Where to go from here? RC1 MESSAGE-DESERIALIZER-POOL Have you checked iostat -x ? On Thu, Apr 8, 2010 at 9:45 AM, Mark Jones wrote: > I don't s

SAR results don't seem overwhelming

2010-04-08 Thread Mark Jones
you are overwhelming the server with requests. Could you run sar and find out how many bytes/sec you are receiving/transmitting? Cheers Avinash On Thu, Apr 8, 2010 at 7:45 AM, Mark Jones mailto:mjo...@imagehawk.com>> wrote: I don't see any way to increase the # of active Deserializers

RE: Very new user needs some troubleshooting pointers

2010-04-09 Thread Mark Jones
Sounds like we are some experiencing the same problems. (I'm using 0.6RC1) I have a 3 node cluster with 8GB/machine (dual core CPU). I'm peaking on inserts at about 6000-7000/second running 40 threads. Separate spindles for commitlog and data. My read speed is atrocious, 800/sec sustained

RE: RE: Very new user needs some troubleshooting pointers

2010-04-09 Thread Mark Jones
in ec2 as well) takes 400,000 ticks. Super consistent. One thread. My friends setup with cassandra on osx takes 400,000 ticks for the first insert, vthen drops to 20,000 ticks for every consecutive call. That's what is so strange. On Apr 9, 2010 12:15 PM, "Mark Jones" mailto:mjo

RE: 0.6 insert performance .... Re: [RELEASE] 0.6.1

2010-04-19 Thread Mark Jones
I'm seeing some issues like this as well, in fact, I think seeing your graphs has helped me understand the dynamics of my cluster better. Using some ballpark figures for inserting single column objects of ~500 bytes onto individual nodes(not when combined as a cluster): Node1: Inserts 12000/s N

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
I too am seeing very slow performance while testing worst case scenarios of 1 key leading to 1 supercolumn and 1 column beyond that. Key -> SuperColumn -> 1 Column (of ~ 500 bytes) Drive utilization is 80-90% and I'm only dealing with 50-70 million rows. (With NO swapping) So far, I've found

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
olumns are in the supercolumn total? "in super columnfamilies there is a third level of subcolumns; these are not indexed, and any request for a subcolumn deserializes _all_ the subcolumns in that supercolumn" http://wiki.apache.org/cassandra/CassandraLimitations On Tue, Apr 20, 2010 at

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
n deserializes _all_ the subcolumns in that supercolumn" http://wiki.apache.org/cassandra/CassandraLimitations On Tue, Apr 20, 2010 at 9:50 AM, Mark Jones wrote: > I too am seeing very slow performance while testing worst case scenarios of > 1 key leading to 1 supercolumn and 1

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
t supercolumn into a new row. On Tue, Apr 20, 2010 at 11:08 AM, Mark Jones wrote: > When I first read this, it bothered me because it seemed like it couldn't be > so. So I read the link, and it says the whole thing, so I have to ask for > some classification here. > > I had a

RE: Filters

2010-04-20 Thread Mark Jones
You will have to pull the columns and filter yourself. From: Christian Torres [mailto:chtor...@gmail.com] Sent: Tuesday, April 20, 2010 11:50 AM To: user@cassandra.apache.org Cc: d...@cassandra.apache.org Subject: Filters Hello! Is there any way to make filters (WHEREs) in cassandra? Or I have t

RE: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-20 Thread Mark Jones
I would think this is on the roadmap, just not available yet. It can be managed by adjusting the Heap size (to a large degree). -Original Message- From: Tatu Saloranta [mailto:tsalora...@gmail.com] Sent: Tuesday, April 20, 2010 12:18 PM To: user@cassandra.apache.org Subject: Re: 0.6.1 in

RE: Filters

2010-04-20 Thread Mark Jones
s [mailto:chtor...@gmail.com] Sent: Tuesday, April 20, 2010 12:25 PM To: user@cassandra.apache.org Subject: Re: Filters Mmmm... According with this doc http://wiki.apache.org/cassandra/API#get_slice that a developer mailed to me It's possible!! I sent you as reference On Tue, Apr 20, 2010 at

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
ave a CF for Emails, with 1 email per row, and another CF for UserEmails with per-user index rows referencing the Emails rows. b On Tue, Apr 20, 2010 at 9:44 AM, Mark Jones wrote: > To make sure I'm clear on what you are saying: > > Are the "Individual Emails" in the exa

RE: problem with get_key_range in cassandra 0.4.1

2010-04-21 Thread Mark Jones
Stop the program, wipe the data dir and commit logs, start the program, it's what I'm doing. I even made a script that will do it so it's just a one line command. From: ROGER PUIG GANZA [mailto:rp...@tid.es] Sent: Wednesday, April 21, 2010 5:20 AM To: cassandra-u...@incubator.apache.org Subject:

RE: Cassandra tuning for running test on a desktop

2010-04-21 Thread Mark Jones
On my 4GB machine I'm giving it 3GB and having no trouble with 60+ million 500 byte columns From: Nicolas Labrot [mailto:nith...@gmail.com] Sent: Wednesday, April 21, 2010 7:47 AM To: user@cassandra.apache.org Subject: Re: Cassandra tuning for running test on a desktop I have try 1400M, and Cass

At what point does the cluster get faster than the individual nodes?

2010-04-21 Thread Mark Jones
I'm seeing a cluster of 4 (replication factor=2) to be about as slow overall as the barely faster than the slowest node in the group. When I run the 4 nodes individually, I see: For inserts: Two nodes @ 12000/second 1 node @ 9000/second 1 node @ 7000/second For reads: Abysmal, less than 1000/s

Implementing Tags

2010-04-22 Thread Mark Jones
If I wanted to store tags in Cassandra, on a per user basis, what would be the best way to do that? ColumnFamily:Tags Key:UserID SuperColumn: Tag names Columns: keys to records using this Tag And in each of the items, have a comma separated list of its tags? Or some other way?

org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

2010-04-23 Thread Mark Jones
How is this specified? Is it a large hex #? A string of bytes in hex? http://wiki.apache.org/cassandra/StorageConfiguration doesn't say.

RE: How to insert a row with a TimeUUIDType column in C++

2010-04-23 Thread Mark Jones
std::string strUUID(uuid, 16) will do the right thing for you. -Original Message- From: Olivier Rosello [mailto:orose...@corp.free.fr] Sent: Friday, April 23, 2010 9:59 AM To: user@cassandra.apache.org Subject: Re: How to insert a row with a TimeUUIDType column in C++ Le vendredi 23 avril

RE: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

2010-04-23 Thread Mark Jones
Ellis [mailto:jbel...@gmail.com] Sent: Friday, April 23, 2010 10:22 AM To: user@cassandra.apache.org Subject: Re: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token a normal String from the same universe as your keys. On Fri, Apr 23, 2010 at 7:23 AM, Mark Jones wrote: > How is t

RE: How to insert a row with a TimeUUIDType column in C++

2010-04-23 Thread Mark Jones
Turns out assign can be called with the length as well So mod your code to be new_col.column.assign((char *)uuid, 16); and you are fixed. -Original Message- From: Mark Jones [mailto:mjo...@imagehawk.com] Sent: Friday, April 23, 2010 10:52 AM To: user@cassandra.apache.org Subject: RE

RE: Trove maps

2010-04-23 Thread Mark Jones
Eliminating GC hell would probably do a lot to help Cassandra maintain speed vs periods of superfast/superslow performance. I look forward to hearing how this experiment goes. From: Eric Hauser [mailto:ewhau...@gmail.com] Sent: Friday, April 23, 2010 3:37 PM To: user@cassandra.apache.org Subjec

RE: Does anybody work about transaction on cassandra ?

2010-04-26 Thread Mark Jones
Orthogonal in this case means "at cross purposes" Transactions can't really be done with eventual consistency because all nodes don't have all the info at the time the transaction is done. I think they recommend zookeeper for this kind of stuff, but I don't know why you want to use Cassandra v

RE: What's the best maximum size for a single column?

2010-04-29 Thread Mark Jones
The max size would probably be best determined by looking at the size of your MemTable 64 Read repair is on a per column basis, every column gets a timestamp, and the overhead of a name. So, balance those 3 out and you have a pretty good idea of what to do. From: Dop Sun [mailto:su...@d

RE: Problem with JVM? concurrent mode failure

2010-04-29 Thread Mark Jones
One of your problems here is the connect uses a daft connection string convention You would think node:port but it's actually node/port Your connection only succeeded because 9160 is the default for port not specified. And the keyspace thing that jbellis mentioned. -Original Message-

RE: Cassandra data model for financial data

2010-04-29 Thread Mark Jones
At the moment they all have to fit in memory during compaction. Columns OR SuperColumns (for one Key). From: Andrew Nguyen [mailto:andrew-lists-cassan...@ucsfcti.org] Sent: Thursday, April 29, 2010 10:30 AM To: user@cassandra.apache.org Subject: Re: Cassandra data model for financial data What

RE: OrderPreservingPartitioner limits and workarounds

2010-04-29 Thread Mark Jones
Sounds like you want something like http://oss.oetiker.ch/rrdtool/ Assuming you are trying to store computer log data. Do you have any other data that can spread the data load? Like a machine name? If so, you can use a hash of that value to place that "machine" randomly on the net, then appe

How does cassandra deal with collisions?

2010-04-29 Thread Mark Jones
MD5 is not a perfect hash, it can produce collisions, how are these dealt with? Is there a size appended to them? If 2 keys collide, would that result in a merging of data (if the column names aren't the same) or an overwrite if they were?