Re: Poor performance; PHP & Thrift to blame

2010-03-29 Thread David Strauss
On 2010-03-30 05:42, Julian Simon wrote: > More surprisingly, if I compile and enable the PHP native thrift > bindings (following this guide > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP) > read performance actually degrades by another 50%. I have verified > that the Thrift c

Large data files and no "edit in place"?

2010-03-29 Thread Julian Simon
Forgive me as I'm probably a little out of my depth in trying to assess this particular design choice within Cassandra, but... My understanding is that Cassandra never updates data "in place" on disk - instead it completely re-creates the data files during a "flush". Stop me if I'm wrong already

Poor performance; PHP & Thrift to blame

2010-03-29 Thread Julian Simon
Hi, I've been trying to benchmark Cassandra for our use case and have been seeing poor performance on both writes and (extremely) poor performance on reads. Using Cassandra 0.51 stable & thrift-0.2.0. It turns out all the CPU time is going to the PHP client process - the JVM operating the Cassan

Re: Ring management and load balance

2010-03-29 Thread Jonathan Ellis
On Fri, Mar 26, 2010 at 4:35 PM, Mike Malone wrote: > With the random partitioner there's no need to suggest a token. The key > space is statistically random so you should be able to just split 2^128 into > equal sized segments and get fairly equal storage load. Your read / write > load could get

Re: Performance effects of tombstones in queue-like use cases

2010-03-29 Thread Jonathan Ellis
On Mon, Mar 29, 2010 at 8:25 PM, Tatu Saloranta wrote: > So if I understand entry correctly, answer is yes, they need to be > explicitly handled by Cassandra. > Which means that I would be better off trying to move "cursor" > (earliest timestamp to consider), with maybe leaving shorter window > fo

Re: Performance effects of tombstones in queue-like use cases

2010-03-29 Thread Tatu Saloranta
On Mon, Mar 29, 2010 at 5:57 PM, Jonathan Ellis wrote: > Does http://wiki.apache.org/cassandra/FAQ#range_ghosts help? Thank you for quick answer, and apologies for missing this entry. So if I understand entry correctly, answer is yes, they need to be explicitly handled by Cassandra. Which means

Re: How reliable is cassandra?

2010-03-29 Thread Benjamin Black
That post is nonsense, start to finish. Disregard everything it says about both Cassandra and HBase. On Mon, Mar 29, 2010 at 10:55 AM, Eric Hauser wrote: > Does the information is the below link about Cassandra and replication over > WAN have any merit or is it just FUD? > http://www.roadtofailu

Re: Performance effects of tombstones in queue-like use cases

2010-03-29 Thread Jonathan Ellis
Does http://wiki.apache.org/cassandra/FAQ#range_ghosts help? On Mon, Mar 29, 2010 at 7:54 PM, Tatu Saloranta wrote: > Quick question: Cassandra documentation explains implementation of > deletes (using tombstones) quite well. > But what I was not quite sure about was what actual effects of > exis

Performance effects of tombstones in queue-like use cases

2010-03-29 Thread Tatu Saloranta
Quick question: Cassandra documentation explains implementation of deletes (using tombstones) quite well. But what I was not quite sure about was what actual effects of existing tombstones might have on doing range queries that would include those tombstones. That is: for a use case where new entri

Re: How reliable is cassandra?

2010-03-29 Thread Matthew Stump
We are actually fairly write heavy. User enrollment, auditing, grouping, key maintenance all involve writing a fair amount of meta data to disk. If we were performing mostly read operations then postgres/clustering performance wouldn't be an issue. On Mar 29, 2010, at 4:49 PM, David Strauss w

Re: Question about node failure...

2010-03-29 Thread Tatu Saloranta
On Mon, Mar 29, 2010 at 10:40 AM, Ned Wolpert wrote: > So,  what does "anti-entropy repair" do then? Fix discrepancies between live nodes? (caused by transient failures presumably) > Sounds like you have to 'decommission' the dead node, then I thought run > 'nodeprobe repair' to get the data adj

Re: Write times

2010-03-29 Thread Carlos Sanchez
Thanks a lot David On Mar 29, 2010, at 6:53 PM, David Strauss wrote: > The partitioner *is* the method by which Cassandra selects the node to > write to. Even if the client picks a node and requests a write there, > Cassandra will still do the write where it knows it belongs. Every node > is a g

Re: Write times

2010-03-29 Thread David Strauss
The partitioner *is* the method by which Cassandra selects the node to write to. Even if the client picks a node and requests a write there, Cassandra will still do the write where it knows it belongs. Every node is a gateway to do anything, anywhere in the cluster. On 2010-03-29 23:31, Carlos San

Re: How reliable is cassandra?

2010-03-29 Thread David Strauss
On 2010-03-29 17:31, Matthew Stump wrote: > Am I crazy to want to switch our server's primary data store from postgres to > cassandra? This is a system used by banks and governments to store crypto > keys which absolutely can not be lost. This sounds like an LDAP problem. There are very nice LD

Re: Write times

2010-03-29 Thread Carlos Sanchez
Would it be best then for the client to select the node to write to when using OPP in order to evenly distributes the keys? On Mar 29, 2010, at 6:05 PM, David Timothy Strauss wrote: > OPP should only affect write speed if OPP's tendency to unevenly distribute > load causes some nodes to be over

Re: Write times

2010-03-29 Thread David Timothy Strauss
OPP should only affect write speed if OPP's tendency to unevenly distribute load causes some nodes to be overworked. In other words, OPP vs. RP on a single node system should have no real effect. -Original Message- From: Carlos Sanchez Date: Mon, 29 Mar 2010 18:58:50 To: user@cassandra

Write times

2010-03-29 Thread Carlos Sanchez
Are writes on OrderPreservingPartitioner always slower than RandomPartitioner? Is the replication factor a 'factor' in the write times? Thanks, Carlos This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential informat

Re: How reliable is cassandra?

2010-03-29 Thread Eric Hauser
Thanks to all that responded. That was helpful information. On Mon, Mar 29, 2010 at 3:45 PM, Jonathan Ellis wrote: > On Mon, Mar 29, 2010 at 2:41 PM, Joe Stump wrote: > > I know at least three Diggers patrol the list and one of them is a > committer to Cassandra. Last I heard from my former c

Re: How reliable is cassandra?

2010-03-29 Thread Jonathan Ellis
On Mon, Mar 29, 2010 at 2:41 PM, Joe Stump wrote: > I know at least three Diggers patrol the list and one of them is a committer > to Cassandra. Last I heard from my former coworkers at Digg was that > ZooKeeper can be more overhead than wanted when doing locks in a high write > environment. Z

Re: How reliable is cassandra?

2010-03-29 Thread Joe Stump
On Mar 29, 2010, at 12:40 PM, Eric Hauser wrote: > BTW, does anyone from Digg patrol the list? I'm really interested in some > additional the implementation of atomic counters with ZooKeeper. I know at least three Diggers patrol the list and one of them is a committer to Cassandra. Last I hea

Re: Which client API to choose?

2010-03-29 Thread Jonathan Ellis
I went ahead and removed the SP example from that wiki page. On Wed, Mar 24, 2010 at 1:22 PM, Jonathan Ellis wrote: > Should we just remove that from the wiki, seeing as how we have the > same (?) sample in contrib/ where it is more likely to be kept up to > date? > > 2010/3/24 Roland Hänel : >>

Re: Which client API to choose?

2010-03-29 Thread Charlie Mason
On Wed, Mar 24, 2010 at 5:07 PM, Peter Chang wrote: > Hector is the way to go if you're using java. I'm using it right now and > it's made things worlds easier. > The reason why it wasn't bundled was because it's a separate and relatively > new project. I think it's under a month old and it was do

Re: How reliable is cassandra?

2010-03-29 Thread Avinash Lakshman
We use ZK for some incrementing counters and this is method that does it (this is wrapped in a Thrift call) : public long getNextSequenceId() { Stat stat = null; String path = "//" + "/SequenceId"; try { stat = zk_.setData( path , new byte[0] , -1); }

Re: How reliable is cassandra?

2010-03-29 Thread Eric Hauser
That's good to know. I've often seen high latency between availability zones. BTW, does anyone from Digg patrol the list? I'm really interested in some additional the implementation of atomic counters with ZooKeeper. On Mon, Mar 29, 2010 at 1:58 PM, Joe Stump wrote: > > On Mar 29, 2010, at 1

Re: How reliable is cassandra?

2010-03-29 Thread Matthew Stump
I'm not too worried about ACLs, I'm going to have to tunnel Cassandra through SSL and for most deployments the data that matters will be encrypted using fairly large key sizes. The nodes that aren't allowed to store private keys will probably access data through a Thrift API which will use our

Re: How reliable is cassandra?

2010-03-29 Thread Jonathan Ellis
FUD is a good description of that piece to use in polite company. :) On Mon, Mar 29, 2010 at 12:55 PM, Eric Hauser wrote: > Does the information is the below link about Cassandra and replication over > WAN have any merit or is it just FUD? > http://www.roadtofailure.com/2009/10/29/hbase-vs-cassan

Re: How reliable is cassandra?

2010-03-29 Thread Matthew Stump
* Higher write throughput is one benefit. User enrollment, auditing, keeping track of client state and replication all generate a fair number of writes which degrades postgres performance. * Built in clustering. Postgres clustering is immature and even when things start to settle down, probab

Re: How reliable is cassandra?

2010-03-29 Thread Joe Stump
On Mar 29, 2010, at 11:55 AM, Eric Hauser wrote: > Does the information is the below link about Cassandra and replication over > WAN have any merit or is it just FUD? I can attest Cassandra works fine over inter-DC connections. We have ~20 nodes spread across three Amazon "Availability Zones".

Re: How reliable is cassandra?

2010-03-29 Thread Eric Hauser
Does the information is the below link about Cassandra and replication over WAN have any merit or is it just FUD? http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/ On Mon, Mar 29, 2010 at 1:51 PM, Jonathan Ellis wrote: > Cassandra is an excellent choice for systems that

Re: How reliable is cassandra?

2010-03-29 Thread Ned Wolpert
The real question is can you handle 'eventual consistency' in this situation? Cassandra is not designed to lose data... quite the opposite. On Mon, Mar 29, 2010 at 10:47 AM, Joe Van Dyk wrote: > On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump > wrote: > > Am I crazy to want to switch our server

Re: How reliable is cassandra?

2010-03-29 Thread Jonathan Ellis
Cassandra is an excellent choice for systems that Can't Lose Data. - real single-server durability (set CommitLogSync to "batch"), not just "hope it replicates somewhere before you lose power" - best multi-DC replication anywhere - immutable data files mean it's very difficult to introduce corr

Re: How reliable is cassandra?

2010-03-29 Thread Joe Van Dyk
On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump wrote: > Am I crazy to want to switch our server's primary data store from postgres to > cassandra?  This is a system used by banks and governments to store crypto > keys which absolutely can not be lost. What benefits would you get from switching

Re: Question about node failure...

2010-03-29 Thread Ned Wolpert
So, what does "anti-entropy repair" do then? Sounds like you have to 'decommission' the dead node, then I thought run 'nodeprobe repair' to get the data adjusted back to a replication factor of 3, right? Also, what is the method to decommission a dead node? pass in the IP address of the dead nod

Re: Question about node failure...

2010-03-29 Thread Jonathan Ellis
On Mon, Mar 29, 2010 at 12:27 PM, Ned Wolpert wrote: > Folks- > > Can someone point out what happens during a node failure. Here is the > Specific usecase: > >   - Cassandra cluster with 4 nodes, replication factor of 3 >   - One node fails. >   - At this point, data that existed on the one failed

Re: How reliable is cassandra?

2010-03-29 Thread Joe Stump
On Mar 29, 2010, at 11:31 AM, Matthew Stump wrote: > Am I crazy to want to switch our server's primary data store from postgres to > cassandra? This is a system used by banks and governments to store crypto > keys which absolutely can not be lost. You might be crazy. PostgreSQL has all sorts

How reliable is cassandra?

2010-03-29 Thread Matthew Stump
Am I crazy to want to switch our server's primary data store from postgres to cassandra? This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

Question about node failure...

2010-03-29 Thread Ned Wolpert
Folks- Can someone point out what happens during a node failure. Here is the Specific usecase: - Cassandra cluster with 4 nodes, replication factor of 3 - One node fails. - At this point, data that existed on the one failed node has copies on 2 live nodes. - The failed node never comes ba

Re: Range scan performance in 0.6.0 beta2

2010-03-29 Thread Jonathan Ellis
I see what you mean -- you have understood correctly. On Mon, Mar 29, 2010 at 8:13 AM, Henrik Schröder wrote: > On Mon, Mar 29, 2010 at 14:15, Jonathan Ellis wrote: >> >> On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder >> wrote: >> > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis wrote: >> >>

Re: Range scan performance in 0.6.0 beta2

2010-03-29 Thread Mike Malone
On Mon, Mar 29, 2010 at 7:13 AM, Henrik Schröder wrote: > On Mon, Mar 29, 2010 at 14:15, Jonathan Ellis wrote: > >> On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder >> wrote: >> > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis >> wrote: >> >> It's a unique index then? And you're trying to read

Re: Range scan performance in 0.6.0 beta2

2010-03-29 Thread Henrik Schröder
On Mon, Mar 29, 2010 at 14:15, Jonathan Ellis wrote: > On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder > wrote: > > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis wrote: > >> It's a unique index then? And you're trying to read things ordered by > >> the index, not just "give me keys with that

Re: Multi-indexing data

2010-03-29 Thread Gary Dusbabek
It sounds like you might need a main storage CF and several CFs to serve as inverted indices to support querying. The inverted indices basically map the searchable attribute (as a key) to the row id (column name) of the main storage. Keep in mind that the searchable attribute may need to map to m

Re: Range scan performance in 0.6.0 beta2

2010-03-29 Thread Jonathan Ellis
On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder wrote: > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis wrote: >> It's a unique index then?  And you're trying to read things ordered by >> the index, not just "give me keys with that have a column with this >> value?" > > Yes, because if we have mo

Re: Range scan performance in 0.6.0 beta2

2010-03-29 Thread Henrik Schröder
On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis wrote: > On Fri, Mar 26, 2010 at 7:40 AM, Henrik Schröder > wrote: > > For each indexvalue we insert a row where the key is indexid + ":" + > > indexvalue encoded as hex string, and the row contains only one column, > > where the name is the object k