On 2010-03-30 05:42, Julian Simon wrote:
> More surprisingly, if I compile and enable the PHP native thrift
> bindings (following this guide
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> read performance actually degrades by another 50%. I have verified
> that the Thrift c
Forgive me as I'm probably a little out of my depth in trying to
assess this particular design choice within Cassandra, but...
My understanding is that Cassandra never updates data "in place" on
disk - instead it completely re-creates the data files during a
"flush". Stop me if I'm wrong already
Hi,
I've been trying to benchmark Cassandra for our use case and have been
seeing poor performance on both writes and (extremely) poor
performance on reads.
Using Cassandra 0.51 stable & thrift-0.2.0.
It turns out all the CPU time is going to the PHP client process - the
JVM operating the Cassan
On Fri, Mar 26, 2010 at 4:35 PM, Mike Malone wrote:
> With the random partitioner there's no need to suggest a token. The key
> space is statistically random so you should be able to just split 2^128 into
> equal sized segments and get fairly equal storage load. Your read / write
> load could get
On Mon, Mar 29, 2010 at 8:25 PM, Tatu Saloranta wrote:
> So if I understand entry correctly, answer is yes, they need to be
> explicitly handled by Cassandra.
> Which means that I would be better off trying to move "cursor"
> (earliest timestamp to consider), with maybe leaving shorter window
> fo
On Mon, Mar 29, 2010 at 5:57 PM, Jonathan Ellis wrote:
> Does http://wiki.apache.org/cassandra/FAQ#range_ghosts help?
Thank you for quick answer, and apologies for missing this entry.
So if I understand entry correctly, answer is yes, they need to be
explicitly handled by Cassandra.
Which means
That post is nonsense, start to finish. Disregard everything it says
about both Cassandra and HBase.
On Mon, Mar 29, 2010 at 10:55 AM, Eric Hauser wrote:
> Does the information is the below link about Cassandra and replication over
> WAN have any merit or is it just FUD?
> http://www.roadtofailu
Does http://wiki.apache.org/cassandra/FAQ#range_ghosts help?
On Mon, Mar 29, 2010 at 7:54 PM, Tatu Saloranta wrote:
> Quick question: Cassandra documentation explains implementation of
> deletes (using tombstones) quite well.
> But what I was not quite sure about was what actual effects of
> exis
Quick question: Cassandra documentation explains implementation of
deletes (using tombstones) quite well.
But what I was not quite sure about was what actual effects of
existing tombstones might have on doing range queries that would
include those tombstones.
That is: for a use case where new entri
We are actually fairly write heavy. User enrollment, auditing, grouping, key
maintenance all involve writing a fair amount of meta data to disk. If we were
performing mostly read operations then postgres/clustering performance wouldn't
be an issue.
On Mar 29, 2010, at 4:49 PM, David Strauss w
On Mon, Mar 29, 2010 at 10:40 AM, Ned Wolpert wrote:
> So, what does "anti-entropy repair" do then?
Fix discrepancies between live nodes? (caused by transient failures presumably)
> Sounds like you have to 'decommission' the dead node, then I thought run
> 'nodeprobe repair' to get the data adj
Thanks a lot David
On Mar 29, 2010, at 6:53 PM, David Strauss wrote:
> The partitioner *is* the method by which Cassandra selects the node to
> write to. Even if the client picks a node and requests a write there,
> Cassandra will still do the write where it knows it belongs. Every node
> is a g
The partitioner *is* the method by which Cassandra selects the node to
write to. Even if the client picks a node and requests a write there,
Cassandra will still do the write where it knows it belongs. Every node
is a gateway to do anything, anywhere in the cluster.
On 2010-03-29 23:31, Carlos San
On 2010-03-29 17:31, Matthew Stump wrote:
> Am I crazy to want to switch our server's primary data store from postgres to
> cassandra? This is a system used by banks and governments to store crypto
> keys which absolutely can not be lost.
This sounds like an LDAP problem. There are very nice LD
Would it be best then for the client to select the node to write to when using
OPP in order to evenly distributes the keys?
On Mar 29, 2010, at 6:05 PM, David Timothy Strauss wrote:
> OPP should only affect write speed if OPP's tendency to unevenly distribute
> load causes some nodes to be over
OPP should only affect write speed if OPP's tendency to unevenly distribute
load causes some nodes to be overworked.
In other words, OPP vs. RP on a single node system should have no real effect.
-Original Message-
From: Carlos Sanchez
Date: Mon, 29 Mar 2010 18:58:50
To: user@cassandra
Are writes on OrderPreservingPartitioner always slower than RandomPartitioner?
Is the replication factor a 'factor' in the write times?
Thanks,
Carlos
This email message and any attachments are for the sole use of the intended
recipients and may contain proprietary and/or confidential informat
Thanks to all that responded. That was helpful information.
On Mon, Mar 29, 2010 at 3:45 PM, Jonathan Ellis wrote:
> On Mon, Mar 29, 2010 at 2:41 PM, Joe Stump wrote:
> > I know at least three Diggers patrol the list and one of them is a
> committer to Cassandra. Last I heard from my former c
On Mon, Mar 29, 2010 at 2:41 PM, Joe Stump wrote:
> I know at least three Diggers patrol the list and one of them is a committer
> to Cassandra. Last I heard from my former coworkers at Digg was that
> ZooKeeper can be more overhead than wanted when doing locks in a high write
> environment.
Z
On Mar 29, 2010, at 12:40 PM, Eric Hauser wrote:
> BTW, does anyone from Digg patrol the list? I'm really interested in some
> additional the implementation of atomic counters with ZooKeeper.
I know at least three Diggers patrol the list and one of them is a committer to
Cassandra. Last I hea
I went ahead and removed the SP example from that wiki page.
On Wed, Mar 24, 2010 at 1:22 PM, Jonathan Ellis wrote:
> Should we just remove that from the wiki, seeing as how we have the
> same (?) sample in contrib/ where it is more likely to be kept up to
> date?
>
> 2010/3/24 Roland Hänel :
>>
On Wed, Mar 24, 2010 at 5:07 PM, Peter Chang wrote:
> Hector is the way to go if you're using java. I'm using it right now and
> it's made things worlds easier.
> The reason why it wasn't bundled was because it's a separate and relatively
> new project. I think it's under a month old and it was do
We use ZK for some incrementing counters and this is method that does it
(this is wrapped in a Thrift call) :
public long getNextSequenceId()
{
Stat stat = null;
String path = "//" + "/SequenceId";
try
{
stat = zk_.setData( path , new byte[0] , -1);
}
That's good to know. I've often seen high latency between availability
zones.
BTW, does anyone from Digg patrol the list? I'm really interested in some
additional the implementation of atomic counters with ZooKeeper.
On Mon, Mar 29, 2010 at 1:58 PM, Joe Stump wrote:
>
> On Mar 29, 2010, at 1
I'm not too worried about ACLs, I'm going to have to tunnel Cassandra through
SSL and for most deployments the data that matters will be encrypted using
fairly large key sizes. The nodes that aren't allowed to store private keys
will probably access data through a Thrift API which will use our
FUD is a good description of that piece to use in polite company. :)
On Mon, Mar 29, 2010 at 12:55 PM, Eric Hauser wrote:
> Does the information is the below link about Cassandra and replication over
> WAN have any merit or is it just FUD?
> http://www.roadtofailure.com/2009/10/29/hbase-vs-cassan
* Higher write throughput is one benefit. User enrollment, auditing, keeping
track of client state and replication all generate a fair number of writes
which degrades postgres performance.
* Built in clustering. Postgres clustering is immature and even when things
start to settle down, probab
On Mar 29, 2010, at 11:55 AM, Eric Hauser wrote:
> Does the information is the below link about Cassandra and replication over
> WAN have any merit or is it just FUD?
I can attest Cassandra works fine over inter-DC connections. We have ~20 nodes
spread across three Amazon "Availability Zones".
Does the information is the below link about Cassandra and replication over
WAN have any merit or is it just FUD?
http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/
On Mon, Mar 29, 2010 at 1:51 PM, Jonathan Ellis wrote:
> Cassandra is an excellent choice for systems that
The real question is can you handle 'eventual consistency' in this
situation? Cassandra is not designed to lose data... quite the opposite.
On Mon, Mar 29, 2010 at 10:47 AM, Joe Van Dyk wrote:
> On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump
> wrote:
> > Am I crazy to want to switch our server
Cassandra is an excellent choice for systems that Can't Lose Data.
- real single-server durability (set CommitLogSync to "batch"), not
just "hope it replicates somewhere before you lose power"
- best multi-DC replication anywhere
- immutable data files mean it's very difficult to introduce corr
On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump wrote:
> Am I crazy to want to switch our server's primary data store from postgres to
> cassandra? This is a system used by banks and governments to store crypto
> keys which absolutely can not be lost.
What benefits would you get from switching
So, what does "anti-entropy repair" do then?
Sounds like you have to 'decommission' the dead node, then I thought run
'nodeprobe repair' to get the data adjusted back to a replication factor of
3, right?
Also, what is the method to decommission a dead node? pass in the IP address
of the dead nod
On Mon, Mar 29, 2010 at 12:27 PM, Ned Wolpert wrote:
> Folks-
>
> Can someone point out what happens during a node failure. Here is the
> Specific usecase:
>
> - Cassandra cluster with 4 nodes, replication factor of 3
> - One node fails.
> - At this point, data that existed on the one failed
On Mar 29, 2010, at 11:31 AM, Matthew Stump wrote:
> Am I crazy to want to switch our server's primary data store from postgres to
> cassandra? This is a system used by banks and governments to store crypto
> keys which absolutely can not be lost.
You might be crazy. PostgreSQL has all sorts
Am I crazy to want to switch our server's primary data store from postgres to
cassandra? This is a system used by banks and governments to store crypto keys
which absolutely can not be lost.
Folks-
Can someone point out what happens during a node failure. Here is the
Specific usecase:
- Cassandra cluster with 4 nodes, replication factor of 3
- One node fails.
- At this point, data that existed on the one failed node has copies on 2
live nodes.
- The failed node never comes ba
I see what you mean -- you have understood correctly.
On Mon, Mar 29, 2010 at 8:13 AM, Henrik Schröder wrote:
> On Mon, Mar 29, 2010 at 14:15, Jonathan Ellis wrote:
>>
>> On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder
>> wrote:
>> > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis wrote:
>> >>
On Mon, Mar 29, 2010 at 7:13 AM, Henrik Schröder wrote:
> On Mon, Mar 29, 2010 at 14:15, Jonathan Ellis wrote:
>
>> On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder
>> wrote:
>> > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis
>> wrote:
>> >> It's a unique index then? And you're trying to read
On Mon, Mar 29, 2010 at 14:15, Jonathan Ellis wrote:
> On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder
> wrote:
> > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis wrote:
> >> It's a unique index then? And you're trying to read things ordered by
> >> the index, not just "give me keys with that
It sounds like you might need a main storage CF and several CFs to
serve as inverted indices to support querying. The inverted indices
basically map the searchable attribute (as a key) to the row id
(column name) of the main storage. Keep in mind that the searchable
attribute may need to map to m
On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder wrote:
> On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis wrote:
>> It's a unique index then? And you're trying to read things ordered by
>> the index, not just "give me keys with that have a column with this
>> value?"
>
> Yes, because if we have mo
On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis wrote:
> On Fri, Mar 26, 2010 at 7:40 AM, Henrik Schröder
> wrote:
> > For each indexvalue we insert a row where the key is indexid + ":" +
> > indexvalue encoded as hex string, and the row contains only one column,
> > where the name is the object k
43 matches
Mail list logo