Re: Question about load balancing.

2010-11-17 Thread Benjamin Black
Random partitioner distributes keys approximately evenly across the entire range of the ring (0-2**127-1). This means that generally a given section of the range will contain about the same number of keys. If you assign tokens equal-size ranges, they will have similar numbers of keys. This is wh

Re: avro + cassandra + ruby

2010-11-17 Thread Benjamin Black
Full list of client options and defaults: https://github.com/fauna/thrift_client/blob/master/lib/thrift_client/abstract_thrift_client.rb#L28-43 On Wed, Nov 17, 2010 at 10:13 AM, Benjamin Black wrote: > Cassandra.new(keyspace, server, {:protocol => > Thrift::BinaryProtocolAccelerated}) &

Re: avro + cassandra + ruby

2010-11-17 Thread Benjamin Black
Cassandra.new(keyspace, server, {:protocol => Thrift::BinaryProtocolAccelerated}) On Tue, Nov 16, 2010 at 5:13 PM, Ryan King wrote: > On Tue, Nov 16, 2010 at 10:25 AM, Jonathan Ellis wrote: >> On Tue, Sep 28, 2010 at 6:35 PM, Ryan King wrote: >>> One thing you should try is to make thrift use >

Re: Insert after Delete fails silently

2010-09-30 Thread Benjamin Black
On Thu, Sep 30, 2010 at 5:25 PM, Peter Harrison wrote: > If you delete a row, and it therefore is marked as tombstone, and > subsequently you try to insert the row again it appears to succeed, > but if you try to request the row you don't get a result. > > If you try to insert a row that has been

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
On Mon, Sep 27, 2010 at 3:48 PM, Alaa Zubaidi wrote: >  RF=2 With RF=2, QUORUM and ALL are the same. Again, your logs show you are attempting to insert about 180,000 columns/sec. The only way that is possible with your hardware is if you are using CL.ZERO. The available information does not ad

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
Does that mean you are doing 600 rows/sec per process or 600/sec total across all processes? On Mon, Sep 27, 2010 at 3:14 PM, Alaa Zubaidi wrote: >  Its actually split to 8 different processes that are doing the insertion. > > Thanks > > On 9/27/2010 2:03 PM, Peter Schuller wrote: >> >> [note: i

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
What is your RF? On Mon, Sep 27, 2010 at 3:13 PM, Alaa Zubaidi wrote: >  Sorry 3 means QUORUM. > > > On 9/27/2010 2:55 PM, Benjamin Black wrote: >> >> On Mon, Sep 27, 2010 at 2:51 PM, Benjamin Black  wrote: >>> >>> On Mon, Sep 27, 2010 at 12:59 PM, Ala

Re: UnavailableException when data grows

2010-09-27 Thread Benjamin Black
Your ring is wildly unbalanced and you are almost certainly out of I/O on one or more nodes. You should be monitoring via JMX and common systems tools to know when you are starting to have issues. It is going to take you some effort to get out of this situation now. b On Mon, Sep 27, 2010 at 2

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
On Mon, Sep 27, 2010 at 2:51 PM, Benjamin Black wrote: > On Mon, Sep 27, 2010 at 12:59 PM, Alaa Zubaidi wrote: >> Thanks for the help. >> we have 2 drives using basic configurations, commitlog on one drive and data >> on another. >> and Yes the CL for writes is 3, how

Re: 0.7 memory usage problem

2010-09-27 Thread Benjamin Black
On Mon, Sep 27, 2010 at 12:59 PM, Alaa Zubaidi wrote: > Thanks for the help. > we have 2 drives using basic configurations, commitlog on one drive and data > on another. > and Yes the CL for writes is 3, however, the CL for reads is 1. > It is simply not possible that you are inserting at CL.ALL

Re: How to Retrieve all the rows from a ColumnFamily

2010-09-27 Thread Benjamin Black
http://wiki.apache.org/cassandra/FAQ#iter_world On Sun, Sep 26, 2010 at 11:51 PM, sekhar kosuru wrote: > Hi > I am new to Cassandra Database. > I want to know how to Retrieve all the records from a column family, is this > is different in the clustered servers vs single servers. > Please suggest

Re: Curious as to how Cassandra handles the following

2010-09-26 Thread Benjamin Black
On Sun, Sep 26, 2010 at 4:01 PM, Lucas Nodine wrote: > Ok, so based on everyone's input it seems that I need to put some sort of > server in front of Cassandra to handle locking and exclusive access. > > I am planning on building a system (DMS) that will store resources > (document, images, media,

Re: Curious as to how Cassandra handles the following

2010-09-26 Thread Benjamin Black
On Sun, Sep 26, 2010 at 11:04 AM, Lucas Nodine wrote: > I'm looking at a design where multiple clients will connect to Cassandra and > get/mutate resources, possibly concurrently.  After planning a bit, I ran > into the following scenero for which I have not been able to research to > find an answ

Re: 0.7 memory usage problem

2010-09-25 Thread Benjamin Black
t(file='C:\Cassandra\Cass07\commitlog\CommitLog-1285358848765.log', position=44950820) b On Sat, Sep 25, 2010 at 7:53 PM, Benjamin Black wrote: > The log posted shows _10_ pending in MPF stage, and the errors show > repeated failures trying to flush memtables at all: > >  IN

Re: 0.7 memory usage problem

2010-09-25 Thread Benjamin Black
The log posted shows _10_ pending in MPF stage, and the errors show repeated failures trying to flush memtables at all: INFO [GC inspection] 2010-09-24 13:16:11,281 GCInspector.java (line 156) MEMTABLE-POST-FLUSHER 110 You are also flushing _really_ small memtables to disk (l

Re: Backporting Data Center Shard Strategy

2010-09-22 Thread Benjamin Black
7? > > > > On Tue, Sep 21, 2010 at 10:03 PM, Benjamin Black wrote: >> >> DCShard is in 0.6.  It has been rewritten in 0.7. >> >> On Tue, Sep 21, 2010 at 10:02 PM, rbukshin rbukshin >> wrote: >> > Is there any plan to backport DataCenterShardStrate

Re: "timestamp" parameter for Thrift "insert" API ??

2010-09-21 Thread Benjamin Black
On Mon, Sep 20, 2010 at 7:25 PM, Kuan(謝冠生) wrote: > By using cassandra-cli tool, we don't have to input timestamp while > insertion. Does it mean that Cassandra have time synchronization build-in > already? No, it means the cassandra-cli program is inserting a timestamp, which it then provides

Re: Backporting Data Center Shard Strategy

2010-09-21 Thread Benjamin Black
DCShard is in 0.6. It has been rewritten in 0.7. On Tue, Sep 21, 2010 at 10:02 PM, rbukshin rbukshin wrote: > Is there any plan to backport DataCenterShardStrategy to 0.6.x from 0.7? It > will be very useful for those who don't want to make drastic changes in > their code and get the benefits of

Re: Cassandra performance

2010-09-17 Thread Benjamin Black
It appears you are doing several things that assure terrible performance, so I am not surprised you are getting it. On Tue, Sep 14, 2010 at 3:40 PM, Kamil Gorlo wrote: > My main tool was stress.py for benchmarks (or equivalent written in > C++ to deal with python2.5 lack of multiprocessing). I wi

Re: questions on cassandra (repair and multi-datacenter)

2010-09-16 Thread Benjamin Black
On Thu, Sep 16, 2010 at 3:19 PM, Gurpreet Singh wrote: > 1.  I was looking to increase the RF to 3. This process entails changing the > config and calling repair on the keyspace one at a time, right? > So, I started with one node at a time, changed the config file on the first > node for the keysp

Re: Connect to localhost is ok,but the ip fails.

2010-09-09 Thread Benjamin Black
correct, 0.0.0.0 is a wildcard. On Thu, Sep 9, 2010 at 1:19 PM, Aaron Morton wrote: > I  set this to 0.0.0.0 I think the original storage_config.xml had a comment > that it would make thrift respond on all interfaces. > Aaron > On 09 Sep, 2010,at 08:37 PM, Benjamin Black wrote: >

Re: ganglia plugin

2010-09-09 Thread Benjamin Black
Nice! On Wed, Sep 8, 2010 at 6:45 PM, Scott Dworkis wrote: > in case the community is interested, my gmetric collector: > > http://github.com/scottnotrobot/gmetric/tree/master/database/cassandra/ > > note i have only tested with a special csv mode of gmetric... you can bypass > this mode and use

Re: Connect to localhost is ok,but the ip fails.

2010-09-09 Thread Benjamin Black
the ip. > > On Thu, Sep 9, 2010 at 4:14 AM, Ying Tang wrote: >> >> no , i didn't change the yaml file. >> >> On Thu, Sep 9, 2010 at 4:10 AM, Benjamin Black wrote: >>> >>> Do you mean you are changing the yaml file?  Does 'netstat -an |

Re: Connect to localhost is ok,but the ip fails.

2010-09-09 Thread Benjamin Black
Do you mean you are changing the yaml file? Does 'netstat -an | grep 9160' indicate cassandra is bound to ipv4 or ipv6 (tcp vs tcp6 in the netstat output)? b On Thu, Sep 9, 2010 at 1:06 AM, Ying Tang wrote: > I'm using cassandra 0.7 . > And in storage-conf . > > # The address to bind the Thrif

Re: Azure Cloud Storage - Tables

2010-09-08 Thread Benjamin Black
And having said all that: Azure Table storage model doesn't look like Cassandra. There is a schema, there are partition keys. It more resembles something like VoltDB than the map of maps (of maps) of Cassandra (and BigTable, and HBase). b On Wed, Sep 8, 2010 at 2:20 PM, Peter Harrison wrote:

Re: Azure Cloud Storage - Tables

2010-09-08 Thread Benjamin Black
They are not copying Cassandra with that, as it was in development for some time before Cassandra was released (possibly even before Cassandra development started). The BigTable-esque aspects, if they are 'copied' from anywhere, are copied from BigTable, just as they are in Cassandra. The underly

Re: 4k keyspaces... Maybe we're doing it wrong?

2010-09-07 Thread Benjamin Black
n, Sep 6, 2010 at 7:53 PM, Benjamin Black wrote: >> >> On Mon, Sep 6, 2010 at 12:41 AM, Janne Jalkanen >> wrote: >> > >> > So if I read this right, using lots of CF's is also a Bad Idea(tm)? >> > >> >> Yes, lots of CFs is bad means lots of CFs

Re: Few questions regarding cassandra deployment on windows

2010-09-07 Thread Benjamin Black
This does not sound like a good application for Cassandra at all. Why are you using it? On Tue, Sep 7, 2010 at 3:42 PM, kannan chandrasekaran wrote: > Hi All, > > We are currently considering Cassandra for our application. > > Platform: > * a single-node cluster. > * windows '08 > * 64-bit jvm >

Re: 4k keyspaces... Maybe we're doing it wrong?

2010-09-06 Thread Benjamin Black
On Mon, Sep 6, 2010 at 12:41 AM, Janne Jalkanen wrote: > > So if I read this right, using lots of CF's is also a Bad Idea(tm)? > Yes, lots of CFs is bad means lots of CFs is also bad.

Re: Migration from 6.X to 7.X

2010-09-06 Thread Benjamin Black
Welcome to Thrift. On Mon, Sep 6, 2010 at 4:04 PM, Edward Capriolo wrote: > > I was not aware of that. Also is the default for 6.o non framed and > 7.o framed? I was thinking possibly replace cassanda.client detect the > server version and use reflection. This way hector sees the same > interface

Re: 4k keyspaces... Maybe we're doing it wrong?

2010-09-05 Thread Benjamin Black
On Thu, Sep 2, 2010 at 6:25 PM, Mike Peters wrote: > > My concerns are - > #1. Will every single node end up with 4k folders under /cassandra/data/? > Yes (and you should review how Cassandra works if that is a question for you). > #2. Performance: Will Cassandra work better with a single keyspa

Re: the process of reading and writing

2010-09-02 Thread Benjamin Black
On Thu, Sep 2, 2010 at 8:19 PM, Ying Tang wrote: > Recently , i read the paper about Cassandra again . > And now i have some concepts about  the reading and writing . > We all know Cassandra uses NWR , > When read : > the request ---> a random node in Cassandra .This node acts as a proxy ,and > it

Re: question about Cassandra error

2010-09-02 Thread Benjamin Black
You seem to be typing 0.7 commands on a 0.6 cli. Please follow the README in the version you are using, e.g.: set Keyspace1.Standard2['jsmith']['first'] = 'John' On Thu, Sep 2, 2010 at 5:35 PM, Simon Chu wrote: > I downloaded cassendra 0.6.5 and ran it, got this error: > > bin/cassandra -f >  I

Re: Data Center Move

2010-09-02 Thread Benjamin Black
You will likely need to rename some of the files to avoid collisions (they are only unique per node). Otherwise, yes, this can work. On Thu, Sep 2, 2010 at 11:09 AM, Anthony Molinaro wrote: > Hi, > >  We're running cassandra 0.6.4, and need to do a data center move of > a cluster (from EC2 to ou

Re: Cassandra on AWS across Regions

2010-09-02 Thread Benjamin Black
On Thu, Sep 2, 2010 at 5:52 AM, Phil Stanhope wrote: > Ben, can you elaborate on some infrastructure topology issues that would > break this approach? > As noted, the naive approach results in nodes behind the same NAT having to communicate with each other through that NAT rather than directly.

Re: Cassandra on AWS across Regions

2010-09-01 Thread Benjamin Black
On Wed, Sep 1, 2010 at 4:16 PM, Andres March wrote: > I didn't have anything specific in mind. I understand all the issues around > DNS and not advocating only supporting hostnames (just thought it would be a > nice option).  I also wouldn't expect name resolution to be done all the > time, only w

Re: Cassandra on AWS across Regions

2010-09-01 Thread Benjamin Black
On Wed, Sep 1, 2010 at 3:18 PM, Andres March wrote: > I thought you might say that.  Is there some reason to gossip IP addresses > vs hostnames?  I thought that layer of indirection could be useful in more > than just this use case. > The trade-off for that flexibility is that nodes are now depen

Re: Cassandra on AWS across Regions

2010-09-01 Thread Benjamin Black
reat start. b On Wed, Sep 1, 2010 at 2:57 PM, Andres March wrote: > Is it not possible to put the external host name in cassandra.yaml and add a > host entry in /etc/hosts for that name to resolve to the local interface? > > On 09/01/2010 01:24 PM, Benjamin Black wrote: > > The

Re: Cassandra on AWS across Regions

2010-09-01 Thread Benjamin Black
The issue is this: The IP address by which an EC2 instance is known _externally_ is not actually on the instance itself (the address being translated), and the _internal_ address is not accessible across regions. Since you can't bind a specific address that is not on one of your local interfaces,

Re: column family names

2010-08-31 Thread Benjamin Black
of the world...), but "what is the >> XXX way" are not the type of topics I find interesting, so another time. >> >> Terje >> >> >> On Tue, Aug 31, 2010 at 5:30 PM, Benjamin Black wrote: >>> >>> This is not the Unix way for goo

Re: column family names

2010-08-31 Thread Benjamin Black
with >> simplicity. >> /Janne >> On Aug 31, 2010, at 08:39 , Terje Marthinussen wrote: >> >> Beyond aesthetics, specific reasons? >> >> Terje >> >> On Tue, Aug 31, 2010 at 11:54 AM, Benjamin Black wrote: >>> >>> URL encoding. >>> >> > >

Re: column family names

2010-08-31 Thread Benjamin Black
; logical names which map to pretty incomprehensible sequences that are > laborious to look up). > So my experience suggests to avoid it for ops reasons, and just go with > simplicity. > /Janne > On Aug 31, 2010, at 08:39 , Terje Marthinussen wrote: > > Beyond aesthetics, specific reaso

Re: column family names

2010-08-30 Thread Benjamin Black
URL encoding. On Mon, Aug 30, 2010 at 5:55 PM, Aaron Morton wrote: > under scores or URL encoding ? > Aaron > On 31 Aug, 2010,at 12:27 PM, Benjamin Black wrote: > > Please don't do this. > > On Mon, Aug 30, 2010 at 5:22 AM, Terje Marthinussen > wrote: >> Ah,

Re: column family names

2010-08-30 Thread Benjamin Black
Please don't do this. On Mon, Aug 30, 2010 at 5:22 AM, Terje Marthinussen wrote: > Ah, sorry, I forgot that underscore was part of \w. > That will do the trick for now. > > I do not see the big issue with file names though. Why not expand the > allowed characters a bit and escape the file names?

Re: get_slice sometimes returns previous result on php

2010-08-30 Thread Benjamin Black
On Mon, Aug 30, 2010 at 6:05 AM, Juho Mäkinen wrote: > The application is using the > same cassandra thrift connection (it doesn't close it in between) and > everything is happening inside same php process. > This is why you are seeing this problem (and is specific to connection reuse in certain

Re: Cassandra & HAProxy

2010-08-29 Thread Benjamin Black
On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro wrote: > > > I don't know it seems to tax our setup of 39 extra large ec2 nodes, its > also closer to 24000 reqs/sec at peak since there are different tables > (2 tables for each read and 2 for each write) > Could you clarify what you mean here?

Re: Cassandra & HAProxy

2010-08-29 Thread Benjamin Black
On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro wrote: > If one machine is misbehaving it tends to fail pretty quickly, at which > point all the haproxies drop it (we have an haproxy on every client node, > so it acts like a connection pooling mechanism for the client). Cool. Except this is n

Re: Benchmarking Cassandra 0.6.5 with YCSB client ... drags to a halt

2010-08-28 Thread Benjamin Black
              0         0 >  INFO 23:56:20,614 MISCELLANEOUS-POOL                0         0 >  INFO 23:56:20,615 GMFD                              0         0 >  INFO 23:56:20,616 CONSISTENCY-MANAGER               0         0 >  INFO 23:56:20,616 LB-TARGET                         0  

Re: Benchmarking Cassandra 0.6.5 with YCSB client ... drags to a halt

2010-08-28 Thread Benjamin Black
cassandra.in.sh? storage-conf.xml? output of iostat -x while this is going on? turn GC log level to debug? On Sat, Aug 28, 2010 at 2:02 PM, Fernando Racca wrote: > Hi, > I'm currently executing some benchmarks against 0.6.5, which i plan to > compare against 0.7-beta1, using the YCSB client > I'm

Re: Cassandra & HAProxy

2010-08-28 Thread Benjamin Black
On Sat, Aug 28, 2010 at 2:34 PM, Anthony Molinaro wrote: > I think maybe he thought you meant put a layer between cassandra internal > communication. No, I took the question to be about client connections. > There's no problem balancing client connections with > haproxy, we've been pushing sever

Re: Cassandra & HAProxy

2010-08-28 Thread Benjamin Black
requests, including using describe_ring to discover nodes and open new connections as needed. On Sat, Aug 28, 2010 at 11:29 AM, Mark wrote: >  On 8/28/10 11:20 AM, Benjamin Black wrote: >> >> no and no. >> >> On Sat, Aug 28, 2010 at 10:28 AM, Mark  wrote: >>> >&g

Re: Cassandra & HAProxy

2010-08-28 Thread Benjamin Black
across those connections. Should a node begin returning errors (for example, because it is overloaded), clients can remove it from rotation. On Sat, Aug 28, 2010 at 11:27 AM, Mark wrote: >  On 8/28/10 11:20 AM, Benjamin Black wrote: >> >> no and no. >> >> On Sat, Aug

Re: RowMutationVerbHandler.java (line 78) Error in row mutation

2010-08-28 Thread Benjamin Black
Have you tried with beta1 and is there a repro you can put in a bug report in jira? On Sat, Aug 28, 2010 at 11:28 AM, Todd Burruss wrote: > Trunk > > > > -Original Message- > From: Benjamin Black [...@b3k.us] > Received: 8/28/10 10:05 AM > To: user

Re: Cassandra & HAProxy

2010-08-28 Thread Benjamin Black
no and no. On Sat, Aug 28, 2010 at 10:28 AM, Mark wrote: >  I will be loadbalancing between nodes using HAProxy. Is this recommended? > > Also is there a some sort of ping/health check uri available? > > Thanks >

Re: RowMutationVerbHandler.java (line 78) Error in row mutation

2010-08-28 Thread Benjamin Black
Todd, Are you using beta1 or trunk code? b On Fri, Aug 27, 2010 at 3:58 PM, B. Todd Burruss wrote: > i got the latest code this morning.  i'm testing with 0.7 > > > ERROR [ROW-MUTATION-STAGE:388] 2010-08-27 15:54:58,053 > RowMutationVerbHandler.java (line 78) Error in row mutation > org.apache

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-27 Thread Benjamin Black
ecapriolo's testing seemed to indicate it _did_ change the behavior. wonder what the difference is? On Fri, Aug 27, 2010 at 6:23 AM, Mikio Braun wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Dear all, > > thanks for your comments, and I'm glad that you found my post helpful. > > Co

Re: Trying to insert a TimeUUID via Java/Thrift -- "UUIDs must be exactly 16 bytes"

2010-08-27 Thread Benjamin Black
You are using the wrong part of the example. That code sample just produces the string representation. Scroll down in that FAQ entry to the sample labeled: "When you want to actually place the UUID into the Column then you'll want to convert it like this. This method is often used in conjuntion

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-27 Thread Benjamin Black
ug 26, 2010 at 9:57 PM, Benjamin Black wrote: >> imo, these should be part of the defaults. >> >> On Tue, Aug 24, 2010 at 8:29 AM, Mikio Braun wrote: >>> -BEGIN PGP SIGNED MESSAGE- >>> Hash: SHA1 >>> >>> Dear all, >>> >&

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-26 Thread Benjamin Black
imo, these should be part of the defaults. On Tue, Aug 24, 2010 at 8:29 AM, Mikio Braun wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Dear all, > > thanks again for all the comments I got on my last post. I've played a > bit with different GC settings and got my Cassandra instance

Re: Repair help

2010-08-26 Thread Benjamin Black
recommend "testing the waters" on release software (0.6.x), not beta. On Thu, Aug 26, 2010 at 2:53 PM, Mark wrote: >  I have a 2 node cluster  (testing the waters) w/ a replication factor of 2. > One node got completed screwed up (see any of my previous messages from > today) so I deleted the com

Re: is it my cassandra cluster ok?

2010-08-25 Thread Benjamin Black
No, it means manually assign tokens to evenly distribute ring range to the existing nodes. On Wed, Aug 25, 2010 at 7:29 PM, john xie wrote: > load balancing? is it means add more nodes? > > > 2010/8/26 Ryan King >> >> Looks like you need to do some load balancing. >> >> -ryan >> >> On Wed, Aug

Re: Cassandra and Lucene

2010-08-25 Thread Benjamin Black
Please put your storage-conf.xml and cassandra.in.sh files on pastie/dpaste/gist and send the link. (moving it back to the user list again) On Sun, Jul 25, 2010 at 11:51 PM, Michelan Arendse wrote: > I have 2 seeds in my cluster, with a replication of 2. I am using cassandra > 0.6.2. > > It keep

Re: Does the scan speed with CL.ALL is faster than CL.QUORUM and CL.ONE?

2010-08-25 Thread Benjamin Black
Did you run the tests in this order without changing anything but CL? You may be seeing the effects of OS page caching. Run then in the reverse order and see if the difference persists. On Tue, Aug 24, 2010 at 11:52 PM, ring_ayumi_king wrote: > Hi all, > > I ran my benchmark(OPP via get_range_sl

Re: get_slice slow

2010-08-24 Thread Benjamin Black
Todd, This is a really bad idea. What you are likely doing is spreading that single row across a large number of sstables. The more columns you insert, the more sstables you are likely inspecting, the longer the get_slice operations will take. You can test whether this is so by running nodetool

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Benjamin Black
http://riptano.blip.tv/file/4012133/ On Sun, Aug 22, 2010 at 12:11 PM, Moleza Moleza wrote: > Hi, > I am setting up a cluster on a linux box. > Everything seems to be working great and I am watching the ring with: > watch -d -n 2 nodetool -h localhost ring > Suddenly, I see that one of the nodes

Re: Node OOM Problems

2010-08-22 Thread Benjamin Black
want to use the base settings (which are intended for the 1G max heap which is way too small for anything interesting), expect suboptimal performance for your application. > after all an evaluation of whether Cassandra can replace Mysql. > > I thank everyone for their help. > > On Sun

Re: Node OOM Problems

2010-08-22 Thread Benjamin Black
Wayne, Bulk loading this much data is a very different prospect from needing to sustain that rate of updates indefinitely. As was suggested earlier, you likely need to tune things differently, including disabling minor compactions during the bulk load, to make this work efficiently. b On Sun,

Re: Node OOM Problems

2010-08-22 Thread Benjamin Black
> > Thank you for your advice, I am struggling with how to make this work. Any > insight you can provide would be greatly appreciated. > > > > On Sun, Aug 22, 2010 at 8:58 AM, Benjamin Black wrote: >> >> How much storage do you need?  240G SSDs quite capable of satur

Re: Node OOM Problems

2010-08-22 Thread Benjamin Black
>> guess that applies here? Do I need to spend $10k per node instead of $3.5k >> to get SUSTAINED 10k writes/sec per node? >> >> >> >> On Sat, Aug 21, 2010 at 11:03 PM, Benjamin Black wrote: >>> >>> My guess is that you have (at least) 2

Re: Node OOM Problems

2010-08-21 Thread Benjamin Black
0k writes/sec per node? > > > > On Sat, Aug 21, 2010 at 11:03 PM, Benjamin Black wrote: >> >> My guess is that you have (at least) 2 problems right now: >> >> You are writing 10k ops/sec to each node, but have default memtable >> flush settings.  This is resu

Re: Privileges

2010-08-21 Thread Benjamin Black
For reference, I learned this from reading the source: thrift/CassandraServer.java On Sat, Aug 21, 2010 at 4:19 PM, Mark wrote: >  Is there anyway to remove drop column family/keyspace privileges? >

Re: Privileges

2010-08-21 Thread Benjamin Black
My mistake, the access levels in 0.7 do now distinguish these operations (at access level FULL). On Sat, Aug 21, 2010 at 4:19 PM, Mark wrote: >  Is there anyway to remove drop column family/keyspace privileges? >

Re: Privileges

2010-08-21 Thread Benjamin Black
No. On Sat, Aug 21, 2010 at 4:19 PM, Mark wrote: >  Is there anyway to remove drop column family/keyspace privileges? >

Re: Node OOM Problems

2010-08-21 Thread Benjamin Black
My guess is that you have (at least) 2 problems right now: You are writing 10k ops/sec to each node, but have default memtable flush settings. This is resulting in memtable flushing every 30 seconds (default ops flush setting is 300k). You thus have a proliferation of tiny sstables and are seein

Re: Node OOM Problems

2010-08-21 Thread Benjamin Black
Perhaps I missed it in one of the earlier emails, but what is your disk subsystem config? On Sat, Aug 21, 2010 at 2:18 AM, Wayne wrote: > I am already running with those options. I thought maybe that is why they > never get completed as they keep pushed pushed down in priority? I am > getting tim

Re: questions regarding read and write in cassandra

2010-08-19 Thread Benjamin Black
More recent. Newest timestamp always wins. And I am moving this to the user list (again) so it can be with all its friendly threads on the exact same topic. On Thu, Aug 19, 2010 at 10:22 AM, Maifi Khan wrote: > Hi David > Thanks for your reply. > But what happens if I read and get 2 nodes has v

Re: Cassandra gem

2010-08-18 Thread Benjamin Black
great, thanks! On Tue, Aug 17, 2010 at 11:30 PM, Mark wrote: >  On 8/17/10 5:44 PM, Benjamin Black wrote: >> >> Updated code is now in my master branch, with the reversion to 10.0.0. >>  Please let me know of further trouble. >> >> >> b >> &g

Re: data deleted came back after 9 days.

2010-08-17 Thread Benjamin Black
On Tue, Aug 17, 2010 at 7:49 PM, Zhong Li wrote: > Those data were inserted one node, then deleted on a remote node in less > than 2 seconds. So it is very possible some node lost tombstone when > connection lost. > My question, is a ConstencyLevel.ALL read can retrieve lost tombstone back > inste

Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Benjamin Black
On Tue, Aug 17, 2010 at 7:55 PM, Chen Xinli wrote: > Hi, > > We are going to use cassandra for searching purpose like inbox search. > The reading qps is very high, we'd like to use ConsitencyLevel.One for > reading and disable read-repair at the same time. > In 0.7 you can set a probability for r

Re: Cassandra gem

2010-08-17 Thread Benjamin Black
Updated code is now in my master branch, with the reversion to 10.0.0. Please let me know of further trouble. b On Tue, Aug 17, 2010 at 8:31 AM, Mark wrote: >  On 8/16/10 11:37 PM, Benjamin Black wrote: >> >> I'm testing with the default cassandra.yaml. >> >>

Re: indexing rows ordered by int

2010-08-17 Thread Benjamin Black
r such a feature as well.  We use it on the MyNews - >> Top in 24 hours. Since we need timestamp ordering + sorting by how many >> friends touch a story. >> >> -Chris >> >> On Aug 15, 2010, at 7:34 PM, Benjamin Black wrote: >> >> > http://code.googl

Re: move data between clusters

2010-08-17 Thread Benjamin Black
without answering your whole question, just fyi: there is a matching json2sstable command for going the other direction. On Tue, Aug 17, 2010 at 10:48 AM, Artie Copeland wrote: > what is the best way to move data between clusters.  we currently have a 4 > node prod cluster with 80G of data and wa

Re: Cassandra gem

2010-08-16 Thread Benjamin Black
a > /Justus > > -Ursprungligt meddelande----- > Från: Benjamin Black [mailto:b...@b3k.us] > Skickat: den 17 augusti 2010 08:37 > Till: user@cassandra.apache.org > Ämne: Re: Cassandra gem > > I'm testing with the default cassandra.yaml. > > I cannot reproduce the

Re: Cassandra gem

2010-08-16 Thread Benjamin Black
7 beta1 is at Thrift interface version 10.0.0 or 11.0.0? b On Mon, Aug 16, 2010 at 9:03 PM, Mark wrote: >  On 8/16/10 8:51 PM, Mark wrote: >> >>  On 8/16/10 6:19 PM, Benjamin Black wrote: >>> >>> client = Cassandra.new('system', '127.

Re: Cassandra gem

2010-08-16 Thread Benjamin Black
thrift (0.2.0.4) thrift_client (0.4.6, 0.4.3) On Mon, Aug 16, 2010 at 8:51 PM, Mark wrote: >  On 8/16/10 6:19 PM, Benjamin Black wrote: >> >> client = Cassandra.new('system', '127.0.0.1:9160') > > Brand new download of beta-0.7.0-beta1 > >

Re: Cassandra gem

2010-08-16 Thread Benjamin Black
$ irb >> require "lib/cassandra/0.7" => true >> client = Cassandra.new('system', '127.0.0.1:9160') => # >> client.keyspaces => ["system"] >> client.partitioner => "org.apache.cassandra.dht.RandomPartitioner" &

Re: Cassandra gem

2010-08-16 Thread Benjamin Black
can you gist the code? On Mon, Aug 16, 2010 at 5:46 PM, Mark wrote: >  On 8/16/10 3:58 PM, Benjamin Black wrote: >> >> If you pulled before a couple hours ago and did not use the 'trunk' >> branch, then you don't have current code.  I merged the trunk branch &

Re: Cassandra gem

2010-08-16 Thread Benjamin Black
If you pulled before a couple hours ago and did not use the 'trunk' branch, then you don't have current code. I merged the trunk branch to master earlier today and sent a pull request for the fauna repo to get the changes, as well. Also fixed a bug another user found when running with Ruby 1.9.

Re: File write errors but cassandra isn't crashing

2010-08-16 Thread Benjamin Black
Useful config option, perhaps? On Mon, Aug 16, 2010 at 8:51 AM, Jonathan Ellis wrote: > That's a tough call -- you can also come up with scenarios where you'd > rather have it read-only than completely dead. > > On Wed, Aug 11, 2010 at 12:38 PM, Ran Tavory wrote: >> Due to administrative error o

Re: indexing rows ordered by int

2010-08-15 Thread Benjamin Black
http://code.google.com/p/redis/ On Sat, Aug 14, 2010 at 11:51 PM, S Ahmed wrote: > For CF that I need to perform range scans on, I create separate CF that have > custom ordering. > Say a CF holds comments on a story (like comments on a reddit or digg story > post) > So if I need to order comments

Re: Data Distribution / Replication

2010-08-14 Thread Benjamin Black
#546 #1076 #1169 #1377 etc... On Sat, Aug 14, 2010 at 12:05 PM, Bill de hÓra wrote: > That data suggests the inbuilt tools are a hazard and manual workarounds > less so. > > Can you point me at the bugs? > > Bill > > > On Fri, 2010-08-13 at 20:30 -0700, Benjamin Bla

Re: Data Distribution / Replication

2010-08-14 Thread Benjamin Black
On Fri, Aug 13, 2010 at 10:13 PM, Stefan Kaufmann wrote: >> My recommendation is to leave Autobootstrap disabled, copy the >> datafiles over, and then run cleanup.  It is faster and more reliable >> than streaming, in my experience. > > I thought about copying da Data manually. However if I have a

Re: Data Distribution / Replication

2010-08-13 Thread Benjamin Black
g 13, 2010 at 2:05 PM, Bill de hÓra wrote: > On Fri, 2010-08-13 at 09:51 -0700, Benjamin Black wrote: > >> My recommendation is to leave Autobootstrap disabled, copy the >> datafiles over, and then run cleanup.  It is faster and more reliable >> than streaming, in my experienc

Re: Data Distribution / Replication

2010-08-13 Thread Benjamin Black
On Fri, Aug 13, 2010 at 9:48 AM, Oleg Anastasjev wrote: > Benjamin Black b3k.us> writes: > >> > 3. I waited for the data to replicate, which didn't happen. >> >> Correct, you need to run nodetool repair because the nodes were not >> present when the w

Re: Data Distribution / Replication

2010-08-12 Thread Benjamin Black
On Thu, Aug 12, 2010 at 8:30 AM, Stefan Kaufmann wrote: > Hello again, > > last day's I started several tests with Cassandra and learned quite some > facts. > > However, of course, there are still enough things I need to > understand. One thing is, how the data replication works. > For my Testing

Re: Growing commit log directory.

2010-08-09 Thread Benjamin Black
what does the io load look like on those nodes? On Mon, Aug 9, 2010 at 1:50 PM, Edward Capriolo wrote: > I have a 16 node 6.3 cluster and two nodes from my cluster are giving > me major headaches. > > 10.71.71.56   Up         58.19 GB > 10827166220211678382926910108067277    |   ^ > 10.71.71.

Re: Question about column insert history

2010-08-08 Thread Benjamin Black
You don't, they are not preserved, as discussed in another, almost identical thread in the past 2 days. If you want to retain history, you must do so your self, usually by maintaining indices. On Sun, Aug 8, 2010 at 1:29 PM, Kevin Cox wrote: > Taking from the CassandaraCLI example page for simpl

Re: TokenRange contains endpoints without any port information?

2010-08-08 Thread Benjamin Black
On Sun, Aug 8, 2010 at 5:21 AM, Carsten Krebs wrote: > > I'm wondering why a TokenRange returned by describe_ring(keyspace) of the > thrift API just returns endpoints consisting only of an address but omits any > port information? > My first thought was, this method could be used to expose some

Re: Columns limit

2010-08-07 Thread Benjamin Black
Certainly. There is also a performance penalty to unbounded row sizes. That penalty is your nodes OOMing. I strongly recommend you abandon that direction. On Sat, Aug 7, 2010 at 9:06 PM, Mark wrote: > On 8/7/10 7:04 PM, Benjamin Black wrote: >> >> certainly it matters: your p

Re: Columns limit

2010-08-07 Thread Benjamin Black
certainly it matters: your previous version is not bounded on time, so will grow without bound. ergo, it is not a good fit for cassandra. On Sat, Aug 7, 2010 at 2:51 PM, Mark wrote: > On 8/7/10 2:33 PM, Benjamin Black wrote: >> >> Right, this is an index row per time interval

  1   2   3   >