Re: What happens if there is a collision?

2010-10-25 Thread Chris Dean
Peter Schuller writes: >> The timestamp is an ever increasing clock so I wouldn't expect two api >> calls from the same machine in the same thread to have the same >> timestamp.  It is perfectly allowed behavior for the read value to not >> agree with the write value. > > In the *particular* case

Re: keys_cached percent?

2010-10-25 Thread Edward Capriolo
On Mon, Oct 25, 2010 at 9:43 PM, Damick, Jeffrey wrote: > Sure - so percents aren’t supported anymore in 0.7.x, which is fine, I just > wanted to clarify. > > thanks > > > On 10/25/10 9:31 PM, "Aaron Morton" wrote: > > To cache 100% set the value to 1. The comments in the yaml below do > explicit

Re: Experiences with Cassandra hardware planning

2010-10-25 Thread Jonathan Ellis
On Mon, Oct 25, 2010 at 3:10 PM, Eric Rosenberry wrote: >> I don't follow the reasoning there.  Row cache or fs cache, it will be >> hot after reading it once, the difference is that doing a read to the >> cached data is much faster from row cache. > > Yeah, I would have thought the same.  Benjami

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Edward Capriolo
On Mon, Oct 25, 2010 at 10:19 PM, Takayuki Tsunakawa wrote: > Hello, Mike, > > Thank you for your advice. I'll close this thread with this mail (I've been > afraid I was interrupting the community developers with cloudy questions.) > I'm happy to know that any clearly known limitation does not exi

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Takayuki Tsunakawa
Hello, Mike, Thank you for your advice. I'll close this thread with this mail (I've been afraid I was interrupting the community developers with cloudy questions.) I'm happy to know that any clearly known limitation does not exist to limit the cluster to a couple hundreds of nodes. If our project

Re: keys_cached percent?

2010-10-25 Thread Damick, Jeffrey
Sure - so percents aren't supported anymore in 0.7.x, which is fine, I just wanted to clarify. thanks On 10/25/10 9:31 PM, "Aaron Morton" wrote: To cache 100% set the value to 1. The comments in the yaml below do explicitly say this, but its discussed here http://wiki.apache.org/cassandra/S

remove

2010-10-25 Thread ke.yuan.whu
remove ,thanks 2010-10-26 ke.yuan.whu

Re: keys_cached percent?

2010-10-25 Thread Aaron Morton
To cache 100% set the value to 1. The comments in the yaml below do explicitly say this, but its discussed here http://wiki.apache.org/cassandra/StorageConfiguration?highlight=(keys)|(cached)AaronOn 26 Oct, 2010,at 02:02 PM, "Damick, Jeffrey" wrote: I tried a few variations, but when it set it

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Mike Malone
Hey Takayuki, I don't think you're going to find anyone willing to promise that Cassandra will fit your petabyte scale data analysis problem. That's a lot of data, and there's not a ton of operational experience at that scale within the community. And the people who do work on that sort of problem

Re: keys_cached percent?

2010-10-25 Thread Damick, Jeffrey
I tried a few variations, but when it set it to: 100% or "100%" or similar I get: ERROR 19:37:31,898 Fatal error: null; Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=keyspaces for javabean=org.apache.cassandra.config.co

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Takayuki Tsunakawa
Hello, Edward, Thank you for giving me insight about large disk nodes. From: "Edward Capriolo" > Index sampling on start up. If you have very small rows your indexes > become large. These have to be sampled on start up and sampling our > indexes for 300Gb of data can take 5 minutes. This is goin

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Takayuki Tsunakawa
Hello, Jonathan, From: "Jonathan Ellis" > There is no reason Cassandra cannot scale to 1000s or more nodes with > the current architecture. Oh, really, I got an impression that the gossip exchanges limit the number of nodes in a cluster when I read the Dynamos's paper and "Cassandra - A Decentra

Re: What happens if there is a collision?

2010-10-25 Thread Jérôme Verstrynge
Peter, thanks for extensive feedback. Much appreciated. On 26/10/2010 0:47, Peter Schuller wrote: This doesn't mean that your problem is somehow invalid; but it doesn't sound like QUOROM consistency (over-writing) writes is the solution. What is the difference, from your application's perspec

Re: What happens if there is a collision?

2010-10-25 Thread Peter Schuller
> The timestamp is an ever increasing clock so I wouldn't expect two api > calls from the same machine in the same thread to have the same > timestamp.  It is perfectly allowed behavior for the read value to not > agree with the write value. In the *particular* case of a single instantiation of a

Re: What happens if there is a collision?

2010-10-25 Thread Peter Schuller
(sorry about the delay in responding - inbox backlog) > REM: I am not trying to make this discussion longer than necessary or to > play semantics. I am not in to that at all and I appreciate the time you > take to answer me, really. No problem; and same here. I just think that a mutual understand

Re: Benchmarking & Testing

2010-10-25 Thread Peter Schuller
> My question is: what are the points in the system that you guys test? What > are the metrics for the test-points? Any flags that you guys use to see if > more capacity / nodes are needed? > > Thanks in advance. Trying to figure this out and figured I'd ask the > community with more experience

Re: Experiences with Cassandra hardware planning

2010-10-25 Thread Eric Rosenberry
I am going to respond to multiple questions in one email to keep down the thread insanity: On Mon, Oct 25, 2010 at 12:39 AM, David Dabbs wrote: > Sorry, Eric I’m not following you. You’ve set the JVM’s processor > affinity so it only runs on one of the processors? > My understanding is that Li

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Edward Capriolo
On Mon, Oct 25, 2010 at 12:37 PM, Jonathan Ellis wrote: > On Sun, Oct 24, 2010 at 9:09 PM, Takayuki Tsunakawa > wrote: >> From: "Jonathan Ellis" >>> (b) Cassandra generates input splits from the sampling of keys each >>> node has in memory.  So if a node does end up with no data for a >>> keyspa

Re: keys_cached percent?

2010-10-25 Thread Aaron Morton
It should do, this is the comment from the yaml#     - keys_cached: specifies the number of keys per sstable whose#        locations we keep in memory in "mostly LRU" order.  (JUST the key#        locations, NOT any column values.) Specify a fraction (value less#        than 1) or an absolute numbe

keys_cached percent?

2010-10-25 Thread Damick, Jeffrey
Does 0.7 not support percentages in the keys_cached (in the yaml config)? (I'm on 0.7.0b2 so maybe it has been fixed?) thanks

Re: batch_mutate in 0.7

2010-10-25 Thread Chris Oei
Thanks Gary and Jonathan. Yeah, I'm planning on switching to Hector sometime soon; I started with Thrift mostly because I wanted to see what was going on underneath the hood before using a higher-level interface. I suppose now is as good a time to switch as any. Thanks, Chris On Mon, Oct 25, 2010

Re: batch_mutate in 0.7

2010-10-25 Thread Jonathan Ellis
You want the set_keyspace method. What language are you using? We don't recommend using raw Thrift unless there's no other option. On Mon, Oct 25, 2010 at 12:59 PM, Chris Oei wrote: > So, I'm a bit puzzled about how to change my old 0.6 code to 0.7. > In 0.6, I used: >   client.batch_mutate(key

Re: batch_mutate in 0.7

2010-10-25 Thread Gary Dusbabek
client.set_keyspace() On Mon, Oct 25, 2010 at 12:59, Chris Oei wrote: > So, I'm a bit puzzled about how to change my old 0.6 code to 0.7. > In 0.6, I used: >   client.batch_mutate(keySpace, mutationMap, ConsistencyLevel.ONE); > But in 0.7, batch_mutate no longer has a keyspace argument, so I used

batch_mutate in 0.7

2010-10-25 Thread Chris Oei
So, I'm a bit puzzled about how to change my old 0.6 code to 0.7. In 0.6, I used: client.batch_mutate(keySpace, mutationMap, ConsistencyLevel.ONE); But in 0.7, batch_mutate no longer has a keyspace argument, so I used: client.batch_mutate(mutationMap, ConsistencyLevel.ONE); Not surprisingl

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Jonathan Ellis
On Sun, Oct 24, 2010 at 9:09 PM, Takayuki Tsunakawa wrote: > From: "Jonathan Ellis" >> (b) Cassandra generates input splits from the sampling of keys each >> node has in memory.  So if a node does end up with no data for a >> keyspace (because of bad OOP balancing for instance) it will have no >>

Re: Experiences with Cassandra hardware planning

2010-10-25 Thread Jonathan Ellis
On Mon, Oct 25, 2010 at 10:25 AM, Edward Capriolo wrote: >> 2. We gave up on using Cassandra's row cache as loading any reasonable >> amount of data into the cache would take days/weeks with our tiny row size. >>  We instead are using file system cache. I don't follow the reasoning there. Row ca

remove

2010-10-25 Thread Dave Wellman
remove

Hiring engineers

2010-10-25 Thread Dejan Diklic
Don't want to spam the list, but since we had awesome luck last time here is the job posting: Software Engineer The Platform Engineering team is responsible for developing a highly specialized web-scale search infrastructure, including crawling, content processing, indexing, and query serving. Th

Re: Experiences with Cassandra hardware planning

2010-10-25 Thread Edward Capriolo
On Mon, Oct 25, 2010 at 11:21 AM, Eric Rosenberry wrote: > Hey Chris- > That is tough to say as we started out with no data and have been > continuously loading data into the cluster.  Initially we had less data than > the amount of RAM in each node (48 gigs) but we have eventually exceeded > that

Re: Experiences with Cassandra hardware planning

2010-10-25 Thread Eric Rosenberry
Hey Chris- That is tough to say as we started out with no data and have been continuously loading data into the cluster. Initially we had less data than the amount of RAM in each node (48 gigs) but we have eventually exceeded that and now have many times more data on each node than in the entire

Tutorial Session at ApahceConf NA 2010

2010-10-25 Thread Eric Evans
I'll be giving a 3-hour tutorial[1] next week at ApacheCon in Atlanta. It covers everything from setup and configuration to cluster operations, and includes a number of hands-on programming exercises using Pycassa[2] and Twissandra[3]. If you're interested, there is still time to register. If a

Re: Experiences with Cassandra hardware planning

2010-10-25 Thread Chris Burroughs
On 10/24/2010 11:16 PM, Eric Rosenberry wrote: > I wanted to share back to the community some of the learnings we have come > across including the hardware configuration we have been successful with > (YMMV). This is still a work in progress naturally. > > I have written up a detailed blog post a

Re: Hung Repair

2010-10-25 Thread Gary Dusbabek
Can you produce a thread dump on the machine? kill -3 ought to do it. JConsole can be your friend at a time like this too. It might be painstaking, but you can check the CPU time used by each thread using the java.lang.Threading mbean. There's an interesting jconsole plugin that is supposed to

RE: Experiences with Cassandra hardware planning

2010-10-25 Thread David Dabbs
Sorry, Eric I'm not following you. You've set the JVM's processor affinity so it only runs on one of the processors? From: epros...@gmail.com [mailto:epros...@gmail.com] On Behalf Of Eric Rosenberry Sent: Monday, October 25, 2010 12:49 AM To: user@cassandra.apache.org Subject: Re: Experienc