Re: [RELEASE] 0.7.0 beta2

2010-10-02 Thread gabriele renzi
On Fri, Oct 1, 2010 at 8:56 PM, Eric Evans wrote: > > If you're coming from 0.6, there are some things to be aware of, so be > sure to read the release notes[2]. should the release notes have the sentence "Stand up your cluster with the 0.7 version. " kind of seems start up to me, but I may

Re: Log configuration

2010-07-18 Thread gabriele renzi
On Sun, Jul 18, 2010 at 5:28 PM, osishkin osishkin wrote: > I didn't find in the documentation a way to configure message logging > that I'm looking for, so I appologize if this is a trivial question. > Is there a simple guide to configuring logging options in Cassandra? > I saw references to outp

Re: Algorithm for distributing key of Cassandra

2010-06-01 Thread gabriele renzi
On Mon, May 31, 2010 at 8:50 PM, Jonathan Ellis wrote: > Doesn't ring a bell.  Maybe if you included the link to which you refer? I guess this is the related post http://spyced.blogspot.com/2009/05/consistent-hashing-vs-order-preserving.html thought I believe the original poster misphrased or mi

Re: ec2 tests

2010-05-28 Thread gabriele renzi
On Fri, May 28, 2010 at 3:48 PM, Mark Greene wrote: > First thing I would do is stripe your EBS volumes. I've seen blogs that say > this helps and blogs that say it's fairly marginal. just to point out: another option is to stripe the ephemeral drives (if using instances > small)

Re: remove a row

2010-05-28 Thread gabriele renzi
On Fri, May 28, 2010 at 11:05 AM, huajun qi wrote: > Is there anyway to remove a row completely? > I use thrift client's remove method , it only deletes the columns under a > row, but the row with its key is still there. > How can I remove it completely? you can't really, with the thrift api, s

Re: using more than 50% of disk space

2010-05-27 Thread gabriele renzi
On Thu, May 27, 2010 at 9:23 PM, Sean Bridges wrote: > But doesn't having multiple similarly sized column families mean in-node > compaction does not require 50% of disk?  Looking at compaction manager, > only 1 thread is doing a compaction, so we only need enough free disk space > to compact the

Re: using more than 50% of disk space

2010-05-27 Thread gabriele renzi
On Wed, May 26, 2010 at 8:00 PM, Sean Bridges wrote: > So after CASSANDRA-579, anti compaction won't be done on the source node, > and we can use more than 50% of the disk space if we use multiple column > families? Sorry if I misunderstand, but #579 seems to only solve half of your question, I b

Re: Anyone using hadoop/MapReduce integration currently?

2010-05-26 Thread gabriele renzi
On Tue, May 25, 2010 at 6:35 PM, Jeremy Hanna wrote: > What is the use case? we end up with messed up data in the database, we run a mapreduce job to find irregular data from time to time. > Why are you using Cassandra versus using data stored in HDFS or HBase? as of now our mapreduce task i

Re: batch mutation : how to delete whole row?

2010-05-26 Thread gabriele renzi
On Wed, May 26, 2010 at 9:54 AM, Mishail wrote: > You could either use 1 remove(keyspace, key, column_path, timestamp, > consistency_level) per aech key, or wait till > https://issues.apache.org/jira/browse/CASSANDRA-494 fixed (to use > SliceRange in the Deletion) thanks, I'm already doing that b

batch mutation : how to delete whole row?

2010-05-26 Thread gabriele renzi
Hi everyone, in our test code we perform a dummy "clear" by reading all the rows and deleting them (while waiting for cassandra 0.7 & CASSANDRA-531). A couple of days ago I updated our code to perform this operation using batchMutate, but there seem to be no way to perform a deletion of the whole

Re: Increment and Decrement operation

2010-05-13 Thread gabriele renzi
On Fri, May 14, 2010 at 12:43 AM, Paul Prescod wrote: > I'm curious what the relevance of CASSANDRA-1016 is. I guess if you had "operations" moved to the data you could implement incr/decr easily: read the previous value, add one, write the new value. This does not yet seem what 1016 is for thoug

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi
On Wed, May 12, 2010 at 5:46 PM, Johan Oskarsson wrote: > Looking over the code this is in fact an issue in 0.6. > It's fixed in trunk/0.7. Connections will be reused and closed properly, see > https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details. > > We can either backport that

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi
On Wed, May 12, 2010 at 4:43 PM, Jonathan Ellis wrote: > On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote: >> - is it possible that such errors show up on the client side as >> timeoutErrors when they could be reported better? > > No, if the node the client is talking t

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi
a follow up for anyone that may end up on this conversation again: I kept trying and neither changing the number of concurrent map tasks, nor the slice size helped. Finally, I found out a screw up in our logging system, which had forbidden us from noticing a couple of recurring errors in the logs

Re: timeout while running simple hadoop job

2010-05-07 Thread gabriele renzi
On Fri, May 7, 2010 at 2:44 PM, Jonathan Ellis wrote: > Sounds like you need to configure Hadoop to not create a whole bunch > of Map tasks at once interesting, from a quick check it seems there are a dozen threads running. Yet , setNumMapTasks seems to be deprecated (together with JobConf) and

Re: timeout while running simple hadoop job

2010-05-07 Thread gabriele renzi
On Fri, May 7, 2010 at 2:53 PM, Matt Revelle wrote: > There's also the mapred.task.timeout property that can be tweaked.  But > reporting is the correct way to fix timeouts during execution. re: not reporting, I thought this was not needed with the new mapred api (Mapper class vs Mapper interf

Re: timeout while running simple hadoop job

2010-05-07 Thread gabriele renzi
On Fri, May 7, 2010 at 3:02 PM, Joost Ouwerkerk wrote: > Joseph, the stacktrace suggests that it's Thrift that's timing out, > not the Task. > > Gabriele, I believe that your problem is caused by too much load on > Cassandra.  Get_range_slices is presently an expensive operation. I > had some succ

timeout while running simple hadoop job

2010-05-07 Thread gabriele renzi
Hi everyone, I am trying to develop a mapreduce job that does a simple selection+filter on the rows in our store. Of course it is mostly based on the WordCount example :) Sadly, while it seems the app runs fine on a test keyspace with little data, when run on a larger test index (but still on a

Re: Regarding Cassandra Scalability

2010-04-16 Thread gabriele renzi
On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang wrote: > FB also does pics and movies so 1MB is way off depending on where they > manage such binary data. apparently not in cassandra http://www.facebook.com/note.php?note_id=76191543919 >I do agree that 1MB of text alone is a lot of text > which is

Re: Clarification on Ring operations in Cassandra 0.5.1

2010-04-16 Thread gabriele renzi
On Fri, Apr 16, 2010 at 1:10 AM, Anthony Molinaro wrote: > Hi, > >  I have a cluster running on ec2, and would like to do some ring > management.  Specifically, I'd like to replace an existing node > without another node (I want to change the instance type). does maybe `nodetool move` do what yo

Re: If a user has millions of followers, is there millions of iterate? (ref Twissandra)

2010-04-15 Thread gabriele renzi
On Thu, Apr 15, 2010 at 9:56 AM, Allen He wrote: > Hello folks, > > When Twissandra (Twitter clone example for Cassandra) post a tweet, it > iterate all of the followers to insert a tweet_id to their time lines(see > for follower_id in follower_ids: > TIMELINE.insert(str(follower_id)

Re: How do vector clocks and conflicts work?

2010-04-06 Thread gabriele renzi
On Tue, Apr 6, 2010 at 9:11 AM, Paul Prescod wrote: > This may be the blind leading the blind... > On Mon, Apr 5, 2010 at 11:54 PM, Tatu Saloranta > wrote: >>... > >> >> I think the key is that this is not automatic -- there is no general >> mechanism for aggregating distinct modifications. Point

Re: Memcached protocol?

2010-04-06 Thread gabriele renzi
On Tue, Apr 6, 2010 at 2:10 AM, Paul Prescod wrote: > On Mon, Apr 5, 2010 at 4:48 PM, Tatu Saloranta wrote: >> ... >> >> I would think that there is also possibility of losing some >> increments, or perhaps getting duplicate increments? > > I believe that with vector clocks in Cassandra 0.7 you w

Re: multinode cluster wiki page

2010-04-03 Thread gabriele renzi
On Sat, Apr 3, 2010 at 6:40 PM, Avinash Lakshman wrote: > We use anywhere from 3-5 seeds for clusters that have over 150 nodes. That > should suffice for larger sizes too since they are only for initial > discovery. would it make sense to just use a round robin dns on the available nodes and use

Re: how to store list ?

2010-04-02 Thread gabriele renzi
On Fri, Apr 2, 2010 at 12:46 PM, Shuge Lee wrote: > For example: > user['lee'] = { >     'name': 'lee', >     'age'; '21', >     'girls': ['java', 'actionscript', 'python'], > } > how to store above in Apache Cassndra ? check what a SuperColumn is in the wiki -- blog en: http://www.riffraff.inf

Re: NullPointerException in DatabaseDescriptor.getComparator

2010-03-24 Thread gabriele renzi
On Wed, Mar 24, 2010 at 3:36 PM, Oleg Mürk wrote: > Hi Jonathan, > > On Wed, Mar 24, 2010 at 4:32 PM, Jonathan Ellis wrote: >> >> probably 0.5.1 is allowing an invalid query and erroring out when it >> actually runs it. > > I am pretty sure that the same query works OK when I initially start > Ca

Re: updates on hector, a java cassandra client

2010-03-23 Thread gabriele renzi
On Sat, Mar 20, 2010 at 4:54 AM, Ran Tavory wrote: > Hector is a java client for cassandra, > see http://github.com/rantav/hector , http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/ , http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/ Hey Ran, > Over