Re: Using 5-6 bytes for cassandra timestamps vs 8…

2011-09-05 Thread Oleg Anastastasyev
> > I have a patch for trunk which I just have to get time to test a bit before I submit. > It is for super columns and will use the super columns timestamp as the base and only store variant encoded offsets in the underlying columns.  > Could you please measure how much real benefit it brings

Re: 15 seconds to increment 17k keys?

2011-09-05 Thread Oleg Anastastasyev
> in the family. There are millions of rows. Each operation consists of > doing a batch_insert through pycassa, which increments ~17k keys. A > majority of these keys are new in each batch. > > Each operation is taking up to 15 seconds. For our system this is a > significant bottleneck. > Try t

Re: load balance issue

2011-09-05 Thread amulya rattan
This is golden! thanks a heap guys On Mon, Sep 5, 2011 at 6:07 PM, Nick Bailey wrote: > You can place each of the 4 new nodes exactly in the middle of 2 of > the current nodes. This way each node will still be responsible for > the same amount of data but your old nodes did not move. > > On Mon,

Re: Bulk loader: Got an unknow host from describe_ring

2011-09-05 Thread Christopher Bottaro
That issue says you can workaround the problem by turning off auto node discovery... any instructions on how to do that? Is it done on the cluster or just the sstableloader? Thanks. On Thu, Sep 1, 2011 at 5:34 PM, Jonathan Ellis wrote: > Sounds like https://issues.apache.org/jira/browse/CASSAN

Re: Pelops authentication

2011-09-05 Thread Dan Washusen
You can set up an instance of org.scale7.cassandra.pelops.SimpleConnectionAuthenticator and pass it to org.scale7.cassandra.pelops.IConnection.Config. Cheers, Dan On Monday, 5 September 2011 at 4:24 PM, lacosa2...@libero.it wrote: > Hi, > I wanna know if exists and how to implement authentic

Re: UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-05 Thread Evgeniy Ryabitskiy
great thanks! Evgeny.

Re: load balance issue

2011-09-05 Thread Nick Bailey
You can place each of the 4 new nodes exactly in the middle of 2 of the current nodes. This way each node will still be responsible for the same amount of data but your old nodes did not move. On Mon, Sep 5, 2011 at 2:56 PM, amulya rattan wrote: > Ah, missed that. Thanks for the pointer. > While

Re: Why no need to query all nodes on secondary index lookup?

2011-09-05 Thread Jonathan Ellis
The first node can answer the question as long as you've requested less rows than the first node has on it. Hence the "low cardinality" point in what you quoted. On Sat, Sep 3, 2011 at 5:00 AM, Kaj Magnus Lindberg wrote: > Hello Anyone > > I have a follow up question on a question from February

Re: Why no need to query all nodes on secondary index lookup?

2011-09-05 Thread Martin von Zweigbergk
Hi Magnus, I think the answer might be on https://issues.apache.org/jira/browse/CASSANDRA-749. For example, Jonathan writes: > Is it worth creating a secondary index that only contains local data, versus > a distributed secondary index (a normal ColumnFamily?) I think my initial reasoning was

Re: RF=1 w/ hadoop jobs

2011-09-05 Thread Mick Semb Wever
On Mon, 2011-09-05 at 21:52 +0200, Patrik Modesto wrote: > I'm not sure about 0.8.x and 0.7.9 (to be released today with your > patch) but 0.7.8 will fail even with RF>1 when there is Hadoop > TaskTracer without local Cassandra. So increasing RF is not a > solution. This isn't true (or not the in

Re: load balance issue

2011-09-05 Thread amulya rattan
Ah, missed that. Thanks for the pointer. While we are at it, the doc says that if I am doubling the strength of my cluster, and I assign calculated tokens to the new nodes, i don't need to do the nodetool move for old nodes. Won't I have to assign the old nodes with their new respective tokens too?

Re: RF=1 w/ hadoop jobs

2011-09-05 Thread Patrik Modesto
On Mon, Sep 5, 2011 at 09:39, Mick Semb Wever wrote: > I've entered a jira issue covering this request. > https://issues.apache.org/jira/browse/CASSANDRA-3136 > > Would you mind attaching your patch to the issue. > (No review of it will happen anywhere else.) I see Jonathan didn't change his mind

Re: UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-05 Thread Jonathan Ellis
https://issues.apache.org/jira/browse/CASSANDRA-3082 On Mon, Sep 5, 2011 at 10:04 AM, Evgeniy Ryabitskiy wrote: > Hi, > > I'am trying to store record with EACH_QUORUM consistency and RF=3. While > same thing with RF=2 is working. > Could some one tell me why EACH_QUORUM is working with RF=2 but n

Thrift 7

2011-09-05 Thread tushar pal
Hi, I am facing some problem using Thrift 7. I downloaded the tar file.I downloaded the windows exe too for . Created a thrift jar from the lib java path and then generated the java class from tutorial.thrift file.Now while I use run the ant file inside example I got some error that some of the

isolate the replication logic?

2011-09-05 Thread Yang
I'm interested in isolating the replication logic, so that cassandra (or any NoSQL software, for that matter) is composed of the replication module and DB engine, and possibly key mapping module. this way we could swap out Cassandra replication (multi-master, async) with , for example, ZAB protoco

Re: KeyRange in the CoumnFamilyInputFormat

2011-09-05 Thread Mick Semb Wever
On Mon, 2011-09-05 at 19:02 +0200, Mick Semb Wever wrote: > > ConfigHelper.setInputRange( > jobConf, > > partitioner.getTokenFactory().toString(partitioner.getToken(myKey)), > > partitioner.getTokenFactory().toString(partitioner.getToken(my

get_range_slices efficiency question

2011-09-05 Thread Sanjeev Kulkarni
Hey guys, We are designing our data model for our app and this question came up. Lets say that I have a large number of rows(say 1M). And just one column family. Each row contains either columns (A, B, C) or (X, Y, Z). I want to run a get_range_slices query to fetch columns (A, B, C). Does cassandr

Re: UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-05 Thread Evgeniy Ryabitskiy
One more thing, Cassandra version is 0.8.4. And if I try same thing from Pelops(thrift), I get UnavailableException.

Re: KeyRange in the CoumnFamilyInputFormat

2011-09-05 Thread Mick Semb Wever
On Mon, 2011-09-05 at 18:18 +0300, Vitaly Vengrov wrote: > See these rows in the ColumnFamilyInputFormat.getSplits method : > > assert jobKeyRange.start_key == null : "only start_token supported"; > > assert jobKeyRange.end_key == null : "only end_token supported"; > > So, the

KeyRange in the CoumnFamilyInputFormat

2011-09-05 Thread Vitaly Vengrov
Hi guys. See these rows in the ColumnFamilyInputFormat.getSplits method : assert jobKeyRange.start_key == null : "only start_token supported"; assert jobKeyRange.end_key == null : "only end_token supported"; So, the question is why start_key and end_key aren't sup

UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-05 Thread Evgeniy Ryabitskiy
Hi, I'am trying to store record with EACH_QUORUM consistency and RF=3. While same thing with RF=2 is working. Could some one tell me why EACH_QUORUM is working with RF=2 but not with RF >=3 I have 7 nodes cluster. All nodes are UP. Here is simple CLI script: create keyspace kspace3 with placeme

Re: cassandra 0.8.4 + pig (using cloudera rpms)

2011-09-05 Thread William Oberman
Yes, my cluster is working. I didn't realize it at the time, but the StorageService link I listed is already in 0.8.4, so yes the only file I had to patch was VersionedValue. Not sure what was going on with the pig jars, but after more configuration changes than I can count, I'm pretty sure remov

Re: java.io.IOException: Could not get input splits

2011-09-05 Thread Ji Cheng
Hi. We got the same problem here. Even the wordcount map/reduce example in the source tar works fine with one node, but fails with the same exception on a two node cluster. CASSANDRA-3044 mentioned that a temporary work around is to disable node auto discovery. Can anyone tell me how to do that in

Re: load balance issue

2011-09-05 Thread Sylvain Lebresne
Have you done step 6 of the 'To add nodes to a Cassandra cluster' of http://www.datastax.com/docs/0.8/operations/clustering#adding-capacity, aka, run nodetool cleanup on the previously existing nodes ? -- Sylvain On Sun, Sep 4, 2011 at 11:58 AM, amulya rattan wrote: > Hi there, > I had a 3 nodes

Re: RF=1 w/ hadoop jobs

2011-09-05 Thread Mick Semb Wever
On Fri, 2011-09-02 at 09:28 +0200, Patrik Modesto wrote: > We use Cassandra as a storage for web-pages, we store the HTML, all > URLs that has the same HTML data and some computed data. We run Hadoop > MR jobs to compute lexical and thematical data for each page and for > exporting the data to a bi