Re: inconsistent hadoop/cassandra results

2013-01-09 Thread Brian Jeltema
Sorry if this is a duplicate - I was having mailer problems last night: > Assuming their were no further writes, running repair or using CL all should > have fixed it. > > Can you describe the inconsistency between runs? Sure. The job output is generated by a single reducer and consists of a

Re: initial_token configuration

2013-01-09 Thread Manu Zhang
no bother. I've seen the codes. On Wed, Jan 9, 2013 at 11:57 AM, Manu Zhang wrote: > # If blank, Cassandra will request a token bisecting the range of >> # the heaviest-loaded existing node. If there is no load information >> # available, such as is the case with a new cluster, it will pick >>

Re: How long does it take for a write to actually happen?

2013-01-09 Thread Vitaly Sourikov
Aaron, thanks a lot for you response! It gave us many ideas for future re-factorings. Meanwhile, while trying to monitor Cassandra response times on all 3 servers (online, offline and cassandra itself), I have noticed that the system time was different on all 3. After I ran ntpdate on all of them,

Re: How long does it take for a write to actually happen?

2013-01-09 Thread Vegard Berget
Hi, The timestamp is generated on the client side, so actually if you have two clients which sets the timestamp from the system time, you will experience trouble.  I don't know how Astyanax does it, and I am not sure if it would cause trouble when getting data?  Could it be that the Process server

RE: Date Index?

2013-01-09 Thread Stephen.M.Thompson
Thanks Aaron, that helps. So is there anything approaching a "consensus" of how to do something like this? You mention a custom index ... is there a good document on creating a custom index? Google doesn't show me much. Steve From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday,

Re: Date Index?

2013-01-09 Thread Michael Kjellman
ElasticSearch is a nice option for ordered lists. In 2.0 triggers would fit updates to elastic search much easier as right now it's in your application logic to detect changes and update. On Jan 9, 2013, at 7:55 AM, "stephen.m.thomp...@wellsfargo.com"

Re: Wide rows in CQL 3

2013-01-09 Thread Hiller, Dean
Probably should read this http://www.datastax.com/dev/blog/cql3-for-cassandra-experts I don't see wide row support going away since they specifically made the change to enable 2 billion columns in a row according to that paper. Dean From: mrevilgnome mailto:mrevilgn...@gmail.com>> Reply-To: "us

Re: Wide rows in CQL 3

2013-01-09 Thread Ben Hood
I'm currently in the process of porting my app from Thrift to CQL3 and it seems to me that the underlying storage layout hasn't really changed fundamentally. The difference appears to be that CQL3 offers a neater abstraction on top of the wide row format. For example, in CQL3, your query results ar

Re: distribution of token ranges with virtual nodes

2013-01-09 Thread Manu Zhang
Is cassandra-shuffle command in the trunk? Or it is only included in the Debian package? I don't find it in the trunk. On Sat, Nov 3, 2012 at 2:18 AM, Eric Evans wrote: > On Fri, Nov 2, 2012 at 12:38 AM, Manu Zhang > wrote: > >> It splits into a contiguous range, because truly upgrading to vno

change cluster name retaining keypsace

2013-01-09 Thread Tim Dunphy
Hello, I'm attempting to change my cluster name, yet retain my keyspace as it was. I know from what I've read that this requires changing it within the cassandra cli (using system), changing it in the cassandra.yaml file, and deleting the contents of the /var/lib/cassandra/data/system directory.

RE: remote datacentre consistency

2013-01-09 Thread Simon Guindon
Here's a good document on how hinted handoff works http://www.datastax.com/dev/blog/modern-hinted-handoff I believe if I understand that document correctly that a hinted handoff will get created if the replica is down in the other data center. Also since Cassandra is self-healing, reads will cau

Re: Wide rows in CQL 3

2013-01-09 Thread Edward Capriolo
I ask myself this every day. CQL3 is "new way" to do things, including wide rows with collections. There is no "upgrade path". You adopt CQL3's sparse tables as soon as you start creating column families from CQL. There is not much backwards compatibility. CQL3 can query compact tables, but you may

Re: Date Index?

2013-01-09 Thread Tyler Hobbs
If you're going to be looking data up by date ranges frequently, I strongly suggest you go with a typical time-series pattern (what Aaron described as hand-rolled indexes): http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/ http://www.datastax.com/dev/blog/advanced-time-series-

RE: Date Index?

2013-01-09 Thread Stephen.M.Thompson
OK ... I think I understand these. So the idea is that you would use the time as the column key? So when I might have something like this: | time=2013/01/03 08:19:01 | user=john | site=Chicago | time=2013/01/05 01:55:34 | user=john | site=Chicago | time=2013/01/09 16:21:42 | user=john | site

Re: Wide rows in CQL 3

2013-01-09 Thread Sylvain Lebresne
> There is no "upgrade path". I don't think that's true. The goal of the blog post you've linked is to discuss that upgrade path (and in particular show that for the most part, you can access your thrift data from CQL3 without any modification whatsoever). > You adopt CQL3's sparse tables as soon

Re: remote datacentre consistency

2013-01-09 Thread Jabbar
Hello Simon, I thought Hinted Handoff was for downed replica's in the local datacentre. I didn't realise that it would work with a remote datacenter. Likewise for Anti Entropy I thought it only worked for the replicas in the local datacentre. I yet to find any definitive references which mention

Pagination over row Keys in Cassandra using Kundera/CQL queries

2013-01-09 Thread Snehal Nagmote
Hello All, I am using Kundera 2.0.7 and Cassandra 1.0.8. I need to implement batching/ pagination over row keys. for instance, Scan columnfamily , get 100 records in batch everytime , till all keys are exhausted. I am using random partitioner for keyspace. I explored limit option in cql and ,se

Re: JIRA for native IAuthorizer and IAuthenticator ?

2013-01-09 Thread aaron morton
Do these help? https://issues.apache.org/jira/browse/CASSANDRA-4874 https://issues.apache.org/jira/browse/CASSANDRA-4875 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 9/01/2013, at 1:21 PM, Frank Hsueh wrote: > I

Re: How long does it take for a write to actually happen?

2013-01-09 Thread aaron morton
And by default in CQL 3 the timestamp is generated server side. There is an option to provide them client side however. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 10/01/2013, at 3:32 AM, Vegard Berget wrote: >

Re: Date Index?

2013-01-09 Thread Tyler Hobbs
On Wed, Jan 9, 2013 at 3:37 PM, wrote: > OK … I think I understand these. So the idea is that you would use the > time as the column key? > > ** ** > > So when I might have something like this: > > ** ** > > | time=2013/01/03 08:19:01 | user=john | site=Chicago > > | time=2013/01/

RE: Helenos 1.3 released

2013-01-09 Thread S C
I tried Helenos 1.3. It looks pretty good. I created a test account with the role "ROLE_USER". With this user, I am able to create KS/CF's and drop them as well. Is this intended?I was expecting that the user with role "ROLE_USER" should be able to browse data but not create or delete it. Than

Re: Wide rows in CQL 3

2013-01-09 Thread Edward Capriolo
"By no upgrade path" I mean to say if I have a table with compact storage I can not upgrade it to sparse storage. If i have an existing COMPACT table and I want to add a Map to it, I can not. This is what I mean by no upgrade path. Column families that mix static and dynamic columns are pretty com

Re: Wide rows in CQL 3

2013-01-09 Thread Edward Capriolo
Also I have to say I do not get that blank sparse column. Ghost ranges are a little weird but they don't bother me. 1 its a row of nothing. The definition of a waste. 2 suppose of have 1 billion rows and my distribution is mostly rows of 1 or 2 columns. My database is now significantly bigger. T

Re: change cluster name retaining keypsace

2013-01-09 Thread aaron morton
To change the cluster name: 1) Stop all nodes. 2) Delete or move the LocationInfo sstables from /var/log/cassandra/data/system/LocationInfo 3) Change the cluster_name in cassandra.yaml 4) Restart the nodes. You cannot do an incremental change of the cluster name. All nodes in the cluster mus

Re: remote datacentre consistency

2013-01-09 Thread aaron morton
> I thought Hinted Handoff was for downed replica's in the local datacentre. I > didn't realise that it would work with a remote datacenter. If the coordinator will store a hint if it detects a replica is down before the request starts, or that the node did not return within rpc_timeout. > Likew

Re: Pagination over row Keys in Cassandra using Kundera/CQL queries

2013-01-09 Thread aaron morton
Try this http://wiki.apache.org/cassandra/FAQ#iter_world Take a look at the code examples it points to. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 10/01/2013, at 11:55 AM, Snehal Nagmote wrote: > Hello All, >

Re: distribution of token ranges with virtual nodes

2013-01-09 Thread Jason Wee
It should be in the trunk, check it https://github.com/apache/cassandra/blob/trunk/bin/cassandra-shuffle On Thu, Jan 10, 2013 at 1:18 AM, Manu Zhang wrote: > Is cassandra-shuffle command in the trunk? Or it is only included in the > Debian package? I don't find it in the trunk. > > > On Sat, No

Re: distribution of token ranges with virtual nodes

2013-01-09 Thread Manu Zhang
sorry, I missed it since it's not executable by default. On Thu, Jan 10, 2013 at 10:05 AM, Jason Wee wrote: > It should be in the trunk, check it > https://github.com/apache/cassandra/blob/trunk/bin/cassandra-shuffle > > > On Thu, Jan 10, 2013 at 1:18 AM, Manu Zhang wrote: > >> Is cassandra-shu

Re: remote datacentre consistency

2013-01-09 Thread Jabbar
Aaron, Thank you for your answers. On 10 Jan 2013 00:27, "aaron morton" wrote: > I thought Hinted Handoff was for downed replica's in the local datacentre. > I didn't realise that it would work with a remote datacenter. > > If the coordinator will store a hint if it detects a replica is down > b

Re: change cluster name retaining keypsace

2013-01-09 Thread Tim Dunphy
Hello, And thanks for your reply! Well so far it's just a single node. So I wouldn't think this should be so complicated. But one day hopefully from this node a cluster will grow, but that we shall have to wait and see. At any rate, at /var/log/cassandra I don't see a directory called system. All

Re: change cluster name retaining keypsace

2013-01-09 Thread Michael Kjellman
I think Arron meant /var/lib/cassandra (by default) Check there (unless you changed you data directories in your cassandra.yaml) On Jan 9, 2013, at 7:36 PM, "Tim Dunphy" mailto:bluethu...@gmail.com>> wrote: Hello, And thanks for your reply! Well so far it's just a single node. So I wouldn't t

Re: Wide rows in CQL 3

2013-01-09 Thread Janne Jalkanen
On 10 Jan 2013, at 01:30, Edward Capriolo wrote: > Column families that mix static and dynamic columns are pretty common. In > fact it is pretty much the default case, you have a default validator then > some columns have specific validators. In the old days people used to say > "You only nee