Re: data model question : finding out the n most recent changes items

2013-07-11 Thread aaron morton
What you described this sounds like the most appropriate: CREATE TABLE user_file ( user_id uuid, modified_date timestamp, file_id timeuuid, PRIMARY KEY(user_id, modified_date) ); If you normally need more information about the file then either store that as addit

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin
what I mean is, I really just want the last modified date instead of series of timestamp and still able to sort or order by it. (maybe I should rephrase my question as how to sort or order by last modified column in a row) CREATE TABLE user_file ( user_id uuid, modified_date timest

RE: data model question : finding out the n most recent changes items

2013-07-11 Thread Lohith Samaga M
Hi, Do you need to store the history of updates to a file? If this is not required, then you can make the userid and file id as the row key. You need to simply update the modified_date timestamp. There will be only one row per file per user. Thanks and Regards M. Lohith Samaga -Original

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin
Thanks for the suggestion. I don't care the history of the update time to a file, BUT I do want to ordered by it. Reason for that is, without that, and if I have 10k+ file belongs to a user, I have to fetch all the last modified time of all these 10k+ file and sort through them in my application a

Alternate "major compaction"

2013-07-11 Thread Tomàs Núnez
Hi About a year ago, we did a major compaction in our cassandra cluster (a n00b mistake, I know), and since then we've had huge sstables that never get compacted, and we were condemned to repeat the major compaction process every once in a while (we are using SizeTieredCompaction strategy, and we'

listen_address and rpc_address address on different interface

2013-07-11 Thread Christopher Wirt
Hello, I was wondering if anyone has measured the performance improvements to having the listen address and client address bound to different interface? We a have 2gbit connection serving both at the moment and this doesn't come close to being saturated. But being very keen on fast reads at

Re: manually removing sstable

2013-07-11 Thread Theo Hultberg
a colleague of mine came up with an alternative solution that also seems to work, and I'd just like your opinion on if it's sound. we run find to list all old sstables, and then use cmdline-jmxclient to run the forceUserDefinedCompaction function on each of them, this is roughly what we do (but wi

Re: High performance hardware with lot of data per node - Global learning about configuration

2013-07-11 Thread Aiman Parvaiz
Hi, We also recently migrated to 3 hi.4xlarge boxes(Raid0 SSD) and the disk IO performance is definitely better than the earlier non SSD servers, we are serving up to 14k reads/s with a latency of 3-3.5 ms/op. I wanted to share our config options and ask about the data back up strategy for Raid

Re: Alternate "major compaction"

2013-07-11 Thread Takenori Sato
Hi, I think it is a common headache for users running a large Cassandra cluster in production. Running a major compaction is not the only cause, but more. For example, I see two typical scenario. 1. backup use case 2. active wide row In the case of 1, say, one data is removed a year later. Thi

IllegalArgumentException on query with AbstractCompositeType

2013-07-11 Thread Pruner, Anne (Anne)
Hi, I've been tearing my hair out trying to figure out why this query fails. In fact, it only fails on machines with slower CPUs and after having previously run some other junit tests. I'm running junits to an embedded Cassandra server, which works well in pretty much all other

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Eric Stevens
I think there is not an extremely simple solution to your problem. You will probably need to use multiple tables to get the view you need. One keyed just by file UUID, which tracks some basic metadata about the file including the last modified time. Another as a materialized view of the most rece

Re: Alternate "major compaction"

2013-07-11 Thread srmore
Thanks Takenori, Looks like the tool provides some good info that people can use. It would be great if you can share it with the community. On Thu, Jul 11, 2013 at 6:51 AM, Takenori Sato wrote: > Hi, > > I think it is a common headache for users running a large Cassandra > cluster in productio

Re: Alternate "major compaction"

2013-07-11 Thread Brian Tarbox
Perhaps I should already know this but why is running a major compaction considered so bad? We're running 1.1.6. Thanks. On Thu, Jul 11, 2013 at 7:51 AM, Takenori Sato wrote: > Hi, > > I think it is a common headache for users running a large Cassandra > cluster in production. > > > Running a

Re: Cassandra performance tuning...

2013-07-11 Thread Eric Stevens
You should be able to set the key_validation_class on the column family to use a different data type for the row keys. You may not be able to change this for a CF with existing data without some troubles due to a mismatch of data types; if that's a concern you'll have to create a separate CF and m

Re: Alternate "major compaction"

2013-07-11 Thread Michael Theroux
Information is only deleted from Cassandra during a compaction. Using SizeTieredCompaction, compaction only occurs when a number of similarly sized sstables are combined into a new sstable. When you perform a major compaction, all sstables are combined into one, very large, sstable. As a re

Re: High performance hardware with lot of data per node - Global learning about configuration

2013-07-11 Thread Mike Heffner
We've also noticed very good read and write latencies with the hi1.4xls compared to our previous instance classes. We actually ran a mixed cluster of hi1.4xls and m2.4xls to watch side-by-side comparison. Despite the significant improvement in underlying hardware, we've noticed that streaming perf

Token Aware Routing: Routing Key Vs Composite Key with vnodes

2013-07-11 Thread Haithem Jarraya
Hi All, I am a bit confused on how the underlying token aware routing is working in the case of composite key. Let's say I have a column family like this USERS( uuid userId, text firstname, text lastname, int age, PRIMARY KEY(userId, firstname, lastname)) My question is do we need to have the val

Re: Working with libcql

2013-07-11 Thread Sorin Manolache
On 2013-07-09 11:46, Shubham Mittal wrote: yeah I tried that and below is the output I get LOG: resolving remote host localhost:9160 libcql is an implementation for the "new binary transport protocol": https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol

Re: alter column family ?

2013-07-11 Thread Langston, Jim
Hi Rob, Are the schema's held somewhere else ? Going through the process that you sent, when I restart the nodes, the original schema's show up (btw, you were correct on your assessment, even though the schema shows they are the same with the gossipinfo command, they are not the same when looking

Re: alter column family ?

2013-07-11 Thread Robert Coli
On Thu, Jul 11, 2013 at 9:17 AM, Langston, Jim wrote: > Are the schema's held somewhere else ? Going through the > process that you sent, when I restart the nodes, the original > schema's show up > If you do not stop all nodes at once and then remove the system CFs, the existing schema will re-p

Re: Logging Cassandra Reads/Writes

2013-07-11 Thread hajjat
Aaron, Thanks for the references! I'll try the things you mentioned and see how it goes! Best, Mohammad On Wed, Jul 10, 2013 at 8:07 PM, aaron morton [via cassandra-u...@incubator.apache.org] < ml-node+s3065146n7588930...@n2.nabble.com> wrote: > Some info on request tracing > http://www.datast

Re: Alternate "major compaction"

2013-07-11 Thread Robert Coli
On Thu, Jul 11, 2013 at 2:46 AM, Tomàs Núnez wrote: > Hi > > About a year ago, we did a major compaction in our cassandra cluster (a > n00b mistake, I know), and since then we've had huge sstables that never > get compacted, and we were condemned to repeat the major compaction process > every once

Re: node tool ring displays 33.33% owns on 3 node cluster with replication

2013-07-11 Thread Jason Tyler
Thanks Rob! I was able to confirm with getendpoints. Cheers, ~Jason From: Robert Coli mailto:rc...@eventbrite.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Wednesday, July 10, 2013 4:09 PM To: "user@cassandra.apache.org

Re: alter column family ?

2013-07-11 Thread Langston, Jim
Yes, I got the gist of what you were after, even making sure I broke out the schema dump and load them in individually, but I haven't gotten that far. It feels like the 2 node that are not coming up with the right schema are not seeing the nodes with the correct ones. And yes, I hear the beat of t

Re: alter column family ?

2013-07-11 Thread Robert Coli
On Thu, Jul 11, 2013 at 10:16 AM, Langston, Jim wrote: > It feels like the 2 node that are not coming up with > the right schema are not seeing the nodes with the correct ones. > At the time that the nodes come up, they should have no schema other than the system columnfamilies. Only once all 3

Re: High performance hardware with lot of data per node - Global learning about configuration

2013-07-11 Thread Aiman Parvaiz
Thanks for the info Mike, we ran in to a race condition which was killing table snap, I want to share the problem and the solution/ work around and may be someone can throw some light on the effects of the solution. tablesnap was getting killed with this error message: Failed uploading %s. Abor

Re: alter column family ?

2013-07-11 Thread Langston, Jim
Thanks Rob, I went through the whole sequence again and now have gotten to the point of being able to try and pull in the schema, but now getting this error from the one node I'm executing on. [default@unknown] create keyspace OTracker ... with placement_strategy = 'SimpleStrategy' ... and str

Re: alter column family ?

2013-07-11 Thread Robert Coli
On Thu, Jul 11, 2013 at 11:00 AM, Langston, Jim wrote: > I went through the whole sequence again and now have gotten to the point > of > being able to try and pull in the schema, but now getting this error from > the one > node I'm executing on. > [default@unknown] create keyspace OTracker > 9209

Re: alter column family ?

2013-07-11 Thread Langston, Jim
Was just looking at a bug with uppercase , could that be the error ? And, yes, definitely saved off the original system keyspaces. I'm tailing the logs when running the cassandra-cli, but I do not see anything in the logs .. Jim From: Robert Coli mailto:rc...@eventbrite.com>> Reply-To: mailto:u

merge sstables

2013-07-11 Thread chandra Varahala
Hello , I have small size of sstables like 5mb around 2000 files. Is there a way i can merge into bigger size ? thanks chandra

Re: merge sstables

2013-07-11 Thread Faraaz Sareshwala
I assume you are using the leveled compaction strategy because you have 5mb sstables and 5mb is the default size for leveled compaction. To change this default, you can run the following in the cassandra-cli: update column family cf_name with compaction_strategy_options = {sstable_size_in_mb: 25

Re: merge sstables

2013-07-11 Thread chandra Varahala
yes, but nodetool scrub is not working .. thanks chandra On Thu, Jul 11, 2013 at 2:39 PM, Faraaz Sareshwala < fsareshw...@quantcast.com> wrote: > I assume you are using the leveled compaction strategy because you have 5mb > sstables and 5mb is the default size for leveled compaction. > > To c

Re: High performance hardware with lot of data per node - Global learning about configuration

2013-07-11 Thread Mike Heffner
Aiman, I believe that is one of the cases we added a check for: https://github.com/librato/tablesnap/blob/master/tablesnap#L203-L207 Mike On Thu, Jul 11, 2013 at 1:54 PM, Aiman Parvaiz wrote: > Thanks for the info Mike, we ran in to a race condition which was killing > table snap, I want to

Re: merge sstables

2013-07-11 Thread sankalp kohli
Scrub will keep the file size same. YOu need to move all sstables to be L0. the way to do this is to remove the json file which has level information. On Thu, Jul 11, 2013 at 11:48 AM, chandra Varahala < hadoopandcassan...@gmail.com> wrote: > yes, but nodetool scrub is not working .. > > > than

Re: merge sstables

2013-07-11 Thread Robert Coli
On Thu, Jul 11, 2013 at 1:52 PM, sankalp kohli wrote: > Scrub will keep the file size same. YOu need to move all sstables to be > L0. the way to do this is to remove the json file which has level > information. > This will work, but I believe is subject to this? "./src/java/org/apache/cassandra/

Rhombus - A time-series object store for Cassandra

2013-07-11 Thread Rob Righter
Hello, Just wanted to share a project that we have been working on. It's a time-series object store for Cassandra. We tried to generalize the common use cases for storing time-series data in Cassandra and automatically handle the denormalization, indexing, and wide row sharding. It currently exist

Re: Token Aware Routing: Routing Key Vs Composite Key with vnodes

2013-07-11 Thread Colin Blower
It is my understanding that you must have all parts of the partition key in order to calculate the token. The partition key is the first part of the primary key, in your case the userId. You should be able to get the token from the userId. Give it a try: cqlsh> select userId, token(userId) from us

Re: merge sstables

2013-07-11 Thread sankalp kohli
He has around 10G of data so should not be bad. This problem is if you have lot of data. On Thu, Jul 11, 2013 at 2:10 PM, Robert Coli wrote: > On Thu, Jul 11, 2013 at 1:52 PM, sankalp kohli wrote: > >> Scrub will keep the file size same. YOu need to move all sstables to be >> L0. the way to do

unsubscribe

2013-07-11 Thread crigano

Re: listen_address and rpc_address address on different interface

2013-07-11 Thread Robert Coli
On Thu, Jul 11, 2013 at 2:53 AM, Christopher Wirt wrote: > ** > > If we were to take down a node and change the listen address then re-join > the ring, the other nodes will mark the node as dead when we take it down > and assume we have a new node when we bring it back on a different address. > **

How many DCs can you have in a cluster?

2013-07-11 Thread Blair Zajac
In this C* Summit 2013 talk titled "A Deep Dive Into How Cassandra Resolves Inconsistent Data" [1], Jason Brown of Netflix mentions that they have 5 data centers in the same cluster, two in the US, one in Europe, one in Brazil and one in Asia (I'm going from memory now since I don't want to wa

Re: Alternate "major compaction"

2013-07-11 Thread Takenori Sato
Hi, I made the repository public. Now you can checkout from here. https://github.com/cloudian/support-tools checksstablegarbage is the tool. Enjoy, and any feedback is welcome. Thanks, - Takenori On Thu, Jul 11, 2013 at 10:12 PM, srmore wrote: > Thanks Takenori, > Looks like the tool provi

Timeout reading row from CF with collections

2013-07-11 Thread Paul Ingalls
I'm running into a problem trying to read data from a column family that includes a number of collections. Cluster details: 4 nodes running 1.2.6 on VMs with 4 cpus and 7 Gb of ram. raid 0 striped across 4 disks for the data and logs each node has about 500 MB of data currently loaded Here is