Re: High disk read throughput on only one node.

2012-12-21 Thread Alain RODRIGUEZ
It looks like nobody has already experiment this kind of trouble or even has a clue about it. Under heavy load this creates a high latency (because of iowait) in my app in prod and we can't handle it longer. If there is nothing new in the few upcoming days I think I'll drop this node and replace i

Cassandra read throughput with little/no caching.

2012-12-21 Thread James Masson
Hi list-users, We have an application that has a relatively unusual access pattern in cassandra 1.1.6 Essentially we read an entire multi hundred megabyte column family sequentially (little chance of a cassandra cache hit), perform some operations on the data, and write the data back to ano

Re: Correct way to design a cassandra database

2012-12-21 Thread Hiller, Dean
I you have a way to partition tables, relational can be ok. Thing of a business that has trillions of clients as customers and clients have a whole slew of things they are related to. Partitioning by client can be a good way to go. Here are some patterns we have seen in nosql and perhaps they

CQL3 Compound Primary Keys - Do I have the right idea?

2012-12-21 Thread Adam Venturella
Trying to better grasp compound primary keys and what they are conceptually doing under the hood. When you create a table with a compound primary key in cql3 (http://www.datastax.com/dev/blog/schema-in-cassandra-1-1) the first part of the key is the partition key. I get that and the subsequent part

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
I have a few questions for you, James, 1. how many nodes are in your Cassandra ring? 2. what is the replication factor? 3. when you say sequentially, what do you mean? what Partitioner do you use? 4. how many columns per row? how much data per row? per column? 5. what client library do you use

Re: Correct way to design a cassandra database

2012-12-21 Thread Adam Venturella
I am pretty new to cassandra as well. But here goes nothing: Assumptions: - You are using a CQL3 client - Remember I am a n00bsauce at this as well, so another member of the list may, and probably does, have a better more enlightened answer than I. Everyone was new to this a one time though, and

Re: TTL on SecondaryIndex Columns. A bug?

2012-12-21 Thread cscetbon.ext
Nice job Aaron, AFAIU now you set the gc_before to the current time for secondary indexes. And as it was set to Integer.MAX_VALUE before your patch, removeDeletedStandard function was testing if (column.getLocalDeletiontime() < MAX_VALUE) which is always true and so was removing all rows from t

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread James Masson
Hi, thanks for the reply On 21/12/12 14:36, Yiming Sun wrote: I have a few questions for you, James, 1. how many nodes are in your Cassandra ring? 2 or 3 - depending on environment - it doesn't seem to make a difference to throughput very much. What is a 30 minute task on a 2 node environ

RE: what happens while node is bootstrapping?

2012-12-21 Thread DE VITO Dominique
> > De : Tyler Hobbs [mailto:ty...@datastax.com] > > Envoyé : mardi 16 octobre 2012 17:04 > > À : user@cassandra.apache.org > > Objet : Re: what happens while node is bootstrapping? > > > > On Mon, Oct 15, 2012 at 3:50 PM, Andrey Ilinykh wrote: > > Does it mean that during bootstrapping process o

Re: Correct way to design a cassandra database

2012-12-21 Thread Adam Venturella
Ok.. So here is my latest thinking... Including that index: CREATE TABLE Users ( user_name text, password text, PRIMARY KEY (user_name) ); ^ Same as before CREATE TABLE Photos( user_name text, photo_id uuid, created_time timestamp, data text, PRIMARY KEY (user_nam

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
James, using RandomPartitioner, the order of the rows is random, so when you request these rows in "Sequential" order (sort by the date?), Cassandra is not reading them sequentially. The size of the data, 200Mb, 300Mb , and 40Mb, are these the size for each column? Or are these the total size of t

what happens while node is decommissioning ?

2012-12-21 Thread DE VITO Dominique
> > De : Tyler Hobbs [mailto:ty...@datastax.com] > > Envoyé : mardi 16 octobre 2012 17:04 > > À : user@cassandra.apache.org > > Objet : Re: what happens while node is bootstrapping? > > > > On Mon, Oct 15, 2012 at 3:50 PM, Andrey Ilinykh wrote: > > Does it mean that during bootstrapping process o

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread James Masson
On 21/12/12 16:27, Yiming Sun wrote: James, using RandomPartitioner, the order of the rows is random, so when you request these rows in "Sequential" order (sort by the date?), Cassandra is not reading them sequentially. Yes, I understand the "next" row to be retrieved in sequence is likely t

Re: Moving data from one datacenter to another

2012-12-21 Thread Vegard Berget
Thanks for answers. It went quite well. Note what Aaron writes about sstable names, as I did the job before his mail, and changed one name wrong :-) - and that caused some troubles ( a lot of missing file errors )- i think that was to blame for some counter cf being messed up. As it was not imp

Re: Last Modified Time Series in cassandra

2012-12-21 Thread Andrey Ilinykh
You can select a column slice (specify time range wich for sure has last data), but ask cassandra to return only one column. It is latest one. To have the best performance use reversed sorting order. Andrey On Fri, Dec 21, 2012 at 6:40 AM, Ravikumar Govindarajan < ravikumar.govindara...@gmail.co

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
James, you could experiment with Row cache, with off-heap JNA cache, and see if it helps. My own experience with row cache was not good, and the OS cache seemed to be most useful, but in my case, our data space was big, over 10TB. Your sequential access pattern certainly doesn't play well with LR

[RELEASE CANDIDATE] Apache Cassandra 1.2.0-rc2 released

2012-12-21 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the second release candidate (and likely the last Cassandra release ever since it's the end of the world) for the future Apache Cassandra 1.2.0. Let me first stress that this is not the final release yet and as such is *not* ready for production use. This

Re: State of Cassandra and Java 7

2012-12-21 Thread Bryan Talbot
Brian, did any of your issues with java 7 result in corrupting data in cassandra? We just ran into an issue after upgrading a test cluster from Cassandra 1.1.5 and Oracle JDK 1.6.0_29-b11 to Cassandra 1.1.7 and 7u10. What we saw is values in columns with validation Class=org.apache.cassandra.db.m

Re: Correct way to design a cassandra database

2012-12-21 Thread Adam Venturella
One more link that might be helpful. It's a similar system to photo's but instead of Photos/Albums it's Songs/Playlists: http://www.datastax.com/dev/blog/cql3-for-cassandra-experts. It's not exactly 1:1 but it covers related concepts in making it work. On Fri, Dec 21, 2012 at 8:02 AM, Adam Ven

thrift client can't add a column back after it was deleted with cassandra-cli?

2012-12-21 Thread Qiaobing Xie
Hi, I am developing a thrift client that inserts and removes columns from a column-family (using batch_mutate calls). Everything seems to be working fine - my thrift client can add/retrieve/delete/add back columns as expected... until I manually deleted a column with cassandra-cli. (I was try

Re: thrift client can't add a column back after it was deleted with cassandra-cli?

2012-12-21 Thread Edward Capriolo
The cli using microsecond precision your client might be using something else and the insert with lower timestamps are dropped. On Friday, December 21, 2012, Qiaobing Xie wrote: > Hi, > > I am developing a thrift client that inserts and removes columns from a column-family (using batch_mutate cal

Very large HintsColumnFamily

2012-12-21 Thread Keith Wright
Hi all, I am seeing a VERY large HintsColumnFamily (40+ GB) on one of my nodes (I have 2 DC with 3 nodes each with 2 RF). Nodetool ring as a result reports load as being way higher for the one node (the delta being the size of the HintsColumnFamily). This behavior seems to occur if I do a

Re: Correct way to design a cassandra database

2012-12-21 Thread Edward Capriolo
You could store the order as the first part of a composite string say first picture as A and second as B. To insert one between call it AA. If you shuffle alot the strings could get really long. Might be better to store the order in a separate column. Neither solution mentioned deals with concurr

Re: Very large HintsColumnFamily

2012-12-21 Thread Rob Coli
Before we start.. what version of cassandra? On Fri, Dec 21, 2012 at 4:25 PM, Keith Wright wrote: > This behavior seems to occur if I do a large > amount of data loading using that node as the coordinator node. In general you want to use all nodes to coordinate, not a single one. > Nodetool net

Re: Very large HintsColumnFamily

2012-12-21 Thread Keith
1.1.7 Rob Coli wrote: >Before we start.. what version of cassandra? > >On Fri, Dec 21, 2012 at 4:25 PM, Keith Wright wrote: >> This behavior seems to occur if I do a large >> amount of data loading using that node as the coordinator node. > >In general you want to use all nodes to coordinate, n

Re: thrift client can't add a column back after it was deleted with cassandra-cli?

2012-12-21 Thread Qiaobing Xie
That makes sense - I think my client uses milliseconds. Thanks for pointing that out. -Q On 12/21/12 6:25 PM, Edward Capriolo wrote: The cli using microsecond precision your client might be using something else and the insert with lower timestamps are dropped. On Friday, December 21, 2012, Q

Re: Exception on running nodetool in windows

2012-12-21 Thread Vivek Mishra
Here it is: C:\Users\vivek.mishra\Downloads\training\cassandra\apache-cassandra-1.1.6-bin\apache-cassandra-1.1.6\bin>nodetool ring -h localhost Starting NodeTool Address DC RackStatus State LoadOwns Token Exception in thread "main" java.lang.ClassCastExcep