Thank you very much for those impulses. I see, that I'm still thinking to much in RDMS. I'll try thoses out.
Stefan On Tue, Aug 3, 2010 at 11:44 AM, Aaron Morton <aa...@thelastpickle.com> wrote: > As Justus said, you need to consider the way you want to get the data back > and then denormalise to suit. Do you need to support ad-hoc queries or will > you know how you want to query ahead of time? > Some different approaches may be > Standard CF to hold the measurements taken, grouped by day > { > device_id/20100810 : { date_and_time : value, > date_and_time : value > } > } > - this spreads the write for each device around the cluster, but the same > nodes are used for every write for one device. > - you can read all the measurements for one device for one day in one get > Super CF to hold all the measures for a day, with super columns for the > device > { > 20100810 : { > device_id { > date_and_time : value > } > } > - this concentrates the write load for a single day on the same nodes for > all devices. > - may not be practicable if you have a lot of devices > - you can read all the measurements for all devices for a single day in one > get > Standard CF to store each measurement as a row by itself. > { > device/date_and_time : { > "timestamp" : date_and_time, > "measurement" : "the value" > } > } > - this spreads every write around the cluster for every device and day > - You can then also write the values into aggregate CF's, say grouped by day > or device as above. If you ever want to build new aggregates you can use the > raw data in this CF. > Try out some different ideas and see how easy it is to do your reporting. > > This post from Cloud Kick may help > https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ > Aaron > On 03 Aug, 2010,at 07:37 PM, Thorvaldsson Justus > <justus.thorvalds...@svenskaspel.se> wrote: > > It sounds to me that it's an good idea to use Cassandra in your case, I > figure I help you as we Europeans need to cooperate some even though I only > worked with Cassandra for a month. =) > > 1: > What is the query you want to use when charting the data? Use it to decide > how to storage and sort your data. > 2: > Where is your row? You must model it correctly, I added my explanation here: > http://x0613.orbbox.com/blog/662/8567/ (http://www.justus.st/) > SCF-ROW-SC-C > Or > CF-ROW-C > 3: > There is some limitations: > 2GB of data in a row in 0.6, 2 billion columns in 0.7. > And > A row must fit on a node. > 4: > For my range-selections - I think I need the OrderPreservingPartitioner. > Right? > I don't think you must but sort it by the time of measure. Why you do not > need to is because you always have an entire row on the same node, > OrderPreservingPartitioner is regarding Row Keys in order. > You got to check how to sort columns and supercolumns again. I haven't added > my bookmarks to the blog yet but http://www.sodeso.nl/?p=421 > Was a good source for information I think. There is more on the same blog > aswell. > 5: > There is always alternate designs, you should not give up to early as it's > the most important decisions. > 6: > Have a nice day Stefan > > /Justus > > > > -----Ursprungligt meddelande----- > Från: Stefan Kaufmann [mailto:sta...@gmail.com] > Skickat: den 3 augusti 2010 09:21 > Till: user@cassandra.apache.org > Ämne: Using Cassandra for storing measurement data > > Dear Cassandra Users, > > I'm quite new to Cassandra and I'm still trying to figure out, if I'm > on the right path for my requirements. > I like to explain my Cassandra design and hope to receive feedback, if > this would work. > > I like to use Cassandra to store measurement data from several > devices. Each device every minute - so there will be about 500 000 > Entries per device every year. > Following data has to be stored: > - device ID > - measurement Time (of course different to the Cassandra time-stamp) > - measurement value > > Later, the data should be charted - so I need to select time-ranges > from a device. > > > > My solution for is currently a super-column: > { > name: "device1", > value: { > // measurement timestamps.. > 1280819205: {name: "value", value: "10", timestamp: 123456789}, > 1280819305: {name: "value", value: "15", timestamp: 123456789}, > 1280819405: {name: "value", value: "10", timestamp: 123456789}, > //there will be millions of entries > } > name: "device2", > value: { > // measurement timestamps.. > 1280819205: {name: "value", value: "20", timestamp: 123456789}, > 1280819305: {name: "value", value: "15", timestamp: 123456789}, > 1280819405: {name: "value", value: "20", timestamp: 123456789}, > //there will be millions of entries > } > } > > My questions: > My main concern is the huge amount of subcolumns I'm using. All the > examples of Cassandra in the web I saw, used those to store only a few > columns (like a user profile). > So would this work with millions of entries? > > For my range-selections - I think I need the OrderPreservingPartitioner. > Right? > > Are there alternative designs? Maybe one without a Super-column? I > can't think of one.. > > I'm looking forward to some answers, > Thanks in advance, > Stefan >