As far as I'm familiar with Cassandra, I gave my opinion for every requirement on your list:
1) 10k inserts / seconds should be no problem at all for Cassandra 2) Cassandra should scale to that 3) As the homepage of Cassandra states that amount of data should be able to fit (source: http://cassandra.apache.org/ ) 4) Not Cassandra related 5) Inserts are very fast in Cassandra 6) You could create row keys in cassandra that hold the values as columns, within a timespan (e.g. per second / minute). Please not that "The maximum of column per row is 2 billion" (source: http://wiki.apache.org/cassandra/CassandraLimitations ) 7) The most common ordering for Cassandra is random. Hower you could create some kind of index ColumnFamily (CF) with as columns the row keys of your actual Data CF. Columns are sorted by default. 8) Cassandra provides a time-to-live (TTL) mechanism: this suits perfect for your needs 9) The column key could be something like "SENSORID~TIMESTAMP", e.g. "US123~1328539905" 10) Cassandra will take care of the column sorting 11) Cassandra is released under the Apache 2.0 license: so it's open source 12) Opscenter from DataStax is a really nice tool with some GUI: for enterprise usage there's a subscription required 13) The high-availability that Cassandra provides will meet your requirements 14) Your contact-node will find out which nodes are responsible for your write/read. Adding, removing or moving nodes is also possible. 15) I have no experience with that, but I'm pretty shure there's someone around here who can help you. Good luck with finding the best database for your problem. 2012/2/6 Heiner Bunjes <bun...@innotec-data.de> > I need a database to log and retrieve sensor data. > > Is cassandra the right solution for this task and if, how should I > set it up and which access methods should I use? > If not, which other DB system might be a better fit? > > > The details are as follows: > > ######## <requirements version="4"> > > Glossary > > - Node = A computer on which an instance of the database > is running > > - Blip = one data record send by a sensor > > - Blip page = The sorted list of all blips for a specific sensor > and a specific time range. > > > The scale is as follows: > > (01) 10E6 sensors deliver 1 blip every 100 seconds > -> Insert rate = 10 kiloblip/s > -> Insert rate ~ 315 gigablip/Year > > (02) They have to be stored for ~3 years > -> Size of database = 1 terablip > > (03) Each blip has about 200 bytes > -> Size of database = 200TB > > (04) The system will start with just 10E4 sensors but will > soon increase upto the described volume. > > > The main operations on the data are: > > (05) Add the new blips to the database > (written blips are never changed)! > > (06) Return all blips for sensor X with a timestamp > between timestamp_a and timestamp_b! > With other words: Return a blip page. > > (07) Return all the blips specified in (06) ordered > by timestamp! > > (08) Delete all blips older than Y! > > > Further the following is true: > > (09) Each added blip is clearly (without ambiguity) identifiable by > sensor_id+timestamp. > > (10) 99.9% of the blips are inserted in > chronological order, the rest is not. > > (11) The database system MUST be free and open source. > > (12) The DB SHOULD be easy to administrate. > > (13) All data MUST still be writable and readable while less > then the configurable number N of nodes are down (unexpectedly). > > (14) The mechanisms to distribute the data to the available > nodes SHOULD be handled by the database. > This means that the database SHOULD automatically > redistribute the data when nodes are added or removed. > > (15) The project is mainly implemented in erlang, so there must be > a stable erlang interface for database access. > > ######## </requirements> > > > Many thanks in advance > Heiner >