I need a database to log and retrieve sensor data. Is cassandra the right solution for this task and if, how should I set it up and which access methods should I use? If not, which other DB system might be a better fit?
The details are as follows: ######## <requirements version="4"> Glossary - Node = A computer on which an instance of the database is running - Blip = one data record send by a sensor - Blip page = The sorted list of all blips for a specific sensor and a specific time range. The scale is as follows: (01) 10E6 sensors deliver 1 blip every 100 seconds -> Insert rate = 10 kiloblip/s -> Insert rate ~ 315 gigablip/Year (02) They have to be stored for ~3 years -> Size of database = 1 terablip (03) Each blip has about 200 bytes -> Size of database = 200TB (04) The system will start with just 10E4 sensors but will soon increase upto the described volume. The main operations on the data are: (05) Add the new blips to the database (written blips are never changed)! (06) Return all blips for sensor X with a timestamp between timestamp_a and timestamp_b! With other words: Return a blip page. (07) Return all the blips specified in (06) ordered by timestamp! (08) Delete all blips older than Y! Further the following is true: (09) Each added blip is clearly (without ambiguity) identifiable by sensor_id+timestamp. (10) 99.9% of the blips are inserted in chronological order, the rest is not. (11) The database system MUST be free and open source. (12) The DB SHOULD be easy to administrate. (13) All data MUST still be writable and readable while less then the configurable number N of nodes are down (unexpectedly). (14) The mechanisms to distribute the data to the available nodes SHOULD be handled by the database. This means that the database SHOULD automatically redistribute the data when nodes are added or removed. (15) The project is mainly implemented in erlang, so there must be a stable erlang interface for database access. ######## </requirements> Many thanks in advance Heiner