Hi, I've been looking at a number of technologies for a simple application.
We are saving large amounts of data to disc; this data is event-log/sensor data which may look something like: Version, Account, RequestID, Timestamp, Duration, IPAddr, Method, URL, HTTP Version, Response_Code, Size, Hit_Rate, Range_From, Range_To, Referrer, Agent, Content_Type, Accept_Encoding, Redirect_Code, Progress For Example: 1 agora 27050938271286652285000000000368375 1289589216.893 1989.938 79.7.41.29 GET http://bi.sciagnij.pl/0/4/TWEE_Upgrade.exe HTTP/1.1 200 953772216 725098308 713834308 -1 -1 - Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1) application/octet-stream gzip - 0 progress The data has no specific key to index off (we will be doing some parsing of the data on ingest to get basic information allowing for fast queries, but this is outside of Riak). Really the issue is that we need to be able to apply "analytic" (map-reduce) type queries on the data. These queries do not need to be real-time, but should not take days to run. For example: All GET requests for a specific URL within a specific time range. The amount of data saved could be quite large (forcing us to use InnoDB instead of BitCask). One estimate is ~1 billion records. Architecturally this data could be split over multiple nodes. The choice of client-side language is still open, with Erlang as the current favorite. As I see it the advantages of Riak are: 1) HTTP based API as well as Erlang and other client APIs (the system has a mix of programming languages including Python and C/C++). 2) More flexible/extensible data model (Cassandra requires you to predefine the key spaces, columns etc ahead of time) 3) Easier to install/setup without the apparent bloat and complexity of Cassandra (which also includes Java setup) 4) Map-reduce queries The disadvantages of Riak are: 1) Write performance. We need to handle ~50,000 writes per second. I would recommend running our client app from within the same Erlang VM as Riak so hopefully we can gain something here. Alternatively use innostore Erlang API directly for writes. Questions: 1) Is Riak a good database for this application? 2) Can we write to InnoDB directly and still leverage the map-reduce queries on the data? Regards Matt _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com