Hi,

I've been looking at a number of technologies for a simple application.

We are saving large amounts of data to disc; this data is event-log/sensor data 
which may look something like:

Version, Account, RequestID, Timestamp, Duration, IPAddr, Method, URL, HTTP 
Version, Response_Code, Size, Hit_Rate, Range_From, Range_To, Referrer, Agent, 
Content_Type, Accept_Encoding, Redirect_Code, Progress


For Example:

1 agora 27050938271286652285000000000368375 1289589216.893 1989.938 79.7.41.29 
GET http://bi.sciagnij.pl/0/4/TWEE_Upgrade.exe HTTP/1.1 200 953772216 725098308 
713834308 -1 -1 - Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1) 
application/octet-stream gzip - 0 progress

The data has no specific key to index off (we will be doing some parsing of the 
data on ingest to get basic information allowing for fast queries, but this is 
outside of Riak).

Really the issue is that we need to be able to apply "analytic" (map-reduce) 
type queries on the data. These queries do not need to be real-time, but should 
not take days to run.

For example: All GET requests for a specific URL within a specific time range.

The amount of data saved could be quite large (forcing us to use InnoDB instead 
of BitCask). One estimate is ~1 billion records. Architecturally this data 
could be split over multiple nodes.

The choice of client-side language is still open, with Erlang as the current 
favorite. As I see it the advantages of Riak are:

1) HTTP based API as well as Erlang and other client APIs (the system has a mix 
of programming languages including Python and C/C++).

2) More flexible/extensible data model (Cassandra requires you to predefine the 
key spaces, columns etc ahead of time)

3) Easier to install/setup without the apparent bloat and complexity of 
Cassandra (which also includes Java setup)

4) Map-reduce queries

The disadvantages of Riak are:

1) Write performance. We need to handle ~50,000 writes per second.

I would recommend running our client app from within the same Erlang VM as Riak 
so hopefully we can gain something here. Alternatively use innostore Erlang API 
directly for writes.

Questions:

1) Is Riak a good database for this application?

2) Can we write to InnoDB directly and still leverage the map-reduce queries on 
the data?

Regards

Matt



_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to