For limited mapreduce (where you know the keys in advance) riak would be
a fine choice. 500 million keys, n val 3 is readily achievable on
commodity hardware; say four nodes with 128GB SSDs.
If large-scale mapreduce (more than a few hundred thousand keys) is
important, or listing keys is critical, you might consider HBase.
If you start hitting latency/write bottlenecks, it may be worth
accumulating metrics in Redis before flushing them to disk.
At Showyou, we're also building a custom backend called Mecha which
integrates Riak and SOLR, specifically for this kind of analytics over
billions of keys. We haven't packaged it for open-source release yet,
but it might be worth talking about off-list.
--Kyle
On 11/28/2011 02:07 PM, Michael Dungan wrote:
Hi,
Sorry if this has been asked before - I couldn't find a searchable
archive of this list.
I was told to ask this list whether or not Riak would be appropriate for
tracking our site's metrics. We are currently using Redis for this but
are at the point where we need both clustering and m/r capability, and
on the surface, Riak looks to fit this bill (we already use Erlang
elsewhere in our app, so that's an additional plus).
The records are pretty small and can be representated easily in json. An
example:
{
"id": "c4473dc5cfc5da53831d47c4c016d1c7de0a31e4fd94229e47ade569ef011a7b"
"type": "Photo::Click",
"user_id": 2640,
"photo_id": 255,
"ip": "100.101.102.103",
"created_at": "2011/04/08 17:09:40 -0700"
}
We currently have around 25 million records similar to this one, and are
adding 4-5 million more each month.
Is Riak appropriate for this use case? Are there any gotchas I need to
be aware of?
thank you,
-mike
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com