Hello!

I'm considering Riak for the statistics of a site that is approaching  a
billion page views per month.
The plan is to log a little information about each the page view and then
to query that data.

I'm very new to Riak.  I've gone over the documentation on the wiki, and I
know about map-reduce,
secondary indexes and Riak search. I've installed Riak on a single node and
made a test with the
default configuration. The results were a little bellow what I expected.
For the test is used the following
requirement.

We want the page view count by day for registered and unregistered users.
We are storing session
documents. Each document has a session identifier as it's key and a list of
page views as the value
(and a few additional properties we can ignore). This document structure
comes from CouchDB,
where I organised things like this to be able to more easily query the
database. I've done a basic
javascript map-reduce query for this. I just map over each session (every
k/v in a bucket) returning
the length of the page views array for either the registered or
unregistered field (the other is zero), and
the day of the request. In the reduce I collect them by hashing the day and
summing the two number
of page views. Then I have a second reduce to sort the list by day.

This is very slow on a single machine setup with default Riak
configuration. 1.000 sessions takes
6 seconds. 10.000 sessions takes more that 2 minutes (timeout). We want to
handle 10.000.000
sessions, at least. Is there a way, maybe with secondary indexes, to make
this go faster using only Riak?
Or must I use some kind of persistent cache to store this info as time goes
by? Or can I make Riak
run 100 times faster by tweaking the config? I don't want to have 1000
machines for making this work.

Also, will updating the session documents be a problem for Riak? Would it
be better to store each
page hit under a new key, to not update the the session document. Because
of the "multilevel" map
reduce this ca work on Riak, where it didn't work on CouchDB, because its
view system limitation.
Unfortunately, with the update of documents the CouchDB database was
growing way too fast for it
to be a feasible solution.


Any advice to make Riak work for this problem is greatly appreciated.

Thanks,
Marco
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to