Is Riak a good solution for this problem?
Hello! I'm considering Riak for the statistics of a site that is approaching a billion page views per month. The plan is to log a little information about each the page view and then to query that data. I'm very new to Riak. I've gone over the documentation on the wiki, and I know about map-reduce, secondary indexes and Riak search. I've installed Riak on a single node and made a test with the default configuration. The results were a little bellow what I expected. For the test is used the following requirement. We want the page view count by day for registered and unregistered users. We are storing session documents. Each document has a session identifier as it's key and a list of page views as the value (and a few additional properties we can ignore). This document structure comes from CouchDB, where I organised things like this to be able to more easily query the database. I've done a basic javascript map-reduce query for this. I just map over each session (every k/v in a bucket) returning the length of the page views array for either the registered or unregistered field (the other is zero), and the day of the request. In the reduce I collect them by hashing the day and summing the two number of page views. Then I have a second reduce to sort the list by day. This is very slow on a single machine setup with default Riak configuration. 1.000 sessions takes 6 seconds. 10.000 sessions takes more that 2 minutes (timeout). We want to handle 10.000.000 sessions, at least. Is there a way, maybe with secondary indexes, to make this go faster using only Riak? Or must I use some kind of persistent cache to store this info as time goes by? Or can I make Riak run 100 times faster by tweaking the config? I don't want to have 1000 machines for making this work. Also, will updating the session documents be a problem for Riak? Would it be better to store each page hit under a new key, to not update the the session document. Because of the "multilevel" map reduce this ca work on Riak, where it didn't work on CouchDB, because its view system limitation. Unfortunately, with the update of documents the CouchDB database was growing way too fast for it to be a feasible solution. Any advice to make Riak work for this problem is greatly appreciated. Thanks, Marco ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Is Riak a good solution for this problem?
ces the same way. If you give all your page count objects a > 2i index field, you can then pass it as an input to map/reduce query, you > are now instantly limiting which objects are getting scanned to only those > with the 2i field. This has the added benefit of allowing you to range > query (e.g. if your field was a UTC timestamp, you could look at only the > page hits for sessions over the last week, month, day, minute, …). > > Hope this helps, if you have the time/ability to try the above and give > feedback on the results, I'd be very interested in learning them and > helping further. > > -- > Jeffrey Massung > j...@basho.com > > On Feb 12, 2012, at 4:27 AM, Marco Monteiro wrote: > > Hello! > > I'm considering Riak for the statistics of a site that is approaching a > billion page views per month. > The plan is to log a little information about each the page view and then > to query that data. > > I'm very new to Riak. I've gone over the documentation on the wiki, and I > know about map-reduce, > secondary indexes and Riak search. I've installed Riak on a single node > and made a test with the > default configuration. The results were a little bellow what I expected. > For the test is used the following > requirement. > > We want the page view count by day for registered and unregistered users. > We are storing session > documents. Each document has a session identifier as it's key and a list > of page views as the value > (and a few additional properties we can ignore). This document structure > comes from CouchDB, > where I organised things like this to be able to more easily query the > database. I've done a basic > javascript map-reduce query for this. I just map over each session (every > k/v in a bucket) returning > the length of the page views array for either the registered or > unregistered field (the other is zero), and > the day of the request. In the reduce I collect them by hashing the day > and summing the two number > of page views. Then I have a second reduce to sort the list by day. > > This is very slow on a single machine setup with default Riak > configuration. 1.000 sessions takes > 6 seconds. 10.000 sessions takes more that 2 minutes (timeout). We want to > handle 10.000.000 > sessions, at least. Is there a way, maybe with secondary indexes, to make > this go faster using only Riak? > Or must I use some kind of persistent cache to store this info as time > goes by? Or can I make Riak > run 100 times faster by tweaking the config? I don't want to have 1000 > machines for making this work. > > Also, will updating the session documents be a problem for Riak? Would it > be better to store each > page hit under a new key, to not update the the session document. Because > of the "multilevel" map > reduce this ca work on Riak, where it didn't work on CouchDB, because its > view system limitation. > Unfortunately, with the update of documents the CouchDB database was > growing way too fast for it > to be a feasible solution. > > > Any advice to make Riak work for this problem is greatly appreciated. > > Thanks, > Marco > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Problems writing objects to an half full bucket
Hello! I have a riak cluster and I'm seeing a write fail rate of 10% to 30% (varies with the nodes). At the moment I am writing about 300 new objects per second to the same bucket. If I direct the write to a new (empty) bucket the problem goes away and I don't see any failure. The non-empty bucket has between 2 and 3 million objects. Each object has between 4 and 8 secondary indexes (most have 4). When we started the system, yesterday, it handled a peak of about 1000 writes per second without problems, with the same hardware. The cluster has 6 nodes, all debian with Riak 1.0.3. We tried Riak 1.1 at first, but had the known map-reduce problem and reverted back. I requested help on the IRC channel and pharkmillups suggested that Riak is just trying to write too many things to the disk, given the secondary index. This is an issue report but if someone has any idea of how a change to the configuration can fix this, please do tell. I would also like to know what the problem is (why this happens) and if it can be fixed in the next few days with maybe a new release of Riak 1.1, along with the fixes for the map-reduce problems. Thanks, Marco ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Problems writing objects to an half full bucket
Hi, David! On 6 March 2012 04:37, David Smith wrote: > 1. What sort of error are you getting when a write fails? > I'm using riak-js and the error I get is: { [Error: socket hang up] code: 'ECONNRESET' } > 2. What backend are you using? (I'm guessing LevelDB) > LevelDB. The documentation says this is the only one to support 2i. > 3. What do your keys look like? For example, are they date-based (and > thus naturally increasing) or are they UUIDs? :) > UUIDs. They are created by Riak. All my queries use 2i. The 2i are integers (representing seconds) and random strings (length 16) used as identifiers for user sessions and similar. Thanks, Marco ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Problems writing objects to an half full bucket
It makes sense, David. I'm going to give it a try. Hopefully this will make it usable for the next month until the issue is addressed. I'll let you know how it goes. Thanks, Marco On 6 March 2012 15:19, David Smith wrote: > On Mon, Mar 5, 2012 at 9:55 PM, Marco Monteiro > wrote: > > > I'm using riak-js and the error I get is: > > > > { [Error: socket hang up] code: 'ECONNRESET' } > > That is a strange error -- are there any corresponding errors in > server logs? I would have expected a timeout or some such... > > > > > UUIDs. They are created by Riak. All my queries use 2i. The 2i are > integers > > (representing seconds) and random strings (length 16) used as identifiers > > for user sessions and similar. > > So, this explains why the problem goes away when you switch to an > empty bucket. A bit of background... > > If you're using the functionality in Riak that automatically generates > a UUID on PUT, you're going to get a uniformly distributed 160-bit > number (since the implementation SHA-1 hashes the input). This sort of > distribution is great for uniqueness, since there is a 1 in 2^160 > chance (roughly) that you will encounter another similar ID. It can be > very bad from a caching perspective, however, if you have a cache that > uses pages of information for locality purposes. In a scheme such as > this (which is what LevelDB uses), the system will wind up churning > the cache constantly since the odds are quite low that the next UUID > to be accessed will be already in memory (remember, uniform > distribution of keys). > > LevelDB also makes this pathological case a bit worse by not having > bloom filters -- when inserting a new UUID, you will potentially have > to do 7 disk seeks just to determine if the UUID is not present. The > Google team is working to address this problem, but I'm guessing it'll > be a month or so before that's done and then we have to integrate with > Riak -- so we can't count on that just yet. > > Now, all is not lost. :) > > If you craft your keys so that there is some temporal locality _and_ > the access pattern of your keys has some sort of exponential-ish > decay, you can still get very good performance out of LevelDB. One > simple way to do this is to prefix the current date-time on front of > the UUID, like so: > > 201203060806- (YMDhm-UUID) > > You could also use seconds since the epoch, etc. This has the effect > of keeping recently accessed/hot UUIDs on (close to) the same cache > page, and lets you avoid a lot of cache churn and typically > dramatically improves LevelDB performance. > > Does this help/make sense? > > D. > -- > Dave Smith > VP, Engineering > Basho Technologies, Inc. > diz...@basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Problems writing objects to an half full bucket
Having the keys prefixed with the seconds since epoch solved the problem. Thanks, Marco On 6 March 2012 15:47, Marco Monteiro wrote: > It makes sense, David. I'm going to give it a try. > Hopefully this will make it usable for the next month > until the issue is addressed. > > I'll let you know how it goes. > > Thanks, > Marco > > > On 6 March 2012 15:19, David Smith wrote: > >> On Mon, Mar 5, 2012 at 9:55 PM, Marco Monteiro >> wrote: >> >> > I'm using riak-js and the error I get is: >> > >> > { [Error: socket hang up] code: 'ECONNRESET' } >> >> That is a strange error -- are there any corresponding errors in >> server logs? I would have expected a timeout or some such... >> >> > >> > UUIDs. They are created by Riak. All my queries use 2i. The 2i are >> integers >> > (representing seconds) and random strings (length 16) used as >> identifiers >> > for user sessions and similar. >> >> So, this explains why the problem goes away when you switch to an >> empty bucket. A bit of background... >> >> If you're using the functionality in Riak that automatically generates >> a UUID on PUT, you're going to get a uniformly distributed 160-bit >> number (since the implementation SHA-1 hashes the input). This sort of >> distribution is great for uniqueness, since there is a 1 in 2^160 >> chance (roughly) that you will encounter another similar ID. It can be >> very bad from a caching perspective, however, if you have a cache that >> uses pages of information for locality purposes. In a scheme such as >> this (which is what LevelDB uses), the system will wind up churning >> the cache constantly since the odds are quite low that the next UUID >> to be accessed will be already in memory (remember, uniform >> distribution of keys). >> >> LevelDB also makes this pathological case a bit worse by not having >> bloom filters -- when inserting a new UUID, you will potentially have >> to do 7 disk seeks just to determine if the UUID is not present. The >> Google team is working to address this problem, but I'm guessing it'll >> be a month or so before that's done and then we have to integrate with >> Riak -- so we can't count on that just yet. >> >> Now, all is not lost. :) >> >> If you craft your keys so that there is some temporal locality _and_ >> the access pattern of your keys has some sort of exponential-ish >> decay, you can still get very good performance out of LevelDB. One >> simple way to do this is to prefix the current date-time on front of >> the UUID, like so: >> >> 201203060806- (YMDhm-UUID) >> >> You could also use seconds since the epoch, etc. This has the effect >> of keeping recently accessed/hot UUIDs on (close to) the same cache >> page, and lets you avoid a lot of cache churn and typically >> dramatically improves LevelDB performance. >> >> Does this help/make sense? >> >> D. >> -- >> Dave Smith >> VP, Engineering >> Basho Technologies, Inc. >> diz...@basho.com >> > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com