Hi, I have been doing research about Riak use cases and patterns (talked to a few people with production usage experience). I have tried using it for developing some project from scratch a few months ago, but received to much resistance from my team, so dropped it =). Thought i think the technology developed by basho is great, so i want to write an article or maybe give a talk about it at some local developer conference =)
So from my own experience and words of those, who i`m talked with - I developed such a picture in my head: Riak works very well as "real" k\v: - user session, user data streams(with append only data) (like facebook wall), and likes. not so good for relational\indexed data: - before 2i indexes you have to roll own your own indexes using riak(but >2 disk reads can be quite costly) or redis\postregsql\mysql (and support their own sharding\migration\etc, which kind of forces you to write your own sharding logic anyway, which really doesn't make sense as you expected riak to take care of that part). - with 2i indexes you are doing mini-M\R job and this have some performance penalties. Not as well as m\r engine: - it is used for multiget (i suppose for languages like php, because doing map-reduce is easier than doing multi-thread reading) - it is used for run-once tasks like building index. but : - using it ad-hoc for analytics seems to be too slow ( why riak doesn't support "pre"-reduce on "map"-vnode like actual google M\R ?). [actually this statement based on 0.14 usages, don't know anybody who has used riak-pipe ] RiakSearch have been developed for a: checkbox "we have full-text search". I will compare it with Sphinx(http://sphinxsearch.com/) as i got quite a lot experience with it(used it extensively and know author personally =)). RiakSearch does have this magical "availability" built-in, but only for writes: if you lost shard`s data you have to "reindex" ALL your dataset completely, because you don't know which part of the dataset, that shard held (if you lost sphinx`s shard you will have to do this too, but because sharding will be configured in sphinx`s conf manually, you won't need to reindex all data, only lost part). If Sphinx`s shard dies you will have to wait for it to timeout, but you can configure timeout, but as far as i understand you can't configure timeout in RiakSearch. RiakSearch can consume a lot of memory while searching low cardinality terms because it uses term-based sharding and there is no way to fix this (except inline fields), sphinx uses fixed amount of memory for sorting and weighting(and document-base sharding) (this obviously leads to approximate results + limit to returned ids, but for full-text search this is acceptable), can i configure same fixed memory usage for RiakSearch request ?. Plus sphinx has configurable ranking, grouping, sorting and weighting, and plain faster. Thought sphinx has accumulated at least 10-20 man-years of development and more features =) So here are the questions: 1) I think I'm a little biased here, but why would i use RiakSearch at all ? I think using sphinx\solr(elastic search) will give me more benefits as they are specialized full-text search engines with most of the search-related problems solved already. And RiakSearch will force me to reindex lost node\data as any other search engine, so practically no gain from that feature . 2) Does anybody uses Riak`s M\R on daily basis for analytics ? Or everybody tend to use specialized data-warehouse products for that matter ? 3) Anybody uses 2i heavily ? Using sharded redis\*sql seems to be wrong, but that was the only way before 2i ? Maybe there are trickier way to do this (keeping indexes at ets\lets backend ) ? How majority of users do this ? 4) What happens with 2i then one of "covering" vnode fails ?Can i configure timeout for it ? 5) Theoretical question regarding quorums: what happens if I issue a (w = 2) put, first vnode aknowledge write, but the second will fail, and quorum will trigger failure, does first node will roll-back write (I assume - no) ? If it doesn't then will it be deleted during read-repair ? Because having sibling of write which i aknowledged as failed, and "replayed" later can cause too much inconsistency. What is the right way to handle this ? 6) Does anybody uses some sort of sorting\pagination with Riak ? Do you prebuild specialized sorted index for it to work ? Using RiakSearch for it doesn't seems to be a good idea. Thanks in advance =) -- email: bogu...@gmail.com skype: i.bogunov phone: +7 968 842 5783 Regards, Bogunov Ilya
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com