Riak use cases\patterns and a few questions.

Bogunov Wed, 14 Dec 2011 09:38:04 -0800

Hi,
I have been doing research about Riak use cases and patterns (talked to a
few people with production usage experience).
I have tried using it for developing some project from scratch a few months
ago, but received to much resistance from my team, so dropped it =).
Thought i think the technology developed by basho is great, so i want to
write an article or maybe give a talk about it at some local developer
conference =)


So from my own experience and words of those, who i`m talked with - I
developed such a picture in my head:

Riak works very well as "real" k\v:
- user session, user data streams(with append only data) (like facebook
wall), and likes.
not so good for relational\indexed data:
- before 2i indexes you have to roll own your own indexes using riak(but >2
disk reads can be quite costly) or redis\postregsql\mysql (and support
their own sharding\migration\etc, which kind of forces you to write your
own sharding logic anyway, which really doesn't make  sense as you expected
riak to take care of that part).
- with 2i indexes you are doing mini-M\R job and this have some performance
penalties.

Not as well as m\r engine:
  - it is used for multiget (i suppose for languages like php, because
doing map-reduce is easier than doing multi-thread reading)
  - it is used for run-once tasks like building index.
but :
  - using it ad-hoc for analytics seems to be too slow ( why riak doesn't
support "pre"-reduce on "map"-vnode like actual google M\R ?). [actually
this statement based on 0.14 usages, don't know anybody who has used
riak-pipe ]

RiakSearch have been developed for a: checkbox "we have full-text search".
I will compare it with Sphinx(http://sphinxsearch.com/) as i got quite a
lot experience with it(used it extensively and know author personally =)).

RiakSearch does have this magical "availability" built-in, but only for
writes: if you lost shard`s data you have to "reindex" ALL your dataset
completely, because you don't know which part of the dataset, that shard
held (if you lost sphinx`s shard you will have to do this too, but because
sharding will be configured in sphinx`s conf manually, you won't need to
reindex all data, only lost part).

If Sphinx`s shard dies you will have to wait for it to timeout, but you can
configure timeout, but as far as i understand you can't configure timeout
in RiakSearch.

RiakSearch can consume a lot of memory while searching low cardinality
terms because it uses term-based sharding and there is no way to fix this
(except inline fields), sphinx uses fixed amount of memory for sorting and
weighting(and document-base sharding) (this obviously leads to approximate
results + limit to returned ids, but for full-text search this is
acceptable), can i configure same fixed memory usage for RiakSearch request
?.
Plus sphinx has configurable ranking, grouping, sorting and weighting, and
plain faster. Thought sphinx has accumulated at least 10-20 man-years of
development and more features =)

So here are the questions:
1) I think I'm a little biased here, but why would i use RiakSearch at all
? I think using sphinx\solr(elastic search) will give me more benefits as
they are specialized full-text search engines with most of the
search-related problems solved already. And RiakSearch will force me to
reindex lost node\data as any other search engine, so practically no gain
from that feature .
2) Does anybody uses Riak`s M\R on daily basis for analytics ? Or everybody
tend to use specialized data-warehouse products for that matter ?
3) Anybody uses 2i heavily ? Using sharded redis\*sql seems to be wrong,
but that was the only way before 2i ? Maybe there are trickier way to do
this (keeping indexes at ets\lets backend ) ? How majority of users do this
?
4) What happens with 2i then one of "covering" vnode fails ?Can i configure
timeout for it ?
5) Theoretical question regarding quorums: what happens if I issue a (w =
2) put, first vnode aknowledge write, but the second will fail, and quorum
will trigger failure, does first node will roll-back write (I assume - no)
?
If it doesn't then will it be deleted during read-repair ? Because having
sibling of write which i aknowledged as failed, and "replayed" later can
cause too much inconsistency. What is the right way to handle this ?
6) Does anybody uses some sort of sorting\pagination with Riak ? Do you
prebuild specialized sorted index for it to work ? Using RiakSearch for it
doesn't seems to be a good idea.

Thanks in advance =)
-- 
email: bogu...@gmail.com
skype: i.bogunov
phone: +7 968 842 5783
Regards, Bogunov Ilya

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Riak use cases\patterns and a few questions.

Reply via email to