On 02/16/2012 01:07 AM, Jerome Renard wrote:
Hello,

I am really interested into Riak but I would like to know if my goals
can be achieved with for my project.

The use case is the following :

- I need to support 10 000 writes/second minimum. Object size will be
from 1kb to 5kb

Definitely. 10 SSDs should do it.

- I need to organize data in buckets. Bucket 'dc1' would store all my
data from data center one, bucket 'dc2' would store all my data from
data center two, etc. The size of a bucket is going to grow large
really quickly; (Would links be a more relevant alternative ?)

Yes; buckets are just prefixed namespaces for keys. You can have vast numbers of keys in a bucket. Links are something totally different.

- I need to search on these data : full-text search + facets. Searches
will most likely be date based range queries;

Riak_search and secondary indices may meet your needs, but read the docs carefully. You may also be interested in
http://www.meetup.com/San-Francisco-Riak-Meetup/events/51287272/

- Those data are meant to expire after a certain period of time, so I
will have to run large delete operations every week/month/year.

Yup, readily doable. Just be aware that listing keys in stock Riak can be expensive. Search or 2I may reduce that load.

- I can get a replication factor of 2 if needed instead of 3 by default;

Yup. Just set n_val in the bucket properties once and you're good.

- I need to get my disk space back when I remove data. This may sounds
odd to some users, but people who come from a MySQL/InnoDB
background know that removing large amounts of data does not mean you
will get your disk space back ;)

Bitcask and leveldb both reclaim space. Bitcask is log-structured so you'll see some (configurable) amount of dead space--in my continuous-compaction environment, roughly 15-30% dead bytes on top of the "real" dataset.

Based on the use case described below my questions are:

- I am sure Riak can achieve the required write speed. But is there
any hardware recommandations for storing Tb of quickly growing data ?

It sounds like your data might be largely immutable and write heavy. In that case I would try a small number of partitions per host, bitcask, huge amounts of spinning disk, and lots of RAM cache on top of that. Those writes will translate into big stripey writes on top of the disk. If you can afford SSDs that's the obvious option.

Be aware that bitcask does not support compression--but if your filesystem performs block-level compression then you should see excellent savings. The riak robject structure on disk is quite fluffy and compresses well. We see ~30% on snappy block-level compression in leveldb.

- Which storage backend would be the most relevant to me ? Bitcast or LevelDB ?

Leveldb supports secondary indexes, but I don't understand its performance characteristics across a variety of read/write loads. Maybe someone else can chime in?

- Does riak-search support facetted search ? (I would say yes but I
found no documentation in the wiki about that)

- Will it be a problem if I decide to run Riak on ZFS + compression enabled ?

I suspect it would work quite well. If you try it, please report back!

If you need any more details feel free to ask.

Thanks in advance for your feedback.

Best Regards,

--
Jérôme Renard
http://39web.fr | http://jrenard.info | http://twitter.com/jeromerenard

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to