Hi Max, Although using secondary indexes most likely can be used to solve your problem, it may not be the most efficient way.
When migrating to Riak from a relational model one often need to alter the data model depending on how you will need to use and search the data rather than migrate it directly from a table to bucket. Sean Cribbs have created a couple of blog posts [1,2] that describes this. When using Riak, it is often very useful to denormalize your model to some extent in order to make queries and updates more efficient. Given the data model you provided, I would suggesting storing a list of users together with the client. When doing this you can also store additional relevant information (especially if this is not updated frequently) that will help you query through the client object together with the user link. In the same way I would also consider storing a list of servers as part of the users object. Another very important thing to consider is using semantic keys. In your data model every object seem to have a serial primary key, which is a very common approach in relational databases. If you have this stored in some external system and you tend to know this before querying Riak, it may make sense to keep it, but otherwise I would recommend using more meaningful keys. As each client has a unique title, this would potentially be a great candidate for a key in Riak as you may know this when looking up the client. The same way the users could be identified by a key built as <client_id>_<user title> as this appears to be a unique combination. If servers have a similarly unique field, you could build a key in a similar way using the user_id as a prefix. If you use LevelDB as a backend, you can also add secondary indexes to help find objects with different criteria efficiently. Riak is at its core a key-value system, and accesing data directly through a key is the most fastest and most scalable way. The reason for this is that Riak directly can identify the partitions that hold the data and go directly to these independent of how many physical nodes and partitions you have. Secondary indexes on the other hand will need to be looked up at a covering set of partitions as records matching the index may exist on a number of partitions. The size of the covering set is usually calculated as the ring size divided by the n value (rounded up). This results in a greater number of partitions taking part in a secondary index query compared to a straight key-value lookup, which usually results in greater latencies and additional load on the system. Having said this, secondary index queries still tend to be quite fast, and it may very well be that for your scenario this may be a better way to go. Given this, it may be that retrieving the user record directly through the key and then fetching the related server objects through the keys using a parallel get may very well turn out to be faster than identifying servers through a secondary index query and then retrieve them in parallel. The best way to find out which way is best for you particular problem is usually to try to benchmark it using as realistic data volumes and access patterns as possible. [1] http://basho.com/schema-design-in-riak-introduction/ [2] http://basho.com/schema-design-in-riak-relationships/ I hope this helps. Best regards, Christian -------------------- Christian Dahlqvist Client Services Engineer Basho Technologies EMEA Office E-mail: christ...@basho.com Skype: c.dahlqvist Mobile: +44 7890 590 910 On 22 Mar 2013, at 06:38, Max Lapshin <max.laps...@gmail.com> wrote: > Hi. I'm looking for alternative to postgresql in my project. > I have two problems with postgres: > 1) if master goes down, problem must be resolved manually and very quickly > 2) I want cross datacenter replication > > Here is my schema: http://pastie.org/7058511 > > In short: client is a user that is logging in on my server and he has many > sub-users and many servers. > > I need to make lookups like "give me all sub-users of this client" and "give > me all servers of this client". > > Also, there is going to be a huge traffic to update "users" table: aggregated > statistics about their usage. > > Is riak ok for it? Will secondary index help me with it? > > > > > I have already looked at Couchbase. It has wonderful admin console, but it > failed to launch when my IP address has changed and it has no erlang API. > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com