Re: Need help with moving to Riak

Christian Dahlqvist Fri, 22 Mar 2013 01:54:37 -0700

Hi Max,

Although using secondary indexes most likely can be used to solve your problem, 
it may not be the most efficient way.

When migrating to Riak from a relational model one often need to alter the data 
model depending on how you will need to use and search the data rather than 
migrate it directly from a table to bucket. Sean Cribbs have created a couple 
of blog posts [1,2] that describes this.

When using Riak, it is often very useful to denormalize your model to some 
extent in order to make queries and updates more efficient. Given the data 
model you provided, I would suggesting storing a list of users together with 
the client. When doing this you can also store additional relevant information 
(especially if this is not updated frequently) that will help you query through 
the client object together with the user link.

In the same way I would also consider storing a list of servers as part of the 
users object.

Another very important thing to consider is using semantic keys. In your data 
model every object seem to have a serial primary key, which is a very common 
approach in relational databases. If you have this stored in some external 
system and you tend to know this before querying Riak, it may make sense to 
keep it, but otherwise I would recommend using more meaningful keys.

As each client has a unique title, this would potentially be a great candidate 
for a key in Riak as you may know this when looking up the client. The same way 
the users could be identified by a key built as <client_id>_<user title> as 
this appears to be a unique combination. If servers have a similarly unique 
field, you could build a key in a similar way using the user_id as a prefix.

If you use LevelDB as a backend, you can also add secondary indexes to help 
find objects with different criteria efficiently.

Riak is at its core a key-value system, and accesing data directly through a 
key is the most fastest and most scalable way. The reason for this is that Riak 
directly can identify the partitions that hold the data and go directly to 
these independent of how many physical nodes and partitions you have. Secondary 
indexes on the other hand will need to be looked up at a covering set of 
partitions as records matching the index may exist on a number of partitions. 
The size of the covering set is usually calculated as the ring size divided by 
the n value (rounded up). This results in a greater number of partitions taking 
part in a secondary index query compared to a straight key-value lookup, which 
usually results in greater latencies and additional load on the system. Having 
said this, secondary index queries still tend to be quite fast, and it may very 
well be that for your scenario this may be a better way to go. 

Given this, it may be that retrieving the user record directly through the key 
and then fetching the related server objects through the keys using a parallel 
get may very well turn out to be faster than identifying servers through a 
secondary index query and then retrieve them in parallel. The best way to find 
out which way is best for you particular problem is usually to try to benchmark 
it using as realistic data volumes and access patterns as possible.

[1] http://basho.com/schema-design-in-riak-introduction/
[2] http://basho.com/schema-design-in-riak-relationships/

I hope this helps.

Best regards,

Christian

--------------------
Christian Dahlqvist
Client Services Engineer
Basho Technologies
EMEA Office
E-mail: christ...@basho.com
Skype: c.dahlqvist
Mobile: +44 7890 590 910

On 22 Mar 2013, at 06:38, Max Lapshin <max.laps...@gmail.com> wrote:

> Hi. I'm looking for alternative to postgresql in my project.
> I have two problems with postgres:
> 1) if master goes down, problem must be resolved manually and very quickly
> 2) I want cross datacenter replication
> 
> Here is my schema: http://pastie.org/7058511
> 
> In short: client is a user that is logging in on my server and he has many 
> sub-users and many servers.
> 
> I need to make lookups like "give me all sub-users of this client" and "give 
> me all servers of this client".
> 
> Also, there is going to be a huge traffic to update "users" table: aggregated 
> statistics about their usage.
> 
> Is riak ok for it? Will secondary index help me with it?
> 
> 
> 
> 
> I have already looked at Couchbase. It has wonderful admin console, but it 
> failed to launch when my IP address has changed and it has no erlang API.
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Need help with moving to Riak

Reply via email to