Hi all.

I've been looking around for a bit with some sort of guidelines as to how
best to structure search indexes within Riak 2.0 - and have yet to come up
with anything that satisfies my questions.

I came across https://github.com/basho/yokozuna/blob/develop/docs/ADMIN.md,
where it talks about the one-to-one and many-to-one ways of indexing. It
mentions in passing the potential for lower latency of queries and
efficient deletion of index data when using the one-to-one method - without
really mentioning too much about when one method could significantly
outweigh the other in performance.

However, something I'm still not sure on is *when* is it considered a good
idea to use multiple indexes, versus one massive index.

If you'll bear with me, I'll use this simple scenario:
I have lists, and I have contacts within these lists. In total, I have 100
million contacts that I am dealing with. Each of them not more than 20KB in
size, and they all follow the exact same JSON object structure. Ignoring
application design for simplicity's sake, let's say I could choose between
the following two ways of storing lists and contacts:


   1. Having two buckets: *lists* and *contacts*.
   All 100 million contacts are stored in the *contacts* bucket. Each
   contact object is linked to its corresponding list through a *list_key*
   property, and all the contacts are stored in the same single search index.

   2. Having multiple buckets: *lists*, and for each list, having a
   separate bucket *contacts_{listkey}*.
   Using this structure, each *contact_{listkey}* bucket would have its own
   search index.

With these two scenarios in mind; and making the assumption that we're
dealing with 100 million contacts:

   1. Which would be the better method of implementing the search indexes?
   2. At which point would one solution be far better than the other?
   3. How much does Yokozuna differ from stock-standard Solr? All the
   search results I could find on Solr specifically weren't talking about
   indexes greater than 60,000 objects, yet Riak is required to be able to
   deal with 100's of millions of rows.

Any help at all with this is really appreciated.
At some point, I do realise that I will need to set this up for myself, and
performance my own tests on it. However, I was hoping that those currently
using Riak in production might have some more insight into this.

Regards,
Geoff
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to