Hi all. I've been looking around for a bit with some sort of guidelines as to how best to structure search indexes within Riak 2.0 - and have yet to come up with anything that satisfies my questions.
I came across https://github.com/basho/yokozuna/blob/develop/docs/ADMIN.md, where it talks about the one-to-one and many-to-one ways of indexing. It mentions in passing the potential for lower latency of queries and efficient deletion of index data when using the one-to-one method - without really mentioning too much about when one method could significantly outweigh the other in performance. However, something I'm still not sure on is *when* is it considered a good idea to use multiple indexes, versus one massive index. If you'll bear with me, I'll use this simple scenario: I have lists, and I have contacts within these lists. In total, I have 100 million contacts that I am dealing with. Each of them not more than 20KB in size, and they all follow the exact same JSON object structure. Ignoring application design for simplicity's sake, let's say I could choose between the following two ways of storing lists and contacts: 1. Having two buckets: *lists* and *contacts*. All 100 million contacts are stored in the *contacts* bucket. Each contact object is linked to its corresponding list through a *list_key* property, and all the contacts are stored in the same single search index. 2. Having multiple buckets: *lists*, and for each list, having a separate bucket *contacts_{listkey}*. Using this structure, each *contact_{listkey}* bucket would have its own search index. With these two scenarios in mind; and making the assumption that we're dealing with 100 million contacts: 1. Which would be the better method of implementing the search indexes? 2. At which point would one solution be far better than the other? 3. How much does Yokozuna differ from stock-standard Solr? All the search results I could find on Solr specifically weren't talking about indexes greater than 60,000 objects, yet Riak is required to be able to deal with 100's of millions of rows. Any help at all with this is really appreciated. At some point, I do realise that I will need to set this up for myself, and performance my own tests on it. However, I was hoping that those currently using Riak in production might have some more insight into this. Regards, Geoff
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com