Hey Eric.

Awesome! Thanks so much for the feedback. I really appreciate the help.

I guess I'll have to test each method on our infrastructure to really know for 
myself.

Cheers,
Geoff

Eric Redmond <eredm...@basho.com> wrote:

>Geoff, comments inline.
>
>
>On Nov 13, 2014, at 3:13 PM, Geoff Garbers <ge...@totalsend.com> wrote:
>
>
>Hi all.
>
>
>I've been looking around for a bit with some sort of guidelines as to how best 
>to structure search indexes within Riak 2.0 - and have yet to come up with 
>anything that satisfies my questions.
>
>
>I came across https://github.com/basho/yokozuna/blob/develop/docs/ADMIN.md, 
>where it talks about the one-to-one and many-to-one ways of indexing. It 
>mentions in passing the potential for lower latency of queries and efficient 
>deletion of index data when using the one-to-one method - without really 
>mentioning too much about when one method could significantly outweigh the 
>other in performance.
>
>
>However, something I'm still not sure on is when is it considered a good idea 
>to use multiple indexes, versus one massive index.
>
>
>If you'll bear with me, I'll use this simple scenario:
>
>I have lists, and I have contacts within these lists. In total, I have 100 
>million contacts that I am dealing with. Each of them not more than 20KB in 
>size, and they all follow the exact same JSON object structure. Ignoring 
>application design for simplicity's sake, let's say I could choose between the 
>following two ways of storing lists and contacts:
>
>
>Having two buckets: lists and contacts.
>All 100 million contacts are stored in the contacts bucket. Each contact 
>object is linked to its corresponding list through a list_key property, and 
>all the contacts are stored in the same single search index.
>
>Having multiple buckets: lists, and for each list, having a separate bucket 
>contacts_{listkey}.
>Using this structure, each contact_{listkey} bucket would have its own search 
>index.
>
>With these two scenarios in mind; and making the assumption that we're dealing 
>with 100 million contacts:
>
>Which would be the better method of implementing the search indexes?
>
>If you have 100M contacts, and giving each contacts it's own index might be 
>fine, but note that indexes have their own overhead in both Solr and Riak 
>cluster metadata. I wouldn't go this route if your contact_listkey measures in 
>the hundreds or thousands.
>
>At which point would one solution be far better than the other?
>
>If your cluster has 100M objects, note that a solr shard wouldn't have 100M 
>objects. Instead, if you had a, say, 10 node cluster, depending on your 
>replication value, a single solr node would have 30M.
>
>How much does Yokozuna differ from stock-standard Solr? All the search results 
>I could find on Solr specifically weren't talking about indexes greater than 
>60,000 objects, yet Riak is required to be able to deal with 100's of millions 
>of rows.
>
>
>Solr can't manage far more than 60k objects (I've run 10M on my laptop, 100M 
>per shard is safe, and I hear the tip-top limit per shard is 2 Billion unique 
>terms per index segment due to Lucene's implementation). I think you'll have 
>to experiment with your use-case and hardware, but you shouldn't have a 
>problem.
>
>
>Any help at all with this is really appreciated.
>
>At some point, I do realise that I will need to set this up for myself, and 
>performance my own tests on it. However, I was hoping that those currently 
>using Riak in production might have some more insight into this.
>
>
>Regards,
>
>Geoff
>
>_______________________________________________
>riak-users mailing list
>riak-users@lists.basho.com
>http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to