Re: Yokozuna performance: single large index vs many smaller indexes

Eric Redmond Mon, 24 Nov 2014 09:41:03 -0800

Sorry, worst typo ever.

"Solr can't manage...60k objects" should be "Solr can manage...60k objects"


Eric


On Nov 22, 2014, at 6:13 AM, Eric Redmond <eredm...@basho.com> wrote:

> Geoff, comments inline.
> 
> On Nov 13, 2014, at 3:13 PM, Geoff Garbers <ge...@totalsend.com> wrote:
> 
>> Hi all.
>> 
>> I've been looking around for a bit with some sort of guidelines as to how 
>> best to structure search indexes within Riak 2.0 - and have yet to come up 
>> with anything that satisfies my questions.
>> 
>> I came across https://github.com/basho/yokozuna/blob/develop/docs/ADMIN.md, 
>> where it talks about the one-to-one and many-to-one ways of indexing. It 
>> mentions in passing the potential for lower latency of queries and efficient 
>> deletion of index data when using the one-to-one method - without really 
>> mentioning too much about when one method could significantly outweigh the 
>> other in performance.
>> 
>> However, something I'm still not sure on is when is it considered a good 
>> idea to use multiple indexes, versus one massive index.
>> 
>> If you'll bear with me, I'll use this simple scenario:
>> I have lists, and I have contacts within these lists. In total, I have 100 
>> million contacts that I am dealing with. Each of them not more than 20KB in 
>> size, and they all follow the exact same JSON object structure. Ignoring 
>> application design for simplicity's sake, let's say I could choose between 
>> the following two ways of storing lists and contacts:
>> 
>> Having two buckets: lists and contacts.
>> All 100 million contacts are stored in the contacts bucket. Each contact 
>> object is linked to its corresponding list through a list_key property, and 
>> all the contacts are stored in the same single search index.
>> 
>> Having multiple buckets: lists, and for each list, having a separate bucket 
>> contacts_{listkey}.
>> Using this structure, each contact_{listkey} bucket would have its own 
>> search index.
>> With these two scenarios in mind; and making the assumption that we're 
>> dealing with 100 million contacts:
>> Which would be the better method of implementing the search indexes?
> If you have 100M contacts, and giving each contacts it's own index might be 
> fine, but note that indexes have their own overhead in both Solr and Riak 
> cluster metadata. I wouldn't go this route if your contact_listkey measures 
> in the hundreds or thousands.
>> At which point would one solution be far better than the other?
> If your cluster has 100M objects, note that a solr shard wouldn't have 100M 
> objects. Instead, if you had a, say, 10 node cluster, depending on your 
> replication value, a single solr node would have 30M.
>> How much does Yokozuna differ from stock-standard Solr? All the search 
>> results I could find on Solr specifically weren't talking about indexes 
>> greater than 60,000 objects, yet Riak is required to be able to deal with 
>> 100's of millions of rows.
> 
> Solr can't manage far more than 60k objects (I've run 10M on my laptop, 100M 
> per shard is safe, and I hear the tip-top limit per shard is 2 Billion unique 
> terms per index segment due to Lucene's implementation). I think you'll have 
> to experiment with your use-case and hardware, but you shouldn't have a 
> problem.
> 
>> Any help at all with this is really appreciated.
>> At some point, I do realise that I will need to set this up for myself, and 
>> performance my own tests on it. However, I was hoping that those currently 
>> using Riak in production might have some more insight into this.
>> 
>> Regards,
>> Geoff
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Yokozuna performance: single large index vs many smaller indexes

Reply via email to