Sorry, worst typo ever. "Solr can't manage...60k objects" should be "Solr can manage...60k objects"
Eric On Nov 22, 2014, at 6:13 AM, Eric Redmond <eredm...@basho.com> wrote: > Geoff, comments inline. > > On Nov 13, 2014, at 3:13 PM, Geoff Garbers <ge...@totalsend.com> wrote: > >> Hi all. >> >> I've been looking around for a bit with some sort of guidelines as to how >> best to structure search indexes within Riak 2.0 - and have yet to come up >> with anything that satisfies my questions. >> >> I came across https://github.com/basho/yokozuna/blob/develop/docs/ADMIN.md, >> where it talks about the one-to-one and many-to-one ways of indexing. It >> mentions in passing the potential for lower latency of queries and efficient >> deletion of index data when using the one-to-one method - without really >> mentioning too much about when one method could significantly outweigh the >> other in performance. >> >> However, something I'm still not sure on is when is it considered a good >> idea to use multiple indexes, versus one massive index. >> >> If you'll bear with me, I'll use this simple scenario: >> I have lists, and I have contacts within these lists. In total, I have 100 >> million contacts that I am dealing with. Each of them not more than 20KB in >> size, and they all follow the exact same JSON object structure. Ignoring >> application design for simplicity's sake, let's say I could choose between >> the following two ways of storing lists and contacts: >> >> Having two buckets: lists and contacts. >> All 100 million contacts are stored in the contacts bucket. Each contact >> object is linked to its corresponding list through a list_key property, and >> all the contacts are stored in the same single search index. >> >> Having multiple buckets: lists, and for each list, having a separate bucket >> contacts_{listkey}. >> Using this structure, each contact_{listkey} bucket would have its own >> search index. >> With these two scenarios in mind; and making the assumption that we're >> dealing with 100 million contacts: >> Which would be the better method of implementing the search indexes? > If you have 100M contacts, and giving each contacts it's own index might be > fine, but note that indexes have their own overhead in both Solr and Riak > cluster metadata. I wouldn't go this route if your contact_listkey measures > in the hundreds or thousands. >> At which point would one solution be far better than the other? > If your cluster has 100M objects, note that a solr shard wouldn't have 100M > objects. Instead, if you had a, say, 10 node cluster, depending on your > replication value, a single solr node would have 30M. >> How much does Yokozuna differ from stock-standard Solr? All the search >> results I could find on Solr specifically weren't talking about indexes >> greater than 60,000 objects, yet Riak is required to be able to deal with >> 100's of millions of rows. > > Solr can't manage far more than 60k objects (I've run 10M on my laptop, 100M > per shard is safe, and I hear the tip-top limit per shard is 2 Billion unique > terms per index segment due to Lucene's implementation). I think you'll have > to experiment with your use-case and hardware, but you shouldn't have a > problem. > >> Any help at all with this is really appreciated. >> At some point, I do realise that I will need to set this up for myself, and >> performance my own tests on it. However, I was hoping that those currently >> using Riak in production might have some more insight into this. >> >> Regards, >> Geoff >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com