I think this is a reasonable question to ask, re the 1/N question during query time. This is currently an implementation detail of secondary indices in Riak. The document vs. term-based index partitioning seems to be a common debate. It almost feels like the emacs/vim flamewar of the distributed index field. A couple issues to take with the document-based approach is that since you must query 1/N of your nodes you a) reduce the throughput of concurrent queries across the cluster (because queries must contend for resources) and b) as N grows you increase your chance of hitting the tcp incast problem [1] which can cause major problems. A couple of issues with term-based are a) your data and it's index are no longer co-located which can be nice b) indexing a single document will often cause a write to all nodes.
To fight 1/N in document-based perhaps you could do a chained query approach where the query is sent around the ring to avoid the incast problem, but at the cost of higher latencies (although you could parallelize it to some set number X and skip nodes in the ring at the end having X nodes converging to the coordinator). To fight term-based's write to all nodes on index problem perhaps you could hash on something less granular like index/field or even just index but replicate to more nodes and play some tricks at the index level for better concurrent access. If this all seems hand wavy that's because it is. I also think it's perfectly reasonable to keep your index completely separate from the data itself. Look no further than a library for a real life example of this, the card catalog. Yes, you have potential consistency problems, you end up having special nodes in the system (e.g. special nodes dedicated to indexing), multiple hops to get from index lookup to object retrieval, but at the end of the day these are the tradeoffs you must weigh in light of your system. These are questions I continue to ponder myself and it's my belief that Riak will continue to get stronger at querying your data. However, sometimes you may also need to use another solution to store your index that works alongside Riak and I also think that is a perfectly reasonable choice as long as you understand the system you're building and the tradeoffs you are making. I think in the future it would even be good for Riak to have integration points with other solutions to make stuff like this easier. Please voice your opinions on the mailing list if you have them. I would love to hear them. [1]: http://www.snookles.com/slf-blog/2012/01/05/tcp-incast-what-is-it/ -Ryan On Wed, Jan 25, 2012 at 10:12 AM, Roberto Calero <roberto_cal...@hotmail.com > wrote: > > > ------------------------------ > From: jeremiah.pesc...@gmail.com > Date: Wed, 25 Jan 2012 06:48:45 -0800 > Subject: Re: Should Riak have used dedicated nodes for secondary indices? > To: runar.jord...@gmail.com > CC: riak-users@lists.basho.com > > Good news! Riak doesn't use sharding. > > Data locality is critical in a distributed system. When you create an > index, your structure looks something like: > > indexed_value:record_id > > Reading from an index requires locating indexed_value, finding all > matching values, and then retrieving all matching record_ids. By keeping > index data on the same node as the source data, Riak avoids having to > remote the query to retrieve object data. This is a Good Thing. The network > is slow and unreliable. Just ask an Australian. > > Riak's approach is intended to provide a uniform system where you can > treat any node equally. The idea that there should be an unsharded index > node is a bit ludicrous. Let's say you have 1TB of raw data. Your indexes > are pretty light and are only about 20% of your data size. This means that > you need 200GB of good storage (not some cheap $150 SATA HDD you found on > NewEgg). 200GB of RAID 10 SAS storage isn't that pricey to put in a single > unsharded machine. Over time as your data grows and your indexing changes, > you may have 10TB and your index size is ~40% of your data. Your unsharded > index server now has to have 4TB of fast, reliable storage. And, since this > is an unsharded system, you'll want multiple replicas of your unsharded > index server to make sure that a hardware hiccup doesn't take down your > ability to perform fast lookups. Besides - a single indexing server becomes > a single bottleneck and a single point of failure in your system. > > Most people using Lucene as their indexing store are sharding Lucene. From > an anecdotal standpoint, about 70% of the people I've talked to using > Lucene are getting to the point of sharding their replicated Lucene indexes. > > I'm not saying that either approach is good or bad; just remember that > every solution has drawbacks. > --- > Jeremiah Peschka, SQL Server MVP > Managing Director, Brent Ozar PLF, LLC > > > On Wed, Jan 25, 2012 at 5:15 AM, Runar Jordahl <runar.jord...@gmail.com>wrote: > > Siddharth Anand, says that secondary indices (for a key-value store) > best is placed on a separate node, avoiding the need to look up 1 / N > nodes during a query: > > "Systems that shard data based on a primary key will do well when > routed by that key. When routed by a secondary key, the system will > need to “spray” a query across all shards. If one of the shards is > experiencing high latency, the system will return either no results or > incomplete (i.e. inconsistent) results. For this reason, it would make > sense to store the secondary index on an unsharded (but replicated) > system." > > http://highscalability.com/blog/2012/1/24/the-state-of-nosql-in-2012.html > > If I understand Riak correctly, it takes the opposite approach, > storing secondary indices together with the data. > > To me at appears like Riak’s approach gives a more uniform system, > with all nodes having the same responsibilities. Does anyone else have > any thoughts on this? > > Kind regards > Runar Jordahl > blog.epigent.com > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > _______________________________________________ riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com