Hey, I'm as up for a good and clever hack as anybody. But the question is just because you can, should you? Who will maintain your hack after your'e dead? I'm still maintaing crap I wrote years ago. Even though I'm paid, sometimes I would rather not have the headache. Why would you use a product that specifically does not support such hackery? Scaling postgres or mongo are known and solvable problems especially concerning bounded data sets, likes, say, all points on a globe. Now if you were storing checkins, that would be a different problem. One suitable for, say, Riak.
On Tue, May 1, 2012 at 14:09, Mark Rose <markr...@markrose.ca> wrote: > Well, I'd be indexing items over the entire globe. I'd be be looking at > resolutions from an entire world view down to city block. I'm thinking of > using geohashes as an index to restrict the result set, then further > filtering and sorting by mapreducing the remaining items. So I only need > enough granularity to reduce the number of items to a reasonable amount. At > the world view level, I'd filter out most results using mapreduce, but the > local-level queries would be far more common so an index would be highly > advantageous. The geometry I'd want to query would be a window that > arbitrarily overlaps one or more geohash regions. Basically, think plotting > items in say, Google Maps. > > Can you use a secondary index inside mapreduce? I haven't seen any > examples of it. I have only seen a secondary index being used to feed a > mapreduce. I am new to Riak. > > I imagine my number of points would be at most 100 items per square km, > but typically less than 1 per square km. A 1 km resolution would be > sufficient. A 32 bit geohash would cover that fine. Vast regions of the > Earth would contain no points at all. > > -Mark > > > On Tue, May 1, 2012 at 1:16 PM, Sean Cribbs <s...@basho.com> wrote: > >> In contrast to Alexander's assessment, I'd say "it depends". I have built >> some geospatial indexes on top of Riak using a geohashing scheme based on >> the Hilbert space-filling curve. However, I had to choose specific levels >> of "zoom" and precompute them. Now that we have secondary indexes, you >> could perhaps bypass the precomputation step. In general, if you know the >> geometry of the space you want to query, you can fairly trivially compute >> the names of the geohashes you need to look up and then either fetch >> individual keys for those (if you precompute them), or use MapReduce to >> fetch a range of them. It's not automatic, for sure, but the greatest >> complexity will be in deciding which granularities of index to support. >> >> On Tue, May 1, 2012 at 12:44 PM, Alexander Sicular <sicul...@gmail.com>wrote: >> >>> My advice is to not use Riak. Check mongo or Postgres. >>> >>> >>> @siculars on twitter >>> http://siculars.posterous.com >>> >>> Sent from my iRotaryPhone >>> >>> On May 1, 2012, at 9:18, Mark Rose <markr...@markrose.ca> wrote: >>> >>> > Hello everyone! >>> > >>> > I'm going to be implementing Riak as a storage engine for geographic >>> data. Research has lead me to using geohashing as a useful way to filter >>> out results outside of a region of interest. However, I've run into some >>> stumbling blocks and I'm looking for advice on the best way to proceed. >>> > >>> > Querying efficiently by geohash involves querying several regions >>> around a point. From what I can tell, Riak offers no way to query a >>> secondary index with multiple ranges. Having to query a several ranges, >>> merge them in the application layer, then pass them off to mapreduce seems >>> rather silly (and could mean passing GBs of data). Alternatively, I could >>> start straight with mapreduce, but key filtering seems to work only with >>> the primary key, which would force me into using the geohashed location as >>> the primary key (which would lead to collisions if two things existed at >>> the same point). I'd also like to avoid using the primary key as the >>> geohash as if the item moves I'd have to change all the references to it. >>> Lastly, I could do a less efficient mapreduce over a less precise geohash, >>> but this doesn't solve the issue of the equator (anything near the equator >>> would require mapreducing the entire dataset). >>> > >>> > Is there any way to query multiple ranges with a secondary index and >>> pass that off to mapreduce? Or should I just stick with the less efficient >>> mapreduce, and when near the equator, run two queries and later merge them? >>> Or am I going about this the wrong way? >>> > >>> > In any case, the final stage of my queries will involve mapreduce as >>> I'll need to further filter the items found in a region. >>> > >>> > Thank you, >>> > Mark >>> > _______________________________________________ >>> > riak-users mailing list >>> > riak-users@lists.basho.com >>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >> >> >> >> -- >> Sean Cribbs <s...@basho.com> >> Software Engineer >> Basho Technologies, Inc. >> http://basho.com/ >> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com