Re: Map Reduce Requirements

Jeremiah Peschka Tue, 23 Aug 2011 07:01:46 -0700

On Aug 22, 2011, at 8:50 PM, bill robertson wrote:

> I wonder if it would be feasible to deploy an erlang web-service in the riak 
> node's webmachine instance that could translate meta-data into Erlang funs 
> and drive the map reduce operation that way. I'm not sure if I could get 
> around having specific knowledge of the protobuf structures baked into that 
> code, but I don't think it matters in this case.
> 
> I also wonder how much 1.0 will change this picture.
> 
> > Additionally, are secondary indexes meta-data?  i.e. If I built some 
> > secondary indices, these are stored in some form internal to Riak, and 
> > therefore available for query regardless of the type of data its associated 
> > with. Is this correct?
> 
> Secondary indexes are a separate physical structure, or so I gather. (Rusty 
> could be full of lies.) They're stored separately from the initial data and 
> not as metadata in the object headers. So, yes, you can store whatever you 
> want in secondary indexes and query it however you want, provided there's an 
> API that supports what you're doing.
> 
> Would secondary indexes eliminate the need for key-filtering? Logically, it 
> would seem that you could do with indexes, but do they have similar 
> performance characteristics?  (i.e. does one suck more than the other?)


Key filters will always perform a list-keys operation. Meaning that they result 
in an in memory scan of all keys in the key space. 

Not knowing entirely how indexes are implemented internally (reading the source 
is on my TO DO list), I can only guess from my experience with other databases 
how this would work. Indexes generally work best when you have a low search 
cardinality - when you're seeking only a few records from the index. As long as 
you can structure secondary indexes to answer the questions you're asking, then 
indexes make it easy to perform fast queries. 

The difference comes in based on your storage mechanism. With bitcask, all keys 
are in memory so that list-keys scan only happens between RAM and CPU and isn't 
THAT expensive of an operation. If indexes are not a memory resident structure, 
then a scan of an index (when you're doing a search that's some kind of 
substring or ends with operation) will be painfully slow - much like when you 
have to perform a table scan in an RDBMS.

The upside of key filtering, and composite key names in general, is that you 
can create meaningful keys that you can assemble on the fly. e.g. To get 
yesterday's trades of Ford stock in the NYSE, (assuming you have a trades 
bucket) you could get at yesterday's trading history through something like 
http://my_riak_server:8091/riak/trades/NYSE:F:20110822 Being able to perform ad 
hoc seeks like that is really powerful.

TL;DR - key filters and secondary indexes serve different purposes.

> 
> Thanks again,
> Bill Robertson


---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Map Reduce Requirements

Reply via email to