Re: The suitability of MapReduce

Guido Medina Tue, 09 Apr 2013 03:00:43 -0700

Rohman,

It is more complicated than that, most big data systems use more thanone DB engine (Including Facebook that uses like 5 different engines),for example (And we are not as big as Facebook), we use a relationalSQL, a text search engine and Riak, you will have to balance eachweakness with a different tool, and use each tool at what it does best,in the case of Riak:


 * JSON storage where you know your keys (And is easy for you to fetch
   keys concurrently)
 * If you need to "reduce", lets say, out of a million keys find 100,
   then "programatically" reduce that 100 to 25, you can enable 2i.
 * If you need a sophisticated search, you could hook into Yokozuma
   which uses Solr (We use Solr separately)

I would say there is no ideal solution, you use the best of it andcounter the worst with something else.


Hope that helps,

Guido.

On 09/04/13 10:42, Antonio Rohman Fernandez wrote:

But... then... i wonder how to do the following task, as i assumed MRwould be the right thing to do:

- Imagine Facebook's "news feed", that every little time recompile thestatuses, photos, comments, likes, etc... of all your contacts.

Shouldn't this be done by MR? and if so... shouldn't the user be ableto execute it by-demand if they want to refresh the news feed? ( or atleast refreshed in the background every X minutes ) and the user ableto GET the refreshed compiled data?


Merci,
Rohman

On 09.04.2013 01:26, Matt Black wrote:

I think an short and explicit discussion of using sequential GETswould be good to add to the docs in [1]. It'll be helpful to put thealternate option in the reader's head so they can evaluate as they'regoing through the article.

Cheers
Matt

On 9 April 2013 02:02, Jeremiah Peschka <jeremiah.pesc...@gmail.com<mailto:jeremiah.pesc...@gmail.com>> wrote:


    I want to follow up on the recent "Map phase timeout" thread [2].
    In part out of curiosity and in part as a documentation clean
    up... Should the documentation at [1] be changed? Specifically,
    the docs say MR should be used:

      * *When you know the set of objects you want to MapReduce over
        (the bucket-key pairs) *(emphasis added)
      * When you want to return actual objects or pieces of the
        object -- not just the keys, as do Search & Secondary Indexes
      * When you need utmost flexibility in querying your data.
        MapReduce gives you full access to your object and lets you
        pick it apart any way you want.

    It seems to me that a lot of discussions around MR in Riak come
    down to "You're close but this isn't the best use case of
    MapReduce in Riak." Would it be better, for the purposes of a
    general discussion, to say that MapReduce is the appropriate
    paradigm when you want to:

      * manipulate a large amount of data inside the Riak cluster in
        bulk - e.g. read all of my sales orders and where the version
        is 1, perform the changes necessary to update the order
        format to version 2.
      * burn a lot of I/O and make your admin sad
      * move data from one bucket to another
      * re-write an entire bucket so all data is indexed for 2i,
        search, etc
      * Anything where the query can be resumed with no knowledge of
        state at the time the last run of the query failed.

    Are there other use cases when MR is the better approach?
    [1]:
    
http://docs.basho.com/riak/latest/tutorials/querying/MapReduce/#When-to-Use-MapReduce
    [2]:
    
http://riak.markmail.org/search/?q=#query:+page:1+mid:4o27v64qf55ejzwc+state:results

    ---
    Jeremiah Peschka - Founder, Brent Ozar Unlimited
    MCITP: SQL Server 2008, MVP
    Cloudera Certified Developer for Apache Hadoop

    _______________________________________________
    riak-users mailing list
    riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com  <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


--
line
logo <http://mahalostudio.com>            *Antonio Rohman Fernandez*
CEO, Founder & Lead Engineer
roh...@mahalostudio.com <mailto:roh...@mahalostudio.com>          *Projects*
MaruBatsu.es <http://marubatsu.es>
PupCloud.com <http://pupcloud.com>
Wedding Album <http://wedding.mahalostudio.com>

line


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: The suitability of MapReduce

Reply via email to