Il 13/07/2015 20:20, Carlo Miron ha scritto:
< http://marcio.io/2015/07/cheap-mapreduce-in-go/>

tl;dr

Sometimes you don’t need overly complex infrastructures or systems to do a job
well. In this case, we were running these exact same aggregations over close to
20 EMR instances that would take a few minutes to execute the entire MapReduce
job over hundreds of Gigabytes of data each day.

When we decided to take a look at this problem again, we rewrote this task using
Go, and we now simply run this on a single 8-core machine and the whole daily
execution takes about 10 minutes. We cut a lot of the costs associated with
maintaining and running these EMR systems and we just schedule this Go app to
run once a day over our daily dataset.

You can find the entire code here:
https://gist.github.com/mcastilho/e051898d129b44e2f502

Qualche tempo fa era uscito qualcosa del genere dove veniva usata una commovente pipeline:
http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html

--
Riccardo Magliocchetti
@rmistaken

http://menodizero.it
_______________________________________________
Python mailing list
Python@lists.python.it
http://lists.python.it/mailman/listinfo/python

Rispondere a