- After loading large RDDs that are > 60-70% of the total memory, (k,v) operations like finding uniques/distinct, GroupByKey and SetOperations would be network bound. - A multi-stage Map-Reduce DAG should be a good test. When we tried this for Hadoop, we used examples from Genomics. Has anyone tried BLAST with Spark ?
Cheers <k/> On Fri, Jun 27, 2014 at 5:07 PM, Ryan Compton <compton.r...@gmail.com> wrote: > We are going to upgrade our cluster from 1g to 10g ethernet. I'd like > to run some benchmarks before and after the upgrade. Can anyone > suggest a few typical Spark workloads that are network-bound? >