Re: Spark performance comparison for research

Reynold Xin Mon, 29 Feb 2016 21:34:54 -0800

That seems reasonable, but it seems pretty unfair to the HPC setup in which
the master is reading all the data. Basically you can make HPC perform
infinitely worse by just adding more modes to Spark.


On Monday, February 29, 2016, yasincelik <[email protected]> wrote:

> Hello,
>
> I am working on a project as a part of my research. The system I am working
> on is basically an in-memory computing system. I want to compare its
> performance with Spark. Here is how I conduct experiments. For my project:
> I
> have a software defined network(SDN) that allows HPC applications to share
> data, such as sending and receiving messages through this network. For
> example, in a word count application, a master reads a 10GB text file from
> hard drive, slices into small chunks, and distribute the chunks. Each
> worker
> will fetch some chunks, process them, and send them back to the SDN. Then
> master collects the results.
>
> To compare with Spark, I run word count application. I run Spark in
> standalone mode. I do not use any cluster manager. There is no
> pre-installed
> HDFS. I use PBS to reserve nodes, which gives me list of nodes. Then I
> simply run Spark on these nodes. Here is the command to run Spark:
> ~/SPARK/bin/spark-submit --class word.JavaWordCount  --num-executors 1
> spark.jar ~/data.txt  > ~/wc
>
> Technically, these experiments are run under same conditions. Read file,
> cut
> it into small chunks, distribute chunks, process chunks, collect results.
> Do you think this is a reasonable comparison? Can someone make this claim:
> "Well, Spark is designed to work on top of HDFS, in which the data is
> already stored in nodes, and Spark jobs are submitted to these nodes to
> take
> advantage of data locality"
>
>
> Any comment is appreciated.
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-performance-comparison-for-research-tp16498.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] <javascript:;>
> For additional commands, e-mail: [email protected] <javascript:;>
>
>

Re: Spark performance comparison for research

Reply via email to