Re: Why does spark take so much time for simple task without calculation?

Bedrytski Aliaksandr Wed, 31 Aug 2016 05:45:08 -0700

Hi xiefeng,

Spark Context initialization takes some time and the tool does not
really shine for small data computations:
http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html


But, when working with terabytes (petabytes) of data, those 35 seconds
of initialization don't really matter. 

Regards,

-- 
  Bedrytski Aliaksandr
  sp...@bedryt.ski

On Wed, Aug 31, 2016, at 11:45, xiefeng wrote:
> I install a spark standalone and run the spark cluster(one master and one
> worker) in a windows 2008 server with 16cores and 24GB memory.
> 
> I have done a simple test: Just create  a string RDD and simply return
> it. I
> use JMeter to test throughput but the highest is around 35/sec. I think
> spark is powerful at distribute calculation, but why the throughput is so
> limit in such simple test scenario only contains simple task dispatch and
> no
> calculation?
> 
> 1. In JMeter I test both 10 threads or 100 threads, there is little
> difference around 2-3/sec.
> 2. I test both cache/not cache the RDD, there is little difference. 
> 3. During the test, the cpu and memory are in low level.
> 
> Below is my test code:
> @RestController
> public class SimpleTest {       
>       @RequestMapping(value = "/SimpleTest", method = RequestMethod.GET)
>       @ResponseBody
>       public String testProcessTransaction() {
>               return SparkShardTest.simpleRDDTest();
>       }
> }
> 
> final static Map<String, JavaRDD&lt;String>> simpleRDDs =
> initSimpleRDDs();
> public static Map<String, JavaRDD&lt;String>> initSimpleRDDs()
>       {
>               Map<String, JavaRDD&lt;String>> result = new 
> ConcurrentHashMap<String,
> JavaRDD&lt;String>>();
>               JavaRDD<String> rddData = JavaSC.parallelize(data);
>               rddData.cache().count();    //this cache will improve 1-2/sec
>               result.put("MyRDD", rddData);
>               return result;
>       }
>       
>       public static String simpleRDDTest()
>       {               
>               JavaRDD<String> rddData = simpleRDDs.get("MyRDD");
>               return rddData.first();
>       }
> 
> 
> 
> 
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-spark-take-so-much-time-for-simple-task-without-calculation-tp27628.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Why does spark take so much time for simple task without calculation?

Reply via email to