How did you exclude it?
I am not sure if it is possible since each task needs to contain the
chunk of data.
> On Jun 24, 2015, at 6:07 PM, xing wrote:
>
> When we compare the performance, we already excluded this part of time
> difference.
>
>
>
> --
> View this message in context:
> http://a
When we compare the performance, we already excluded this part of time
difference.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871p12873.html
Sent from the Apache Spark Developers List mailing list archive
If you read the file one by one and then use parallelize, it is read by a
single thread on a single machine.
On Wednesday, June 24, 2015, xing wrote:
> We have a large file and we used to read chunks and then use parallelize
> method (distData = sc.parallelize(chunk)) and then do the map/reduce