I'm puzzled by the following results I got from executing an application that just generates data and writes it to HDFS. The 16 tasks that ran for that app look like this:
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26345/Screen_Shot_2016-02-26_at_14.png> So 4 task wrote 2.9GB to the output. But actually if I check in HDFS I have 16 2.9GB files. How comes? abrandon@granduc-13:/opt/hadoop/bin$ ./hdfs dfs -ls -h kMeans 16/02/26 13:44:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 17 items -rw-r--r-- 3 abrandon supergroup 0 2016-02-26 13:40 kMeans/_SUCCESS -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:39 kMeans/part-00000 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:38 kMeans/part-00001 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:38 kMeans/part-00002 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:38 kMeans/part-00003 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:39 kMeans/part-00004 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:39 kMeans/part-00005 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:39 kMeans/part-00006 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:39 kMeans/part-00007 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:40 kMeans/part-00008 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:40 kMeans/part-00009 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:40 kMeans/part-00010 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:40 kMeans/part-00011 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:40 kMeans/part-00012 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:40 kMeans/part-00013 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:39 kMeans/part-00014 -rw-r--r-- 3 abrandon supergroup 2.9 G 2016-02-26 13:40 kMeans/part-00015 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Task-Output-size-in-Spark-WEB-UI-not-the-same-as-in-HDFS-tp26345.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org