Hi, everyone!
I consider flatmap as a narrow dependency , but why it has shuffle?
as shown on the web UI:
my code is as below :
val transferRDD = sc.textFile("hdfs://host:port/path")
val rdd = transferRDD.map(line => {
val trunks = line.split("\t")
if(trunks.length == 32){
(trunks(11), trunks(13),
Try(java.lang.Long.parseLong(trunks(9))).getOrElse(0l), trunks(14), trunks(19))
}
}).filter(arg =>arg != ()).map(arg =>
arg.asInstanceOf[(String, String, Long, String, String)]).filter(arg => arg._3
!= 0)val flatMappedRDD = rdd.flatMap(arg => List((arg._1, (arg._2, arg._3, 1)),
(arg._2, (arg._1, arg._3, 0))))
Thank for your help!
qinwei