hey guys I am not following why this happens DATASET=======Tab separated values (164 columns) Spark command 1================val mjpJobOrderRDD = sc.textFile("/data/cdr/cdr_mjp_joborder_raw")val mjpJobOrderColsPairedRDD = mjpJobOrderRDD.map(line => { val tokens = line.split("\t");(tokens(23),tokens(7))})mjpJobOrderColsPairedRDD: org.apache.spark.rdd.RDD[(String, String)] = MappedRDD[18] at map at <console>:14
Spark command 2================val mjpJobOrderRDD = sc.textFile("/data/cdr/cdr_mjp_joborder_raw")scala> val mjpJobOrderColsPairedRDD = mjpJobOrderRDD.map(line => { val tokens = line.split("\t"); if (tokens.length == 164 && tokens(23) != null) {(tokens(23),tokens(7))} }) mjpJobOrderColsPairedRDD: org.apache.spark.rdd.RDD[Any] = MappedRDD[19] at map at <console>:14 In the second case above , why does it say org.apache.spark.rdd.RDD[Any] and not org.apache.spark.rdd.RDD[(String, String)] thanks sanjay