hey guys 
I am not following why this happens
DATASET=======Tab separated values (164 columns)
Spark command 1================val mjpJobOrderRDD = 
sc.textFile("/data/cdr/cdr_mjp_joborder_raw")val mjpJobOrderColsPairedRDD = 
mjpJobOrderRDD.map(line => { val tokens = 
line.split("\t");(tokens(23),tokens(7))})mjpJobOrderColsPairedRDD: 
org.apache.spark.rdd.RDD[(String, String)] = MappedRDD[18] at map at 
<console>:14

Spark command 2================val mjpJobOrderRDD = 
sc.textFile("/data/cdr/cdr_mjp_joborder_raw")scala> val 
mjpJobOrderColsPairedRDD = mjpJobOrderRDD.map(line => { val tokens = 
line.split("\t"); if (tokens.length == 164 && tokens(23) != null) 
{(tokens(23),tokens(7))} }) mjpJobOrderColsPairedRDD: 
org.apache.spark.rdd.RDD[Any] = MappedRDD[19] at map at <console>:14

In the second case above , why does it say org.apache.spark.rdd.RDD[Any] and 
not org.apache.spark.rdd.RDD[(String, String)]

thanks
sanjay

Reply via email to