Hi, I'm new to spark and scala, so apologies if this is obvious.
Every RDD appears to be typed, which I can see by seeing the output in the spark-shell when I execute 'take': scala> val t = sc.parallelize(Array(1,2,3)) t: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[6] at parallelize at <console>:12 scala> t.take(3) res4: Array[Int] = Array(1, 2, 3) scala> val u = sc.parallelize(Array(1,Array(2,2,2,2,2),3)) u: org.apache.spark.rdd.RDD[Any] = ParallelCollectionRDD[3] at parallelize at <console>:12 scala> u.take(3) res5: Array[Any] = Array(1, Array(2, 2, 2, 2, 2), 3) Array type stays the same even if only one type returned. scala> u.take(1) res6: Array[Any] = Array(1) Is there some way to just get the name of the type of the entire RDD from some function call? Also, I would really like this same functionality in pyspark, so I'm wondering if that exists on that side, since clearly the underlying RDD is typed (I'd be fine with either the Scala or Python type name). Thank you, Evan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Getting-the-type-of-an-RDD-in-spark-AND-pyspark-tp13498.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org