Pretty easy to do in Scala: rdd.elementClassTag.runtimeClass
You can access this method from Python as well by using the internal _jrdd. It would look something like this (warning, I have not tested it): rdd._jrdd.classTag().runtimeClass() (The method name is "classTag" for JavaRDDLike, and "elementClassTag" for Scala's RDD.) On Thu, Sep 4, 2014 at 1:32 PM, esamanas <evan.sama...@gmail.com> wrote: > Hi, > > I'm new to spark and scala, so apologies if this is obvious. > > Every RDD appears to be typed, which I can see by seeing the output in the > spark-shell when I execute 'take': > > scala> val t = sc.parallelize(Array(1,2,3)) > t: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[6] at parallelize > at <console>:12 > > scala> t.take(3) > res4: Array[Int] = Array(1, 2, 3) > > > scala> val u = sc.parallelize(Array(1,Array(2,2,2,2,2),3)) > u: org.apache.spark.rdd.RDD[Any] = ParallelCollectionRDD[3] at parallelize > at <console>:12 > > scala> u.take(3) > res5: Array[Any] = Array(1, Array(2, 2, 2, 2, 2), 3) > > Array type stays the same even if only one type returned. > scala> u.take(1) > res6: Array[Any] = Array(1) > > > Is there some way to just get the name of the type of the entire RDD from > some function call? Also, I would really like this same functionality in > pyspark, so I'm wondering if that exists on that side, since clearly the > underlying RDD is typed (I'd be fine with either the Scala or Python type > name). > > Thank you, > > Evan > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Getting-the-type-of-an-RDD-in-spark-AND-pyspark-tp13498.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >