can you show the output of df.printSchema? Just a guess but I think I ran into something similar with a column that was part of a path in parquet. E.g. we had an account_id in the parquet file data itself which was of type string but we also named the files in the following manner /somepath/account_id=.../file.parquet. Since Spark uses the paths for partition discovery, it was actually inferring that account_id is a numeric type and upon reading the data, we ran into the exception you're describing (this is in Spark 1.4)..
On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote: > Hi there, > > I have saved my records in to parquet format and am using Spark1.5. But > when > I try to fetch the columns it throws exception* > java.lang.ClassCastException: java.lang.Long cannot be cast to > org.apache.spark.unsafe.types.UTF8String*. > > This filed is saved as String while writing parquet. so here is the sample > code and output for the same.. > > logger.info("troubling thing is ::" + > sqlContext.sql(fileSelectQuery).schema().toString()); > DataFrame df= sqlContext.sql(fileSelectQuery); > JavaRDD<Row> rdd2 = df.toJavaRDD(); > > First Line in the code (Logger) prints this: > troubling thing is ::StructType(StructField(batch_id,StringType,true)) > > But the moment after it the execption comes up. > > Any idea why it is treating the filed as Long? (yeah one unique thing about > column is it is always a number e.g. Time-stamp). > > Any help is appreciated. > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >