here is what the df.schema.toString() prints. DF Schema is ::StructType(StructField(batch_id,StringType,true))
I think you nailed the problem, this filed is the part of our hdfs file path. We have kind of partitioned our data on the basis of batch_ids folder. How did you get around it? Thanks for help. :) On Sat, Oct 10, 2015 at 7:55 AM, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: > can you show the output of df.printSchema? Just a guess but I think I ran > into something similar with a column that was part of a path in parquet. > E.g. we had an account_id in the parquet file data itself which was of type > string but we also named the files in the following manner > /somepath/account_id=.../file.parquet. Since Spark uses the paths for > partition discovery, it was actually inferring that account_id is a numeric > type and upon reading the data, we ran into the exception you're describing > (this is in Spark 1.4).. > > On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote: > >> Hi there, >> >> I have saved my records in to parquet format and am using Spark1.5. But >> when >> I try to fetch the columns it throws exception* >> java.lang.ClassCastException: java.lang.Long cannot be cast to >> org.apache.spark.unsafe.types.UTF8String*. >> >> This filed is saved as String while writing parquet. so here is the sample >> code and output for the same.. >> >> logger.info("troubling thing is ::" + >> sqlContext.sql(fileSelectQuery).schema().toString()); >> DataFrame df= sqlContext.sql(fileSelectQuery); >> JavaRDD<Row> rdd2 = df.toJavaRDD(); >> >> First Line in the code (Logger) prints this: >> troubling thing is ::StructType(StructField(batch_id,StringType,true)) >> >> But the moment after it the execption comes up. >> >> Any idea why it is treating the filed as Long? (yeah one unique thing >> about >> column is it is always a number e.g. Time-stamp). >> >> Any help is appreciated. >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- *Regards , Shobhit Gupta.* *"If you salute your job, you have to salute nobody. But if you pollute your job, you have to salute everybody..!!"*