Great, that helped a lot, issue is fixed now. :) Thank you very much!
On Sun, Oct 11, 2015 at 12:29 PM, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: > In our case, we do not actually need partition inference so the > workaround was easy -- instead of using the path as > rootpath/batch_id=333/... we changed the paths to rootpath/333/.... This > works for us because we compute the set of HDFS paths manually and then > register a dataframe into a SQLContext. > > But it seems like there is a nicer solution: > > http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery > > Notice that the data types of the partitioning columns are automatically > inferred. Currently, numeric data types and string type are supported. > Sometimes users may not want to automatically infer the data types of the > partitioning columns. For these use cases, the automatic type inference can > be configured by spark.sql.sources.partitionColumnTypeInference.enabled, > which is default to true. When type inference is disabled, string type will > be used for the partitioning columns > > > > On Sat, Oct 10, 2015 at 9:52 PM, shobhit gupta <smartsho...@gmail.com> > wrote: > >> here is what the df.schema.toString() prints. >> >> DF Schema is ::StructType(StructField(batch_id,StringType,true)) >> >> I think you nailed the problem, this filed is the part of our hdfs file >> path. We have kind of partitioned our data on the basis of batch_ids folder. >> >> How did you get around it? >> >> Thanks for help. :) >> >> On Sat, Oct 10, 2015 at 7:55 AM, Yana Kadiyska <yana.kadiy...@gmail.com> >> wrote: >> >>> can you show the output of df.printSchema? Just a guess but I think I >>> ran into something similar with a column that was part of a path in >>> parquet. E.g. we had an account_id in the parquet file data itself which >>> was of type string but we also named the files in the following manner >>> /somepath/account_id=.../file.parquet. Since Spark uses the paths for >>> partition discovery, it was actually inferring that account_id is a numeric >>> type and upon reading the data, we ran into the exception you're describing >>> (this is in Spark 1.4).. >>> >>> On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote: >>> >>>> Hi there, >>>> >>>> I have saved my records in to parquet format and am using Spark1.5. But >>>> when >>>> I try to fetch the columns it throws exception* >>>> java.lang.ClassCastException: java.lang.Long cannot be cast to >>>> org.apache.spark.unsafe.types.UTF8String*. >>>> >>>> This filed is saved as String while writing parquet. so here is the >>>> sample >>>> code and output for the same.. >>>> >>>> logger.info("troubling thing is ::" + >>>> sqlContext.sql(fileSelectQuery).schema().toString()); >>>> DataFrame df= sqlContext.sql(fileSelectQuery); >>>> JavaRDD<Row> rdd2 = df.toJavaRDD(); >>>> >>>> First Line in the code (Logger) prints this: >>>> troubling thing is ::StructType(StructField(batch_id,StringType,true)) >>>> >>>> But the moment after it the execption comes up. >>>> >>>> Any idea why it is treating the filed as Long? (yeah one unique thing >>>> about >>>> column is it is always a number e.g. Time-stamp). >>>> >>>> Any help is appreciated. >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >> >> -- >> >> >> >> >> *Regards , Shobhit Gupta.* >> *"If you salute your job, you have to salute nobody. But if you pollute >> your job, you have to salute everybody..!!"* >> > > -- *Regards , Shobhit Gupta.* *"If you salute your job, you have to salute nobody. But if you pollute your job, you have to salute everybody..!!"*