Re: SQLcontext changing String field to Long

shobhit gupta Mon, 12 Oct 2015 10:58:38 -0700

Great, that helped a lot, issue is fixed now. :)

Thank you very much!


On Sun, Oct 11, 2015 at 12:29 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
wrote:

>  In our case, we do not actually need partition inference so the
> workaround was easy -- instead of using the path as
> rootpath/batch_id=333/... we changed the paths to rootpath/333/.... This
> works for us because we compute the set of HDFS paths manually and then
> register a dataframe into a SQLContext.
>
> But it seems like there is a nicer solution:
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery
>
> Notice that the data types of the partitioning columns are automatically 
> inferred. Currently, numeric data types and string type are supported. 
> Sometimes users may not want to automatically infer the data types of the 
> partitioning columns. For these use cases, the automatic type inference can 
> be configured by spark.sql.sources.partitionColumnTypeInference.enabled, 
> which is default to true. When type inference is disabled, string type will 
> be used for the partitioning columns
>
> 
>
> On Sat, Oct 10, 2015 at 9:52 PM, shobhit gupta <smartsho...@gmail.com>
> wrote:
>
>> here is what the df.schema.toString() prints.
>>
>> DF Schema is ::StructType(StructField(batch_id,StringType,true))
>>
>> I think you nailed the problem, this filed is the part of our hdfs file
>> path. We have kind of partitioned our data on the basis of batch_ids folder.
>>
>> How did you get around it?
>>
>> Thanks for help. :)
>>
>> On Sat, Oct 10, 2015 at 7:55 AM, Yana Kadiyska <yana.kadiy...@gmail.com>
>> wrote:
>>
>>> can you show the output of df.printSchema? Just a guess but I think I
>>> ran into something similar with a column that was part of a path in
>>> parquet. E.g. we had an account_id in the parquet file data itself which
>>> was of type string but we also named the files in the following manner
>>> /somepath/account_id=.../file.parquet. Since Spark uses the paths for
>>> partition discovery, it was actually inferring that account_id is a numeric
>>> type and upon reading the data, we ran into the exception you're describing
>>> (this is in Spark 1.4)..
>>>
>>> On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I have saved my records in to parquet format and am using Spark1.5. But
>>>> when
>>>> I try to fetch the columns it throws exception*
>>>> java.lang.ClassCastException: java.lang.Long cannot be cast to
>>>> org.apache.spark.unsafe.types.UTF8String*.
>>>>
>>>> This filed is saved as String while writing parquet. so here is the
>>>> sample
>>>> code and output for the same..
>>>>
>>>> logger.info("troubling thing is ::" +
>>>> sqlContext.sql(fileSelectQuery).schema().toString());
>>>> DataFrame df= sqlContext.sql(fileSelectQuery);
>>>> JavaRDD<Row> rdd2 = df.toJavaRDD();
>>>>
>>>> First Line in the code (Logger) prints this:
>>>> troubling thing is ::StructType(StructField(batch_id,StringType,true))
>>>>
>>>> But the moment after it the execption comes up.
>>>>
>>>> Any idea why it is treating the filed as Long? (yeah one unique thing
>>>> about
>>>> column is it is always a number e.g. Time-stamp).
>>>>
>>>> Any help is appreciated.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>>
>> --
>>
>>
>>
>>
>> *Regards , Shobhit Gupta.*
>> *"If you salute your job, you have to salute nobody. But if you pollute
>> your job, you have to salute everybody..!!"*
>>
>
>


-- 




*Regards , Shobhit Gupta.*
*"If you salute your job, you have to salute nobody. But if you pollute
your job, you have to salute everybody..!!"*

Re: SQLcontext changing String field to Long

Reply via email to