I further dig down into this issue and 1. Seems like this issue originates
from hive meta-store since when tried to execute query with sub-column
containing special characters and despite adding backtick it did not work
for me 2. I solved this issue by explicitly passing SQL expression to the
data frame by updating special character from sub columns

Ex

source data :
{
  "address": {
    "lane-one": "mark street",
    "lane:two": "sub stree"
 }
}
Python CODE:

schema = 'struct<lane_one:string, lane_two:string>'
data_frame_from_json.select(col('address').cast(schema))
I have verified the data for much more complex JSON and XML structure and
it looks good.

Thanks,
Abhijeet

On Wed, May 16, 2018 at 6:13 PM, abhijeet bedagkar <qadevel...@gmail.com>
wrote:

> Hi,
>
> I am using SPARK to read the XML / JSON files to create a dataframe and
> save it as a hive table
>
> Sample XML file:
> <revolt_configuration>
> <id>101</id>
>     <testexecutioncontroller>
>         <execution-timeout>45</execution-timeout>
>         <execution->COMMAND</execution-method>
>     </testexecutioncontroller>
> </revolt_configuration>
>
> Note field 'validation-timeout' under testexecutioncontroller.
>
> Below is the schema populated by DF after reading the XML file
>
> |-- id: long (nullable = true)
> |-- testexecutioncontroller: struct (nullable = true)
> |    |-- execution-timeout: long (nullable = true)
> |    |-- execution-method: string (nullable = true)
>
> While saving this dataframe to hive table I am getting below exception
>
> Caused by: java.lang.IllegalArgumentException: Error: : expected at the
> position 24 of 
> 'bigint:struct<execution-timeout:bigint,execution-method:string>'
> but '-' is found.        at org.apache.hadoop.hive.serde2.
> typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:360)
>   at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$
> TypeInfoParser.expect(TypeInfoUtils.java:331)        at
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$
> TypeInfoParser.parseType(TypeInfoUtils.java:483)        at
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$
> TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)        at
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.
> getTypeInfosFromTypeString(TypeInfoUtils.java:765)        at
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:111)
>       at 
> org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
>       at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)
>       at org.apache.hadoop.hive.ql.metadata.Table.
> getDeserializerFromMetaStore(Table.java:276)        at
> org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197)
>     at org.apache
>
> It looks like the issue is happening due to special character '-' in the
> field. As after removing the special character it iw working properly.
>
> Could you please let me know if there is way to replaces all child column
> names so that it can be saved as table without any issue.
>
> Creating the STRUCT FIELD from df.schema and recursively creating another
> STRUCTFIELD with renamed column is one solution I am aware of. But still
> wanted to check if there is easy way to do this.
>
> Thanks,
> Abhijeet
>

Reply via email to