I further dig down into this issue and 1. Seems like this issue originates from hive meta-store since when tried to execute query with sub-column containing special characters and despite adding backtick it did not work for me 2. I solved this issue by explicitly passing SQL expression to the data frame by updating special character from sub columns
Ex source data : { "address": { "lane-one": "mark street", "lane:two": "sub stree" } } Python CODE: schema = 'struct<lane_one:string, lane_two:string>' data_frame_from_json.select(col('address').cast(schema)) I have verified the data for much more complex JSON and XML structure and it looks good. Thanks, Abhijeet On Wed, May 16, 2018 at 6:13 PM, abhijeet bedagkar <qadevel...@gmail.com> wrote: > Hi, > > I am using SPARK to read the XML / JSON files to create a dataframe and > save it as a hive table > > Sample XML file: > <revolt_configuration> > <id>101</id> > <testexecutioncontroller> > <execution-timeout>45</execution-timeout> > <execution->COMMAND</execution-method> > </testexecutioncontroller> > </revolt_configuration> > > Note field 'validation-timeout' under testexecutioncontroller. > > Below is the schema populated by DF after reading the XML file > > |-- id: long (nullable = true) > |-- testexecutioncontroller: struct (nullable = true) > | |-- execution-timeout: long (nullable = true) > | |-- execution-method: string (nullable = true) > > While saving this dataframe to hive table I am getting below exception > > Caused by: java.lang.IllegalArgumentException: Error: : expected at the > position 24 of > 'bigint:struct<execution-timeout:bigint,execution-method:string>' > but '-' is found. at org.apache.hadoop.hive.serde2. > typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:360) > at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$ > TypeInfoParser.expect(TypeInfoUtils.java:331) at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$ > TypeInfoParser.parseType(TypeInfoUtils.java:483) at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$ > TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils. > getTypeInfosFromTypeString(TypeInfoUtils.java:765) at > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:111) > at > org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53) > at > org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391) > at org.apache.hadoop.hive.ql.metadata.Table. > getDeserializerFromMetaStore(Table.java:276) at > org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197) > at org.apache > > It looks like the issue is happening due to special character '-' in the > field. As after removing the special character it iw working properly. > > Could you please let me know if there is way to replaces all child column > names so that it can be saved as table without any issue. > > Creating the STRUCT FIELD from df.schema and recursively creating another > STRUCTFIELD with renamed column is one solution I am aware of. But still > wanted to check if there is easy way to do this. > > Thanks, > Abhijeet >