[ https://issues.apache.org/jira/browse/SPARK-17398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931622#comment-17931622 ]
Dyno Fu commented on SPARK-17398: --------------------------------- > Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast > to org.apache.hive.hcatalog.data.HCatRecord run into similar issue when change the serde from `org.apache.hadoop.hive.serde2.JsonSerDe` to `org.apache.hive.hcatalog.data.JsonSerDe` because of some Trino upgrade requirement, it turns out that the later requires all lower case column name when define the hive table. ``` CREATE EXTERNAL TABLE xx ( errorLog STRING -- have to be errorlog. ) PARTITIONED BY (date STRING) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE LOCATION 's3://somebucket/tables/xx' ``` > Failed to query on external JSon Partitioned table > -------------------------------------------------- > > Key: SPARK-17398 > URL: https://issues.apache.org/jira/browse/SPARK-17398 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: pin_zhang > Assignee: Wing Yew Poon > Priority: Major > Fix For: 2.4.5, 3.0.0 > > Attachments: screenshot-1.png > > > 1. Create External Json partitioned table > with SerDe in hive-hcatalog-core-1.2.1.jar, download fom > https://mvnrepository.com/artifact/org.apache.hive.hcatalog/hive-hcatalog-core/1.2.1 > 2. Query table meet exception, which works in spark1.5.2 > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: > Lost task > 0.0 in stage 1.0 (TID 1, localhost): java.lang.ClassCastException: > java.util.ArrayList cannot be cast to org.apache.hive.hcatalog.data.HCatRecord > at > org.apache.hive.hcatalog.data.HCatRecordObjectInspector.getStructFieldData(HCatRecordObjectInspector.java:45) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:430) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:426) > > 3. Test Code > import org.apache.spark.SparkConf > import org.apache.spark.SparkContext > import org.apache.spark.sql.hive.HiveContext > object JsonBugs { > def main(args: Array[String]): Unit = { > val table = "test_json" > val location = "file:///g:/home/test/json" > val create = s"""CREATE EXTERNAL TABLE ${table} > (id string, seq string ) > PARTITIONED BY(index int) > ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' > LOCATION "${location}" > """ > val add_part = s""" > ALTER TABLE ${table} ADD > PARTITION (index=1)LOCATION '${location}/index=1' > """ > val conf = new SparkConf().setAppName("scala").setMaster("local[2]") > conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse") > val ctx = new SparkContext(conf) > val hctx = new HiveContext(ctx) > val exist = hctx.tableNames().map { x => x.toLowerCase() }.contains(table) > if (!exist) { > hctx.sql(create) > hctx.sql(add_part) > } else { > hctx.sql("show partitions " + table).show() > } > hctx.sql("select * from test_json").show() > } > } -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org