Hi, We are currently trying to replace hive with Spark thrift server. We encounter a problem. With the following sql: create table test_db.sink_test as select [some columns] from test_db.test_source After the SQL run successfully, we queried data from test_db.test_sink. The data is gibberish. After some inspection, we found that test_db.test_sink has orc file (which can be read with spark.read.orc) on hdfs, but the metadata for it is text. Using spark.read.orc().show, the output column names are not column names from test_db.test_source, but something like: |_col0| _col1| _col2| _col3| _col4| _col5|_col6|_col7| _col8|_col9| _col10| _col11|_col12|
What is mysterious is that after rerunning the SQL, without any changes, the table will be alright (the file content and file format in metadata matches). I wonder if anyone has encountered the same problem. Appreciate for any response.