Spark wrote to Hive table. file content format and fileformat in metadata doesn't match

马阳阳 Mon, 11 May 2020 02:38:18 -0700

Hi,
We are currently trying to replace hive with Spark thrift server.
We encounter a problem. With the following sql:
    create table test_db.sink_test as select [some columns] from 
test_db.test_source
After the SQL run successfully, we queried data from test_db.test_sink. The 
data is
gibberish. After some inspection, we found that test_db.test_sink has orc file 
(which can
be read with spark.read.orc) on hdfs, but the metadata for it is text. Using 
spark.read.orc().show,
the output column names are not column names from test_db.test_source, but 
something like:
|_col0|   _col1|   _col2|               _col3|     _col4|            
_col5|_col6|_col7|    _col8|_col9|    _col10|    _col11|_col12|


What is mysterious is that after rerunning the SQL, without any changes, the 
table will be
alright (the file content and file format in metadata matches).

I wonder if anyone has encountered the same problem.

Appreciate for any response.

Spark wrote to Hive table. file content format and fileformat in metadata doesn't match

Reply via email to