Hello, I posted the following on a Cloudera forum but haven’t had much luck. I’m hoping someone here can tell me what step I have probably missed:
Hello, I'm using HIVE (v1.2.1) to convert our data files from CSV into Parquet for use in AWS Athena. However, no mater what I try the resulting Parquet always has columns titles [_col0, _col1, ..., _colN] After researching, I read that the line SET parquet.column.index.access=false was supposed to allow for Parquet to use the column titles of my HIVE table; however, it has been unsuccessful so far. Below is an example script I use to create the Parquet from data SET parquet.column.index.access=false; CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS( `release` STRING, `customer` STRING, `cookie` STRING, `category` STRING, `end_time` STRING, `start_time` STRING, `first_name` STRING, `email` STRING, `phone` STRING, `last_name` STRING, `site` STRING, `source` STRING, `subject` STRING, `raw` STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' LOCATION '${INPUT}'; INSERT OVERWRITE DIRECTORY '${OUTPUT}/parquet' STORED AS PARQUET SELECT * FROM EVENTS; Using parquet-tools, I read the resulting file and below is an example output: _col0 = 0.1.2 _col1 = customer1 _col2 = NULL _col3 = api _col4 = 2018-01-21T06:57:57Z _col5 = 2018-01-21T06:57:56Z _col6 = Brandon _col7 = bran...@fakesite.com _col8 = 999-999-9999 _col9 = Pompei _col10 = Boston _col11 = Wifi _col12 = NULL _col13 = eyJlbmdhZ2VtZW50TWVkaXVtIjoibm9uZSIsImVudHJ5UG9pbnRJZCI6ImQ5YjYwN2UzLTFlN2QtNGY1YS1iZWQ4LWQ4Yjk3NmRkZTQ3MiIsIkVDWF9FVkVOVF9DQVRFR09SWV9BUElfTkFNRSI6IkVDWF9FQ19TSVRFVFJBQ0tfU0lURV9WSVNJVCIsIkVDWF9TSVRFX1JFR0lPTl9BUElfTkFNRSI This is problematic because it is impossible to transfer it to an Athena table (or even back to HIVE) without using these index-based column titles. I need HIVE's column titles to transfer over to the Parquet file. I've search for a very long time and have come up short. Am I doing something wrong? Please let me know if I can provide more information. Thank you! I appreciate your time. Sincerely, Brandon Cooke Software Engineer engage.cx 5500 Interstate N Parkway Suite 130