Hello,

I posted the following on a Cloudera forum but haven’t had much luck.
I’m hoping someone here can tell me what step I have probably missed:

Hello,
 
I'm using HIVE (v1.2.1) to convert our data files from CSV into Parquet for use 
in AWS Athena.
However, no mater what I try the resulting Parquet always has columns titles 
[_col0, _col1, ..., _colN]
 
After researching, I read that the line SET parquet.column.index.access=false 
was supposed to allow for Parquet to use the column titles of my HIVE table; 
however, it has been unsuccessful so far.
 
Below is an example script I use to create the Parquet from data
 
SET parquet.column.index.access=false;

CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS(
    `release` STRING,
    `customer` STRING,
    `cookie` STRING,
    `category` STRING,
    `end_time` STRING,
    `start_time` STRING,
    `first_name` STRING,
    `email` STRING,
    `phone` STRING,
    `last_name` STRING,
    `site` STRING,
    `source` STRING,
    `subject` STRING,
    `raw` STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION '${INPUT}';

INSERT OVERWRITE DIRECTORY '${OUTPUT}/parquet'
STORED AS PARQUET
SELECT *
FROM EVENTS;
 
Using parquet-tools, I read the resulting file and below is an example output:
 
_col0 = 0.1.2
_col1 = customer1
_col2 = NULL
_col3 = api
_col4 = 2018-01-21T06:57:57Z 
_col5 = 2018-01-21T06:57:56Z 
_col6 = Brandon
_col7 = bran...@fakesite.com
_col8 = 999-999-9999
_col9 = Pompei
_col10 = Boston
_col11 = Wifi
_col12 = NULL
_col13 = 
eyJlbmdhZ2VtZW50TWVkaXVtIjoibm9uZSIsImVudHJ5UG9pbnRJZCI6ImQ5YjYwN2UzLTFlN2QtNGY1YS1iZWQ4LWQ4Yjk3NmRkZTQ3MiIsIkVDWF9FVkVOVF9DQVRFR09SWV9BUElfTkFNRSI6IkVDWF9FQ19TSVRFVFJBQ0tfU0lURV9WSVNJVCIsIkVDWF9TSVRFX1JFR0lPTl9BUElfTkFNRSI
 
This is problematic because it is impossible to transfer it to an Athena table 
(or even back to HIVE) without using these index-based column titles. I need 
HIVE's column titles to transfer over to the Parquet file. 
 
I've search for a very long time and have come up short. Am I doing something 
wrong? 
Please let me know if I can provide more information. Thank you!

I appreciate your time.
Sincerely,

Brandon Cooke
Software Engineer
engage.cx
5500 Interstate N Parkway Suite 130

Reply via email to