Hi Prasad, I actually have tried this and I had that same result. Although I am certainly willing to try again.
Sincerely, Brandon Cooke > On Jan 26, 2018, at 11:29 AM, Prasad Nagaraj Subramanya > <prasadn...@gmail.com> wrote: > > Hi Brandon, > > Have you tried creating an external table with the required names for parquet > - > > CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS_PARQUET( > `release` STRING, > `customer` STRING, > `cookie` STRING, > `category` STRING, > `end_time` STRING, > `start_time` STRING, > `first_name` STRING, > `email` STRING, > `phone` STRING, > `last_name` STRING, > `site` STRING, > `source` STRING, > `subject` STRING, > `raw` STRING > ) > STORED AS PARQUET > LOCATION '${OUTPUT}'; > > And then inserting data into this table from your csv table - > > INSERT OVERWRITE TABLE EVENTS_PARQUET SELECT * FROM EVENTS; > > This will create a parquet file at the specified location (${OUTPUT}) > > Thanks, > Prasad > > On Fri, Jan 26, 2018 at 7:45 AM, Brandon Cooke <brandon.co...@engage.cx > <mailto:brandon.co...@engage.cx>> wrote: > Hello, > > I posted the following on a Cloudera forum but haven’t had much luck. > I’m hoping someone here can tell me what step I have probably missed: > > Hello, > > I'm using HIVE (v1.2.1) to convert our data files from CSV into Parquet for > use in AWS Athena. > However, no mater what I try the resulting Parquet always has columns titles > [_col0, _col1, ..., _colN] > > After researching, I read that the line SET parquet.column.index.access=false > was supposed to allow for Parquet to use the column titles of my HIVE table; > however, it has been unsuccessful so far. > > Below is an example script I use to create the Parquet from data > > SET parquet.column.index.access=false; > > CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS( > `release` STRING, > `customer` STRING, > `cookie` STRING, > `category` STRING, > `end_time` STRING, > `start_time` STRING, > `first_name` STRING, > `email` STRING, > `phone` STRING, > `last_name` STRING, > `site` STRING, > `source` STRING, > `subject` STRING, > `raw` STRING > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' > LOCATION '${INPUT}'; > > INSERT OVERWRITE DIRECTORY '${OUTPUT}/parquet' > STORED AS PARQUET > SELECT * > FROM EVENTS; > > Using parquet-tools, I read the resulting file and below is an example output: > > _col0 = 0.1.2 > _col1 = customer1 > _col2 = NULL > _col3 = api > _col4 = 2018-01-21T06:57:57Z > _col5 = 2018-01-21T06:57:56Z > _col6 = Brandon > _col7 = bran...@fakesite.com <mailto:bran...@fakesite.com> > _col8 = 999-999-9999 > _col9 = Pompei > _col10 = Boston > _col11 = Wifi > _col12 = NULL > _col13 = > eyJlbmdhZ2VtZW50TWVkaXVtIjoibm9uZSIsImVudHJ5UG9pbnRJZCI6ImQ5YjYwN2UzLTFlN2QtNGY1YS1iZWQ4LWQ4Yjk3NmRkZTQ3MiIsIkVDWF9FVkVOVF9DQVRFR09SWV9BUElfTkFNRSI6IkVDWF9FQ19TSVRFVFJBQ0tfU0lURV9WSVNJVCIsIkVDWF9TSVRFX1JFR0lPTl9BUElfTkFNRSI > > This is problematic because it is impossible to transfer it to an Athena > table (or even back to HIVE) without using these index-based column titles. I > need HIVE's column titles to transfer over to the Parquet file. > > I've search for a very long time and have come up short. Am I doing something > wrong? > Please let me know if I can provide more information. Thank you! > > I appreciate your time. > Sincerely, > > Brandon Cooke > Software Engineer > engage.cx <http://engage.cx/> > 5500 Interstate N Parkway Suite 130 > <https://maps.google.com/?q=5500+Interstate+N+Parkway+Suite+130&entry=gmail&source=g>