Prasad and Jorn, Prasad’s fix did the trick! An output table and OVERWRITE statement worked!
Up to this point, what I had been doing is - Create the initial Events table - Create Parquet_table ( …. ) Stored as Parquet - Insert INTO parquet_table select * from Events I wasn’t using LOCATION for the Parquet table either. Are you able to tell me why the above steps and my example below (where I Overwrite Directory Stored as Parquet) did not work? Thanks so much by the way! -Brandon > On Jan 26, 2018, at 1:09 PM, Jörn Franke <jornfra...@gmail.com> wrote: > > Drop the old parquet table before and then create it with explicit > statements. The above statement keeps using the old parquet table if it > existed > > On 26. Jan 2018, at 17:35, Brandon Cooke <brandon.co...@engage.cx > <mailto:brandon.co...@engage.cx>> wrote: > >> Hi Prasad, >> >> I actually have tried this and I had that same result. >> Although I am certainly willing to try again. >> >> Sincerely, >> >> Brandon Cooke >> >>> On Jan 26, 2018, at 11:29 AM, Prasad Nagaraj Subramanya >>> <prasadn...@gmail.com <mailto:prasadn...@gmail.com>> wrote: >>> >>> Hi Brandon, >>> >>> Have you tried creating an external table with the required names for >>> parquet - >>> >>> CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS_PARQUET( >>> `release` STRING, >>> `customer` STRING, >>> `cookie` STRING, >>> `category` STRING, >>> `end_time` STRING, >>> `start_time` STRING, >>> `first_name` STRING, >>> `email` STRING, >>> `phone` STRING, >>> `last_name` STRING, >>> `site` STRING, >>> `source` STRING, >>> `subject` STRING, >>> `raw` STRING >>> ) >>> STORED AS PARQUET >>> LOCATION '${OUTPUT}'; >>> >>> And then inserting data into this table from your csv table - >>> >>> INSERT OVERWRITE TABLE EVENTS_PARQUET SELECT * FROM EVENTS; >>> >>> This will create a parquet file at the specified location (${OUTPUT}) >>> >>> Thanks, >>> Prasad >>> >>> On Fri, Jan 26, 2018 at 7:45 AM, Brandon Cooke <brandon.co...@engage.cx >>> <mailto:brandon.co...@engage.cx>> wrote: >>> Hello, >>> >>> I posted the following on a Cloudera forum but haven’t had much luck. >>> I’m hoping someone here can tell me what step I have probably missed: >>> >>> Hello, >>> >>> I'm using HIVE (v1.2.1) to convert our data files from CSV into Parquet for >>> use in AWS Athena. >>> However, no mater what I try the resulting Parquet always has columns >>> titles [_col0, _col1, ..., _colN] >>> >>> After researching, I read that the line SET >>> parquet.column.index.access=false was supposed to allow for Parquet to use >>> the column titles of my HIVE table; however, it has been unsuccessful so >>> far. >>> >>> Below is an example script I use to create the Parquet from data >>> >>> SET parquet.column.index.access=false; >>> >>> CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS( >>> `release` STRING, >>> `customer` STRING, >>> `cookie` STRING, >>> `category` STRING, >>> `end_time` STRING, >>> `start_time` STRING, >>> `first_name` STRING, >>> `email` STRING, >>> `phone` STRING, >>> `last_name` STRING, >>> `site` STRING, >>> `source` STRING, >>> `subject` STRING, >>> `raw` STRING >>> ) >>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' >>> LOCATION '${INPUT}'; >>> >>> INSERT OVERWRITE DIRECTORY '${OUTPUT}/parquet' >>> STORED AS PARQUET >>> SELECT * >>> FROM EVENTS; >>> >>> Using parquet-tools, I read the resulting file and below is an example >>> output: >>> >>> _col0 = 0.1.2 >>> _col1 = customer1 >>> _col2 = NULL >>> _col3 = api >>> _col4 = 2018-01-21T06:57:57Z >>> _col5 = 2018-01-21T06:57:56Z >>> _col6 = Brandon >>> _col7 = bran...@fakesite.com <mailto:bran...@fakesite.com> >>> _col8 = 999-999-9999 >>> _col9 = Pompei >>> _col10 = Boston >>> _col11 = Wifi >>> _col12 = NULL >>> _col13 = >>> eyJlbmdhZ2VtZW50TWVkaXVtIjoibm9uZSIsImVudHJ5UG9pbnRJZCI6ImQ5YjYwN2UzLTFlN2QtNGY1YS1iZWQ4LWQ4Yjk3NmRkZTQ3MiIsIkVDWF9FVkVOVF9DQVRFR09SWV9BUElfTkFNRSI6IkVDWF9FQ19TSVRFVFJBQ0tfU0lURV9WSVNJVCIsIkVDWF9TSVRFX1JFR0lPTl9BUElfTkFNRSI >>> >>> This is problematic because it is impossible to transfer it to an Athena >>> table (or even back to HIVE) without using these index-based column titles. >>> I need HIVE's column titles to transfer over to the Parquet file. >>> >>> I've search for a very long time and have come up short. Am I doing >>> something wrong? >>> Please let me know if I can provide more information. Thank you! >>> >>> I appreciate your time. >>> Sincerely, >>> >>> Brandon Cooke >>> Software Engineer >>> engage.cx <http://engage.cx/> >>> 5500 Interstate N Parkway Suite 130 >>> <https://maps.google.com/?q=5500+Interstate+N+Parkway+Suite+130&entry=gmail&source=g> >>