Drop the old parquet table before and then create it with explicit statements. The above statement keeps using the old parquet table if it existed
> On 26. Jan 2018, at 17:35, Brandon Cooke <brandon.co...@engage.cx> wrote: > > Hi Prasad, > > I actually have tried this and I had that same result. > Although I am certainly willing to try again. > > Sincerely, > > Brandon Cooke > >> On Jan 26, 2018, at 11:29 AM, Prasad Nagaraj Subramanya >> <prasadn...@gmail.com> wrote: >> >> Hi Brandon, >> >> Have you tried creating an external table with the required names for >> parquet - >> >> CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS_PARQUET( >> `release` STRING, >> `customer` STRING, >> `cookie` STRING, >> `category` STRING, >> `end_time` STRING, >> `start_time` STRING, >> `first_name` STRING, >> `email` STRING, >> `phone` STRING, >> `last_name` STRING, >> `site` STRING, >> `source` STRING, >> `subject` STRING, >> `raw` STRING >> ) >> STORED AS PARQUET >> LOCATION '${OUTPUT}'; >> >> And then inserting data into this table from your csv table - >> >> INSERT OVERWRITE TABLE EVENTS_PARQUET SELECT * FROM EVENTS; >> >> This will create a parquet file at the specified location (${OUTPUT}) >> >> Thanks, >> Prasad >> >>> On Fri, Jan 26, 2018 at 7:45 AM, Brandon Cooke <brandon.co...@engage.cx> >>> wrote: >>> Hello, >>> >>> I posted the following on a Cloudera forum but haven’t had much luck. >>> I’m hoping someone here can tell me what step I have probably missed: >>> >>> Hello, >>> >>> I'm using HIVE (v1.2.1) to convert our data files from CSV into Parquet for >>> use in AWS Athena. >>> However, no mater what I try the resulting Parquet always has columns >>> titles [_col0, _col1, ..., _colN] >>> >>> After researching, I read that the line SET >>> parquet.column.index.access=false was supposed to allow for Parquet to use >>> the column titles of my HIVE table; however, it has been unsuccessful so >>> far. >>> >>> Below is an example script I use to create the Parquet from data >>> >>> SET parquet.column.index.access=false; >>> >>> CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS( >>> `release` STRING, >>> `customer` STRING, >>> `cookie` STRING, >>> `category` STRING, >>> `end_time` STRING, >>> `start_time` STRING, >>> `first_name` STRING, >>> `email` STRING, >>> `phone` STRING, >>> `last_name` STRING, >>> `site` STRING, >>> `source` STRING, >>> `subject` STRING, >>> `raw` STRING >>> ) >>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' >>> LOCATION '${INPUT}'; >>> >>> INSERT OVERWRITE DIRECTORY '${OUTPUT}/parquet' >>> STORED AS PARQUET >>> SELECT * >>> FROM EVENTS; >>> >>> Using parquet-tools, I read the resulting file and below is an example >>> output: >>> >>> _col0 = 0.1.2 >>> _col1 = customer1 >>> _col2 = NULL >>> _col3 = api >>> _col4 = 2018-01-21T06:57:57Z >>> _col5 = 2018-01-21T06:57:56Z >>> _col6 = Brandon >>> _col7 = bran...@fakesite.com >>> _col8 = 999-999-9999 >>> _col9 = Pompei >>> _col10 = Boston >>> _col11 = Wifi >>> _col12 = NULL >>> _col13 = >>> eyJlbmdhZ2VtZW50TWVkaXVtIjoibm9uZSIsImVudHJ5UG9pbnRJZCI6ImQ5YjYwN2UzLTFlN2QtNGY1YS1iZWQ4LWQ4Yjk3NmRkZTQ3MiIsIkVDWF9FVkVOVF9DQVRFR09SWV9BUElfTkFNRSI6IkVDWF9FQ19TSVRFVFJBQ0tfU0lURV9WSVNJVCIsIkVDWF9TSVRFX1JFR0lPTl9BUElfTkFNRSI >>> >>> This is problematic because it is impossible to transfer it to an Athena >>> table (or even back to HIVE) without using these index-based column titles. >>> I need HIVE's column titles to transfer over to the Parquet file. >>> >>> I've search for a very long time and have come up short. Am I doing >>> something wrong? >>> Please let me know if I can provide more information. Thank you! >>> >>> I appreciate your time. >>> Sincerely, >>> >>> Brandon Cooke >>> Software Engineer >>> engage.cx >>> 5500 Interstate N Parkway Suite 130 >> >