Re: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-06 Thread Dmitry Goldenberg
I'm assuming, given this: CREATE TABLE IF NOT EXISTS db.mytable ( `item_id` string, `timestamp` string, `item_comments` string) PARTITIONED BY (`date`, `content_type`) STORED AS PARQUET; we'd have to organize the input Parquet files into subdirectories where each subdirectory contains data

Re: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-06 Thread Dmitry Goldenberg
>> properly split and partition your data before using LOAD if you want hive to be able to find it again. If the destination table is defined as CREATE TABLE IF NOT EXISTS db.mytable ( `item_id` string, `timestamp` string, `item_comments` string) PARTITIONED BY (`date`, `content_type`) STORE

Re: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-06 Thread Dmitry Goldenberg
Thank you, Ryan and Furcy for your detailed responses. Our application doesn't necessarily have to have the data in the CSV format. We read data from "a source" and load it in memory (not all at once), basically as a continuous stream of records. These are meant to be processed and written to Hive

RE: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-06 Thread Ryan Harris
“If we represent our data as delimited files” ….the question is how you plan on getting your data into these parquet files since it doesn’t sound like your data is already in that format…. If your data is not already in parquet format, you are going to need to run *some* process to get it into

Re: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-06 Thread Furcy Pin
Hi Dmitry, If I understand what you said correctly: At the beginning you have csv files on hdfs, and at the end you want a partitioned Hive table as parquet. And your question is: "can I do this using only one Hive table and a LOAD statement?" The answer to that question is "no". The correct

Re: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-06 Thread Dmitry Goldenberg
Thanks, Ryan. I was actually more curious about scenario B. If we represent our data as delimited files, why don't we just use LOAD DATA INPATH and load it right into the final, parquet, partitioned table in one step, bypassing dealing with the temp table? Are there any advantages to having a tem