Re: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

Dmitry Goldenberg Tue, 04 Apr 2017 09:48:05 -0700

Dudu,

This is still in design stages, so we have a way to get the data from its
source. The data is *not* in the Parquet format.  It's up to us to format
it the best and most efficient way.  We can roll with CSV or Parquet;
ultimately the data must make it into a pre-defined PARQUET, PARTITIONED
table in Hive.


Thanks,
- Dmitry

On Tue, Apr 4, 2017 at 12:20 PM, Markovitz, Dudu <dmarkov...@paypal.com>
wrote:

> Are your files already in Parquet format?
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenb...@hexastax.com]
> *Sent:* Tuesday, April 04, 2017 7:03 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Is it possible to use LOAD DATA INPATH with a PARTITIONED,
> STORED AS PARQUET table?
>
>
>
> Thanks, Dudu.
>
>
>
> Just to re-iterate; the way I'm reading your response is that yes, we can
> use LOAD INPATH for a PARQUET, PARTITIONED table, provided that the data in
> the delimited file is properly formatted.  Then we can LOAD it into the
> table (mytable in my example) directly and avoid the creation of the temp
> table (origtable in my example).  Correct so far?
>
>
>
> I did not quite follow the latter part of your response:
>
> >> You should only create an external table which is an interface to read
> the files and use it in an INSERT operation.
>
>
>
> My assumption was that we would LOAD INPATH and not have to use INSERT
> altogether.  Am I missing something in groking this latter part of your
> response?
>
>
>
> Thanks,
>
> - Dmitry
>
>
>
> On Tue, Apr 4, 2017 at 11:26 AM, Markovitz, Dudu <dmarkov...@paypal.com>
> wrote:
>
> Since LOAD DATA INPATH  only moves files the answer is very simple.
>
> If you’re files are already in a format that matches the destination table
> (storage type, number and types of columns etc.) then – yes and if not,
> then – no.
>
>
>
> But –
>
> You don’t need to load the files into intermediary table.
>
> You should only create an external table which is an interface to read the
> files and use it in an INSERT operation.
>
>
>
> Dudu
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenb...@hexastax.com]
> *Sent:* Tuesday, April 04, 2017 4:52 PM
> *To:* user@hive.apache.org
> *Subject:* Is it possible to use LOAD DATA INPATH with a PARTITIONED,
> STORED AS PARQUET table?
>
>
>
> We have a table such as the following defined:
>
> CREATE TABLE IF NOT EXISTS db.mytable (
>   `item_id` string,
>   `timestamp` string,
>   `item_comments` string)
> PARTITIONED BY (`date`, `content_type`)
> STORED AS PARQUET;
>
> Currently we insert data into this PARQUET, PARTITIONED table as follows,
> using an intermediary table:
>
> INSERT INTO TABLE db.mytable PARTITION(date, content_type)
> SELECT itemid as item_id, itemts as timestamp, date, content_type
> FROM db.origtable
> WHERE date = “${SELECTED_DATE}”
> GROUP BY item_id, date, content_type;
>
> Our question is, would it be possible to use the LOAD DATA INPATH.. INTO
> TABLE syntax to load the data from delimited data files into 'mytable'
> rather than populating mytable from the intermediary table?
>
>
>
> I see in the Hive documentation that:
>
> * Load operations are currently pure copy/move operations that move
> datafiles into locations corresponding to Hive tables.
>
> * If the table is partitioned, then one must specify a specific partition
> of the table by specifying values for all of the partitioning columns.
>
>
>
> This seems to indicate that using LOAD is possible; however looking at
> this discussion: http://grokbase.com/t/hive/user/114frbfg0y/
> can-i-use-hive-dynamic-partition-while-loading-data-into-tables, perhaps
> not?
>
>
>
> We'd like to understand if using LOAD in the case of PARQUET, PARTITIONED
> tables is possible and if so, then how does one go about using LOAD in that
> case?
>
>
>
> Thanks,
>
> - Dmitry
>
>
>
>
>

Re: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

Reply via email to