Hi,
I apologize for the wide distribution and if this is not the right mailing
list for this.
We write Avro files to Parquet and load them to HDFS so they can be
accessed via an EXTERNAL Hive table. These records have two timestamp
fields which are expressed in the Avro schema as type = long and
d how would it match content_type='Presentation' -- would
the file just need to be named "Presentation"?
On Thu, Apr 6, 2017 at 5:05 PM, Dmitry Goldenberg
wrote:
> >> properly split and partition your data before using LOAD if you want
> hive to be able to find it again.
essing step to convert
> the data from delimited text to hive-compatible parquet, I don’t see a
> reason to use any tool OTHER than hive to perform that conversion.
>
>
>
> LOAD DATA is generally used in situations where you **know** that the
> data format is already 100% exactl
ion.
>
>
>
> LOAD DATA is generally used in situations where you **know** that the
> data format is already 100% exactly compatible with your destination
> table….which most often occurs when the source of the data is the raw data
> backing an existing hive managed table (possibly
flies to the duplicate cluster using
> LOAD DATA to ensure the metadata is recorded in hive metastore.
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenb...@hexastax.com]
> *Sent:* Tuesday, April 04, 2017 3:31 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Is it po
ion) from their
> current HDFS location to the location defined for the table.
>
> Later on when you query the table the files will be scanned. If there are
> in the right format you’ll get results. If not, then no.
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenb...@hexastax.
ame results by using hdfs dfs -mv …
>
> · LOAD DATA LOCAL INPATH is just a file copying operation from the
> shell to the HDFS.
> You can achieve the same results by using hdfs dfs -put …
>
>
> From: Dmitry Goldenberg [mailto:dgoldenb...@hexastax.com]
> Sent: Tuesday, Ap
IONED
table in Hive.
Thanks,
- Dmitry
On Tue, Apr 4, 2017 at 12:20 PM, Markovitz, Dudu
wrote:
> Are your files already in Parquet format?
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenb...@hexastax.com]
> *Sent:* Tuesday, April 04, 2017 7:03 PM
> *To:* user@hive.apache.
erface to read the
> files and use it in an INSERT operation.
>
>
>
> Dudu
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenb...@hexastax.com]
> *Sent:* Tuesday, April 04, 2017 4:52 PM
> *To:* user@hive.apache.org
> *Subject:* Is it possible to use LOAD DATA INP
We have a table such as the following defined:
CREATE TABLE IF NOT EXISTS db.mytable (
`item_id` string,
`timestamp` string,
`item_comments` string)
PARTITIONED BY (`date`, `content_type`)
STORED AS PARQUET;
Currently we insert data into this PARQUET, PARTITIONED table as follows,
using an
10 matches
Mail list logo