Re: Hive to HDFS directly using INSERT OVERWRITE DIRECTORY Imcompatible issue

Sonal Goyal Mon, 14 Oct 2013 20:07:18 -0700

You could create an external table at a location of your choice with the
format desired. Then do a select into the table. The data at the location
of your table will be in the format you desire, which you can copy over.


Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Tue, Oct 15, 2013 at 6:04 AM, Sonya Ling <threecup...@gmail.com> wrote:

> Hi:
>
>
> Currently, our hive_to_hdfs function has two parts.  The first part
> retrieves transactions records in Hive, put into a temporary file in local
> file system.The second part puts temporary file in local file system into
> HDFS.  The second part work on NameNode and is outside of Hadoop process
> and takes time.  I like to make hive_to_hdfs function go directly using the
>
> INSERT OVERWRITE [LOCAL] DIRECTORY directory1 SELECT ... FROM ...
>
> I did speed up the process using the above direct write.  However, I found
> out the step followed cannot process data generated due to unexpected
> format.   The following step expect TextInputFormat.   I checked Hive
> Language Manual.  It says
> "Data written to the filesystem is serialized as text with columns
> separated by ^A and rows separated by newlines. If any of the columns are
> not of primitive type, then those columns are serialized to JSON format."
>
> How can I make them compatible?  It does not look like I have way to
> change the defaulted format generated.  What can I set InputFormat to make
> it compatible?
>
> Thank.s
>
>

Re: Hive to HDFS directly using INSERT OVERWRITE DIRECTORY Imcompatible issue

Reply via email to