You could create an external table at a location of your choice with the format desired. Then do a select into the table. The data at the location of your table will be in the format you desire, which you can copy over.
Best Regards, Sonal Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Tue, Oct 15, 2013 at 6:04 AM, Sonya Ling <threecup...@gmail.com> wrote: > Hi: > > > Currently, our hive_to_hdfs function has two parts. The first part > retrieves transactions records in Hive, put into a temporary file in local > file system.The second part puts temporary file in local file system into > HDFS. The second part work on NameNode and is outside of Hadoop process > and takes time. I like to make hive_to_hdfs function go directly using the > > INSERT OVERWRITE [LOCAL] DIRECTORY directory1 SELECT ... FROM ... > > I did speed up the process using the above direct write. However, I found > out the step followed cannot process data generated due to unexpected > format. The following step expect TextInputFormat. I checked Hive > Language Manual. It says > "Data written to the filesystem is serialized as text with columns > separated by ^A and rows separated by newlines. If any of the columns are > not of primitive type, then those columns are serialized to JSON format." > > How can I make them compatible? It does not look like I have way to > change the defaulted format generated. What can I set InputFormat to make > it compatible? > > Thank.s > >