Did you do anything to mitigate this issue? Like putting it directly on the HDFS? Or thourg spark instead of going through Hive?
From: Qiuzhuang Lian [mailto:[email protected]] Sent: 09 December 2016 04:02 To: [email protected] Subject: Re: Hive Stored Textfile to Stored ORC taking long time Yes, we did run into this issue too. Typically if the text hive table exceeds 100 million when converting txt table into ORC table. On Fri, Dec 9, 2016 at 9:08 AM, Joaquin Alzola <[email protected]<mailto:[email protected]>> wrote: HI List The transformation from textfile table to stored ORC table takes quiet a long time. Steps follow> 1.Create one normal table using textFile format 2.Load the data normally into this table 3.Create one table with the schema of the expected results of your normal hive table using stored as orcfile 4.Insert overwrite query to copy the data from textFile table to orcfile table I have about 1,5 million records with about 550 fields in each row. Doing step 4 takes about 30 minutes (moving from one format to the other). I have spark with only one worker (same for HDFS) so running now a standalone server but with 25G and 14 cores on that worker. BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
