Re: loading in ORC from big compressed file

Jörn Franke Wed, 22 Jun 2016 04:45:08 -0700


Marcin is correct : either split up the gzip files in smaller files of at least 
on HDFS block or use bzip2 with block compression.
What is the original format of the table?


> On 22 Jun 2016, at 01:50, Marcin Tustin <mtus...@handybook.com> wrote:
> 
> This is because a GZ file is not splittable at all. Basically, try creating 
> this from an uncompressed file, or even better split up the file and put the 
> files in a directory in hdfs/s3/whatever. 
> 
>> On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh <sanjiv.is...@gmail.com> 
>> wrote:
>> Hi ,
>> 
>> I have big compressed data file my_table.dat.gz ( approx size 100 GB)
>> 
>> # load staging table STAGE_my_table from file my_table.dat.gz
>> 
>> HIVE>> LOAD DATA  INPATH '/var/lib/txt/my_table.dat.gz' OVERWRITE INTO TABLE 
>> STAGE_my_table  ;
>> 
>> # insert into ORC table "my_table"
>> 
>> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
>> ....
>> INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
>> ....
>> 
>> 
>> Insertion into orc table in going on since 5-6 hours , Seems everything is 
>> going sequential with one mapper reading complete file? 
>> 
>> Please suggest ? help me in improving ORC table load.
>> 
>> 
>> 
>> 
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
> 
> 
> Want to work at Handy? Check out our culture deck and open roles
> Latest news at Handy
> Handy just raised $50m led by Fidelity
>

Re: loading in ORC from big compressed file

Reply via email to