Re: loading in ORC from big compressed file

Marcin Tustin Tue, 21 Jun 2016 16:51:10 -0700

This is because a GZ file is not splittable at all. Basically, try creating
this from an uncompressed file, or even better split up the file and put
the files in a directory in hdfs/s3/whatever.


On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh <sanjiv.is...@gmail.com>
wrote:

> Hi ,
>
> I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)
>
> # load staging table *STAGE_**my_table* from file *my_table.dat.gz*
>
> HIVE>> LOAD DATA  INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
> TABLE STAGE_my_table ;
>
> *# insert into ORC table "my_table"*
>
> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
> ....
> INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
> ....
>
>
> Insertion into orc table in going on since 5-6 hours , Seems everything is
> going sequential with one mapper reading complete file?
>
> Please suggest ? help me in improving ORC table load.
>
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
 led 
by Fidelity

Re: loading in ORC from big compressed file

Reply via email to