Re: loading in ORC from big compressed file

@Sanjiv Singh Wed, 22 Jun 2016 16:03:47 -0700

Thanks Marcin, I worked ....I uncompressed file and then loaded file in
hive table.


Now its been quick, few mins.




Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Jun 22, 2016 at 7:44 AM, Jörn Franke <jornfra...@gmail.com> wrote:

>
>
> Marcin is correct : either split up the gzip files in smaller files of at
> least on HDFS block or use bzip2 with block compression.
> What is the original format of the table?
>
> On 22 Jun 2016, at 01:50, Marcin Tustin <mtus...@handybook.com> wrote:
>
> This is because a GZ file is not splittable at all. Basically, try
> creating this from an uncompressed file, or even better split up the file
> and put the files in a directory in hdfs/s3/whatever.
>
> On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh <sanjiv.is...@gmail.com>
> wrote:
>
>> Hi ,
>>
>> I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)
>>
>> # load staging table *STAGE_**my_table* from file *my_table.dat.gz*
>>
>> HIVE>> LOAD DATA  INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
>> TABLE STAGE_my_table ;
>>
>> *# insert into ORC table "my_table"*
>>
>> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
>> ....
>> INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
>> ....
>>
>>
>> Insertion into orc table in going on since 5-6 hours , Seems everything
>> is going sequential with one mapper reading complete file?
>>
>> Please suggest ? help me in improving ORC table load.
>>
>>
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>  led
> by Fidelity
>
>

Re: loading in ORC from big compressed file

Reply via email to