Re: only one mapper

Rajesh Balamohan Wed, 21 Aug 2013 20:00:55 -0700

Create the LZO index after moving the file to hive directory (i.e after
executing your LOAD DATA* statement).  Index file is needed only during job
execution and if its not present in the same directory, it would not split
the large file.



On Thu, Aug 22, 2013 at 7:11 AM, 闫昆 <[email protected]> wrote:

> In hive i use SET mapreduce.input.fileinputformat.split.maxsize=134217728;
> but not effect and i found when use
>
> LOAD DATA INPATH  '/data_split/data_rowkey.lzo'
>
> OVERWRITE INTO TABLE data_zh
>
> The hdfs data move to hive directory i  CREATE EXTERNAL TABLE but issue
> is data_rowkey.lzo.index is also exist hdfs /data_split/ directory
> .actually data move to hive directory , index file in hdfs directory ,they
> are not in the same directory
>
>
> 2013/8/22 Sanjay Subramanian <[email protected]>
>
>>  Hi
>>
>>  Try this setting in your hive query
>>
>>  SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;
>>
>>  If u set this value "low" then the MR job will use this size to split
>> the input LZO files and u will get multiple mappers (and make sure the
>> input LZO files are indexed I.e. .LZO.INDEX files are created)
>>
>>  sanjay
>>
>>
>>   From: Edward Capriolo <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Wednesday, August 21, 2013 10:43 AM
>> To: "[email protected]" <[email protected]>
>> Subject: Re: only one mapper
>>
>>   LZO files are only splittable if you index them. Sequence files
>> compresses with LZO are splittable without being indexed.
>>
>>  Snappy + SequenceFile is a better option then LZO.
>>
>>
>> On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <[email protected]> wrote:
>>
>>>  LZO files are combinable so check your max split setting.
>>>
>>> http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%[email protected]%3E
>>>
>>>  igor
>>> decide.com
>>>
>>>
>>>
>>> On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <[email protected]> wrote:
>>>
>>>>  hi all when i use hive
>>>> hive job make only one mapper actually my file split 18 block my block
>>>> size is 128MB and data size 2GB
>>>> i use lzo compression and create file.lzo and make index file.lzo.index
>>>> i use hive 0.10.0
>>>>
>>>>  Total MapReduce jobs = 1
>>>> Launching Job 1 out of 1
>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>> Cannot run job locally: Input Size (= 2304560827) is larger than
>>>> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
>>>> Starting Job = job_1377071515613_0003, Tracking URL =
>>>> http://hydra0001:8088/proxy/application_1377071515613_0003/
>>>> Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
>>>> job_1377071515613_0003
>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>> reducers: 0
>>>> 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
>>>> 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>> 6.81 sec
>>>> 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>> 6.81 sec
>>>> 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
>>>> 6.81 sec
>>>> 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>>> 9.95 sec
>>>> 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
>>>> 9.95 sec
>>>> 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
>>>> 13.0 sec
>>>>
>>>>  --
>>>>
>>>> In the Hadoop world, I am just a novice, explore the entire Hadoop
>>>> ecosystem, I hope one day I can contribute their own code
>>>>
>>>> YanBit
>>>> [email protected]
>>>>
>>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>
>
>
> --
>
> In the Hadoop world, I am just a novice, explore the entire Hadoop
> ecosystem, I hope one day I can contribute their own code
>
> YanBit
> [email protected]
>
>


-- 
~Rajesh.B

Re: only one mapper

Reply via email to