Thanks a lot for the valuable response Bejoy. Actually I wanted to
know if it is possible to set the size of filesplits or the criterion
on which filesplits are created (in turn controlling the creation of
mappers) for a Hive query. For example, If I want to take 'n' lines
from a file as one split instead of taking each individual row, I can
use nlineinput format.Is it possible to do something similar at Hive's
level or do I need to look into the source code??

Regards,
    Mohammad Tariq


On Fri, Jun 29, 2012 at 12:37 AM, Bejoy KS <bejoy...@yahoo.com> wrote:
> Hi Mohammed
>
> Splits are associated with MapReduce framework and not necessarily with hive. 
> It is the data processed by a mapper. Based on your InputFormat, min and max 
> split size properties MR framework considers hdfs blocks that a mapper should 
> process.( It can be just one block or more if CombineFileInputFormat is 
> used.) This choice of which all hdfs blocks forms a split is determined under 
> the consideration of data locality. Number of mappers/map tasks created by a 
> job is equal to the number of splits thus determined. ie one map task per 
> split.
>
> Hope it is clear. Feel free to revert if you still have any queries.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <donta...@gmail.com>
> Date: Fri, 29 Jun 2012 00:29:13
> To: <user@hive.apache.org>; <bejoy...@yahoo.com>
> Reply-To: user@hive.apache.org
> Subject: Re: Hive mapper creation
>
> Hello Nitin, Bejoy,
>
>        Thanks a lot for the quick response. Could you please tell me
> what is the default criterion of split creation??How the splits for a
> Hive query are created??(Pardon my ignorance).
>
> Regards,
>     Mohammad Tariq
>
>
> On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <bejoy...@yahoo.com> wrote:
>> Hi Mohammed
>>
>> Internally In hive the processing is done using MapReduce. So like in 
>> mapreduce the splits are calculated on job submission and a mapper is 
>> assigned per split. So a mapper ideally process a split and not a row.
>>
>> You can store data in various formats as text, sequence files, RC files etc. 
>> No restriction just on text files.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: Mohammad Tariq <donta...@gmail.com>
>> Date: Fri, 29 Jun 2012 00:17:05
>> To: user<user@hive.apache.org>
>> Reply-To: user@hive.apache.org
>> Subject: Hive mapper creation
>>
>> Hello list,
>>
>>         Since Hive tables are assumed to be of text input format, is
>> it right to assume that a mapper is created per row of a particular
>> table??Please correct me if my understanding is wrong. Also let me
>> know how mappers are created corresponding to a Hive query. Many
>> thanks.
>>
>> Regards,
>>     Mohammad Tariq

Reply via email to