Ok Bejoy. I'll proceed as directed by you and get back to you in case of any difficulty. Thanks again for the help.
Regards, Mohammad Tariq On Fri, Jun 29, 2012 at 12:59 AM, Bejoy KS <[email protected]> wrote: > Hi Mohammed > > If it is to control the split size and there by the number of map tasks, you > just need to play with min and max split size properties. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > -----Original Message----- > From: Mohammad Tariq <[email protected]> > Date: Fri, 29 Jun 2012 00:55:54 > To: <[email protected]>; <[email protected]> > Reply-To: [email protected] > Subject: Re: Hive mapper creation > > Thanks a lot for the valuable response Bejoy. Actually I wanted to > know if it is possible to set the size of filesplits or the criterion > on which filesplits are created (in turn controlling the creation of > mappers) for a Hive query. For example, If I want to take 'n' lines > from a file as one split instead of taking each individual row, I can > use nlineinput format.Is it possible to do something similar at Hive's > level or do I need to look into the source code?? > > Regards, > Mohammad Tariq > > > On Fri, Jun 29, 2012 at 12:37 AM, Bejoy KS <[email protected]> wrote: >> Hi Mohammed >> >> Splits are associated with MapReduce framework and not necessarily with >> hive. It is the data processed by a mapper. Based on your InputFormat, min >> and max split size properties MR framework considers hdfs blocks that a >> mapper should process.( It can be just one block or more if >> CombineFileInputFormat is used.) This choice of which all hdfs blocks forms >> a split is determined under the consideration of data locality. Number of >> mappers/map tasks created by a job is equal to the number of splits thus >> determined. ie one map task per split. >> >> Hope it is clear. Feel free to revert if you still have any queries. >> >> >> Regards >> Bejoy KS >> >> Sent from handheld, please excuse typos. >> >> -----Original Message----- >> From: Mohammad Tariq <[email protected]> >> Date: Fri, 29 Jun 2012 00:29:13 >> To: <[email protected]>; <[email protected]> >> Reply-To: [email protected] >> Subject: Re: Hive mapper creation >> >> Hello Nitin, Bejoy, >> >> Thanks a lot for the quick response. Could you please tell me >> what is the default criterion of split creation??How the splits for a >> Hive query are created??(Pardon my ignorance). >> >> Regards, >> Mohammad Tariq >> >> >> On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <[email protected]> wrote: >>> Hi Mohammed >>> >>> Internally In hive the processing is done using MapReduce. So like in >>> mapreduce the splits are calculated on job submission and a mapper is >>> assigned per split. So a mapper ideally process a split and not a row. >>> >>> You can store data in various formats as text, sequence files, RC files >>> etc. No restriction just on text files. >>> >>> >>> Regards >>> Bejoy KS >>> >>> Sent from handheld, please excuse typos. >>> >>> -----Original Message----- >>> From: Mohammad Tariq <[email protected]> >>> Date: Fri, 29 Jun 2012 00:17:05 >>> To: user<[email protected]> >>> Reply-To: [email protected] >>> Subject: Hive mapper creation >>> >>> Hello list, >>> >>> Since Hive tables are assumed to be of text input format, is >>> it right to assume that a mapper is created per row of a particular >>> table??Please correct me if my understanding is wrong. Also let me >>> know how mappers are created corresponding to a Hive query. Many >>> thanks. >>> >>> Regards, >>> Mohammad Tariq
