Thanks a lot for the valuable response Bejoy. Actually I wanted to know if it is possible to set the size of filesplits or the criterion on which filesplits are created (in turn controlling the creation of mappers) for a Hive query. For example, If I want to take 'n' lines from a file as one split instead of taking each individual row, I can use nlineinput format.Is it possible to do something similar at Hive's level or do I need to look into the source code??
Regards, Mohammad Tariq On Fri, Jun 29, 2012 at 12:37 AM, Bejoy KS <bejoy...@yahoo.com> wrote: > Hi Mohammed > > Splits are associated with MapReduce framework and not necessarily with hive. > It is the data processed by a mapper. Based on your InputFormat, min and max > split size properties MR framework considers hdfs blocks that a mapper should > process.( It can be just one block or more if CombineFileInputFormat is > used.) This choice of which all hdfs blocks forms a split is determined under > the consideration of data locality. Number of mappers/map tasks created by a > job is equal to the number of splits thus determined. ie one map task per > split. > > Hope it is clear. Feel free to revert if you still have any queries. > > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > -----Original Message----- > From: Mohammad Tariq <donta...@gmail.com> > Date: Fri, 29 Jun 2012 00:29:13 > To: <user@hive.apache.org>; <bejoy...@yahoo.com> > Reply-To: user@hive.apache.org > Subject: Re: Hive mapper creation > > Hello Nitin, Bejoy, > > Thanks a lot for the quick response. Could you please tell me > what is the default criterion of split creation??How the splits for a > Hive query are created??(Pardon my ignorance). > > Regards, > Mohammad Tariq > > > On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <bejoy...@yahoo.com> wrote: >> Hi Mohammed >> >> Internally In hive the processing is done using MapReduce. So like in >> mapreduce the splits are calculated on job submission and a mapper is >> assigned per split. So a mapper ideally process a split and not a row. >> >> You can store data in various formats as text, sequence files, RC files etc. >> No restriction just on text files. >> >> >> Regards >> Bejoy KS >> >> Sent from handheld, please excuse typos. >> >> -----Original Message----- >> From: Mohammad Tariq <donta...@gmail.com> >> Date: Fri, 29 Jun 2012 00:17:05 >> To: user<user@hive.apache.org> >> Reply-To: user@hive.apache.org >> Subject: Hive mapper creation >> >> Hello list, >> >> Since Hive tables are assumed to be of text input format, is >> it right to assume that a mapper is created per row of a particular >> table??Please correct me if my understanding is wrong. Also let me >> know how mappers are created corresponding to a Hive query. Many >> thanks. >> >> Regards, >> Mohammad Tariq