Unless the argument (args[0]) to your job is a comma separated set of paths, you are only adding a single input path. It may be you want to pass args and not args[0]. FileInputFormat.setInputPaths(c, args[0]);
On Thu, Apr 23, 2009 at 7:10 PM, nguyenhuynh.mr <[email protected]>wrote: > Edward J. Yoon wrote: > > > As far as I know, FileInputFormat.getSplits() will returns the number > > of splits automatically computed by the number of files, blocks. BTW, > > What version of Hadoop/Hbase? > > > > I tried to test that code > > (http://wiki.apache.org/hadoop/Hbase/MapReduce) on my cluster (Hadoop > > 0.19.1 and Hbase 0.19.0). The number of input paths was 2, map tasks > > were 274. > > > > Below is my changed code for v0.19.0. > > --- > > public JobConf createSubmittableJob(String[] args) { > > JobConf c = new JobConf(getConf(), TestImport.class); > > c.setJobName(NAME); > > FileInputFormat.setInputPaths(c, args[0]); > > > > c.set("input.table", args[1]); > > c.setMapperClass(InnerMap.class); > > c.setNumReduceTasks(0); > > c.setOutputFormat(NullOutputFormat.class); > > return c; > > } > > > > > > > > On Thu, Apr 23, 2009 at 6:19 PM, nguyenhuynh.mr > > <[email protected]> wrote: > > > >> Edward J. Yoon wrote: > >> > >> > >>> How do you to add input paths? > >>> > >>> On Wed, Apr 22, 2009 at 5:09 PM, nguyenhuynh.mr > >>> <[email protected]> wrote: > >>> > >>> > >>>> Edward J. Yoon wrote: > >>>> > >>>> > >>>> > >>>>> Hi, > >>>>> > >>>>> In that case, The atomic unit of split is a file. So, you need to > >>>>> increase the number of files. or Use the TextInputFormat as below. > >>>>> > >>>>> jobConf.setInputFormat(TextInputFormat.class); > >>>>> > >>>>> On Wed, Apr 22, 2009 at 4:35 PM, nguyenhuynh.mr > >>>>> <[email protected]> wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Hi all! > >>>>>> > >>>>>> > >>>>>> I have a MR job use to import contents into HBase. > >>>>>> > >>>>>> The content is text file in HDFS. I used the maps file to store > local > >>>>>> path of contents. > >>>>>> > >>>>>> Each content has the map file. ( the map is a text file in HDFS and > >>>>>> contain 1 line info). > >>>>>> > >>>>>> > >>>>>> I created the maps directory used to contain map files. And the > this > >>>>>> maps directory used to input path for job. > >>>>>> > >>>>>> When i run job, the number map task is same number map files. > >>>>>> Ex: I have 5 maps file -> 5 map tasks. > >>>>>> > >>>>>> Therefor, the map phase is slowly :( > >>>>>> > >>>>>> Why the map phase is slowly if the number map task large and the > number > >>>>>> map task is equal number of files?. > >>>>>> > >>>>>> * p/s: Run jobs with: 3 node: 1 server and 2 slaver > >>>>>> > >>>>>> Please help me! > >>>>>> Thanks. > >>>>>> > >>>>>> Best, > >>>>>> Nguyen. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>> Current, I use TextInputformat to set InputFormat for map phase. > >>>> > >>>> > >>>> > >>> > >>> Thanks for your help! > >>> > >> I use FileInputFormat to add input paths. > >> Some thing like: > >> FileInputFormat.setInputPath(new Path("dir")); > >> > >> The "dir" is a directory contains input files. > >> > >> Best, > >> Nguyen > >> > >> > >> > >> > Thanks! > > I am using Hadoop version 0.18.2 > > Cheer, > Nguyen. > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
