For the approach that I outlined, you need to subclass of the file input format.
In that subclass, you store the list of URIs (in a new variable), and override the "createInputSplits()" method. Stephan On Tue, Jul 14, 2015 at 6:42 PM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote: > Hi Stephan, I started working on this today, but I am having a problem > > Can you be a little more detailed in the procedure? > actually I don’t understand how to give to the input format the list of > URI since it will try putting it in a Path variable > > createinputsplit does not receive the path but takes a path from that > variable > > > Thanks, > Michele > > > Il giorno 26/giu/2015, alle ore 12:28, Michele Bertoni < > michele1.bert...@mail.polimi.it> ha scritto: > > Right! > later I will do the question and quoting your answer with the solution :) > > Il giorno 26/giu/2015, alle ore 12:27, Stephan Ewen <se...@apache.org> > ha scritto: > > Seems like a good idea to collect these questions. > > Stackoverflow is also a good place for "useful tricks"... > > On Fri, Jun 26, 2015 at 12:25 PM, Michele Bertoni < > michele1.bert...@mail.polimi.it> wrote: > >> Got it! >> i will try thanks! :) >> >> What about writing a section of it in the programming guide? >> I found a couple of topic about the readers in the mailing list, it seems >> it may be helpful >> >> >> >> Il giorno 26/giu/2015, alle ore 12:21, Stephan Ewen <se...@apache.org> >> ha scritto: >> >> Sure, just override the "createInputSplits()" method. Call for each of >> your file paths "super.createInputSplits()" and then combine the results >> into one array that you return. >> >> That should do it... >> >> On Fri, Jun 26, 2015 at 12:19 PM, Michele Bertoni < >> michele1.bert...@mail.polimi.it> wrote: >> >>> Hi Stephan, thanks for answering, >>> right now I am using an extension of the DelimitedInputFormat, is there >>> a way to merge it with the option 2? >>> >>> >>> >>> Il giorno 26/giu/2015, alle ore 12:17, Stephan Ewen <se...@apache.org> >>> ha scritto: >>> >>> There are two ways you can realize that: >>> >>> 1) Create multiple sources and union them. This is easy, but probably >>> a bit less efficient. >>> >>> 2) Override the FileInputFormat's createInputSplits method to take a >>> union of the paths to create a list of all files and fils splits that will >>> be read. >>> >>> Stephan >>> >>> >>> On Fri, Jun 26, 2015 at 12:12 PM, Michele Bertoni < >>> michele1.bert...@mail.polimi.it> wrote: >>> >>>> Hi everybody, >>>> is there a way to specify a list of URI (“hdfs://file1”,”hdfs://file2”,…) >>>> and open them as different files? >>>> I know i may open the entire directory, but i want to be able to select >>>> a subset of files in the directory >>>> >>>> thanks >>> >>> >>> >>> >> >> > > >